Macs in Chemistry

Insanely great science


Data Creator

I have compiled a comprehensive list of data analysis applications and one of the things that I’m occasionally asked for is a test data set that can be used to evaluate an application. Whilst I keep a couple of data sets that I can use perhaps Data Creator will provide a more comprehensive solution.

Data Creator is an application that has been designed to fill this important niche, Data Creator can be used to build very large data sets using field types defined by the user and then filled with random realistic content. When you start the application the user is presented with an empty dataset, the controls are very familiar to anyone who has used Mac OS X applications/preference panes. Simply click on the”+” button (highlighted in red) to add a field and then click on the text box to name the field and choose the field type from the drop down menu. These are automatically added to the right hand panel.


You can then fill the field with data one record at a time by clicking the “+” button on the right hand panel (highlighted in green), if you are not happy with the content you can delete it using the “-“ button. This is a nice way to check that the content is what you were expecting. If you decide that you want the fields in a different order you can just drag the field labels in the right hand panel into the desired order. When you have added all the fields needed you can populate with the number of records required by clicking the “Add More Records” button (highlighted in blue). I created a data set with 6 fields and 10,000 records on my laptop in under 30 seconds, exporting to a tab delimited text file together with the field names (csv is also available) then only took a few seconds.

I had a look at the resulting tab delimited file in Aabel and the results were certainly random.

The currently available data types are:

Data Creator makes use of the recent Lion technologies

Data Creator is very straightforward to use and the performance would suggest it is able to create large data sets without excessive demands on the hardware.

There are a few things that might enhance the application, adding numeric fields allows you to define the length of the number but not the range e.g. between 21 and 105, it would also be useful to have an option to add fixed point and floating point non-integral numbers. Whilst you can add days of the week and months of the year it would be helpful to be able to intelligently add the date i.e a number based on the number of days in a particular month. Whilst there are currently 50 different data types it would also be very useful if users could add their own data types.

Updated 14 January 2012