Data Creator
I have compiled a comprehensive list of data analysis applications and one of the things that I’m occasionally asked for is a test data set that can be used to evaluate an application. Whilst I keep a couple of data sets that I can use perhaps Data Creator will provide a more comprehensive solution.
Data Creator is an application that has been designed to fill this important niche, Data Creator can be used to build very large data sets using field types defined by the user and then filled with random realistic content. When you start the application the user is presented with an empty dataset, the controls are very familiar to anyone who has used Mac OS X applications/preference panes. Simply click on the”+” button (highlighted in red) to add a field and then click on the text box to name the field and choose the field type from the drop down menu. These are automatically added to the right hand panel.
You can then fill the field with data one record at a time by clicking the “+” button on the right hand panel (highlighted in green), if you are not happy with the content you can delete it using the “-“ button. This is a nice way to check that the content is what you were expecting. If you decide that you want the fields in a different order you can just drag the field labels in the right hand panel into the desired order. When you have added all the fields needed you can populate with the number of records required by clicking the “Add More Records” button (highlighted in blue). I created a data set with 6 fields and 10,000 records on my laptop in under 30 seconds, exporting to a tab delimited text file together with the field names (csv is also available) then only took a few seconds.
I had a look at the resulting tab delimited file in Aabel and the results were certainly random.
The currently available data types are:
- Alpha lowercase (Fixed length or inside a specific range)
- Alpha Uppercase (Fixed length or inside a specific range)
- Alpha Anycase (Fixed length or inside a specific range) Alphanumeric lowercase (Fixed length or inside a specific range) Alphanumeric Uppercase (Fixed length or inside a specific range)
- Alphanumeric Anycase (Fixed length or inside a specific range)
- Numeric (Fixed length or inside a specific range)
- Exadecimal Uppercase (Fixed length or inside a specific range)
- Animals Cakes Cheeses Colors
- Italian Wines Countries
- USA States
- French Departments Italian Provinces Spanish Provinces
- Days of Week Months of Year
- USAStreets French streets German Streets Italian Streets Spanish Streets
- English Names (female) English Names (male)
- English Names (mixed)
- English Surname (mixed)
- English Names (female) + Surname English Names (male)+ Surname English Names (mixed)+ Surname
- French Names (female)
- French Names (male)
- French Names (mixed)
- French Surname (mixed)
- French Names (female) + Surname French Names (male)+ Surname French Names (mixed)+ Surname
- German Names (female)
- German Names (male)
- German Names (mixed)
- German Surname (mixed)
- German Names (female) + Surname German Names (male)+ Surname German Names (mixed)+ Surname
- Italian Names (female)
- Italian Names (male)
- Italian Names (mixed)
- Italian Surname (mixed)
- Italian Names (female) + Surname Italian Names (male)+ Surname Italian Names (mixed)+ Surname
- Spanish Names (female)
- Spanish Names (male)
- Spanish Names (mixed)
- Spanish Surname (mixed)
- Spanish Names (female) + Surname Spanish Names (male)+ Surname Spanish Names (mixed)+ Surname
Data Creator makes use of the recent Lion technologies
- Resume – the app will reopen at the point and state the user left it included opened documents and unsaved one
- Auto save – the app saves using the Mac OS X Lion autosave functionality
- Versions – the user can look at previous versions of documents and restore to any earlier version (just select Revert to Saved from the menu or the down arrow at the right of the title window, it appears only if you move the cursor over)
Data Creator is very straightforward to use and the performance would suggest it is able to create large data sets without excessive demands on the hardware.
There are a few things that might enhance the application, adding numeric fields allows you to define the length of the number but not the range e.g. between 21 and 105, it would also be useful to have an option to add fixed point and floating point non-integral numbers. Whilst you can add days of the week and months of the year it would be helpful to be able to intelligently add the date i.e a number based on the number of days in a particular month. Whilst there are currently 50 different data types it would also be very useful if users could add their own data types.
Updated 14 January 2012