Top 12 unix commands for data scientists.
A really useful post on KDnuggets.
With the beautiful intuitive interface it is sometimes easy to forget that Mac OS X has unix underpinnings and that the Terminal gives access to whole set of invaluable tools.
This post is a short overview of a dozen Unix-like operating system command line tools which can be useful for data science tasks. The list does not include any general file management commands (pwd, ls, mkdir, rm, ...) or remote session management tools (rsh, ssh, ...), but is instead made up of utilities which would be useful from a data science perspective, generally those related to varying degrees of data inspection and processing. They are all included within a typical Unix-like operating system as well.
If you regularly have to deal with very large data files some of these commands will be invaluable, for example:
head outputs the first n lines of a file (10, by default) to standard output. The number of lines displayed can be set with the -n option.
head -n 5 my file.txt