The latest update to the CRAN R archive brings the total number of packages to 9004.
2016-08-22: 9000 packages
2016-02-29: 8000 packages
2015-08-12: 7000 packages
2014-10-29: 6000 packages
2013-11-08: 5000 packages
2012-08-23: 4000 packages
2011-05-12: 3000 packages
2009-10-04: 2000 packages
2007-04-12: 1000 packages
2004-10-01: 500 packages
2003-04-01: 250 packages
There is a listing of data analysis tools for Mac OSX here.
I just thought I'd flag a paper in Journal of Cheminformatics, RRegrs: an R package for computer-aided model selection with multiple regression models DOI.
We propose an integrated framework for creating multiple regression models, called RRegrs. The tool offers the option of ten simple and complex regression methods combined with repeated 10-fold and leave-one-out cross-validation. Methods include Multiple Linear regression, Generalized Linear Model with Stepwise Feature Selection, Partial Least Squares regression, Lasso regression, and Support Vector Machines Recursive Feature Elimination. The new framework is an automated fully validated procedure which produces standardized reports to quickly oversee the impact of choices in modelling algorithms and assess the model and cross-validation results. The methodology was implemented as an open source R package, available at https://www.github.com/enanomapper/RRegrs, by reusing and extending on the caret package.
The results of the 16th annual KDnuggets Software Poll on data analysis tools is in.
The top 10 tools by share of users were
R, 46.9% share ( 38.5% in 2014, 37% in 2013)
RapidMiner, 31.5% ( 44.2% in 2014, 39% in 2013)
SQL, 30.9% ( 25.3% in 2014, NA in 2013)
Python, 30.3% ( 19.5% in 2014, 13% in 2013)
Excel, 22.9% ( 25.8% in 2014, 28% in 2013)
KNIME, 20.0% ( 15.0% in 2014, 6% in 2013)
Hadoop, 18.4% ( 12.7% in 2014, 9% in 2013)
Tableau, 12.4% ( 9.1% in 2014, NA 2013)
SAS, 11.3 (10.9% in 2014, 10.7% in 2013)
Spark, 11.3% ( 2.6% in 2014, NA in 2013)
The results very much reflect my own interactions, whilst R has a significant installed user base and of course a vast repository of open source packages, Python seems to be gaining traction. Certainly in part because Python seems to have become the lingua franca for scientific computing.
There is a listing of data analysis tools for Mac OS X here.
Whilst R is a very comprehensive statistical and data analysis package it does have a very steep learning curve.
R Instructor is an iPhone, iPad and iPod Touch application that uses plain, non-technical language and over 30 videos to explain how to make and modify plots, manage data and conduct both parametric and non-parametric statistical tests.
Now added to the mobile science site.
ChemmineOB provides an R interface to a subset of cheminformatics functionalities implemented by the OpelBabel C++ project. OpenBabel is an open source cheminformatics toolbox that includes utilities for structure format interconversions, descriptor calculations, compound similarity searching and more. ChemineOB aims to make a subset of these utilities available from within R. For non-developers, ChemineOB is primarily intended to be used from ChemmineR as an add-on package rather than used directly.
I just noticed that there is an update to R on the CRAN website
This binary distribution of R and the GUI supports 64-bit Intel based Macs on Mac OS X 10.6 (Leopard) or higher. Since R 3.0.0 the binary is a single-arch build and contains only the x86_64 (64-bit Intel) architecture. PowerPC Macs and 32-bit Macs are only supported by building from sources or by older binary R versions. The default package type is "mac.binary" and the binary repository layout has changed accordingly.
I’m not a big user of R a free software environment for statistical computing and graphics, but occasionally I notice cheminformatics modules being published. The latest issue of Bioinformatics DOI has a paper describing “fmcsR: Mismatch Tolerant Maximum Common Substructure Searching in R”.
The fmcsR package provides an R interface, with the time consuming steps of the FMCS algorithm implemented in C++. It includes utilities for pairwise compound comparisons, structure similarity searching, clustering and visualization of MCSs. In comparison to an existing MCS tool, fmcsR shows better time performance over a wide range of compound sizes. When mismatching of atoms or bonds is turned on, the compute times increase as expected, and the resulting FMCSs are often R1C5 substantially larger than their strict MCS counterparts. Based on R1C6 extensive virtual screening (VS) tests, the flexible matching feature enhances the enrichment of active structures at the top of MCS-based similarity search results. With respect to overall and early enrichment performance, FMCS outperforms most of the seven other VS methods considered in these tests.
fmcsR is freely available for all common operating systems from the Bioconductor site http://www.bioconductor.org/packages/devel/bioc/html/fmcsR.html.
ChemmineR a cheminformatics package for analyzing drug-like small molecule data in R was recently updated. Its latest version contains functions for efficient processing of large numbers of molecules, physicochemical/structural property predictions, structural similarity searching, classification and clustering of compound libraries with a wide spectrum of algorithms. In addition, it offers visualization functions for compound clustering results and chemical structures.
To install, start R and enter
R the language and environment for statistical computing and graphics has now reached version 3.0.0.
Whilst there is a list of new features and updates, those listed as most significant are shown below.
- Packages need to be (re-)installed under this version (3.0.0) of R.
- There is a subtle change in behaviour for numeric index values 2^31 and larger. These never used to be legitimate and so were treated as NA, sometimes with a warning. They are now legal for long vectors so there is no longer a warning, and x[2^31] <- y will now extend the vector on a 64-bit platform and give an error on a 32-bit one.
- It is now possible for 64-bit builds to allocate amounts of memory limited only by the OS. It may be wise to use OS facilities (e.g. ulimit in a bash shell, limit in csh), to set limits on overall memory consumption of an R process, particularly in a multi-user environment. A number of packages need a limit of at least 4GB of virtual memory to load. 64-bit Windows builds of R are by default limited in memory usage to the amount of RAM installed: this limit can be changed by command-line option --max-mem-size or setting environment variable RMAXMEM_SIZE.
- Negative numbers for colours are consistently an error: previously they were sometimes taken as transparent, sometimes mapped into the current palette and sometimes an error.
Wizard the point-and-click statistical analysis for Mac has been updated.
The focus of this release is supporting several new import formats, including the oft-requested XLSX and Numbers document formats.
A major change in the product line is that reading and writing R files and generating R code has now "graduated" from the Pro version and is now available in the Standard version. But Pro users shouldn't feel left out: with this release, Support for importing binary SAS files and generating SAS code -- both features only available in the Pro version.
- Import XLSX spreadsheets
- Import Numbers documents
New Features (Pro Version):
- Import SAS binary files (.sas7bdat)
- Import plain-text data with SAS commands (.sas)
- Generate SAS model estimation commands
New Features (Standard Version):
- Import/export R files
- Generate R commands
- Fix a crash when zero observations are included in the Model view
- Fix a bug when importing multiple sheets in XLS documents
- Fix a bug where Q-Q plots were not properly exported as PDF
KNIME 2.7 has been released.
KNIME now runs on Java 7 for Windows and Linux systems (Mac stays on Java 6) Eclipse update 3.7 increases stability on Mac and some Linux systems. BIRT 3.7 brings Open Office support among other new features
JFreeChart nodes have now more setting options in the “General Plot Options” tab of their configuration window.
In R-> Local there are a number of new nodes to import:
- “Table to R” can read a KNIME table into R and output the R workspace.
- “R to Table” takes an R workspace and outputs a KNIME table.
- “R +Data to R” takes an R workspace and optional data input and outputs an R workspace.
- “R to R-View” takes an R workspace and outputs a KNIME view
There is a KNIME tutorial here