A complete guide to K-means clustering algorithm
A little while back I compared different Options for Clustering large datasets of Molecules.
Clustering is an invaluable cheminformatics technique for subdividing a typically large compound collection into small groups of similar compounds. One of the advantages is that once clustered you can store the cluster identifiers and then refer to them later this is particularly valuable when dealing with very large datasets. This often used in the analysis of high-throughput screening results, or the analysis of virtual screening or docking studies.
One popular (and quick) technique for clustering is to use K-means clustering. I just came across this very useful explanation of K-means clustering, well worth a read.