CSFP - A New Molecular Fingerprint


An interesting paper in JCIM, Connected Subgraph Fingerprints: Representing Molecules Using Exhaustive Subgraph Enumeration DOI.

The very popular ECFP fingerprint enumerates all circular substructures, i.e. substructures with a central atom and a spherical extension around them. However, not all chemical reasonable substructures are circular. They could be shaped as paths, cycles or any other irregular form and consequently cannot be represented as single features in ECFPs. To overcome these limitations, we developed a novel algorithm named CONSENS systematically enumerating all connected substructures within given size limits. CONSENS is the central element of a novel fingerprint named CSFP - Connected Subgraph Fingerprint. CSFPs are not only richer in represented substructures, furthermore they allow finegrained control to the chemical model encoded.

ConsensLib is a header-only C++ library for efficient enumeration of connected induced subgraphs. The CONSENS (Connected Subgraph Enumeration Strategy) algorithm enumerates all node sets that form a connected subgraph of a given query graph. It is available on GitHub

