rdkit.Chem.rdSynthonSpaceSearch module

Module containing implementation of SynthonSpace search of Synthon-based chemical libraries such as Enamine REAL. NOTE: This functionality is experimental and the API and/or results may change in future releases.

rdkit.Chem.rdSynthonSpaceSearch.ConvertTextToDBFile((str)inFilename, (str)outFilename[, (AtomPairsParameters)fpGen=None]) None :

Convert the text file into the binary DB file in our format. Assumes that all synthons from a reaction are contiguous in the input file. This uses a lot less memory than using ReadTextFile() followed by WriteDBFile().- inFilename the name of the text file- outFilename the name of the binary file- optional fingerprint generator

C++ signature :

void ConvertTextToDBFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > [,boost::python::api::object=None])

rdkit.Chem.rdSynthonSpaceSearch.FormattedIntegerString((int)value) str :

Format an integer with spaces every 3 digits for ease of reading

C++ signature :

std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > FormattedIntegerString(long)

class rdkit.Chem.rdSynthonSpaceSearch.SubstructureResult

Bases: instance

Used to return results of SynthonSpace searches.

Raises an exception This class cannot be instantiated from Python

GetCancelled((SubstructureResult)arg1) bool :

Returns whether the search was cancelled or not.

C++ signature :

bool GetCancelled(RDKit::SynthonSpaceSearch::SearchResults {lvalue})

GetHitMolecules((SubstructureResult)self) list :

A function returning hits from the search

C++ signature :

boost::python::list GetHitMolecules(RDKit::SynthonSpaceSearch::SearchResults)

GetMaxNumResults((SubstructureResult)arg1) int :

The upper bound on number of results possible. There may be fewer than this in practice for several reasons such as duplicate reagent sets being removed or the final product not matching the query even though the synthons suggested they would.

C++ signature :

unsigned long GetMaxNumResults(RDKit::SynthonSpaceSearch::SearchResults {lvalue})

GetTimedOut((SubstructureResult)arg1) bool :

Returns whether the search timed out or not.

C++ signature :

bool GetTimedOut(RDKit::SynthonSpaceSearch::SearchResults {lvalue})

class rdkit.Chem.rdSynthonSpaceSearch.SynthonSpace((object)arg1)

Bases: instance

SynthonSpaceSearch object.

C++ signature :

void __init__(_object*)

BuildSynthonFingerprints((SynthonSpace)self, (FingerprintGenerator64)fingerprintGenerator) None :

Build the synthon fingerprints ready for similarity searching. This is done automatically when the first similarity search is done, but if converting a text file to binary format it might need to be done explicitly.

C++ signature :

void BuildSynthonFingerprints(RDKit::SynthonSpaceSearch::SynthonSpace {lvalue},RDKit::FingerprintGenerator<unsigned long>)

FingerprintSearch((SynthonSpace)self, (Mol)query, (AtomPairsParameters)fingerprintGenerator[, (AtomPairsParameters)params=None]) SubstructureResult :

Does a fingerprint search in the SynthonSpace using the FingerprintGenerator passed in.

C++ signature :

RDKit::SynthonSpaceSearch::SearchResults FingerprintSearch(RDKit::SynthonSpaceSearch::SynthonSpace {lvalue},RDKit::ROMol,boost::python::api::object [,boost::python::api::object=None])

GetNumProducts((SynthonSpace)self) int :

Returns number of products in the SynthonSpace, with multiple counting of any duplicates.

C++ signature :

unsigned long GetNumProducts(RDKit::SynthonSpaceSearch::SynthonSpace {lvalue})

GetNumReactions((SynthonSpace)self) int :

Returns number of reactions in the SynthonSpace.

C++ signature :

unsigned long GetNumReactions(RDKit::SynthonSpaceSearch::SynthonSpace {lvalue})

GetSynthonFingerprintType((SynthonSpace)self) str :

Returns the information string for the fingerprint generator used to create this space.

C++ signature :

std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > GetSynthonFingerprintType(RDKit::SynthonSpaceSearch::SynthonSpace {lvalue})

RascalSearch((SynthonSpace)self, (Mol)query, (AtomPairsParameters)rascalOptions[, (AtomPairsParameters)params=None]) SubstructureResult :

Does a search using the Rascal similarity score. The similarity threshold used is provided by rascalOptions, and the one in params is ignored.

C++ signature :

RDKit::SynthonSpaceSearch::SearchResults RascalSearch(RDKit::SynthonSpaceSearch::SynthonSpace {lvalue},RDKit::ROMol,boost::python::api::object [,boost::python::api::object=None])

ReadDBFile((SynthonSpace)self, (str)inFile[, (int)numThreads=1]) None :

Reads binary database file. Takes optional number of threads,default=1.

C++ signature :

void ReadDBFile(RDKit::SynthonSpaceSearch::SynthonSpace {lvalue},std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > [,int=1])

ReadTextFile((SynthonSpace)self, (str)inFile) None :

Reads text file of the sort used by ChemSpace/Enamine.

C++ signature :

void ReadTextFile(RDKit::SynthonSpaceSearch::SynthonSpace {lvalue},std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)

SubstructureSearch((SynthonSpace)self, (Mol)query[, (AtomPairsParameters)substructMatchParams=None[, (AtomPairsParameters)params=None]]) SubstructureResult :

Does a substructure search in the SynthonSpace.

C++ signature :

RDKit::SynthonSpaceSearch::SearchResults SubstructureSearch(RDKit::SynthonSpaceSearch::SynthonSpace {lvalue},RDKit::ROMol [,boost::python::api::object=None [,boost::python::api::object=None]])

SubstructureSearch( (SynthonSpace)self, (object)query [, (AtomPairsParameters)substructMatchParams=None [, (AtomPairsParameters)params=None]]) -> SubstructureResult :

Does a substructure search in the SynthonSpace using an extended query.

C++ signature :

RDKit::SynthonSpaceSearch::SearchResults SubstructureSearch(RDKit::SynthonSpaceSearch::SynthonSpace {lvalue},RDKit::GeneralizedSubstruct::ExtendedQueryMol [,boost::python::api::object=None [,boost::python::api::object=None]])

Summarise((SynthonSpace)self) None :

Writes a summary of the SynthonSpace to stdout.

C++ signature :

void Summarise(RDKit::SynthonSpaceSearch::SynthonSpace {lvalue})

WriteDBFile((SynthonSpace)self, (str)outFile) None :

Writes binary database file.

C++ signature :

void WriteDBFile(RDKit::SynthonSpaceSearch::SynthonSpace {lvalue},std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)

WriteEnumeratedFile((SynthonSpace)self, (str)outFile) None :

Writes enumerated library to file.

C++ signature :

void WriteEnumeratedFile(RDKit::SynthonSpaceSearch::SynthonSpace {lvalue},std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)

class rdkit.Chem.rdSynthonSpaceSearch.SynthonSpaceSearchParams((object)arg1)

Bases: instance

SynthonSpaceSearch parameters.

C++ signature :

void __init__(_object*)

property approxSimilarityAdjuster

The fingerprint search uses an approximate similarity method before building a product and doing a final check. The similarityCutoff is reduced by this value for the approximate check. A lower value will give faster run times at the risk of missing some hits. The value you use should have a positive correlation with your FOMO. The default of 0.1 is appropriate for Morgan fingerprints. With RDKit fingerprints, 0.05 is adequate, and higher than that has been seen to produce long run times.

property buildHits

If false, reports the maximum number of hits that the search could produce, but doesn’t return them.

property fragSimilarityAdjuster

Similarities of fragments are generally low due to low bit densities. For the fragment matching, reduce the similarity cutoff off by this amount. Default=0.1.

property hitStart

The sequence number of the hit to start from. So that you can return the next N hits of a search having already obtained N-1. Default=0

property maxHits

The maximum number of hits to return. Default=1000.Use -1 for no maximum.

property maxNumFrags

The maximum number of fragments the query can be broken into. Big molecules will create huge numbers of fragments that may cause excessive memory use. If the number of fragments hits this number, fragmentation stops and the search results will likely be incomplete. Default=100000.

property numRandomSweeps

The random sampling doesn’t always produce the required number of hits in 1 go. This parameter controls how many loops it makes to try and get the hits before giving up. Default=10.

property numThreads

The number of threads to use for search. If > 0, will use that number. If <= 0, will use the number of hardware threads plus this number. So if the number of hardware threads is 8, and numThreads is -1, it will use 7 threads. Default=1.

property randomSample

If True, returns a random sample of the hits, up to maxHits in number. Default=False.

property randomSeed

If using randomSample, this seeds the random number generator so as to give reproducible results. Default=-1 means use a random seed.

property similarityCutoff

Similarity cutoff for returning hits by fingerprint similarity. At present the fp is hard-coded to be Morgan, bits, radius=2. Default=0.5.

property timeOut

Time limit for search, in seconds. Default is 600s, 0 means no timeout. Requires an integer