Chemistry
space refers to the combinatorial and
configurational space spanned by all possible molecules (i.e.
those combination of atoms allowed by the rules of valence in
energetically stable spatial arrangements). It is estimated that the
total number of possible small organic molecules populating chemistry
space could exceed 1060
— a number that exceeds the total number of atoms in the known
universe, and is vastly greater than the number of molecules that
have actually been isolated or synthesized.
Chemistry
space is, of course, more than an
uncategorized list of possible molecules. Molecules in chemistry
space are related to each other in different ways. They are related
to each other by similarity relationships, by chemical reaction
pathways connecting different molecules, and
in other ways. Chemical reactions also
allow us to move from one molecule or chemical structure to another,
defining a chemical reaction network in chemistry space. The
similarity relationships include those of constitutional similarity
(similarity of atoms in the molecule), structural similarity
(similarity of substructures comprising the molecule), similarity of
three-dimensional shape, similarity of chemical properties, or
similarity of effects on the human (or animal) body due to binding to
similar proteins. There are certainly more ways to assess molecular
similarity than there are to skin the proverbial cat. Each similarity
metric can be used to define a pairwise distance between molecules,
which in turn can be used to generate a weighted or unweighted
network. While many of these similarity measures are related to each
other, they are not identical, and thus each will result in a
different network.
The
topological characteristics of these chemistry
space networks are of considerable
interest, both for fundamental reasons and for practical applications
to drug design. But the enormous size of chemistry space makes its
thorough exploration impossible. Thus a key question in drug design
is how to optimally direct research efforts towards regions of
chemistry space that are most likely to contain molecules with useful
biological activity. The regions of chemistry space that have been
mapped through experimental investigations are extremely limited and
constitute an obviously biased sample. Chemists isolate, synthesize
and study molecules for a variety of reasons, which include but are
not limited to novelty, structural diversity, similarity to known
drug leads, availability of source materials, unusual properties,
peer pressure, etc.
Thus it is not clear a priori
whether different regions of chemistry space or chemistry spaces
constructed using different similarity metrics should have any common
characteristics or whether the network topology of chemistry spaces
should be more similar to biological networks or to social networks.
Not
all chemical spaces are created equal!
Relating
chemical similarity to similarity in biological activity produced by
the molecules introduces yet another level of complication [1].
Changes in biological activities resulting from changes in molecular
structure are described by chemists through structure-activity
relationships. The fundamental assumption implicit in such studies is
that similar molecules should exhibit similar activities in
biological assays — this is known as the similarity principle [2].
(More generally, while similar molecules may not always exhibit
similar activities in individual biological assays, similar molecules
do display similar broad patterns of biological activities across a
range of related protein targets [3-6). Significant deviations from
the similarity principle have been observed even between very similar
molecules, leading to very similar molecules often exhibiting very
different biological activities [2]. This is one of the major reasons
for the failure of structure activity relationship models [7]. Gerry
Maggiora postulated that such deviations arise on account of the
complex nature of the activity landscape associated with biological
assays, and he coined the term “activity cliffs” to characterize
such regions of the structure activity landscape [8]. In Maggiora's
topographical metaphor, smooth regions of the structure activity
landscape (either flat like Kansas or like the rolling hills of
England) are those that best satisfy the similarity principle.
Measures such as the structure activity landscape index (SALI)
[9-12], which quantifies the change in biological activity produced
by a given change in chemical structure, have been devised to
characterize activity cliffs. Utilizing a cutoff value of the index
enables one to represent sets of molecules through network graphs
that highlight abrupt changes in biological activity associated with
the steepest cliffs. Steep activity cliffs (Bryce canyon-like
regions), associated with high SALI values, represent the most
challenging regions of a structure activity relationship to model
quantitatively, but they are also the most interesting regions for
purposes of drug design, because small structural modifications in a
molecule can lead to a drug with vastly improved potency. This
process is known as lead optimization.
Network
topology of chemistry spaces
The
degree distribution P(k) is the probability that a given node in a
network has exactly k links or connections to other nodes. Scale-free
networks are characterized by a power-law degree distribution: the
probability that a node has k links follows P(k) ∼ k-γ.
Such distributions appear linear on a plot of log P(k) versus log k.
Nodes whose degrees deviate significantly from the average degree are
extremely rare. The properties of a scale-free network are often
determined by a relatively small number of highly connected nodes
(hubs). In contrast, the tail of the degree distribution of a random
network decreases exponentially as P(k) ∼ exp(-k) with the degree
k. For a chemistry space network, we take each molecule as a node of
the network, and use a discretized similarity measure to define the
edges. Investigation of number of chemistry space networks using a
variety of similarity measures has revealed the heavy tail degree
distribution characteristic of a small-world network [13-15], as seen
in the figure below.
Hubs
in chemistry space are represented by molecules with high leverage in
structure-activity relationship models. Such molecules are important
for maintaining the diversity of a chemical library and for ensuring
good predictive performance of structure activity relationship models
across a wide domain of applicability. This ability to identify
diverse structures spanning very different bond frame works or
structural scaffolds with similar activities (known as scaffold
hopping) is of great importance for drug design.
Activity
cliffs lead to breakdown of simple structure activity relationship
models in their vicinity. Differences in the characteristics of
biological networks and the networks of commonly used chemical
representations is a reason for encountering activity cliffs. Mapping
the locations of activity cliffs for different representations, and
comparing the global characteristics of SALI sub-networks with those
of the underlying chemistry space networks generated using each
representation, can guide the modeler in the choice of an appropriate
chemical structure representation.
The
figure above shows the SALI sub-network (in red) of a small set of
molecules superimposed upon the underlying chemistry space network
(in black). A higher density of SALI edges in any region of a
chemistry space network graph with a particular chemical structure
representation is an indication of a more challenging structure
activity relationship using that representation in that region of
chemistry space. Appreciation
for the role of polypharmacology (the interaction of a drug with
multiple targets) is also leading to a
rapidly growing
interest in the investigation of networks in chemistry space [16-17].
References:
-
Bajorath,
J.; Peltason, L.; Wawer, M.; Guha, R.; Lajiness, M. S.; Van Drie, J.
H. Navigating structure-activity landscapes. Drug Discov. Today,
2009, 14 (1314), 698–705.
-
Martin,
Y. C.; Kofron, J.L.; Traphagen, L. M. Do structurally similar
molecules have similar biological activity? J. Med. Chem., 2002, 45,
4350-4358.
-
Fliri,
A. F.; Loging, W. T.; Thadeio, P. F.; Volkmann, R. A. Biospectra
analysis: Model proteome characterization for linking molecular
structure and biological response. J. Med. Chem., 2005, 48,
6918-6925.
-
Fliri,
A. F.; Loging, W. T.; Thadeio, P. F.; Volkmann, R. A. Biological
spectra analysis: Linking biological activity profiles to molecular
structure. Proc. Nat. Acad. Sci. USA, 2005, 102, 261-266.
-
Klabunde,
T. Chemogenomic approaches to drug discovery: similar receptors bind
similar ligands. Br. J. Pharmacol., 2007, 152 (1), 5-7.
-
Rognan,
D. Chemogenomic approaches to rational drug design. Br. J.
Pharmacol., 2007, 152, 38-52.
-
Kubinyi,
H. Why Models Fail http://www.kubinyi.de/sanfrancisco-09-06.pdf
-
Maggiora,
G. M. On Outliers and Activity Cliffs - Why QSAR Often Disappoints.
J. Chem. Inf. Model., 2006, 46 (4), 1535.
-
Guha,
R.; Van Drie, J. H. Structure-Activity Landscape Index: Identifying
and Quantifying Activity Cliffs. J. Chem. Inf. Model., 2008, 48,
646–658.
-
Guha,
R.; Van Drie, J. H. Assessing How Well a Modeling Protocol Captures
a Structure-Activity Landscape. J. Chem. Inf. Model., 2008, 48 (8),
1716–1728.
-
Peltason,
L.; Bajorath, J. SAR Index: quantifying the nature of
structure-activity relationships. J. Med. Chem., 2007, 50,
5571-5578.
-
Wawer,
M.; Peltason, L.; Weskamp, N.; Teckentrup, A.; Bajorath, J.
Structure-activity relationship anatomy by network-like similarity
graphs and local structure-activity relationship indices, J. Med.
Chem., 2008, 51, 6075-6084.
-
Benz,
R. W.; Swamidass, J.; Baldi, P. Discovery of Power-Laws in Chemical
Space. J. Chem. Inf. Model 2008, 48, 1138–1151.
-
Tanaka,
N.; Ohno, K.; Niimi, T.; Moritomo, A.; Mori, K.; Orita, M.
Small-World Phenomena in Chemical Library Networks: Application to
Fragment-Based Drug Discovery. J. Chem. Inf. Model., 2009, 49
703(12), 2677–2686.
-
Krein,
M. P.; Sukumar, N. Exploration of the Topology of Chemical Spaces
with Network Measures, J. Phys. Chem. A, 2011, 11:6; DOI:
-
Hopkins,
A. L. Network pharmacology: the next paradigm in drug discovery.
Nature Chem. Biol., 2008, 4, 682-690.
-
Milletti,
F.; Vulpetti, A. Predicting polypharmacology by binding site
similarity: from kinases to the protein universe. J. Chem. Inf.
Model., 2010, 50 (8), 1418-1431.