Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2860184

Formats

Article sections

- Abstract
- I. INTRODUCTION
- II. BACKGROUND
- III. GRAPH PATTERN MATCHING KERNELS
- IV. EXPERIMENTAL STUDY
- V. CONCLUSIONS
- References

Authors

Related links

Proc IEEE Int Symp Bioinformatics Bioeng. Author manuscript; available in PMC 2010 April 27.

Published in final edited form as:

Proc IEEE Int Symp Bioinformatics Bioeng. 2008 December 8; 2008: 1–6.

doi: 10.1109/BIBE.2008.4696654PMCID: PMC2860184

NIHMSID: NIHMS118198

Aaron Smalter, Department of Electrical Engineering and Computer Science, University of Kansas, USA;

Classifying chemical compounds is an active topic in drug design and other cheminformatics applications. Graphs are general tools for organizing information from heterogenous sources and have been applied in modelling many kinds of biological data. With the fast accumulation of chemical structure data, building highly accurate predictive models for chemical graphs emerges as a new challenge.

In this paper, we demonstrate a novel technique called Graph Pattern Matching kernel (GPM). Our idea is to leverage existing frequent pattern discovery methods and explore their application to kernel classifiers (e.g. support vector machine) for graph classification. In our method, we first identify all frequent patterns from a graph database. We then map subgraphs to graphs in the database and use a diffusion process to label nodes in the graphs. Finally the kernel is computed using a set matching algorithm. We performed experiments on 16 chemical structure data sets and have compared our methods to other major graph kernels. The experimental results demonstrate excellent performance of our method.

The fast accumulation of data describing chemical structures [1] and biological activity calls for the development of efficient informatics tools. *Cheminformatics* is a rapidly emerging research discipline that employs a wide array of statistical, data mining, and machine learning techniques with the goal of establishing robust relationships between chemical structures and their biological properties[25].

Publicly-available large-scale chemical compound databases have offered tremendous opportunities for creating highly efficient *in silico* drug design methods. Many machine learning and data mining algorithms have been applied to study the structure-activity relationship of chemicals with the goal of building classifiers for graph-structured data. Additional applications include protein function prediction based on structure [9] and gene regulation networks analysis [10].

Recently Support Vector Machines (SVM) have gained popularity in drug design and cheminformatics. A key insight of SVM is the utilization of kernel functions (i.e. inner product of two points in a Hilbert Space) to transform a non-linear classification problem into a linear one. Design of a good kernel function for graphs is therefore a critical issue and several have been studied.

The initial work was done by Haussler in his work of *R-convolution* kernel, providing a framework of which many current graph kernel function follow [7]. Recent progress of graph kernel functions can be roughly divided into two categories. The first group of kernel functions consider the full adjacency matrix of graphs and hence measure the global similarity of two graphs. These include product graph kernels [6], random walk based kernels [12], and kernels based on shortest paths between pair of nodes [13]. The second group of kernel functions try to capture the local similarity of two graphs by counting the shared subcomponents of graphs. These include the subtree kernels [20], cyclic kernels [24], spectrum kernel [4], and recently subgraph kernels [23].

In this paper, we explore the second avenue and aim to leverage existing frequent pattern mining algorithms in building accurate graph kernel functions. Towards that end, we demonstrate a novel technique called graph pattern matching kernel (GPM). We have tested our algorithm using 16 chemical structure data sets. The experimental results demonstrate that our method outperforms existing state-of-the-art methods with a large margin.

The rest of the paper is organized as follows. In the remainder of this section, we give a brief survey of research efforts that are closely related to our current work. In section II, we provide background information about graphs. In section III, we present the details of our graph pattern matching kernels. In section IV we use real-world data sets to evaluate our proposed methods and perform a comparison of ours to the current state-of-the-art. Finally we conclude and present our future plan in section V.

We survey the work related to graph classification methods by dividing them into two categories. The first category of methods explicitly collect a set of *features* from the graphs. Once a set of features is determined, a graph is described by a feature vector, and any existing classification methods such as Classification based on Association (CBA) [2] and decision tree [19] that work in an *n-*dimensional Euclidian space, may be applied for graph classification.

The second approach is to implicitly collect a (possibly infinite) set of features from graphs. Rather than computing the features, this approach computes the similarity of graphs, using the framework of “kernel functions” [26]. The advantage of a kernel method is that it has low chance of over fitting, which is a serious concern in high dimensional space with low sample size. We review these kernel functions in the following section.

In recent years a variety of graph kernel functions have been developed, with promising application results as described by Ralaviola *et al.* [21].

Product graph kernels use a feature space of all possible node label sequences for walks in graphs. Since the number of possible walks are infinite, there is no way to enumerate all the features in kernel computation [6]. Instead, a *product graph* is computed in order to make the kernel function computation feasible.

Rather than computing the shared paths exactly, which has prohibitive computational cost for large graphs, Kashima *et al*. [12] developed the *marginalized kernel* that uses a Markov model to generate random walks of a labeled graph. The kernel is then computed using the number of shared walks.

Spectrum kernels aim to simplify the aforementioned kernels by working in a finite dimensional feature space based on a set of subgraphs (or as special cases, trees, cycles, and paths). The kernel function is computed as the inner product between two feature vectors, such as counts of subgraph occurrences as in [4]. Transformations of the inner product, such as min-max kernel [27] and Tanimoto kernel [14], are also widely used. The subtree kernel [17] is a variation on the spectrum kernel that uses subtrees instead of paths.

The optimal assignment kernel, proposed by Frölich et al [5], differs significantly from the marginalized graph kernel in that it attempts to align two graphs, rather than compare sets of linear substructures. The similarity between the two graphs is computed by finding the maximal weighted bipartite graph between the two sets of nodes.

In this section we discuss a few important definitions for graph database mining: labeled graphs, subgraph isomorphic relation, graph kernel function, and graph classification.

A **labeled graph** *G* is a quadruple *G* = (*V,E,*Σ,λ) where *V* is a set of vertices or nodes and *E* *V* × *V* is a set of undirected edges. Σ is a set of (disjoint) vertex and edge labels, and λ: *V* *E* → Σ is a function that assigns labels to vertices and edges. We assume that a total ordering is defined on the labels in Σ.

A *graph database* is a set of labeled graphs.

A graph *G*′ = (*V*′, *E*′, Σ′, λ′) is **subgraph isomorphic** to *G* = (*V,E,*Σ,λ), denoted by *G*′ *G*, if there exists a 1-1 mapping *f: V*′ → *V* such that

- ∀
*v*V′,λ′(*v*) = λ(*f*(*v*)) - ∀(
*u, v*)*E*′, (*f*(*u*),*f*(*v*))*E*, and - ∀(
*u, v*)*E*′, λ′ (*u,v*) = λ(*f*(*u*),*f*(*v*))

The function *f* is a *subgraph isomorphism* from graph *G*′ to graph *G*. We say *G*′ *occurs* in *G* if *G*′ *G*. Given a subgraph isomorphism *f*, the image of the domain *V*′(*f*(*V*′)) is an *embedding* of *G*′ in *G*.

Figure 1 shows a graph database of three labeled graphs. The mapping (isomorphism) *q*_{1} → *p*_{3}, *q*_{2} → *p*_{1}, and *q*_{3} → *p*_{2} demonstrates that graph Q is subgraph isomorphic to *P* and hence *Q occurs in P*. Set {*p*_{1}, *p*_{2}, *p*_{3}} is an embedding of *Q* in *P*. Similarly, graph *S* occurs in graph *P* but not *Q*.

Here we present our design of a graph matching kernel with diffusion. We start the section by first presenting a general framework for graph matching. Then we present the pattern based graph matching kernel. Finally we show a technique we call “pattern diffusion” that significantly improves graph classification accuracy in practice.

To derive an efficient algorithm scalable to large graphs, our idea is to use a function Γ: *V* → * ^{n}* to map nodes in a graph to a

$${K}_{m}(G,{G}^{\prime})=\sum _{(u,v)\in V[G]\times V[{G}^{\prime}]}K(\mathrm{\Gamma}(u),\mathrm{\Gamma}(v))$$

(1)

*K* can be any kernel function defined in the co-domain of Γ. We call this function *K _{m} a graph matching kernel*. The following theorem indicates that

The graph matching kernel is symmetric and positive semi-definite if the function *K* is symmetric and positive semi-definite.

Proof sketch: the matching kernel is a special case of the *R*-convolution kernel and is hence positive semi-definite as proved in [16].

We visualize the kernel function by constructing a weighted complete bipartite graph: connecting every node pair (u,v) *V*[*G*] × *V*[*G*′] with an edge. The weight of the edge (u,v) is *K*(Γ (*v*), Γ (*v*)). In Figure 2, we show a weighted complete bipartite graph for *V*[*G*] = {*v*_{1},*v*_{2},*v*_{3}} and *V*[*G*′] = {*u*_{1},*u*_{2},*u*_{3}}

The maximum weighted bipartite graph for graph matching. Highlighted edges (*v*1, *u*2), (*v*2, *u*1), (*v*3, *u*3) have larger weights than the rest of the edges (dashed).

From the figure we see that if two nodes are quite dissimilar, the weight of the related edge is small. Since dissimilar node pairs usually outnumber similar node pairs, if we use linear kernel for nodes, we may have a noisy kernel function and hence loose our signal. In our design, we use the RBF kernel function, as specified below, to penalize dissimilar node pairs.

$$K(X,Y)={e}^{\frac{-{\left|\right|X-Y\left|\right|}_{2}^{2}}{2}}$$

(2)

where
${\left|\right|X\left|\right|}_{2}^{2}$ is the squared *L*_{2} norm of a vector *X*.

One way to design the function Γ is to take advantage of frequent patterns mined from a set of graphs. Intuitively if a node belongs to a subgraph *F*, we have some information about the local topology of the node. Following the intuition, given a node *v* in a graph *G* and a frequent subgraph *F*, we design a function Γ* _{F}* such that

$${\mathrm{\Gamma}}_{F}(v)=\{\begin{array}{ll}1\hfill & \text{if}\phantom{\rule{0.16667em}{0ex}}u\phantom{\rule{0.16667em}{0ex}}\text{belongs}\phantom{\rule{0.16667em}{0ex}}\text{an}\phantom{\rule{0.16667em}{0ex}}\text{embedding}\phantom{\rule{0.16667em}{0ex}}\text{of}\phantom{\rule{0.16667em}{0ex}}F\phantom{\rule{0.16667em}{0ex}}\text{in}\phantom{\rule{0.16667em}{0ex}}G\hfill \\ 0\hfill & \text{otherwise}\hfill \end{array}$$

We call the function Γ* _{F}* as a “pattern membership function” since this function tests whether a node occurs in a specific subgraph feature (“membership to a subgraph”).

Given a set of frequent subgraph = *F*_{1}, *F*_{2},…,*F*_{n}, we treat each membership function as a dimension and design the function Γ_{} as below:

$${\mathrm{\Gamma}}_{\mathcal{F}}(v)={({\mathrm{\Gamma}}_{{F}_{i}}(v))}_{i}^{n}$$

(3)

In other words, given *n* frequent subgraph, the function Γ maps a node *v* in *G* to a *n*-dimensional space, indexed by the *n* subgraphs, where values of the features indicate whether the node is part of the related subgraph in *G*.

In Figure 3, we duplicated the figure *Q* in Figure 1. We show two subgraph features *F*_{1} and *F*_{2}. *F*_{1} has an embedding in *Q* at {*q*_{1}, *q*_{2}} and *F*_{2} occurs in *Q* at {*q*_{1}, *q*_{3}}. We depict the occurrences using shadings with different color and orientations. For node *q*_{1}, if we consider subgraph *F*_{1} as a feature, we have Γ_{F1}(*q*_{1}) = 1 since *q*_{1} is part of an embedding of *F*_{1} in *Q*. Also, we have Γ_{F1}(*q*_{3}) = 0 since *q*_{3} is not part of an embedding of *F*_{1} in *Q*. Similarly we have Γ_{F2}(*q*_{1}) = 1 and Γ_{F2}(*q*_{3}) = 1. Hence Γ_{F1,F2}(*q*_{1}) = (1,1) and Γ_{F1,F2}(*q*_{3}) = (0,1). The values of the function Γ_{F1,F2} are also illustrated in the same figure using the annotated *Q*.

Here we introduce a better technique than the pattern membership function to capture the local topology information of nodes. We call this technique “pattern diffusion”. Our design has the following advantages:

- Our design is generic and does not assume any domain knowledge from a specific application. The diffusion process may be applied to graphs with dramatically different characteristics.
- The diffusion process is straightforward to implement and can be computed efficiently.
- We prove that the diffusion process is related to the probability distribution of a graph random walk. This explains why the simple process may be used to summarize local topological information.

Below, we outline the pattern diffusion kernel in three steps.

In the first step, we identify a seed as a starting point for the diffusion. In our design, a “seed” could be a single node, or a set of connected nodes in the original graph. In our experimental study, we always use frequent subgraphs for seeds since we can easily compare a seed from one graph to a seed in another graph.

In the second step given a set of nodes *S* as seed, we recursively define a diffusion function *f _{t}* in the following way.

The base *f*_{0} is defined as:

$${f}_{0}(u)=\{\begin{array}{ll}1/\mid S\mid \hfill & \text{if}\phantom{\rule{0.16667em}{0ex}}u\in S\hfill \\ 0\hfill & \text{otherwise}\hfill \end{array}$$

We define *f _{t}*

$${f}_{t+1}(v)={f}_{t}(v)\times (1-\frac{\lambda}{d(v)})+\sum _{u\in N(v)}{f}_{t}(u)\times \frac{\lambda}{d(u)}$$

(4)

In the notation, *N*(*v*) = {*u*|(*u,v*) is an edge} is the set of nodes that connects to *v* directly. *d*(*v*) = |*N*(*v*)| is the node degree of *v*. λ is a parameter that controls the diffusion rate.

The formula 4 describes a process where each node distributes a λ fraction of its value to its neighbors evenly and in the same way receives some value from its neighbors. We call it “diffusion” because the process simulate the way a value is spreading in a network. Our intuition is that the distribution of such a value encodes information about the local topology of the network.

To constrain the diffusion process to a local region, we use one parameter called diffusion time, denoted by *τ*, to control the diffusion process. Specifically we limit the diffusion process to a local region of the original graph with nodes that are at most *τ* hops away from a node in the seed *S*. In this sense, the diffusion should be named “local diffusion”.

Finally in the last step, for the seed *S*, we define the mapping function
${\mathrm{\Gamma}}_{S}^{d}$ as the limit function of *f _{t}* as

$${\mathrm{\Gamma}}_{S}^{d}=\underset{t\to \infty}{lim}{f}_{t}$$

(5)

And given a set of frequent subgraph = *F*_{1},*F*_{2},…,*F _{n}* as seeds, we design the pattern diffusion function
${\mathrm{\Gamma}}_{\mathcal{F}}^{d}$ as:

$${\mathrm{\Gamma}}_{\mathcal{F}}^{d}(v)={({\mathrm{\Gamma}}_{{F}_{i}}^{d}(v))}_{i}^{n}$$

(6)

Here we show the connection of pattern matching kernel function to the marginalized graph kernel [12], which uses a Markov model to randomly generate walks of a labeled graph.

Given a graph *G* with nodes set *V*[*G*] = {*v*_{1}, *v*_{2},…, *v _{n}*} and a seed

$$M(i,j)=\{\begin{array}{ll}{\scriptstyle \frac{\lambda}{d({v}_{j})}}\hfill & \text{if}\phantom{\rule{0.16667em}{0ex}}i\ne j\phantom{\rule{0.16667em}{0ex}}\text{and}\phantom{\rule{0.16667em}{0ex}}i\in N(j)\hfill \\ 1-{\scriptstyle \frac{\lambda}{d({v}_{i})}}\hfill & i=j\hfill \\ 0\hfill & \text{otherwise}\hfill \end{array}$$

In this representation, we compute the stationary distribution (*f _{S}* = lim

We notice that the matrix *M* corresponds to a probability matrix corresponding to a Markov Chain since

- all entries are non-negative
- column sum is 1 for each column

Therefore the vector *M*^{∞} ×*U*_{0} corresponds to the stationary distribution of the local random walk as specified by *M*. In other words, rather than using random walk to retrieve information about the local topology of a graph, we use the stationary distribution to retrieve information about the local topology. Our experimental study shows that this in fact is an efficient way for graph classification.

The optimal assignment (OA) kernel [5] carries the same spirit of our graph pattern matching kernel in that OA uses pairwise node kernel function to construct a graph kernel function. OA kernel has been utilized for cheminformatics applications and is found to deliver good results empirically.

There are two major differences between ours and the OA kernel. (1) OA kernel is not positive semi-definite and hence is not Mercer kernel in a strict sense. Non Mercer kernel functions are used to train SVM model and the problem is that the convex optimizer utilized in SVM will not converge to a global optimal and hence the performance of the SVM training may not be reliable. (2) OA utilizes a complicated recursive function to compute the similarity between nodes, which make the computation of the kernel function runs slowly for large graphs [23].

We summarize the discussions we present so far and show how the kernel function is utilized to construct an efficient graph classification algorithm at both the training and testing phases.

In the training phase, we divide graphs of the training data set $D={\{({G}_{i},{T}_{i},)\}}_{i=1}^{n}$ into groups according to their class labels. For example in binary classification, we have two groups of graphs: positive or negative. For multi-class classification, we partition graphs according to their class label where graphs have the same class labels are grouped together. The training phase is composed of four steps:

- Obtain frequent subgraphs. We identify frequent subgraphs from each graph group and union the subgraph sets together as our seed set .
- For each graph
*G*in the training data set, we use the node pattern diffusion function ${\mathrm{\Gamma}}_{\mathcal{F}}^{d}$ to label nodes in*G*. Thus the feature vector of a node*v*is a vector ${L}_{V}={({\mathrm{\Gamma}}_{{F}_{i}}^{d}(v))}_{i=1}^{m}$ with length*m*= ||.For two graphs*G, G*′, we construct the complete weighted bipartite graph as described in section III-A and compute the kernel*K*(_{m}*G*,*G*′) using Equation 1 and Equation 2.Train a predictive model using a kernel classifier.

In the testing phase, we compute the kernel function for graphs in the testing and training data sets. We use the trained model to make predictions about graph in the testing set.

- For each graph
*G*in the testing data set, we use ${\mathrm{\Gamma}}_{\mathcal{F}}^{d}$ to label nodes in*G*and create feature vectors as we did in the training phase.We use Equation 1 and Equation 2 to compute the kernel function*K*(_{m}*G, G*′) for each graph*G*in the testing data set and for each graph*G*′ in the training data set.Use kernel classifier and trained models to obtain prediction accuracy of the testing data set

Below we present our empirical study of different kernel functions including our pattern diffusion kernel.

We conducted classification experiments using six different graph kernel functions, including our Pattern Diffusion kernel, on sixteen different data sets. There are twelve chemical-protein binding data sets, and the rest are chemical toxicity data sets. We performed all of our experiments on a desktop computer with a 3Ghz Pertium 4 processor and 1 GB of RAM. In the following subsections, we describe the data sets and the classification methods in more detail along with the associated results.

In all classification experiments, we used the LibSVM [3] as our kernel classifier. We used nu-SVC with *v* = 0:5. Our classification accuracy (TP+TN/S, TP: true positive, TN: true negative, *S*: total number of testing samples) is computed by averaging over a 10-fold cross-validation experiment. Standard deviation is computed similarly. To have a fair comparison, we simply used default SVM parameters in all cases, and did not tune any parameters to increase the accuracy of any method.

We have selected sixteen data sets covering prediction of chemical-protein binding activity and chemical toxicity. The first seven data sets are manually extracted from the BindindDB database [15]. The next five are established data sets taken from Jorissen et al. [11]. The last four are from the Predictive Toxicology Challenge[8] (PTC). Detailed information for the data sets is available in supplementary material.

The BindingDB database contains more than 450 proteins. For each protein, the database record chemicals that bind to the protein. Two types of activity measurements K* _{i}* and IC

The Jorissen data sets also contains information about chemical-protein binding activity. In this case the provider of the data set carefully selected positive and negative samples and hence are more reliable than the data sets we created from BindingDB. For more information about the creation of the data sets, see [11] in details. The data sets from this study are: CDK2, COX2, FXa, PDE5, and A1A.

The Predictive Toxicology Challenge (PTC) data sets contain a series of chemical compounds classified according to their toxicity in male rats, female rats, male mice, and female mice. While chemical-protein binding activity is an important type of chemical activity, it is not the only type. Toxicity is another important, though different, kind of chemical activity we would like to predict in drug design. These data sets (PTC-FR/FM/MR/MM) are well curated and highly reliable.

We have selected 6 different kernel functions for evaluation: Marginalized[12], spectrum[4], tanimoto[14], subtree[17], optimal assignment[5], together with our graph pattern matching kernel.

Four kernel functions (Marginalized, spectrum, tanimoto, subtree) are computed using the open source Chemcpp v1.0.2 package [18]. The optimal assignment kernel was computed using the JOELib2 package, and is not strictly a kernel function, but still provides good prediction accuracy. Our graph pattern matching kernel was computed using our own MATLAB code.

Here we present the results of our graph classification experiments with various kernel functions. Due to space constraints we are unable to present the entirety of our data here, but it can be viewed in our technical report [22]. Figure 4 shows the classification accuracy for different kernel functions and data sets, averaged over a 10-fold cross validation experiment. The standard deviations (omitted) of the accuracies are generally very high, from 5–10%, so statistically significant differences between kernel functions are generally not observed.

We can see from the data that our method is competitive for all sixteen data sets. If we examine the accuracy of each kernel function averaged over all data sets, we see that our GPM kernel performs the best overall. Again, the standard deviations are high so the differences between the top performing kernels are not statistically significant. Still, with 16 different data sets we can see some clear trends: GPM kernel delivers the highest classification accuracy in 8 out of the 16 data sets, with tanimoto kernel best in 4, marginalized best in 2, subtree in 2, optimal assignment in 1 and spectrum in none.

Although GPM does not work well on a few data sets such as AChE, HIV-RT, MAPK, and PTC-FR/MR, overall it performs better when compared to any other kernel for a majority of data sets. It is better than every other kernel function in at least 10 of the 16 data sets.

In general the GPM, spectrum and tanimoto kernels perform the best, with over all average accuracy of about 80%. The subtree, optimal assignment, and marginalized also perform very good, in mid to high 70%. The min/max tanimoto kernel performed much worse than the other methods, and hence it was not included in the figure. Note that the optimal assignment kernel is missing a prediction accuracy for the FXa data set, this was due to a terminal error in the JOELib2 software used to calculate this kernel on this data set.

Graphs have proven to be powerful tools for modeling complex, high-dimensional biological data; building highly accurate predictive models for chemical graph classification is a goal for cheminformatics and drug design. In this paper we have demonstrated the utility of a novel graph kernel function, graph pattern matching kernel (GPM kernel). We showed that the GPM kernel can capture the intrinsic connection between a chemical and its class label and has the lowest testing error in majority of the data sets we evaluated.

This work has been supported by the Kansas IDeA Network for Biomedical Research Excellence (NIH/NCRR award #P20 RR016475), the KU Center of Excellence for Chemical Methodology and Library Development (NIH/NIGM award #P50 GM069663), and NIH grant #R01 GM868665.

Aaron Smalter, Department of Electrical Engineering and Computer Science, University of Kansas, USA.

Jun Huan, Department of Electrical Engineering and Computer Science, University of Kansas, USA.

Gerald Lushington, Molecular Graphics and Modeling Laboratory, University of Kansas, USA.

1. Austin C, Brady L, Insel T, Collins F. Nih molecular libraries initiative. Science. 2004;306(5699):1138–9. [PubMed]

2. Liu YM Bing, Hsu Wynne. Integrating classification and association rule mining. Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining; 1998.

3. Chang C, Lin C. Libsvm: a library for support vector machines. 2001. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm.

4. Deshpande M, Kuramochi M, Karypis G. Frequent substructure-based approaches for classifying chemical compounds. IEEE Transactions on Knowledge and Data Engineering. 2005

5. Fröohlich, Wegner J, Sieker F, Zell A. Kernel functions for attributed molecular graphs - a new similarity-based approach to adme prediction in classification. QSAR & Combinatorial Science. 2006

6. Gärtner T, Flach P, Wrobel S. On graph kernels: Hardness results and efficient alternatives. Sixteenth Annual Conference on Computational Learning Theory and Seventh Kernel Workshop; 2003.

7. Haussler D. Technical Report UCSC-CRL099-10. Computer Science Department; UC Santa Cruz: 1999. Convolution kernels on discrete structures.

8. Helma C, King R, Kramer S. The predictive toxicology challenge 2000–2001. Bioinformatics. 2001;17(1):107–108.

9. Huan J, Wang W, Washington A, Prins J, Shah R, Tropsha A. Accurate classification of protein structural families based on coherent subgraph analysis. Proceedings of the Pacific Symposium on Biocomputing (PSB) 2004:411–422. [PubMed]

10. Huang Y, Li H, Hu H, Yan X, Waterman MS, Huang H, Zhou XJ. Systematic discovery of functional modules and context-specific functional annotation of human genome. Bioinformatics. 2007:222–229. ISMB/ECCB Supplement. [PubMed]

11. Jorissen R, Gilson M. Virtual screening of molecular databases using a support vector machine. J Chem Inf Model. 2005;45(3):549–561. [PubMed]

12. Kashima H, Tsuda K, Inokuchi A. Marginalized kernels between labeled graphs. Proc. of the Twentieth Int. Conf. on Machine Learning (ICML); 2003.

13. BKM, KH-P Shortest-path kernels on graphs. Proc. of International Conference on Data Mining; 2005.

14. RL, SSJ, SH, BP Graph kernels for chemical informatics. Neural Networks. 2005;18:1093–1110. [PubMed]

15. Liu T, Lin Y, Wen X, Jorrisen RN, Gilson M. Bindingdb: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Research. 2007;35:D198–D201. [PubMed]

16. Lyu S. Mercer kernels for object recognition with local features. IEEE Computer Vision and Pattern Recognition. 2005:223–229.

17. Mahe P, Vert J. Graph kernels based on tree patterns for molecules; Technical Report HAL:ccsd-00095488; Ecoles des Mines de Paris; Sep, 2006.

18. Perret J-L, Mahe P, Vert J-P. Chemcpp: an open source c++ toolbox for kernel functions on chemical compounds. 2007. Software available at http://chemcpp.sourceforge.net.

19. Quinlan JR. C4.5: Programs for Machine Learning. Morgan Kaufmann; 1993.

20. Ramon J, Gärtner T. Expressivity versus efficiency of graph kernels. Technical Report, First International Workshop on Mining Graphs, Trees and Sequences; 2003.

21. Ravaliola L, Swamidass SJ, Saigo H. Graph kernels for chemical informatics. Neural Networks. 2005 [PubMed]

22. Smalter A, Huan J, Lushington G. Technical report. University of Kansas; Aug, 2008. Gpm: A graph pattern matching kernel with diffusion for accurate graph classification. [PMC free article] [PubMed]

23. Smalter A, Huan J, Lushington G. Structure-based pattern mining for chemical compound classification. Proceedings of the 6th Asia Pacific Bioinformatics Conference; 2008. [PMC free article] [PubMed]

24. Horvath SW Tamas, Gartner Thomas. Cyclic pattern kernels for predictive graph mining. SIGKDD. 2004

25. Tolliday N, Clemons PA, Ferraiolo P, Koehler AN, Lewis TA, Li X, Schreiber SL, Gerhard DS, Eliasof S. Small molecules, big players: the national cancer institute’s initiative for chemical genetics. Cancer Research. 2006;66:8935–42. [PubMed]

26. Vapnik V. Statistical Learning Theory. John Wiley; 1998.

27. Wale N, Watson I, Karypis G. Comparison of descriptor spaces for chemical compound retrieval and classification. Knowledge and Information Systems. 2007

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |