|Home | About | Journals | Submit | Contact Us | Français|
Angiogenesis is the growth of new blood vessels from existing vasculature. Excessive vascularization is associated with a number of diseases including cancer. Anti-angiogenic therapies have the potential to stunt cancer progression. Peptides derived from type IV collagen are potent inhibitors of angiogenesis. We wanted to gain a better understanding of collagen IV structure-activity relationships using a ligand-based approach. We developed novel peptide-specific QSAR models to study the activity of the peptides in endothelial cell proliferation, migration, and adhesion inhibition assays. We found that the models produced quantitatively accurate predictions of activity and provided insight into collagen IV derived peptide structure-activity relationships.
Excessive vascularization is a hallmark of many diseases including cancer, rheumatoid arthritis, diabetic nephropathy, pathologic obesity, age-related macular degeneration, and asthma. Compounds that inhibit angiogenesis represent potential therapeutics for many diseases. Judah Folkman performed pioneering research in the field of angiogenesis;1 his work lead to the identification of a number of polypeptides with anti-angiogenic activity.2 One of polypeptides called endostatin was derived from the noncollagenous (NC1) domain of collagen XVIII.3 Work led by Raghu Kalluri resulted in the development of small antiangiogenic peptides from the NC1 domain of collagen IV including canstatin,4 arrestin,5 and tumstatin.6 These collagen IV derived fragments were reviewed in the context of other angiogenesis modulating compounds.7–9 Based on these parent compounds, work in our laboratory identified more than 100 similar peptide sequences from diverse parent proteins throughout the proteome.10 The set of parent proteins included collagen IV, CXC chemokines, type I thrombospondin domain (TSP-1)-containing proteins, serpins, somatotropins, and tissue inhibitors of metalloproteinases (TIMPs). Work carried out in our group experimentally validated in vitro inibition of endothelial cell (EC) proliferation and migration by peptides derived from type IV collagens,11 thrombospondin domain-containing proteins,12, 13 and CXC chemokines.14 These studies showed that a large fraction of the peptides have antiangiogenic potential. Subsequently, our laboratory tested some of these peptides in vivo using mouse xenograft models of breast and lung cancer,15, 16 and ocular models.17 The peptides derived from type IV collagen are attractive targets because of their efficacy against multiple angiogenic properties (i.e. endothelial cell proliferation, migration, and adhesion).18
A better understanding of the structure-activity relationship of type IV collagen peptides could help us better understand the mechanism of action and produce more active peptides. For many of these peptides, the receptor had not been elucidated. When the receptor is unknown, ligand-based modeling approaches must be used. Examples of ligand-based design methods include pharamcophore modeling19–22 and quantatitive structure-activity relationship (QSAR)23–26 analysis. These methods correlate diverse aspects of molecular structure and flexibility with a quantatitive measure of activity. Some work has been done on developing peptide-specific feature sets for QSAR.27, 28 Others make use of position weight matrices to describe a family of peptides.29 Many of these methods require solving NP-hard30 problems. That means a polynomial time algorithm is not known for solving these problems. For large datasets, these methods must resort to using inexact approaches and heuristics.
To continue developing the type IV collagen-derived peptides, we aimed to (i) develop techniques for computationally efficient, peptide-specific, QSAR analysis, (ii) enable predictions of peptide activity, and (iii) gain a better understanding of the structure-activity relationship of collagen IV derived peptides. In this work, we described several novel peptide-specific QSAR methods that helped us address these aims. We formulated the models using convex optimization in a way that could be solved quickly to global optimality. We used experimentally-determined activity data from collagen IV peptides to develop individual models for endothelial cell proliferation, migration, and adhesion. We validated the QSAR models by making activity predictions and performing experiments for an external set of peptides. The activity of the external set of peptides was verified by endothelial cell proliferation, migration, adhesion, and tube formation assays.
This study is based on a libary of 23 collagen IV derived peptides. The founding peptide 0 (SP2000)10 was found as a homolog of tumstatin6 in the human proteome. These peptides consisted of a series of truncations and selected amino acid substitutions designed to improve translational potential. In Table 1 we present the activity of the 23 (21 training + 2 external verification) peptides in endothelial cell proliferation (at 100μM), migration (at 50μM), and adhesion (at 100μM). Peptide concentrations were chosen to provide diversity in activity measurements. All experiments were performed in duplicate and the result of each experiment was the average of three replicates on the same plate. Activity measurements are given as a percentage of the vehicle control.
In Figure 1, we outline the peptide modeling procedure. The methods are based on data that associates peptide features with a quantitative activity score (e.g., endothelial cell (EC) proliferation inhibition activity). Peptides are converted into unique sparse vector of features. For example, Figure 2 shows the vectorization of the short peptide LRRFSTMPFMF. In the simplest methodology that we consider, each feature uniquely identifies an amino acid at a single position. We use convex optimization to select features that differentiate highly active and inactive peptides. We formulate the convex optimization objective in a way that can be solved quickly to global optimality.
We developed four approaches to model the data in Table 1 and learn about the structure-activity relationship of type IV collagen peptides. The approaches were based on the least absolute shrinkage and selection operator (Lasso).31 The approaches differed in the features that they consider and the weight assigned to training examples. The specific details of these approaches can be found in the Materials and Methods section.
In Table 2, we compared four methods for their ability to predict peptide efficacy. We compared each of these methods to a naive featureless method that always predicted the average activity from the training set. The methods were evaluated on three datasets that measured the ability of peptides to inhibit endothelial cell proliferation (A), migration (B), and adhesion (C). To compare these approaches, we took a leave-one-out cross validation (LOOCV) approach. The concept of LOOCV is that we use all but a single peptide to train the model. We then use that model to predict the efficacy of the single peptide, which was left out. This allowed us to compute the error between predicted and observed activity measurements. To determine which methods were statistically superior to others, we conducted t-tests for all pairs of methods based on their squared test errors. Significantly low test errors indicate better performance. The table gives the p-value associated two-tailed paired t-test. At the 0.05 level, all of the models had lower error than the naive featureless method. Also, the non-linear Lasso method had significantly less error than the Lasso method. These results held over all three datasets. Based on these results, the rest of the study was performed using non-linear Lasso. In Figure 3, we show the observed and leave-one-out predictions for each method for all peptides in the dataset in endothelial cell proliferation, migration, and adhesion assays. The figure illustrates that no single method had the least error in all trials, and that the predictive performance is good even in cases where percent inhibition is negative as seen in the migration and adhesion datasets.
In the previous sections we make extensive use of leave-one-out cross validations to estimate generalization error. We concluded from these analyses that non-linear Lasso had statistically lower generalization error than Lasso. Low generalization error is an indication that the features used in the models may be useful for understanding the structure-activity relationship of type IV collagen peptides.
In this section and unlike the previous sections, we train models for endothelial cell proliferation, migration, and adhesion based on all of the data in Table 1 except for the external validation set consisting of 27 and 35. The models are structured such that important features receive high weight. The model features (first column) and weights (second column) are given in decreasing order in Table 3. The features are indicated for each row by the change in sequence from the preceding row. The weights were determined using the non-linear Lasso method (as described in Materials and Methods). We analyse these features for QSAR analysis. This approach gives us a way of indirectly identifying putative pharmacophores for the collagen-IV derived peptides.
When multiple amino acids are viable options in a position, they are shown in decreasing order of importance. In the migration model (Table 3, C) in the 18th position, L-α-amino-n-butyric acid (indicated by X) is preferred with a weight 0.018 over alanine with a weight of 0.016. The proliferation model (Table 3, A) makes it clear that there are important regions on the N-terminus (LRRF) and the C-terminus (NINNVXN). In the adhesion model (Table 3, B), the highly weighted asterisks in the 20th position indicates that truncation of the phenylalanine may improve the anti-adhesion activity of the peptide. Like the proliferation model, the regions on the N-terminus (LRRF) and C-terminus (NINNVX) are selected. Unlike the proliferation model, the L-α-amino-n-butyric acid in the 12th position is one of the most important features for anti-adhesion activity. The migration model (Table 3, C) highlights the C-terminal (ANINNVXN) as a useful indicator of anti-migration activity; however for full anti-migration activity the LRRF sequence is also required. From all three models we found that both the C-terminal sequence LRRF and the N-terminal sequence XNINNVXN are required for full activity.
We examined the structure of peptide 0 as it exists in the native type IV collagen NC1 domain (pdb:1T60). In Figure 4, we show the conformation of the peptide in the native protein. By computing the solvent accessible surfaces of the protein, we found two exposed regions corresponding to the N-terminal (LRR) and C-terminal (INN). These regions correlate with the peptide motifs needed for anti-angiogenic activity.
Two peptides, 27 and 35, were held out as an external validation set. Models for proliferation, migration, and adhesion were trained using all other peptides from Table 1. Based on these models, peptides 27 and 35 were predicted to have similar activity. They were predicted to have 54.15, 93.35, and 97.54 percent proliferation, migration, and adhesion inhibition, respectively. Based on the experimentally determined activities given in Table 1 and predicted activities, R2 values on the external validation set were 0.84, 0.85, and 0.99 for the proliferation, migration, and adhesion models, respectively.32 From the R2 values on the external validation set, we could conclude that the models were predictive for anti-angiogenesis phenotypes. In Figure 5, endothelial cell tube formation assays at 100μM confirmed the potency of peptides 27 (Figure 5, C) and 35 (Figure 5, D), relative to a vehicle control (Figure 5, A) and a weaker peptide 8 (SP2008) (Figure 5, B).
Type IV collagens are basement membrane proteins that are essential for binding cells to the extracellular matrix.33 Type IV collagen derived peptides have proven to be effective inhibitors of angiogenesis.34 Using the models trained using the data from Table 1, we found a pair of regions namely LRRF at the C-terminus and XNINNVXN at the N-terminus are needed for full activity. This pair of important regions indicates that secondary structure or multiple binding sites may be important for the endothelial cell proliferation, migration, and adhesion inhibition activity of type IV collagen derived peptides. These results are consistent with a previous study on the tumstatin peptide by Eikesdal et al..35 They found that the mutations to the NINN region resulted in a significant change in EC proliferation inhibition. These results also indicate that truncations to the 20-mer peptide with the exception of the phenylalanine in the 20th position would be detrimental to the activity of the collagen IV derived peptides.
In this article, we describe four novel peptide-specific QSAR approaches. We compared these approaches by testing their ability to predict the outcome of in vitro experiments. The comparison indicated that one approach called non-linear Lasso had statistically lower generalization error than Lasso (Table 2). We showed the individual predictions made by this approach in Figure 3. We found that the predictions made using the all four approaches were statistically significant compared to a method based on naive predictions. These results gave us confidence in the utility of the peptide-specific QSAR models. We analyzed the features of these models to learn about the structure-activity relationship of collagen IV derived peptides. By analysing the structure of the collagen IV NC1 domain, we found that the solvent accessible regions of the peptide in the parent protein correlated with the motifs needed for anti-angiogenic activity.
All peptides were synthesized by New England Peptide with at least 95% purity evaluated using both HPLC and MALDI by the manufacturer. Table 1 gives the compound structures in terms of the one letter amino acid codes. Truncated amino acids are indicated by asterisks. The error in the activity measurements was based on two biological replicates each derived from the mean of three technical replicates. The data are shown as percent inhibition relative to a vehicle control. A single dose was selected for each dataset that produced a diverse set of activities for the candidate peptides. Proliferation and adhesion measurements were taken at a peptide concentration of 100μM, while migration measurements were taken with a compound dose of 50μM.
Human umbilical vein endothelial cells (HUVEC) were purchased from Lonza and were grown under the manufacturer’s recommendation using Endothelial Basal Media (EBM-2) supplemented with the Bullet Kit (EGM-2, Lonza). Cells of passages 2–7 were used for experiments. Cells were grown at 37°C in a humidified incubator with 5% CO2.
Colorimetric WST-1 reagent (Roche, IN) was used to perform the proliferation assays. HUVECs were plated in 96-well plates at a 2000 cell/well density. Peptides at 100 μM in fully supplemented media were added to the adherent cells and incubated for 72 hours. WST-1 reagent was added in serum free media for four hours and the color intensity was measured at 450 nm with Victor-V plate reader (Perkin Elmer, MA).
The effect of the migration inhibition of the peptides on the cells was determined using electrical impedance measurements with a continuous and real time migration assay (RT-CIM, ACEA Biosciences, CA). The top compartment of the CIM plate was coated with fibronectin (20μg/ml) and 45,000 HUVEC/well were added either in the presence or absence of the peptide at 50 μM. Fully supplemented media was added to the bottom compartment serving as chemoattractant. The migration of the cells is measured by the integrated sensors in the bottom side of the porous membrane which divides the two chambers. This technology allows for easy quantification of cell migration by monitoring the cell index (derived from the measured impedances).
The adhesion inhibitory potential of the peptides was also measured using RT-CIM technology. In this instance single compartment E-plates (ACEA, Biosciences,CA) were used, in which 25,000 HUVEC/well were plated in the presence or absence of the peptides at 100 μM and the adhesion measured by the changes in the cell index amplitude for 3 hours.
Tube formation assay was performed by following the published protocol by Arnaoutva et. al.36 Briefly, 96 well plates were coated with Geltrex, Reduced Growth Factor Basement Membrane Matrix (Invitrogen, CA) (50μl/well) and incubated at 37ºC for 30 minutes to allow gelation to occur. HUVECs were added to the top of the gel at a density of 15,000 cells/well in the presence or absence of the peptide (100 μM). The positive control included the same amount of solvation vehicle (i.e., DMSO) as the experimental condition. Cells were incubated at 37ºC with 5%CO2 overnight and pictures were captured with a CCD Sensicam camera mounted on a Nikon inverted microscope.
We took as input a set of peptide sequences along with an experimentally measured efficacy for each peptide. The method returned a model which could be used to predict the efficacy of hypothetical peptides from the same class. The method worked by converting each peptide sequence into an input space of amino acids and positions. Those were the explanatory variables in the peptide-specific QSAR modeling framework. A weight for each feature was learned using non-negative Lasso regression37 with the peptide efficacies as response variables. The scaling term for the L1-norm regularizer was determined using leave-one-out cross validation. Despite evaluating many features, the use of L1-norm regularization allowed the model to avoid over-fitting. The convex nature of the optimization problem allowed the method to quickly reach the globally optimal solution without a combinatorial search of input space. The software which was implemented in Matlab using CVX38 is freely available upon request.
Without loss of generality, we describe the method in terms of the 20 common amino acids. Given m peptides of length n, let pij be amino acid j in peptide i. Let r be a list of all 20 natural amino acids. Let S be a 20 by 20 amino acid association matrix, in this study we use the PAM250 matrix,39 such that S(a,b) gives the association between amino acid a and b. We use the PAM250 matrix as a principled approach to give weight to amino acids with similar biochemical properties. Let A be an m by 20n matrix that encodes the amino acid sequences, such that
Let b be a vector of length m representing the activity of each peptide. In this study, the quantitative measure of activity is given by percent endothelial cell proliferation, migration, or adhesion inhibition. Our goal is to learn values in the weight vector x of length 20n. The values in the weight vector x correspond to the relative importance of the features considered in the model. Using this formulation, we solve the standard Lasso objective subject to x ≥ 0. Lasso is composed of the least-squares objective regularized by the L1 norm of the weight vector. The parameter λ influences the sparsity of the weight vector x
In the previous section, we described the linear version of Lasso using only the input space described in A. As an alternate approach, we expand on the input space given in the previous approach to a feature space consisting of pairs of features. Let A′ be an m by (20n)2 matrix. Although the number of features is large, we use sparse matrices to eliminate unused variables and reduce the problem size. We make use of aggressive regularization to avoid over-fitting. The Lagrange multiplier λ is selected automatically by leave-one-out cross validation. We use the objective from equation (2) except that we make use of A′ and the x vector is of length (20n)2.
We extend both linear and non-linear Lasso to construct locally-weighted variants of both methods. The idea is that we will weight training examples in A by their proximity to the vectorized peptide y to be predicted. The intuition is that we prefer to make smaller training errors for points close to the test point y. The weight w assigned to each training example in A is given in equation (3).
The weighted objective for the linear version of Lasso is given in equation (4).
To evaluate the quality of the predictions given by the peptide-specific QSAR approaches, we perform leave-one-out cross validation. For each of the m peptide examples, we split the examples into a test set containing the ith peptide and a training set containing all other peptides. We use the training set of peptides to obtain the weight vector x. Let pi be the vector of length 20n that encodes the ith peptide. The predicted activity qi for the ith peptide is given by
The statistical significance of the predictions is determined by comparing the set of residuals generated using our model predictions with residuals generated using naive model predictions. We test the null hypothesis that the residuals between the observed and predicted values are equal to the residuals between the observed and naive model predictions (i.e., a model that always predicts the mean training efficacy). The alternative hypothesis is that the residuals between the observed and predicted values are less than the residuals between the observed and naive model predictions. We generate a p-value for each model using a one-sided paired t-test. We used R2 as a metric of model performance on the external validation set.
In this metric, experimentally observed values y are compared with predicted values ŷ relative to the mean observed value from the training set .
The work was supported by NIH grants R01 CA138264, R01 HL101200, and U54RR020839.
Authors’ contributionsCGR designed the method, performed the analysis, and wrote the paper. EVR, JEK and NBP performed the in vitro experiments. JSB and ASP motivated the problem, provided guidance for the analysis and manuscript. All co-authors edited the paper.
The authors’ declare no competing interests.