Glycans are a diverse group of carbohydrates that play an intricate role in fundamental physiological processes through their modulation of protein activity and their ability to fine-tune biological responses
[4]. Sulfated glycosaminoglycans represent a group of linear sugars that are incorporated as proteoglycans at the surfaces of cells and in the extracellular matrix. GAGs positioned at the cell-extracellular matrix interface have a unique opportunity to interact with a wide spectrum of proteins, including growth factors, cytokines, chemokines, morphogens, proteases, antiproteases, cell adhesion molecules, and extracellular matrix components
[1],
[8]. The resulting GAG-protein interactions provide a mechanism by which GAGs exert their control over critical biological processes. However, unlike the binary on/off concept generally applied to the understanding of protein activity regulation (e.g., receptor binding and activation), GAGs exhibit a gradation of control through the diverse nature of their disaccharide sequence, chemical organization, and chain length. The inherent heterogeneity of GAG chains is a product of nontemplate-based biosynthesis
[5]. The ability to characterize this complexity remains a challenge with the current analytical methods, and as a consequence, progress in deciphering GAG structure-function relationships has been hindered.
The most structurally complex member of the GAG family is heparan sulfate, having an information-dense chain carrying potentially 48 different disaccharide structures segregated into chemical blocks of highly sulfated and largely unsulfated domains
[1],
[8],
[9]. Heparan sulfate is a major physiological player at the cell-extracellular matrix interface where it has been shown to interact with and mediate the activity of a number of proteins. Many HS-protein interactions appear to be less dependent on the specific disaccharide sequence and more dependent on the domain organization of the HS chain
[3],
[5],
[9],
[12]. Because the analytical capability to detect domain organization is currently limited, new methods that can provide insight into HS chain organization will be extremely useful to researchers in the field.
Several early attempts with mathematical modeling
[51] and computer simulation
[52]–
[55] were focused on the structure of heparin. These methods were used in conjunction with cleavage experiments to test alternative hypotheses concerning the action pattern of heparin lyase I and the arrangement of specific oligosaccharides within the heparin chain. Although the methodology was later applied to hyaluronic acid and the action of hyaluronate lyase
[56], the extension of these techniques to the more complex structure of HS was not implemented.
The computational approach described in this study offers a unique way to probe the organizational structure of HS chains. Using minimal experimental data from disaccharide analysis and selective heparin lyase digestion, the computational routines can generate chains according to rules of HS biosynthesis and lyase specificity and then transform them into strings of user-defined domains for pattern analysis. As demonstrated with HS chain populations from two different cell culture sources, the model has the ability to predict significant differences in overall domain organization properties as well as in the density and distribution of specific functional motifs. HS activity measurements revealed that these structural differences are related to functional differences in HS-protein interactions. Hence these tools can be used in conjunction with experimental measurement to investigate the relationship between proposed structural requirements and functional activities in HS.
The existence of a causal relationship between cell-surface HS structure and cell behavior is supported by various studies. HS chains from different cell types have been reported to have consistent structural variations that result in distinct biological functions
[57]–
[59]. For example, experimental measurements of HS chains purified from the surfaces of mouse mammary gland epithelial cells and embryonic fibroblasts showed differences in structure (chain length and disaccharide composition of highly sulfated domains) as well as binding to type I collagen
[58]. The current investigation is consistent with these earlier studies and provides additional evidence for cell type-specific differences in HS structure that have direct functional consequences. In this case, however, these differences include not only analytical measurements of disaccharide compositions for the chain and lyase-specific domains but also computer-predicted patterns of overall domain structure and specific functional motifs. On both a large and small scale, the differences in HS structure appear to contribute to the ability of different cell types to appropriately respond to molecular signals in their particular microenvironment.
For the pulmonary fibroblasts and epithelial cells used in this study, the cellular microenvironments are quite different. The epithelial cells form a tight barrier of cell-to-cell contacts in a layer (epithelium) that lines the pulmonary airspace. There is minimum extracellular matrix around the epithelial cells except at the basal surface where they juxtapose a thin layer called the basal lamina. By contrast, the fibroblasts exist in isolation from one another in a generous layer of extracellular matrix and fibrous polymers that forms the connective tissue support of the epithelium
[60]. As a result of these unique microenvironments, distinctly different biological responses are required from the resident cells. Because the epithelial cells are positioned at the forefront of the airway, their major function is to defend the lung by actions that include providing a barrier and clearance mechanism for environmental agents, modulating the inflammatory response, and regulating cellular activities in response to injury
[61]. The fibroblasts, however, embedded within the interior of the tissue, assiduously maintain the integrity of the structure by producing the components of the extracellular matrix (e.g., collagens, elastin, fibronectin, and proteoglycans) and when required, migrate to sites of injury to proliferate and produce large amounts of matrix
[60],
[62].
Considering the diverse biological functions of pulmonary fibroblasts and epithelial cells, it is not surprising that a different structural organization would be predicted for the cell-surface HS chains of these two cell types. Since the epithelial cells are the first line of defense against injury caused by excessive release of elastase by neutrophils, they may have need for a more potent HS structure for binding and inhibiting this protease as a means to restrict its action to sites of injury or infection. Their HS structure may be more condensed (higher frequency of highly sulfated domains) because of the shorter range of operation among the closely packed epithelial cells. On the other hand, the fibroblasts exist as a sparse population in the extracellular matrix where distances are considerable. An HS structure that is more spread out (lower frequency of highly sulfated domains) may be more practical for these longer range interactions. Since the extracellular matrix also contains a convenient source of HS proteoglycans, excessive elastase activity within the matrix may be more readily addressed by extracellular HS chains, either as intact proteoglycans or as fragments released by injury. Consequently, the HS chains on the surface of the fibroblasts may have less need to be as effective in inhibiting elastase as their counterparts on the epithelial cells.
The ability to read and interpret the patterns of HS chains will have far-reaching implications for understanding the biological function of these complex glycans. The analysis of these patterns can be handled in many different ways, and as illustrated with the computer-generated chains of this study, the results can reveal varied aspects of the same chain depending on the chosen method. If the emphasis is placed on the macro organization of the chain, the calculation of the average domain size and the Fourier power spectrum are reasonable techniques for characterizing the overall domain pattern for a group of chains. However, the model generates unique chains, and although the general pattern gives a sense of the properties and potential activity of the population of chains as a whole, there is no single chain pattern that exactly matches the average chain pattern (compare and ). The existence of individual chains with unique sequences provides an opportunity to evaluate the relative density of rare patterns within the population of chains or to search for distinct local patterns within each chain. Thus, differences in the biological activities of various HS chain populations, including HS isolated from diseased and nondiseased tissues, can be correlated to differences in either the overall domain organization or the presence of specific structural motifs within the population.
Even though sequence analysis was not the impetus behind developing this computational model, it appears that it may be a powerful byproduct of the overall process. In fact, a system that can integrate the computational model with the current analytical sequencing technology may have the potential to actually “sequence” entire biologically active HS chains. The strategy behind such a system would be to use a sequencing technique to explicitly define the major oligosaccharides (ten or less sugars) from the partial degradation of the chain by one or more schemes. These fully sequenced chain fragments would then be input as a set of constraints for the model. As each chain is generated by the program, the simulated sequence would be searched for matches with the real fragments. Chains would be ranked as a function of the number of matches, and the top-scoring chain or chains would represent the best solution for the sequence of the real chain.
Although future work will focus on refining the computational model, there are two aspects of its basic design that should be emphasized. On a practical level, the first and perhaps more important factor is that the model does not require extraordinary means to achieve results. The experimental data are fairly straightforward to obtain by standard laboratory methods, and the computer program is executable on a personal computer. The second and less obvious factor is that the model has a modular structure. This allows for great flexibility in modifying specific parts of the model, such as the rules for chain position or lyase digestion, or in adding new parts, such as the generation of chains from a normal distribution of lengths. Moreover, because of this modular structure, the model is rather broad in application and can be tailored to other glycosaminoglycans or enzymes, such as chondroitin sulfate and associated chondroitin lyases.
While basic information is sufficient for operation of the model, it is apparent that the more complete these data are, the more closely the predicted chains will represent the real chains. For example, instead of using estimates of the glucuronic acid/iduronic acid ratio from the literature, improved values can be determined by comparing data from chemical and enzymatic degradations of the actual sample
[36]. As another example, if the molecular weight distribution of the sample is obtained, an average chain length or a distribution of chain lengths can be defined, replacing the generation of chains over an arbitrary range of lengths with a more realistic representation of the sample
[63].
An overriding issue for any simulation is how well the predictions agree with the real system. Confidence in the predicted results can only be established through model validation, and future work will focus on addressing this critical element
[64]. Although there is insufficient knowledge on the domain organization of HS chains for direct comparison, an indirect method can be used to apply the model to experimental data from HPLC profiling that identify both the disaccharide and oligosaccharide products after selective heparin lyase digestion. The presence of substantial disagreement may suggest refinements to the internal rules for chain synthesis or enzyme degradation that will bring the model predictions closer to reality. A high level of agreement between the predicted results and the data will establish credibility for the model.
The true value of this model rests on whether it answers the question that prompted its development; namely, can knowing the domain organization of HS chains increase understanding of HS function. For example, activities of HS samples such as protein binding, enzyme inhibition, and cell regulation can be measured and related to domain structure. Toward this end, evaluation of the elastase inhibitory potential of the HS samples used in this study has indicated differences that might relate to altered domain organization. These differences may also have physiological implications regarding the particular role of these HS populations in their cell type of origin. For instance, the ability of HS from lung epithelial cells to inhibit elastase activity may contribute to the normal control of tissue damage at sites of inflammation where neutrophil elastase has been shown to be involved. As the model is refined and applied in conjunction with additional functional measurements with a wide range of HS samples, there are reasonable expectations that new mechanisms for the activity of HS and proteoglycans will be revealed.
The conceptual framework for an innovative computational approach to predict patterns of domain organization within a population of HS chains is presented. This model will give investigators the ability to consider high-level chain organization in understanding HS-protein interactions. The approach described here will likely provide the basis for the development of a new class of tools to probe for structure-function relationships in glycosaminoglycans that may ultimately be used to design selective drugs that target GAG-protein interactions associated with disease.