|Home | About | Journals | Submit | Contact Us | Français|
Atherosclerosis, a chronic inflammatory disease of the vascular system, presents significant challenges to developing effective molecular diagnostics and novel therapies. A systems biology approach integrating data from large-scale measurements (e.g. transcriptomics, proteomics and genomics) is successfully contributing to deciphering regulatory networks underlying the response of many different cellular systems to perturbations. Such a network analysis strategy using pathway information and data from multiple measurement platforms, tissues and species is a promising approach to elucidate the mechanistic underpinnings of complex diseases. Here, we present our views on the contributions that a systems approach can bring to the study of atherosclerosis, propose ways to tackle the complexity of the disease in a systems manner and review recent systems-level studies of the disease.
Atherosclerosis is a complex multifactorial disease characterized by the accumulation of inflammatory cells, lipoproteins and fibrous tissue in the wall of large arteries (Lusis, 2000). It is the primary cause of heart attacks and strokes and thus is the underlying cause of the majority of deaths globally, accounting for approximately 29% of all deaths worldwide. Cardiovascular disease disproportionately affects low and middle income countries and is projected to remain the single leading cause of death worldwide for the next 20 years (World Health Organization, 2009). Despite the enormous economic and social burden of this disease, we lack both a full understanding of its underlying mechanism and the ability to personalize its diagnosis and treatment.
Research over the past several decades has revolutionized our understanding of the pathogenesis of atherosclerosis. Previously atherosclerosis was viewed primarily as a passive process of cholesterol accumulation in the vessel wall, and the clinical manifestations were attributed primarily to the degree of stenosis. We now understand that atherosclerosis is a complex and active process and that the ultimate clinical presentation results from the interaction of multiple cell types and organ systems (Corti et al, 2004; Libby & Theroux, 2005). Because of its underlying complexity, the study and treatment of atherosclerosis presents several fundamental challenges that the emerging discipline of systems biology is uniquely suited to address.
Systems biology is the comprehensive, quantitative analysis of the manner in which all of the components of a biological system interact over time (Zak & Aderem, 2009). To study atherosclerosis, one would for example consider the human body as the biological system and naturally, all the molecules, cells, tissues and organs that play a role in the pathology of this disease are its components. While systems biology is dependent on new ‘omics’-scale technologies, it is not defined by these technologies. Rather, the systems approach involves the integration of the data derived from these measurement tools into comprehensive predictive models.
Systems biology is hypothesis-driven, global, quantitative, iterative, integrative and dynamic. Its practice begins with the acquisition of global sets of biological data from as many hierarchical levels of information as possible [i.e. deoxyribonucleic acid (DNA) sequences, ribonucleic acid (RNA) expression, protein or lipid abundance]. This is the starting point for formulating detailed graphical or mathematical models, which are then iteratively refined through a hypothesis-driven process of system perturbation and data integration. Cycles of this process will result in more accurate models; ultimately, these models will explain the systems-level properties of the biological system of interest. Once the model is sufficiently accurate and detailed, it will allow researchers to accomplish two tasks never before possible: (1) predict the behaviour of the system given any perturbation and (2) redesign or perturb the molecular network to create new emergent properties. Taking the atherosclerosis analogy further, a researcher or clinician could (1) predict the body's response to, for example, a new diet or medication and (2) design an appropriate intervention that prevents atherosclerosis-promoting events or shifts them to anti-atherosclerotic ones. This latter possibility lies at the heart of preventative medicine.
A steroid metabolite found in cell membranes. It is transported in the blood in lipoprotein particles and excess circulating cholesterol is associated with atherosclerosis.
Pattern-finding in a large dataset.
A genetic disorder usually caused by mutations in the LDL-receptor or in ApoB that results in high levels of LDL and premature atherosclerosis.
Early atherosclerotic lesion containing mainly cholesterol and macrophages.
A network motif in which a node (molecule) indirectly (or directly) regulates itself.
A network motif in which a node (molecule) both directly and indirectly regulates a downstream target node.
High density lipoprotein (HDL)—‘good cholesterol’, high levels are thought to be protective against atherosclerosis.
Low density lipoprotein (LDL)—‘bad cholesterol’, high circulating levels have been shown to correlate with atherosclerosis.
The buildup of cells and cholesterol in the arterial wall. Severe plaque buildup can narrow the arterial lumen interfering with the flow of blood.
A group of nodes (molecules) used as the starting point in ad hoc network construction.
The abnormal narrowing of an artery.
A genetic disorder resulting in very low levels of HDL caused by a mutation in the ABCA1 transporter.
Formation of a blood clot.
Atherosclerosis involves the interplay among thousands of molecules in multiple interacting cells types including macrophages, endothelial cells and smooth muscle cells (SMCs). The disease occurs in different forms throughout the body (Trogan et al, 2006) and is affected by inputs from multiple organ systems including the vascular system, the endocrine system, adipose tissue, the liver, the gastrointestinal tract and the kidneys (Fig 1). Epidemiologic and treatment studies have also shown that the disease is modulated by a variety of genetic and environmental factors (Yusuf et al, 2004). For example, alteration in the relative abundance of various plasma lipoproteins such as low-density lipoprotein (LDL) and high-density lipoprotein (HDL) has been shown to be of primary importance in the development of the disease. The levels of these lipoproteins are influenced by multiple genetic factors (such as mutations in the LDL receptor gene which cause familial hypercholesterolemia (Brown & Goldstein, 1974) and mutations in the ABCA1 gene which cause Tangier disease (Rust et al, 1999) as well as diet, exercise and medications (Steinberg, 2004, 2005a, b, 2006). Many of the risk factors for atherosclerosis, including dyslipidemia, hypertension, diabetes and obesity, involve the interaction of several organ systems such as the liver, kidneys, gastrointestinal tract and hormonal systems (Assmann et al, 1999). Furthermore, systemic inflammation, which has been shown to be critically involved in both the development and eventual clinical complications of atherosclerosis, involves immune cells and mediators located at the site of plaque formation as well as distal organ systems such as the liver and adipose tissue (Libby et al, 2002; Ross, 1999). Thus, atherosclerosis results from the complex interplay of genetic and environmental risk factors at a whole-organism level (Fig 1).
Atherosclerosis progresses through multiple stages from early fatty streaks, to advanced lesions to plaque rupture, with each stage being characterized by different cellular and molecular components (Fig 1D). Atherosclerotic plaque development begins with endothelial cell activation, including overexpression of leukocyte adhesion proteins such as vascular cell adhesion molecule 1 VCAM-1 (Cybulsky & Gimbrone, 1991). Chemoattractants such as monocyte chemoattractant protein 1 (MCP-1) then promote migration of leukocytes into the intima (Boring et al, 1998), where macrophage colony stimulating factor (CSF1) promotes the differentiation of monocytes into macrophages (Rajavashisth et al, 1990). These macrophages express scavenger receptors that allow them to engulf and modify lipoproteins and become foam cells which secrete inflammatory mediators [such as interleukin-1 (IL-1), tumour-necrosis factor-α (TNF-α), nitric oxide and endothelin] (Hansson et al, 2006), that amplify inflammation in the vessel wall and can contribute to additional leukocyte accumulation, SMC proliferation and extracellular matrix remodelling (Brown & Goldstein, 1983; Greaves & Gordon, 2009). Multiple other leukocytes are recruited to the lesion and have been demonstrated to play a critical role in disease development (Weber et al, 2008). Whereas foam cell accumulation characterizes fatty streaks, deposition of fibrous tissue defines the more advanced atherosclerotic lesion. SMCs synthesize the bulk of the extracellular matrix that characterizes this phase of plaque evolution (Raines & Ferri, 2005). Plaque rupture resulting from inflammatory activation and the ensuing thrombosis commonly cause the most acute complications of atherosclerosis such as myocardial infarctions or stroke (Fuster et al, 2005).
There are multiple challenges facing clinicians in the treatment of atherosclerotic vascular disease. Atherosclerosis is a chronic condition that develops silently over decades, presenting with clinical manifestations only very late in the course of the disease. Current strategies to detect early disease rely heavily on population-based risk factor assessment but lack the ability to individualize these risk assessments. Currently available diagnostic tools are only able to detect advanced disease. Furthermore, while many pharmacologic interventions directed at reducing correlative risk factors have been shown to reduce the population-based cardiovascular mortality rate, no methods are currently available to track the vascular response in an individual patient and therefore to predict that patient's risk of future events. Thus, current strategies do not holistically address the multiple factors that contribute to the observed pathology.
Given the complexities of atherosclerosis, a systems biology approach that samples multiple levels of hierarchical data and then integrates the results into coherent network models offers many advantages. Complex biological networks are organized around sub-networks of gene modules that contribute to the robustness of the entire system. From a systems-level perspective, disease states represent perturbations, from genetic or environmental factors, on complex networks of interacting components on multiple scales (molecules, macromolecules, organelles, cells, tissues and organs).
The recent development of global measurement and analysis technologies, and their integration under the aegis of systems biology, offer an unprecedented opportunity to overcome the difficulties inherent in atherosclerosis research and treatment. The complex spatial and temporal relationships involved in the disease need to be understood in the context of a dynamic interaction network. Because atherosclerotic plaques evolve over time from simple fatty streaks to advanced lesions prone to plaque rupture, a useful model of the disease process must be able to accommodate changes in molecular composition and interactions over time. Below, we describe the use of interaction networks in systems biology and their application to atherosclerosis research.
A network is a framework that represents the relationships among the features that make up complex biological systems. Biological networks are made up of nodes, which represent molecular entities (such as DNA variations, RNA, proteins and metabolites), edges that represent the relationships between these entities, and network properties that represent the state of the molecular entities over time (Fig 2). The network topology represents all of the interactions involved in a given biological system. A cornerstone of the systems biology approach is the construction of a network representing the disease process, using a collection of methods that together can be called network analysis. Network analysis can help identify feedback mechanisms and network regulatory motifs that capture the emergent properties of the system, such as robustness to perturbation, multistability or homeostatic control.
Network analysis can be divided into two approaches, ad hoc network construction and pathway analysis. Both are useful for analysing expression data pertaining to a complex disease such as atherosclerosis, and they have complementary advantages and limitations. The pathway approach is more straightforward to interpret, but it is limited to only those biological functions and processes that are represented in a pathway database. Arbitrary divisions between canonical pathways can also limit the effectiveness of the pathway approach. On the other hand, the ad hoc approach has greater potential to reveal novel molecular connections within the data, but the networks generated in the ad hoc approach can be challenging to interpret. We outline a procedure and relevant resources for each approach below.
In the pathway approach a network diagram is organized around curated lists of interactions known to be involved in a specific molecular process, for example, ‘eicosanoid biosynthesis’ or ‘toll-like receptor signalling’. These pathway-oriented interaction networks can be obtained from freely accessible pathway repositories (Table 1) and from commercial pathway databases such as Ingenuity Pathways Analysis (IPA, Ingenuity), MetaCore (GeneGo) and Pathway Studio (Ariadne). There are also several freely available software tools that can mine multiple pathway databases (see Table 1). Beyond analysis in terms of separate pathways, an atherosclerosis dataset can also be analysed across multiple pathways to identify molecules or genes that operate within more than one atherosclerosis-associated pathway (e.g. see Ghazalpour et al, 2004).
As opposed to the pathway approach, the ad hoc approach is unbiased. Here, differentially expressed molecules [ideally identified using a significance threshold that accounts for multiple hypothesis tests (Storey & Tibshirani, 2003)] are grouped into sub-networks that are highly interconnected in an interaction dataset. Often, differentially expressed molecules may share an interacting partner that is not itself differentially expressed, such as a common regulator, substrate, etc. To reveal such indirect connections, molecules can be added to the network that, by virtue of their interactions, enhance the network's connectedness [e.g. its clustering coefficient (Watts & Strogatz, 1998)]. The ad hoc network construction approach does not rely on curated pathway information, but instead is applied using large databases of molecular interactions or associations. Most commonly used are databases of protein–protein interactions (PPI), protein–DNA interactions and protein–metabolite interactions. Interaction databases fall into two categories, large-scale interaction repositories, which may aggregate interactions from high-throughput PPI screens from various species and tissue types, and interaction databases that are focused on a specific class of molecules or functions. A listing of commonly used, publicly accessible interaction databases is given in Table 2. An important limitation of many of these databases is that they aggregate findings from a variety of model cell types, and thus the interaction data are not necessarily derived from atherosclerosis-relevant tissues or models. Another caveat is that literature-based interaction databases necessarily provide more information on better-studied molecules, which may introduce bias into the network model.
Typically, ad hoc network analysis begins with a network constructed from a ‘seed’ collection of molecules identified in an expression study and any direct interactions between them. To this seed network are then added molecules that have high connectivity to the seed network, thus growing a highly interconnected molecular network in an iterative fashion. The resulting network can be analysed for enrichment of functional annotations to gain insight into its specific biological functions. Several software tools are available that can perform ad hoc network construction and analysis, as well as statistical and bioinformatic analysis of high-throughput data (Table 3). Integrating data from different high-throughput measurement platforms (e.g. transcriptomic and proteomic) can be particularly useful to comprehensively detect disease-associated genes, and statistical approaches have been specifically developed to iteratively expand a molecular network using multiple data types (Hwang et al, 2005).
Analysing large-scale expression measurements in the context of molecular pathways or interaction networks can reveal key regulatory molecules and functional modules involved in the disease process and suggest hypotheses regarding the system response to perturbation. Originally used in the context of model organisms such as yeast and bacteria, ad hoc network analysis has more recently been applied to the study of mammalian systems. Particularly relevant to atherosclerosis, several studies have examined networks involved in the inflammatory response in macrophages. Gilchrist et al combined transcriptional profiling and analysis of promoter sequences to identify activating transcription factor 3 ATF3 as a regulator of macrophage response to the bacterial endotoxin lipopolysaccharide (Gilchrist et al, 2006). Based on the network analysis, ATF3 was predicted to act as a negative regulator of Toll-like receptor 4 TLR4-induced expression of key pro-inflammatory genes such as Il6 and Il12b, and this prediction was validated both in vitro and in vivo. A second transcriptomic study analysed the dynamic transcriptional response of macrophages to stimulation with various Toll-like receptor TLR agonists. Using a probabilistic framework, transcriptomic data were integrated with promoter sequence scanning (scanning for cis-regulatory motifs from the TRANSFAC database) to predict transcription factors that regulate clusters of TLR-responsive genes. TGFB-induced factor homeobox 1 TGIF1 was identified as a potential novel transcriptional regulator of a cluster containing the cytokines Csf2 and Gm1960 (Ramsey et al, 2008).
Network analysis can enhance the utility of even simple transcriptomic studies, as we illustrate in Fig 3. Expression data-mining based solely on gene functional annotations is limited by incomplete annotations and the fact that key regulators of disease progression may not be differentially expressed. Furthermore, within a disease-associated functional module, only a small fraction of molecular species may be differentially expressed. Network analysis can extend beyond gene annotation enrichment analysis by taking into account interactions between the molecules in the expression dataset (and intermediaries). Among other advantages, this enables the identification of regulatory molecules that may not be differentially expressed (Fig 3A). A differentially expressed molecule can also be identified as a candidate regulator based on its proximity to disease-associated molecules in the interaction network (Fig 3B). Moreover, the sub-network involving a differentially expressed molecule may indicate its significance, for example, if it resides in a feedback loop regulating the level of a disease-associated protein or metabolite. As an example of this network-oriented approach, we analysed transcriptomic data from murine macrophages treated with oxidatively modified LDL (oxLDL), a stimulus that is associated with foam cell formation. The analysis and the resulting network, which are detailed in Box 1 and Fig 4, show how a potential regulator of the response (p65/Rela) can be identified even though it is not differentially expressed. This simple example also reveals the potential for network analysis to provide a more complete picture of the cellular response than would analysing the annotations of only the differentially expressed genes. Moreover, network analysis can uncover network regulatory motifs [e.g. feedback loops, feed-forward loops, etc. (Alon, 2007)] controlling the response.
As an example of network analysis, we analysed transcriptomic data from murine macrophages treated with oxLDL, a stimulus that is associated with foam cell formation. Using the PLIER algorithm (Affymetrix, 2005), 542 genes were identified as differentially expressed. Ad hoc network analysis was performed on the 542 genes using MetaCore, yielding a network (Fig 4) associated with three biological processes relevant to foam cell formation: ‘regulation of foam cell differentiation’ (P < 10−10), ‘regulation of macrophage differentiation’ (P < 10−9) and ‘lipoprotein catabolic process’ (P < 10−9). A functional enrichment test of only the differentially expressed genes in the network fails to detect these enrichments, because only a fraction of the molecules in the network are differentially expressed. The interaction network has multiple interactions with sphingolipids, consistent with the findings of Wheelock et al (2009) based on the data from Kleemann et al (2007).
The application of both pathway (Cagnin et al, 2009) and ad hoc (Skogsberg et al, 2008) network analysis has facilitated extraction of biologically meaningful information from microarray messenger RNA (mRNA) studies of atherosclerotic plaques. Such studies are critical in that they analyse disease-relevant tissue [e.g. whole mouse aorta (Skogsberg et al, 2008) or human coronary and carotid arteries (Cagnin et al, 2009)] but they pose numerous analysis challenges. For example, the lesion samples contain a mix of cell types obtained at a fixed time point. These studies allow the construction of a ‘parts list’ of molecules that may participate in the process—which is extremely useful in the interpretation and analysis of complementary in vitro studies. However, to go beyond such lists requires the application of network analysis. By doing this type of analysis the authors provided insights into disease pathogenesis that would not otherwise have been apparent. For example, network analysis of transcriptional data from lesions suggested a small group of cholesterol-responsive genes whose functional annotations were suggestive of involvement in lipid uptake or metabolism (Skogsberg et al, 2008). Screening this gene network with siRNA in an in vitro macrophage cholesterol accumulation assay showed that no single intervention ablates foam cell formation. These findings are consistent with the viewpoint that foam cell formation in vivo is likely to be resistant to targeting a single molecule, and instead may require a combined therapeutic approach.
Both pathway and ad hoc network analysis depend on the quality and comprehensiveness of the underlying interaction database. One approach to extend beyond the available databases is to construct networks using molecular associations based on semantic mining of relevant scientific literature. This has the advantage of enabling an investigator to explore networks organized around relevant biological search terms such as ‘atherosclerosis’, ‘foam cell’ or ‘cardiovascular disease’. The interactions in these networks are based on co-occurrence of molecule names within abstracts of disease-associated articles, with the molecule names separated by a keyword that is suggestive of molecular interaction (e.g. ‘binds’, ‘modifies’, ‘phosphorylates’, etc.). This literature network approach has been used in the analysis mRNA profiles of human atherosclerotic tissue (Ashley et al, 2006; King et al, 2005). Due to limitations inherent in human studies, these investigations could not identify transcriptional markers of early-stage disease, and they identified a relatively modest number of differentially expressed genes compared to controlled transcriptomic studies of lesions in mice. Integrating data from both human and model systems maximizes the probability of obtaining robust, physiologically relevant findings. Tabibiazar et al leveraged transcriptional data from mouse aortas and human coronary arteries to identify atherosclerosis-related genes that are predictive of disease severity in mouse and lesion grade in human (Tabibiazar et al, 2005). Their analysis confirmed many genes whose transcript levels are known to correlate with disease severity and identified functional classes of genes that are novel in the context of atherosclerosis (such as RUNX transcription factors and histone deacetylases). Such a dual-species approach has the benefit of ensuring that the transcriptional correlates of disease severity identified in mice are relevant in human.
Integrating transcriptomic, proteomic and metabolomic measurements into network analysis can yield a more complete picture of changes responsible for the initiation of atherosclerotic vascular disease than can be obtained by analysing a single measurement type. Applying such an approach in a mouse model of atherosclerosis has demonstrated the importance of transcriptional and metabolic reprogramming of the liver as a key driver of the inflammatory process underlying atherogenesis (Clish et al, 2004; Kleemann et al, 2007).
Examining a combination of genetic and transcriptomic measurements in the context of molecular pathways and biological processes associated with atherosclerosis has also been used as a novel method to uncover disease biomarkers (Hagg et al, 2008; Torkamani et al, 2008). For example, this approach identified insulin receptor substrate 2 (IRS2), whose expression is higher in macrophages from individuals with atherosclerosis than from control subjects. Through genetic association analysis of a larger cohort, a single nucleotide polymorphism SNP in the IRS2 promoter was identified that results in higher IRS2 gene expression and increased risk of coronary heart disease (Hagg et al, 2008). Thus, this approach utilizing a tiered study design has identified a potential novel biomarker for the development of coronary heart disease.
A number of excellent genome-wide association studies (GWAS) have uncovered multiple genetic loci that are associated with the development of atherosclerosis (Assimes et al, 2008; Aulchenko et al, 2009; Erdmann et al, 2009; Jarinova et al, 2009; McPherson et al, 2007; Tregouet et al, 2009), the subsequent clinical complications of atherosclerosis such as myocardial infarction (Kathiresan et al, 2009a) as well as risk factors for atherosclerosis such as hypertension (Newton-Cheh et al, 2009), dyslipidemia (Kathiresan et al, 2007, 2008a, b, 2009b) and obesity (Lindgren et al, 2009; Willer et al, 2009). These studies demonstrate the power of current high-throughput technologies as they have revealed multiple loci that would not have been uncovered with more traditional hypothesis-based methods.
However, the utility of these studies is limited by the fact that GWAS does not necessarily directly indicate the causal gene and does not establish the biological context in which the causal gene operates. The integration of GWAS data with studies that examine the downstream changes in the RNA, protein and metabolite state has the potential to reveal the perturbations in molecular networks that are associated with disease. Additional discovery power can be achieved through integration of QTL and transcriptional profiling analysis of strain-intercrossed mice with varying susceptibility to atherosclerosis (Smith et al, 2006a, b).
The identification of the macrophage-enriched metabolic network (MEMN) is an excellent example of the power of such integrated approaches. This network was constructed by integrating multiple data types including genetic studies and expression data from human and mouse liver and adipose tissue (Chen et al, 2008; Emilsson et al, 2008). The MEMN was strongly associated with obesity, diabetes and heart disease and this was confirmed experimentally in studies that indicate complex feedback control within the network (Mehrabian et al, 2005; Schadt et al, 2008; Yang et al, 2009). These studies identified several genes, including a newly discovered phosphatase gene Ppm1l, that were associated with multiple cardiovascular risk factors including weight, glucose tolerance, levels of free fatty acids and blood pressure. Additionally, these studies suggested that the macrophage not only plays a key role at the local level in the plaque but that it is also a driver of many complex metabolic diseases that are associated with atherosclerosis (Schadt, 2009).
This body of literature has made substantial contributions to the study of atherosclerosis. By increasing our ability to extract biologically meaningful information from high-throughput data sets, network analysis has allowed the identification of potential novel therapeutic targets and diagnostic markers. Moreover, studies that publish complete high-throughput data sets are particularly valuable to the research community because they enable other investigators to mine the data and formulate novel testable hypotheses.
The network analysis tools and methods described above can be extended and refined to accommodate complex study designs spanning multiple tissues. This is particularly relevant to atherosclerosis, where available models of plaque formation and leukocyte infiltration necessarily involve a trade-off between ease of expression profiling and physiological relevance. Network analysis can be performed on data from multiple expression studies, for example, from studies using different models or using different high-throughput measurement platforms. We briefly mention two possible strategies. (i) Differential expression data across multiple studies can be clustered and the clusters used as ‘seed’ lists for ad hoc network construction. This approach is predicated on the hypothesis that clusters derived from data from multiple complementary expression studies (which will have different model artefacts and ‘blind spots’) will enable the construction of a more physiologically relevant network than would be possible using a single expression study. (ii) Present/absent detection calls for gene expression from more physiological models can be used to constrain the list of possible molecules for network construction. We believe this is a particularly promising strategy for extracting maximal information from in vitro expression studies using in vivo-derived expression data.
Although the application of systems biology to the study of complex diseases is in its early stages, these studies are already providing novel insights into atherosclerosis and powerful tools to continue to decipher the intricacies of this disease. The promise of a systems approach includes disease prediction and prevention as well as personalized medicine.
We thank Alan Diercks for helpful comments. The authors acknowledge support from NIH contract HHSN272200700038C (National Institute for Allergy and Infectious Diseases). S.A.R. was supported by Award Number K25HL098807 from the National Heart, Lung and Blood Institute.
The authors declare that they have no conflict of interest.
American Heart Association:
European Atherosclerosis Society:
United States National Heart Lung and Blood Institute:
International Atherosclerosis Society:
Systems Immunology website:
Lipid MAPS project website:
Systems Biology portal: