|Home | About | Journals | Submit | Contact Us | Français|
This article provides an introduction to the Framingham Heart Study (FHS) and the genetic research related to cardiovascular diseases conducted in this unique population1. It briefly describes the origins of the study, the risk factors that contribute to heart disease and the approaches taken to discover the genetic basis of some of these risk factors. The genetic architecture of several biological risk factors has been explained using family studies, segregation analysis, heritability, phenotypic and genetic correlations. Many quantitative trait loci underlying cardiovascular diseases have been discovered using different molecular markers. Additionally, results from genome-wide association studies using 100,000 markers, and the prospects of using 550,000 markers for association studies are presented. Finally, the use of this unique sample in genotype and environment interaction is described.
“Nature is all that a man brings with himself into the world; nurture is every influence from without that affects him after his birth.”
- Francis Galton (1890, p. 9)
“Why should you,…put yourself to the trouble of being measured, weighed and otherwise tested? Why should I…and why should others, take the trouble of persuading you to go through the process?…A comparison of the measures made from time to time will show whether the child maintains his former rank, or whether he is gaining on it or losing it.”
Coronary heart disease (CHD) has remained a major cause of morbidity and mortality in the United States, affecting nearly 13 million people and causing approximately one million deaths per year (Thom et al., 2006). Although the incidence of cardiovascular diseases (CVDs) has gradually declined since the 1960s in the U.S. (Cooper et al., 2004), it is reaching epidemic proportions in many countries of Europe and the developing world (Yusuf et al., 2001). In the 1940s CHD was recognized as the leading cause of mortality in the U.S. accounting for approximately half of all deaths (Kannel, 1990). Nonetheless, knowledge of the factors that disposed individuals to CVDs was “virtually non existent” 60 years ago and was perceived to be an inevitable consequence of “aging or genetic predisposition” of individuals (Dawber and Kannel, 1999). Fortunately, the U.S. Public Health Service (USPHS then and later NIH) recognized the necessity for understanding the causal factors of the epidemic and decided to establish a prospective longitudinal observational epidemiological study in 1947, in the town of Framingham, Massachusetts in collaboration with the Massachusetts State Department of Health and Harvard Medical School. The “Framingham Study” was formally established in 1948, to identify factors that contribute to CVD (Dawber et al., 1951; Kagan et al., 1962; Levy and Brink, 2005).
The study, nearly six decades later and now known as the “Framingham Heart Study” (FHS), is the longest running, multigenerational longitudinal study in medical history (Butler, 1999). It has helped identify several “risk factors” and their cumulative influence on the manifestation of CVD. Indeed, the term ‘risk factor’ was coined by Framingham investigators (Kannel et al., 1961).
Information collected on the participants enrolled in the study has aided in correcting a number of long held misconceptions on the role of blood pressure, lipids, diabetes, obesity, proteinuria, left ventricular hypertrophy, atrial fibrillation, smoking and exercise in the manifestation of CVD. Framingham investigators have also elucidated the pathogenesis of atherosclerosis and thus have laid a firm foundation toward preventive cardiology (Kannel, 1990). Furthermore, the study has acquired an iconic status in public health and preventive cardiology and has been listed as the “fourth significant achievement in medicine” (after the development of antibiotic treatments, immunization against infectious diseases, and the understanding of the roles of vitamins; Anon., 1999), and the second greatest discovery (behind electrocardiography) in Cardiology (Mehta and Khan, 2002).
The investigators of the original protocol of the “Framingham Study” recognized a wide range of variation among individuals in human populations in response to “stresses and insults” (Gordon and Kannel, 1970). Instead of focusing on just one or a few independent causal factors that might influence CVD, they took an integrated approach and hypothesized that CVD may arise from “multiple causes which work slowly within the individual.” However, family history for CVD received the highest importance among many variables selected for studying its manifestation among the participants (Dawber et al., 1951). In general, at least three major variables were assumed to contribute to the onset of CVD: constitutional (heredity), and conditioning (environmental) factors, as well as the length of time taken by the conditional factors to act on constitutional factors ultimately resulting in a clinically recognizable condition (Gordon and Kennel, 1970). Thus, the founders of the study were cognizant of the fact that the biological basis of CVD may be complex and may be modulated by the interaction of heredity and environmental factors.
Although the role of hereditary factors in the development of CVD was acknowledged from the very beginning of the Framingham study, genetic studies did not receive much attention until the late 1980s. In the last twenty years, however, a number of investigators have utilized the rich resource available at the study and have attempted to understand the genetic basis of CVD using various approaches. In this review, we briefly discuss: a) some of the salient features of the Framingham Heart Study population, and b) approaches taken by the Framingham investigators toward identifying the genetic bases of CVD and some of its risk factors.
The Framingham Heart study is comprised largely of whites of European descent. Individuals from the Italian, Irish and English ancestry are predominant in the sample. About 85 percent of the Original Cohort was born in the U.S. or Canada, including 19% born in Framingham and another 40% born in Massachusetts. Thirty-five percent identify themselves with ethnic origins in the British Isles, including 15% from Ireland; another 19% are of Italian ethnicity, 32% of other Western European ancestry, 5% Canadian and 6% Eastern European. Less than 4% are of non-European origins or of unknown ethnicity (Table 1).
The study was formally established from 1948–1953 in the town of Framingham MA, located about 20 miles west of Boston. Approximately 10,000 individuals were found to be of ages 30–59 years from a total population of 28,000. It was determined that if 6,000 individuals were invited into the study from the 10,000 in the target age range, about 5,000 individuals would not only be free of cardiovascular disease, but also provide sufficient sample size for the analysis of factors contributing to the development of CHD among the selected individuals over a period of twenty years. In such a time span, approximately 400, 900, and 2,150 would develop CHD at the end of 5th, 10th and 20th year, respectively, from the initial examination. Therefore, all the households in Framingham listed in the town census were categorized by the number of eligible individuals, and every third household was excluded. Approximately 6,600 individuals were so selected. As expected the number was diminished by losses, deaths and refusals. There were also 740 volunteers from the town of Framingham included. At the beginning of the study, 5,209 men and women joined from January 1948 through March 1953 (Kagan et al., 1962), and a total of 5,128 these participants were found to be free of “overt coronary heart disease”. Thus, the group consisting of 5,209 participants constitutes the “Original cohort” of the study. The participants would undergo examinations every two years (Dawber et al., 1951). The Offspring and Third Generation cohorts consisting of 5124 and 4095 individuals, respectively, were recruited in 1971–1975 and 2002–2005, respectively. The Offspring cohort was comprised of children of the Original cohort and the children’s spouses and has been examined every 4 to 8 years. The Third generation cohort was recruited from children of the Offspring. The participants of the Original, Offspring, and Third Generation cohorts have been examined 29, 5 and one time, respectively, for a large number of variables that may have a bearing on CVDs. (Figure 1, ,2).2). Thus, most participants of the study are members of 754 extended pedigrees. These pedigrees are well-defined (parents, children, grand children, cousins, avuncular cousins, aunts, uncles etc.) and range in size from 3 to 230 individuals with a median of 9 (Figure 3); the third generation individuals represent 1828 nuclear families whose sibships vary from 1 to 9 individuals.
At the initiation of the study, a committee consisting of eleven physician-epidemiologists developed a list of criteria and measured variables that may have a “bearing on the development of (CV) disease” under the following six categories (see Dawber et al., 1951 for details).
This tradition of routine ascertainment of physical examination, life style and habits, medical history, laboratory analysis, non-invasive and end-point data has been applied to all three generations of participants (Table 2). The number of variables, however, has increased over time and varies from one exam to the next. For example, in the first examination, data were collected on 30 major variables. Over time, the diversity and complexity of phenotyping has expanded. For instance, in recent examinations the Offspring, aside from standard history and physical examination measures, have undergone additional testing including carotid ultrasound, echocardiography, brachial reactivity, arterial tonometry, 6 minute walk, pulmonary function testing, and subsets have received cardiac and brain magnetic resonance imaging, cardiac multidetector computed tomography, and bone densitometry (www.nhlbi.nih.gov/resources/deca/fhsc/docindex.htm). In a recent survey, approximately 1500 variables were found to have been measured on the FHS cohort. However, not all traits have been measured on all of the individuals; hence, the number of phenotypes measured varies among individuals, examination cycles and cohorts.
Cardiovascular diseases arise from multiple causes. The heterogeneous nature of the etiology of CVD was recognized at the start of the study in 1948. Several key factors either independently or cumulatively were found to exert influence disproportionately to the development of CVD. These factors were designated as “risk factors.” The primary risk factors include: age, systolic blood pressure (SBP), body mass index (BMI), total/HDL ratio, diabetes, and smoking (Dawber et al., 1959; Kannel et al., 1961). Additional risk factors and their components including morphological (e.g. left ventricular hypertrophy; Kannel et al., 1969), physiological (e.g., fibrinogen; Kannel et al., 1987), and life style (e.g. Posner et al., 1993) have been added over time. These are further categorized into modifiable, probably modifiable, and fixed risk factors (Table 3; Wilson, 1994). Distribution of various risk factors in all the three cohorts as well as between men and women is provided in Table 4.
Biological variation may be understood at two levels: phenotypic and genetic. Many of the CVD risk factors such as high density lipoprotein cholesterol, total cholesterol, and blood pressure are quantitative traits. The phenotypic variation of a quantitative trait may be represented by VP = VG + VE + 2covGE, where G, E and 2covGE are genetic, environmental and their interaction variances, respectively (Falconer and Mackay, 1996). An understanding of the genetic architecture of a quantitative trait requires knowledge of its inheritance pattern, association with other traits and molecular characterization of genes that underlie the phenotype (Mackay and Lyman, 2005). Complex diseases such as CVD may arise from multiple genes and their interaction with environmental factors. Hence, it is important to tease apart the components that contribute toward the development of these diseases using genetic approaches. The Framingham Heart Study provides a unique opportunity for understanding the genetic architecture of many human traits, including the CVD risk factors, using the detailed family structure, detailed phenotypic measurements, information on physiological and molecular markers. Although the original protocol of the FHS recognized the role of heredity and environmental factors in CVD, systematic genetic analysis did not start until the mid 1980s. DNA collection on each of the participants from the Original and Offspring cohorts was initiated in the late 1980s, continued during the 1990s and was expanded to Third Generation participants at their first examination.
The fact that both morphological and disease traits cluster in families has been known to human geneticists for a long time (Galton, 1886; Garrod, 1902), and family history is a significant predictor of heart diseases (Friedlander et al 1985). The Framingham investigators indeed recognized the fact that CHD “runs in families” (Kannel et al., 1979; Kannel and Stokes, 1985); yet the relative contribution of genetic factors and shared environment toward developing cardiovascular risk was debated, since “family members eat at the same table” (Kannel et al., 1979; Kannel and Stokes, 1985). On the contrary, Havlik et al., (1979) reported significant correlations between parents and offspring and sibling pairs for blood pressure. Correlation between spouses was attributed to assortative marriages for age, body weight and habits such as smoking and alcohol consumption. Similarly, Myers et al., (1990) demonstrated that CVD in parents could be an independent risk factor. Similar studies have been carried out at the FHS for other traits such as cardiac heart disease (Brand et al., 1992), lens opacities (Anon., 1994), stroke and hypertension (Reed et al., 2000), atrial fibrillation (Fox et al., 2004) and heart failure (Lee et al., 2006). Many of the risk factors, discovered by the FHS investigators, act cumulatively toward determining CVD risk between parents and offspring (Figure 4; Lloyd-Jones et al., 2004).
Family studies point toward the aggregation and inheritance of disease causing factors among individuals within families. They do not, however, indicate if the mode of genetic transmission from parent to offspring is simple or complex. Segregation analysis, on the other hand, provides insights on whether or not the inheritance is Mendelian (simple) or complex. For example, using the FHS family data, Felson et al., (1998) reported the presence of a major recessive gene and a multifactorial component for generalized arthritis. On the other hand, pulmonary function was found to be governed by a polygenic component (Givelber et al., 1998). Interestingly, a number of risk factors appear to differ among men and women (Table 4), which could ultimately contribute to their susceptibility to CVD (Figure 5; Hubert et al., 1983).
The relative contribution of genetic and environmental factors on the expression of quantitative traits is determined using the index known as heritability. Formally, heritability represents the amount of phenotypic variability or variance explained by genetic factors and is estimated as a ratio of genetic to phenotypic variance. Either broad (H2) or narrow sense (h2) estimates are used for this purpose (Sham, 1988). By definition, broad sense heritability includes all genetic variance (both additive, dominance and their interaction), but the narrow sense heritability considers only the additive portion of the genetic variance (Falconer and Mackay, 1996). Heritability serves two purposes: it provides an estimate of the level of genetic variation underlying a quantitative trait, including disease, and also indicates the evolutionary potential of the trait (Lynch and Walsh, 1998). In general, moderate to high heritability has been reported for most traits examined (Table 5), but the distribution of heritability for the many traits examined in this highly phenotyped cohort is unknown. Note that heritability is a population estimate, and therefore, it could vary across populations, between sexes, environments as well as at different stages in the life span (Lynch and Walsh, 1998). These instabilities of heritability estimates are also seen for various traits in the FHS sample (Table 5). For example Brown et al., (2003) demonstrated a general decrease in estimated heritability in 70 versus 40 year old individuals (Figure 6). Furthermore, Atwood et al., (2005) indicated that heritability for white matter hypersensitivity decreased in women, but increased slightly in men with advancing age (Figure 7).
A number of morphological and biochemical traits are correlated, and these associations that may be attributed to three factors: genetic, developmental and environmental (Lynch and Walsh, 1998). Thus, any variation in the relations among traits, either due to environmental or age-related changes, may reflect the effects of underlying genes and common genetic precursors, developmental pathways as well as coordinated organism wide-signaling (Badyayev and Fresman, 2004). Genetic correlations among traits arise from pleiotropic effects of genes on multiple traits and/or linkage disequilibria among distinct loci (Cheverud, 2001). Genetic correlations could also reflect allelic complexes at multiple loci as well as coadaptation (Churchill, 2006). Conversely, genetic correlations might indicate widespread association among loci, due to linkage and/or pleiotropy at the genomic level, which in turn could govern the integration of both morphological traits and disease related traits (Churchill op cit.,). Phenotypic, genetic and environmental correlations have been determined among five risk factors (cholesterol, high density lipoprotein, systolic blood pressure, triglycerides and body mass index) in the FHS (Table 6). The results indicate that the phenotypic and genetic correlations have similar magnitudes. In other cases, whereas the magnitude differed, the direction of the correlation was conserved. Additionally, the concentrations of high density lipoprotein and triglycerides were affected by environment. These results largely agree with the conclusions reached by Cheverud (1988), who suggested that phenotypic correlations may reflect genetic correlations.
Phenotypes are linked to genes via biochemical pathways, and therefore, biochemical (bio) markers or biological traits provide logical surrogates to establish the relations between disease phenotypes and genotypes. These molecules or traits, also called endophenotypes or risk factors, in turn reflect the action of underlying genes and their expression patterns (Rice et al. 2001). Hence, measuring informative biochemical markers to predict the behavior of phenotypes is often favored in CVD (Vasan, 2006), as they simultaneously provide an idea of the phenotypes, genes and the pathway. A number of biomarkers have been used to establish relations between biomarkers and risk for cardiovascular disease in the FHS population. For example, Seman et al., (1999) reported a positive association between lipoprotein (a) cholesterol concentrations and CHD in men but not in women. Keaney et al., (2004) determined that ICAM-1 concentrations were associated with age, female gender, total/high density cholesterol ratio, body mass index, blood glucose, smoking and prevalent CVD. Similarly, Wang et al., (2002) reported a close association between the concentrations of C-reactive protein, and carotid atherosclerosis, but the relationship was found only in women and not in men. High concentrations of total homocysteine have been implicated in cardiovascular disease (Arnesen et al., 1995) and dementia (Seshadri et al. 2002). Elias et al. (2005) reported an inverse relation between the concentrations of homocysteine and cognitive function, only among individuals over 60 years in the FHS population.
In humans, a number of other classes of molecular markers have been employed to describe both genetic variation and to discover the genetic basis of phenotypic traits including complex diseases. These include: allozymes (Harris, 1966), restriction fragment length polymorphisms (RFLPS; Solomon and Bodmer 1979; Botstein et al., 1980), variable number of tandem repeats (VNTRs; Jeffreys et al. 1985), and microsatellites (Weber and May, 1989) and more recently, single nucleotide polymorphisms (SNPs). Briefly, RFLPs are the products obtained by digesting the DNA molecules with restriction enzymes; microsatellites are two [e.g. (CA)n] to five [(TTTTA)n] repeat sequences found distributed throughout the genome and are known to be highly polymorphic. SNPs arise from mutations at specific nucleotides in the DNA molecule and represent the most abundant class of polymorphisms in the human genome (see Strachan and Read 2003, for details).
The Framingham investigators have utilized primarily three families of molecular markers - RFLPs, microsatellites and SNPs - to establish associations between molecular markers and cardiovascular risk factors. For example, Fabsitz et al., (1989) tested the association between human leukocyte antigen (HLA) and obesity on 348 individuals and found that the Bw35 allele was significantly associated with obesity. Similarly, RFLPs (for restriction enzymes, MspI, PstI, SstI, PvuII, XbaI) in the Aplolipoprotein gene cluster A-I, C-III, and A-IV were tested (Ordovas et al., 1991) on 202 patients with coronary artery disease and 145 normal individuals. They found that individuals with SstI had 38 percent greater concentration of triglycerides than the referents. Wilson et al, (1994), examined the relationship among the ε2, ε3 and ε4 alleles of the apolipoprotein E locus in relation to CHD among 1034 men and 916 women aged 40 – 70. They found that ε4 allele was associated with elevated low density lipoprotein cholesterol concentrations, as well as CHD in both men and women.
The availability of detailed measurements on cardiovascular risk factors and other phenotypic information in the FHS has facilitated mapping complex traits using two well known approaches: linkage and association. Briefly, linkage arises if two loci physically occur on the same chromosome and are inherited as a unit. It is determined using information on the inheritance pattern between parents and offspring in pedigrees (see Terwilliger and Ott, 1994 for details).
Linkage methods are used to identify regions at various locations on chromosomes or the genome that influence a given trait. These regions are assumed to contain quantitative trait loci (QTL).
Discovery of QTLs has been accomplished using primarily two types of linkage analyses: model based (parametric) and model free (non-parametric). In the former, a number of parameters such as the mode of inheritance of the disease, frequency of the causal allele, and its penetrance must be specified a priori. The likelihood of genetic linkage between two loci is determined by a LOD (logarithm of odd) scores. In general, for a Mendelian disorder, a LOD score of >3.0 is considered evidence for linkage (Sham, 1998). Parametric approaches have been successfully used for identifying the genetic basis of simple Mendelian disorders. Cardiovascular disorders reveal complex or non-Mendelian inheritance patterns that make it difficult to assign inheritance patterns. Therefore, model free analysis, which does not require a priori definition of allele frequencies or mode of inheritance, is used to map quantitative traits. This approach requires that the identity of specific alleles or set of linked alleles (haplotypes) that are inherited among relatives be identified, by means of identity-by-decent (IBD). In other words, IBD is central to model free linkage analysis. Model free approaches have been used at the FHS more extensively to understand the genetic bases of quantitative traits employing microsatellite markers.
Approximately 612 microsatellite markers have been typed on the largest 330 pedigrees consisting of 1702 individuals belonging to generations 1 and 2 of the FHS. These data have been used to map genes underlying several risk factors, including blood pressure, arterial stiffness, lipid traits, adiposity glycemic traits, circulating biomarkers (e.g. inflammation, natriuretic peptide), pulmonary function, renal function, and bone traits (Table 7). A number of these locations have been confirmed using other populations as replicate samples. Recently, the third generation individuals have been typed with a comparable set of STR markers. Upon completion, microsatellite markers will be available on about 7000 individuals, encompassing three generations, and linkage analyses will be extended to three generation pedigrees.
Besides identifying candidate loci for a number of risk factors, the availability of correlated traits and longitudinal data on families has facilitated FHS researchers to ask additional interesting questions. For example, does age variation influence the magnitude of LOD scores? Or does it lead to a shift in the location of a candidate gene region? Also, are several distinct yet correlated phenotypes influenced by the gene(s) located in a specific region? For instance, it is known that decreased high-density lipoproteins are inversely correlated with high cardiovascular risk. Arya et al., (2003) mapped the region harboring genes that influence both BMI and HDL-C and thereby suggested pleiotropic effects. Similarly, Lin (2003) reported a common region, 6q24.3, to be influencing two inversely correlated traits, plasma triglycerides (PG) and high density lipoprotein cholesterol levels. Atwood et al., (2006) on the other hand, performed linkage analysis on body mass index across 28 years to determine the impact of measurement across age groups. The results indicated that although the magnitude of LOD scores varied across six measurements ranging from 0.61 to 3.27, they all mapped to 11q14, suggesting that at least a QTL in this region for BMI may not be due to measurement errors.
Linkage studies have been employed to map numerous genes underlying Mendelian diseases. This approach, however, is less powerful to map complex disorders as they are governed by many genes and their causal alleles whose effects are generally low. As noted earlier, parametric linkage approaches work best when the effect of the causal allele is large and least influenced by environmental factors. Complex traits, on the contrary, are greatly affected by environmental factors, making it more difficult to use linkage analysis. Risch and Marikangas (1996) proposed an alternative solution to this problem. They conjectured that association studies, using a large number of markers (in the neighborhood of a million) may be more useful for studying the genetic bases of complex disease than linkage studies. In association studies, a comparative analysis of alleles between individuals that carry the disease and healthy individuals is carried out, with the important assumption that the marker may be embedded in the causal gene or close to it. Additionally, association studies may or may not require pedigree information and could also be performed using samples that are unrelated or family-based. This approach has been feasible by the discovery and deployment of the most abundant class of molecular markers – single nucleotide polymorphisms (SNPs) – for association studies (see below).
Usually, two approaches are taken to establish an association between a putative causal site within a known gene (or any unknown site in the entire genome), with a given phenotype. Markers are placed at regular intervals along the length of the gene or across the genome, with the assumption that the markers so placed may be in linkage disequilibrium (LD) with the causal allele. In other words, information on how a marker can predict the presence or absence of disease causing alleles or locus could be determined using a linkage disequilibrium approach. Briefly, linkage disequilibrium is an index of non-random association of two alleles on a chromosome in a population (Ardlie et al., 2002). If a new mutation occurs at any location of the genome, it is in complete linkage disequilibrium with the surrounding marker alleles. Among several measures proposed to measure linkage disequilibrium (Devlin and Risch, 1995), two methods, D’ (Lewontin, 1964), and r2 (Hill and Weir, 1994) are most frequently used. Accordingly, strong LD between the marker and a causal allele (>0.8) is used as an index toward identifying a functional allele. Both of these two approaches have been used in FHS data and some of these results are presented below.
Causal polymorphisms within a number of candidate genes that affect the cardiovascular pathway have been described in the literature. FHS investigators have typed the same polymorphisms in FHS participants to confirm or refute the previously published associations. Examples include the association between two polymorphisms in the estrogen receptor-β gene with left ventricular mass and wall thickness in women with hypertension (Peter et al., 2005); L162 polymorphisms of the peroxisome proliferator-activated receptor alpha (PPARA) and plasma lipids (Tai et al., 2002); ATP-binding cassette transporter -1 (ABCA1; polymorphisms with HDL concentrations (Brousseau et al., 2001). Additionally, SNPs in 200 genes of the cardiovascular pathway have been typed and a number of association studies have been performed with the following six echocardiographic phenotypes: left ventricular (LV) mass, LV internal dimension, LV wall thickness, left atrial dimension and aortic dimension and part of the results are presented in a grid form (http://cardiogenomics.med.harvard.edu/home; Levy et al., 2006). Occasionally, however, a single SNP may suggest weak or no association with a given phenotype, but several SNPs in linkage disequilibrium (also known as haplotypes) may improve the strength of association. For example, Kathiresan et al., (2006) found a triallelic haplotype containing C-T-A alleles of the C-reactive protein gene to be associated with serum C-reactive concentration.
Whereas linkage and candidate gene studies have revealed many potential regions and SNPs of interest, there have been relatively few successes in uncovering a comprehensive set of genetic variants responsible for common complex disease (Carlson et al. 2004). Meta-analyses of candidate gene studies suggest that only about 1/3 of the reported associations are validated, and less than 100 reported genetic associations are considered to be definitive (Lohmueller et al. 2003; Ioannidis et al. 2003). A limitation of candidate gene studies is that they are constrained by existing, often incomplete knowledge of the pathophysiology of disease. Technological breakthroughs in high throughput genotyping using 100 – 500 thousand well characterized, informative markers – single nucleotide polymorphisms (SNPs) – in combination with novel analytical techniques have opened the possibility of conducting genome-wide association studies. These approaches have also received an additional impetus from the success of the HapMap project (Altshuler et al. 2005; http://www.hapmap.org). The discovery and replication of the association between CFH (Complement Factor – H) gene and age-related macular degeneration, using informative SNPs obtained thorough the HapMap provided an early indication of the power of genome-wide association studies to accelerate gene discovery (Klein et al. 2005). The Framingham investigators have taken a two-tier approach to conduct genome-wide association studies using both 100,000 and 550,000 single nucleotide polymorphisms chips provided by Affymetrix.
In 2005 an Affymetrix 116K SNP genome-wide scan was conducted in about 1350 family members of the Original and Offspring cohorts of the FHS. Herbert and colleagues identified a common genetic variant associated with BMI near the INSIG2 gene in Framingham participants; they replicated the finding in most of the other cohorts they tested (Herbert et al. 2006). The Framingham investigators subsequently have examined the association of the autosomal SNPs in relation to about 1000 phenotypes using generalized estimating equations (GEE) and family- based association tests (FBAT; Lange et al. 2003). The generalized estimating equation approach is a population-based strategy measuring association in a regression model that accounts for correlation among related individuals. The FBAT procedure, on the hand, tests for differences in the probability of transmission of an allele based on phenotype from an expected Mendelian model and uses subsets of pedigrees that are informative for a SNP. Reflecting the complexity of the Framingham database, Framingham investigators have formed 17 phenotype-specific writing groups to examine these associations and publish the results. Plans are underway to replicate some of the findings either using “in silico” approaches or performing targeted association studies on other cohorts. Additionally, the Framingham investigators are collaborating with the National Center for Bioinformatics to develop a web display of the unfiltered results to speed data sharing and the ability to replicate our findings [database of Genotype and Phenotype, dbGaP; http://www.ncbi.nlm.nih.gov/SNP/GaP.html].
Genome-wide association studies present many challenges. The Framingham 100K genome-wide association studies have provided a window of opportunity to examine the complexities in organization and statistical analysis of these large data sets. Merely uploading, analyzing and synthesizing 100,000s of associations requires extensive resources and time. Interpreting the results has presented challenges. For example, should one use a minimum statistical significance (p-value) between a SNP and a known phenotype? Or should one use a complex phenotype or its components to perform association studies? In some instances different analytical approaches [genetic linkage, generalized estimating equations and family-based association testing] highlighted different SNPs and regions of interests. Distinguishing between true versus false positives in the context of 100,000s SNPs and hundreds of phenotypes has been daunting. The Framingham investigators have noted that most results are likely to be false positives and conversely, they may have failed to appreciate important true positives of modest statistical significance. Furthermore, these data provide additional raw material to understand the role of gene-gene interaction (both pleiotropy and epistatic gene action) and gene-environment interactions in the human genome and health. From this perspective, use of novel computational methods such as net work analysis and other machine learning approaches are contemplated.
The technology for genome-wide association studies has advanced rapidly, posing new ethical as well as analytical challenges. Framingham investigators work closely with three panels that deal with the ethical dimensions of genome-wide association studies: a) the Study’s Observational Safety and Monitoring Board, b) the Boston University Medical Center Institutional Review Board, and c) the Framingham Ethics Advisory Board. For instance, all the three panels have reviewed measures to protect participant confidentiality and ensure against genetic discrimination(Greely 2005; Billings 2005; Morrison 2005). In addition, the three panels are addressing under what circumstances it is appropriate to notify participants of the results of genetic testing (Bookman et al. 2006).
The National Heart Lung and Blood Institute has embarked on an ambitious collaboration with Boston University and Affymetrix to conduct a 550K genome-wide association study of 10,000 Original, Offspring and Third Generation Cohort participants and to post the aggregate results at the NCBI “dbGaP” (http://www.nih.gov/news/pr/dec2006/nlm-12.htm) website. Investigators around the world will be able to access the genotype and phenotype data collected over almost 6 decades after securing approval from the NHLBI, the scientist’s own Institutional Review Board, and signing a data distribution agreement. The objective is to speed scientific discovery while protecting Framingham participant confidentiality.
The genome-wide association studies at Framingham represent unparalleled opportunities as well as challenges. The challenges include bioinformatics, logistical, and ethical concerns. However, the extensive genotypic and phenotypic characterizations of Framingham participants represent unique steps in the goal of achieving medical care that is “predictive, preemptive and personalized (Nabel 2006).
The FHS has firmly established the role of environmental factors, such as the use of tobacco (Doyle et al., 1992) and other life style factors (Posner et al., 1993) on cardiovascular phenotypes. Since genes are known to interact with various environmental factors, their interaction may be reflected in the magnitude or in the direction of association. A number of polymorphisms in the candidate genes have been evaluated to determine their interaction with environmental factors. Some examples include: effects of dietary fatty acids on apolipoprotien A5 polymorphisms (Lai et al., 2006); fatty acid binding protein (FABP2) in relation to plasma lipids (Galluzzi et al., 2001); apolipoprotein E polymorphisms and alcohol consumption (Corella et al., 2001). In an interesting study, Ordovas et al., (2002) evaluated the relations between dietary fat intake and three genotypes of the C/T polymorphisms of the hepatic lipase gene (LIPC). They found a dose dependent association of T allele with higher HDL-C in subjects consuming <30 percent of the energy from fat (Figure 8). Also, the slopes formed by the genotypes in relation to gradient energy intake, followed the classical genotype × environmental interactions (Lynch and Walsh, 1998). These studies are providing valuable insights toward designing other large studies (Manolio et al., 2006).
The Framingham Heart Study has made extraordinary contributions toward the discovery of cardiovascular risk factors and in turn has helped alleviate cardiovascular burden both in the US and elsewhere in the world. The availability of family structure and a rich panel of phenotypic data related to cardiovascular health as well as other ancillary traits are providing many useful insights on the role of genetic variation in cardiovascular risk traits, and their interaction with the environment. Interestingly, moderate to high heritability is common to many of the traits studied, suggesting a reservoir of genetic variation for the CV risk factors and other phenotypic traits. Also, heritability estimates vary over time or age among sexes. The longitudinal design and intensive phenotyping of the FHS participants increases the insights that may be obtained from the sample. For example, in this cohort, age can be matched and genetic variation can be measured over time to account for longitudinal changes in environmental factors affecting the trait of interest. Similarly, testing for the consistency of linkage peaks in relation to age or understanding pleiotropic gene action on seemingly different traits is facilitated by examining a sample such as the Framingham Heart study population. Answers obtained on genotype-environment interactions, using the FHS are already providing valuable insights toward designing additional studies and could further illuminate developing personalizing medications or interventions. Also, genome-wide association studies (e.g. Affymetrix116K chip) has made it possible to examine the genetic basis of numerous correlated traits and understand the challenges associated with such a large scale association study as well as examining the role of pleiotropy in the genome. The study is poised to perform association studies using the Affymetrix 550k chip. This effort should provide additional insights toward refining the locations of candidate and novel genes, as well as to ask other questions relating to functional aspects of the identified genes. Answers to these fundamental questions may hold promise toward applying genetics and evolutionary principles to both public health and to the practice of medicine.
The investigators are deeply appreciative to the three generations of Framingham Heart Study participants. DRG is thankful to Ms. Lynnel Lyons and Ms. Esta Shindler for their help in the literature survey. We also thank the core contract NO1-HC 25195, and RO1s HL076784, AG 028321.
The terms, Framingham Heart Study population and Framingham Heart Study cohort are used interchangeably.