Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3103104

Formats

Article sections

- Abstract
- Introduction
- Materials and methods
- Results and discussion
- Conclusions
- Supplementary Material
- References

Authors

Related links

Hum Genet. Author manuscript; available in PMC 2012 June 1.

Published in final edited form as:

Published online 2011 February 4. doi: 10.1007/s00439-011-0960-6

PMCID: PMC3103104

NIHMSID: NIHMS296383

Kiranmoy Das, Jiahan Li, Zhong Wang, Chunfa Tong, Guifang Fu, Yao Li, Meng Xu, Kwangmi Ahn, David Mauger, Runze Li, and Rongling Wu^{}

Kiranmoy Das, Department of Statistics, The Pennsylvania State University, University Park, PA, USA;

Rongling Wu: ude.usp.cmh.seh@uwr

K. Das and J. Li contributed equally to this work.

Although genome-wide association studies (GWAS) are widely used to identify the genetic and environmental etiology of a trait, several key issues related to their statistical power and biological relevance have remained unexplored. Here, we describe a novel statistical approach, called functional GWAS or *f*GWAS, to analyze the genetic control of traits by integrating biological principles of trait formation into the GWAS framework through mathematical and statistical bridges. *f*GWAS can address many fundamental questions, such as the patterns of genetic control over development, the duration of genetic effects, as well as what causes developmental trajectories to change or stop changing. In statistics, *f*GWAS displays increased power for gene detection by capitalizing on cumulative phenotypic variation in a longitudinal trait over time and increased robustness for manipulating sparse longitudinal data.

The past 3 years have witnessed a revolution in mapping the distribution of polygenes for human diseases and other complex traits by genome-wide association studies (GWAS) (Altshuler et al. 2008; Ikram et al. 2009; Psychiatric GCCC 2009; Hirschhorn 2009). This revolution has greatly inspired our hope that detailed genetic control mechanisms for complex phenotypes can be understood at the level of individual nucleotides or nucleotide combinations. Up until now, GWAS have reproducibly identified hundreds of loci, many of which affect the outcome of a disease or trait through its biochemical and metabolic pathways (Lettre and Rioux 2008; Mohlke et al. 2008; Hirschhorn and Lettre 2009; Shete et al. 2009; Styrkarsdottir et al. 2009; Turnbull et al. 2010). In the next few years, with ever-improving technologies for genotyping, GWAS methods will play a more important role in identifying genetic associations for complex traits and diseases, which may lead to new physiological or pathological hypotheses.

Despite its potential in genetic studies, there has been recognition of the limitations of using current GWAS approaches to elucidate a comprehensive genetic atlas of complex phenotypes. Some of the limitations include the following: (1) most GWAS have found genes that explain only small proportions of the genetic variance that occurs for many traits; (2) GWAS are based on a simple genotype– phenotype analysis, thus incapable of providing substantial knowledge about the biological and biochemical functions of significant genetic variants required for therapeutic applications; (3) GWAS in which phenotypes are assessed at a single time point cannot capitalize on full information of phenotypic expression, particularly in trials, in which phenotypic data are collected at irregular time intervals. Here, we argue that the power and biological relevance of GWAS can be enhanced by integrating the biological principle of trait formation into a general GWAS framework through mathematical or statistical functions. Such integration, leading to the birth of a new so-called functional GWAS (or *f*GWAS) approach, is shown to be able to address the limitations of classic GWAS methods.

The central tenet of *f*GWAS is founded on the fact that every trait or disease develops over a period of time. Ignoring this developmental process reduces the power of GWAS. Elaborating on this argument, Fig. 1 provides an example in which four individuals show a similar body height at adult age. Thus, if GWAS are used to detect genes contributing to final height using adult data of these individuals, no genes would be detected with the end point data. However, because these individuals use a different amount of time to reach the same final height, there is considerable inter-individual variation in growth rate. Therefore, compared to the final height at a single adult age, the use of growth trajectories as a phenotype to conduct GWAS will remarkably increase the power for gene detection. In general, human growth includes several distinct phases that children pass through from birth to adult, i.e., infancy, childhood, and puberty (Thompson and Thompson 2009). The difference in overall growth curves should be due to differences in one or more of these phases of development; thus, GWAS of growth curves allows one to identify specific stages in which genes play a central role in governing growth rates. Given that different developmental phases include a particular set of hormonal signals and physiological functions, GWAS are a powerful tool to identify and study genes that control growth-phase specific biochemical pathways.

Developmental genetics of complex traits is a science that has intrigued researchers for many decades. Traditional quantitative genetics can be integrated with developmental models to estimate the genetic variation of growth and development (Atchley and Zhu 1997). Meyer (2000) used random regression models to study the ontogenetic control of growth for animal breeding, whereas Kirkpatrick, Pletcher, and colleagues utilized the orthogonal, additive and universally practicable properties of Legendre polynomials to derive a series of genetic models for growth trajectories in the evolutionary context (Kirkpatrick et al. 1994a, b; Pletcher and Geyer 1999). These models have been instrumental in understanding the genetic control of growth by modeling the covariance matrix for growth traits measured at different time points. Ma et al. (2002) implemented a mixture-based method to map quantitative trait loci (QTLs) for developmental processes, greatly facilitating an understanding of the genetic and developmental regulation of trait formation (Cui et al. 2006; Wu and Lin 2006; He et al. 2010).

The integration of a biological process into GWAS will provide a useful means for studying developmental genetics by deciphering temporal mechanisms of genetic control of growth and development. Several fundamental questions in genetics can be addressed by *f*GWAS:

- How does a specific gene affect the pattern of development? In other words, what is the temporal pattern of expression and action of genes that control growth and development of dynamic traits?
- Are there different temporal patterns of genetic effects during development?
- Do genes interact with each other and the environment in a web of interactive network to regulate growth and development?
- Can we identify genes that exert pleiotropic effects of multiple traits during development?

*f*GWAS can capture genotypic differences at the level of phenotypic curves accumulated in the entire process of growth, thereby increasing the statistical power of gene detection. The statistical advantage of *f*GWAS can be strengthened when longitudinal data are measured irregularly, leading to data sparsity, a common phenomenon in a clinical trial. This type of phenotypic data, shown in Table S1, is characterized by three features: (1) each subject is measured at a limited number of time points, (2) time intervals are unevenly spaced for each subject, and (3) there are different schedules of measurement among subjects. Traditional GWAS have inherent limitations in their ability to handle such sparse longitudinal data. First, because only a small fraction of the subjects has measurements at a given time point, traditional GWAS based on a single time are unable to capitalize on all subjects, thus leading to biased parameter estimation and reduced gene detection ability. Second, individual subjects are measured at a few number of time points, limiting the fit of an informative curve.

In this manuscript, we describe the procedures we have implemented to evaluate the genetic, biological, and statistical merits of *f*GWAS using a GWAS data set from the Framingham Heart Study (FHS) (Dawber et al. 1951; Jaquish 2007; Fox et al. 2007). *f*GWAS is used to analyze and model the genetic control of age-specific body mass index (BMI), a highly heritable trait as a typical surrogate of obesity (Frayling 2007; Loos et al. 2008; Frayling et al. 2007; Scuteri et al. 2007). Significant SNPs are detected to affect the rates of change of BMI with ages.

The Framingham Heart Study (FHS), a cardiovascular study based in Framingham, Massachusetts, is supported by the National Heart, Lung, and Blood Institute, in collaboration with Boston University. Beginning in 1948 with 5,209 healthy men and women of European descent aged 30–60, the FHS is now on its third generation of participants. This longitudinal project plays a central role in advancing our understanding of the epidemiological cause of hypertensive or arteriosclerotic cardiovascular disease and continues to exert important impacts on the molecular and genetic mechanisms for cardiovascular diseases. All subjects underwent a medical history and physical examination (including body height and weight), laboratory tests, and electrocardiography. Examinations have been repeated every two or more years although different subjects may have different numbers and time intervals of measurements. Recently, 550,000 SNPs have been genotyped for the entire Framingham cohort (Jaquish 2007; Fox et al. 2007), from which we chose 977 subjects for *f*GWAS analysis and modeling using highly inheritable body mass index (BMI). As it is standard practice, SNPs with rare allele frequency <10% were removed from the *f*GWAS analysis. The numbers and percentages of non-rare allele SNPs vary among different chromosomes and ranges from 4,417 to 28,771 and from 0.64 to 0.72, respectively. All the data were downloaded, with permission, from the NIH webpage from which ethics approval, and informed consent from all participants can be obtained.

In the *f*GWAS of clinical data sets, longitudinal traits are measured at irregular and possibly subject-specific time points (see Table S1). Let **y**_{i} = (*y _{i}*(

$${y}_{i}({t}_{i\tau})={\displaystyle \sum _{j=1}^{3}}{\xi}_{i}{\mu}_{j}({t}_{i\tau})+{\beta}^{\mathrm{T}}({t}_{i\tau}){\mathbf{x}}_{i}+{e}_{i}({t}_{i\tau})+{\epsilon}_{i}({t}_{i\tau}),$$

(1)

where μ_{j}(*t*_{iτ}) is the mean value for genotype *j* at time *t*_{iτ}, **β** is a vector of regression coefficients of *p* covariates at time *t*_{iτ}, **x**_{i} is the *p* × 1 covariate vector for subject *i*, and *e _{i}*(

With time-dependent genotypic values, we can estimate the additive (*a*) and dominant effects (*d*) of the SNP in a time course, expressed as

$$a(t)=\frac{1}{2}[{\mu}_{1}(t)-{\mu}_{3}(t)],$$

(2)

$$d(t)={\mu}_{2}(t)-\frac{1}{2}[{\mu}_{1}(t)+{\mu}_{3}(t)].$$

(3)

If the residual errors of any two subjects are independent, we have the likelihood of **y**_{i} as

$$L(\mathbf{y})={\displaystyle \underset{}{\overset{}{i-1{n}_{1}}}{f}_{1}({\mathbf{y}}_{i}){\displaystyle \underset{}{\overset{}{i-1{n}_{2}}}{f}_{2}({\mathbf{y}}_{i}){\displaystyle \underset{}{\overset{}{i-1{n}_{3}}}{f}_{3}({\mathbf{y}}_{i})}}}$$

(4)

where *f _{j}*(

$$\begin{array}{c}{\mu}_{1}({t}_{i\tau})+{\mathbf{\beta}}^{\mathrm{T}}({t}_{i\tau}){\mathbf{x}}_{i},\text{for genotype}AA\hfill \\ {\mu}_{2}({t}_{i\tau})+{\mathbf{\beta}}^{\mathrm{T}}({t}_{i\tau}){\mathbf{x}}_{i},\text{for genotype}Aa\hfill \\ {\mu}_{3}({t}_{i\tau})+{\mathbf{\beta}}^{\mathrm{T}}({t}_{i\tau}){\mathbf{x}}_{i},\text{for genotype}aa\hfill \end{array}$$

(5)

and subject-specific covariance matrix

$$\sum _{i}}=\varphi \phantom{\rule{thinmathspace}{0ex}}[\begin{array}{cc}\hfill {\sigma}_{{t}_{i1}}^{2}\hfill & \hfill \hfill {\sigma}_{{t}_{i1}{t}_{{\mathit{\text{iT}}}_{i}}}\hfill \hfill & \hfill \hfill \hfill \hfill & \hfill {\sigma}_{{t}_{{\mathit{\text{iT}}}_{i}}{t}_{i1}}\hfill & \hfill \hfill {\sigma}_{{t}_{{\mathit{\text{iT}}}_{i}}}^{2}\hfill \hfill \hfill & ]\hfill & +(1-\varphi )\phantom{\rule{thinmathspace}{0ex}}[\begin{array}{cc}\hfill {\sigma}_{{t}_{i1}}^{2}\hfill & \hfill \hfill 0\hfill \hfill & \hfill \hfill \hfill \hfill & \hfill 0\hfill & \hfill \hfill {\sigma}_{{t}_{{\mathit{\text{iT}}}_{i}}}^{2}\hfill \hfill \hfill & ]\hfill & \varphi +(1-\varphi )\end{array}\end{array$$

(6)

In the covariance matrix (Eq. 6), the residual variance ${\sigma}_{{t}_{\mathit{\text{ix}}}}^{2}$ is composed of the permanent error variance due to the temporal pattern of longitudinal variables and the random error variance (also called the innovative variance) arising from random independent unpredictable errors. The relative magnitude of the permanent and innovative components is described by parameter ϕ. The covariance matrix (∑_{iP}) due to the permanent errors contains autocorrelation structure that can be modeled, whereas the random errors are often assumed to be independent among different time points so that the random covariance matrix ∑_{iR} is diagonal.

The first task for *f*GWAS involves modeling the mean vector (Eq. 5) for different SNP genotypes in a biologically and statistically meaningful way and modeling the longitudinal structure of covariance matrix (Eq. 6) in a statistically efficient and robust manner. In analyzing the BMI longitudinal data from the FHS project, we used a nonparametric approach based on Legendre orthogonal polynomials (LOP) to model time-dependent genotypic values.

For the given sparse longitudinal BMI data, we implemented a nonparametric approach based on Legendre orthogonal polynomials (LOP) to model age-specific genotypic values for each SNP genotype. The LOP are solutions to a differential equation, the Legendre equation:

$$(1-{x}^{2})\frac{{d}^{2}z}{{\mathit{\text{dx}}}^{2}}-2x\frac{\mathit{\text{dz}}}{\mathit{\text{dx}}}+r(r+1)z=0.$$

Let *P _{r}*(

$${P}_{r}(x)={\displaystyle \sum _{k=0}^{K}}{(-1)}^{k}\frac{(2r-2k)!}{{2}^{r}k!(r-k)!(r-2k)!}{x}^{r-2k}$$

where *k* = *r*/2 or (*r* – 1)/2 whichever is an integer. This polynomial is defined over the interval [−1, 1] and is orthogonal in this interval in the sense that ${\int}_{-1}^{1}}{P}_{r}(x){P}_{s}(x)\phantom{\rule{thinmathspace}{0ex}}\text{dx}=0$ when *r* ≠ *s*. Therefore, the longitudinal phenotype should be scaled to the range [−1, 1] by

$$t*=-1+\frac{2(t-{t}_{\text{min}})}{{t}_{\text{max}}-{t}_{\text{min}}},$$

where *t*_{min} and *t*_{max} are the first and last time points, respectively. The application of Legendre orthogonal polynomials (LOP) in nonparametric regression is not a new idea that we propose. Since these polynomials are orthogonal to each other and integrate to 0 in the interval [−1, 1], they have been applied to nonparametric regression (Meyer 2000), with parameter estimates possessing favorable asymptotic properties (Huskova and Sen 1985; McKay 1997). In practice, the LOP have been used to model time-specific phenotypic or genetic variation for milk production (Meyer 2000) and plant growth traits (Cui et al. 2006; Lin and Wu 2006). An ordinary polynomial regression might also work in this case, but we incorporate LOP in a hope to capitalize its well-established effectiveness in nonparametric regression.

Define a family of LOP with a particular order *r*,

$${\mathbf{P}}_{r}(t*)=[{P}_{0}(t*),{P}_{1}(t*),\dots ,{P}_{r}(t*)]$$

and a vector of genotypic base values

$${\mathbf{u}}_{r}=[{u}_{0},{u}_{1},\dots ,{P}_{r}].$$

Then time-dependent genotypic values in Eq. 5 can be described as a linear combination of **u**_{r} weighted by the family of LOP, i.e.,

$${\mu}_{j}({t}^{*})={\mathbf{P}}_{r}({t}^{*}){\mathbf{u}}_{r}^{\mathrm{T}}.$$

(7)

By assuming different orders of genotypic base vectors, we can find the best order of LOP to fit the longitudinal data set using model selection criteria, such as BIC.

In practice, other nonparametric approaches, such as B-spline, can also be used in *f*GWAS (Yang et al. 2009). If the traits studied have an explicit mathematical function, such as logistic equations for plant and animal growth (West et al. 2001; von Bertalanffy 1957; Richards 1959; Guiot et al. 2003, 2006), triple-logistic equations for human body growth (Bock and Thissen 1976), then parametric approaches can be used to model the changes of genotypic values over time. Because these mathematical functions are derived from fundamental principles of biology, their incorporation into *f*GWAS will facilitate the biological interpretation of genetic results (Wu and Lin 2006). If trait formation includes multiple distinct stages, then it is crucial to derive a semiparametric model that combines the precision and biological relevance of parametric approaches and the flexibility of nonparametric approaches because some traits can be modeled parametrically in one phase, but not so in others. Such a semiparametric model was derived in Cui et al. (2006) and can be well implemented in the *f*GWAS framework.

Robust modeling of longitudinal covariance structure (Eq. 5) is a prerequisite for appropriate statistical inference of genetic effects on longitudinal traits. From longitudinal BMI plots (Fig. S1), we found that interpersonal variation tends to be constant over age. Thus, a stationary autoregressive model was used for the covariance structure by which only two parameters, variance and correlation, are needed to be estimated. If variance and correlation are not stationary, nonstationary approaches, such as the structured antedependence (SAD) model, can be used (Zimmerman and Núñez-Antón 2001; Zhao et al. 2005). More generally, autoregressive moving average models are used [e.g., ARMA(*p*,*q*)] because they are capable of handling of complex covariance structure (Li et al. 2001). It is important to determine its optimal order to model covariance structure. A model selection procedure has been established to determine the most parsimonious approach. We have implemented all these approaches into the *f*GWAS model, allowing geneticists to select an optimal approach for covariance structure for their longitudinal data.

We can also implement nonparametric and semiparametric approaches for longitudinal covariance structure. For sparse longitudinal data, the efficient estimation of covariance structure is a significant concern for detecting the genes that control dynamic traits.

In Table S1’s data structure, subjects are measured at a number of time points (1–3), with intervals and number depending on each subject. But all subjects are projected in a space with a full measure schedule (10). Semiparametric approaches by Fan et al. (2007) and Fan and Wu (2008) will be a better choice for covariance structure modeling in *f*GWAS.

The significance test of the genetic effect of a SNP is a key for detecting significant genetic variants. This can be done by formulating the hypotheses as follows:

$$\begin{array}{c}{H}_{0}:a({t}_{i\tau})=d({t}_{i\tau})0\phantom{\rule{thinmathspace}{0ex}}\text{versus}\hfill & {H}_{1}:\text{At least one equality in the}\phantom{\rule{thinmathspace}{0ex}}{H}_{0}\phantom{\rule{thinmathspace}{0ex}}\text{does not hold}.\hfill \end{array}$$

(8)

The likelihoods under the null and alternative hypotheses are calculated, from which the log-likelihood ratio (LR) is computed. The LR value is supposed to be asymptotically χ^{2}-distributed with the degree of freedom equal to the difference in the numbers of unknown parameters under the H_{1} and H_{0}. The significance of individual SNPs will be adjusted for multiple comparisons with a standard approach such as the false discovery rate (FDR).

We can also test the additive (H_{0}: *a*(*t*_{iτ}) = 0) and dominant effects (H_{0}: *a*(*t*_{iτ}) = 0) of a SNP after it is detected to be significant. Similarly, the LR values are calculated separately for each test and compared with critical values determined from the χ^{2}-distribution.

The phenotypic measurements of BMI for the FHS are collected on an irregular schedule. Although individual subjects have a few repeated measurements (1–19), collectively they display many time points (75). Figure 2 gives age-specific trajectories of BMI for five males and five females randomly drawn from the FHS data. We divided the population according to sex group and used Legendre orthogonal polynomials (LOP)-based nonparametric approach (Meyer 2000; Lin and Wu 2006) to model age-specific changes of BMI. For this given data set, in which variance tends to be constant over age (Fig. S1), a simple first-order autoregressive model was used for the longitudinal covariance structure within the *f*GWAS framework. It turns out that longitudinal curves were fitted most parsimoniously by the LOP of order 3 for both sexes (see Fig. S2 for the BIC plot). SNP loci with a significant effect on age-specific changes in BMI were detected on different chromosomes for the two sexes (Fig. S3). Since we have millions of SNPs to be tested, multiple testing procedures need to be applied. We avoid using Bonferroni’s adjustment since it is too conservative; we focus to control false discovery rate (FDR) by using Benjamini-Hochberg (1995) algorithm. After adjusting for multiple comparisons with FDR, 8 and 4 SNPs were significant at *p* < 10^{−6} for males and females, respectively. Table 1 provides detailed information about the names of these significant SNPs and their chromosomal locations, alleles, and significance levels. Figure 3a, b shows the observed (black) versus expected (red) *p* values in −log scale with base 10 for male and female population, respectively. For both the populations, most of the observed *p* values are above the red line (expected *p* values). We observe 8 extreme points for male and 4 for female which are far from the other points. These extremes are eight SNPs we detect as significant for male and four SNPs significant for female. The trajectory curves of BMI for different genotypes at each significant SNP are illustrated in Fig. 4.

Plots of BMI measured at different ages for five males (**a**) and five females (**b**) randomly sampled from the Framingham Heart Study. The sparsity and irregularity of longitudinal data for BMI are reflected by subject-specific uneven-spaced age intervals

Age-specific trajectories of BMI in different sexes for three genotypes at each significant SNP detected from various chromosomes

Of all these significant SNPs detected, only one was found to affect BMI in both sexes, while the expression of all the others was sex-specific. Three genotypes at SNP rs4451518 on chromosome 1, rs2171168 on chromosome 3, and rs13124340 on chromosome 4 display strikingly different shapes of BMI curves in females, whereas genotypic curves at these SNPs are similar or overlap in males. There are notable discrepancies in BMI trajectories among three genotypes at SNP rs17782554 and rs11783045 on chromosome 8, rs7903156 on chromosome 10, rs7309679 on chromosome 12, rs9915696 on chromosome 17, rs948716 on chromosome 18, and rs747911 on chromosome 20 in males, but no differences were found in females. Such sex-specific expression should be one of the major causes of genotype × sex interactions for BMI trajectories. Unlike these SNPs, rs3903759 on chromosome 6 is expressed in both sexes but to different extents. The pattern of this sex-biased effect also contributes to genotype × sex interactions (Anholt and Mackay 2004).

Male-specific significant SNP, rs948716, is located on a similar region of the *MC4R* (melanocortin-4 receptor) gene on chromosome 18. Previous GWAS work using adult and child BMI data (also of European descent) detected a common SNP, rs17782313, mapped 188 kb downstream from *MC4R* (Loos et al. 2008). The results of our *f*GWAS analysis are in agreement with previous reports about the presence of common variants near *MC4R* which influences fat mass, weight and obesity risk. In a recent review on gene identification for type 2 diabetes, nine regions have been confirmed to harbor significant signals with this disease (Frayling 2007). These regions include two on chromosomes 3 and 10, respectively, and one on each chromosome 4, 6, 9, 12, and 16. We have noticed that a high proportion of SNP associations detected with *f*GWAS overlap with those detected on chromosomes 3, 4, 6, 10 and 12 by previous studies targeting type 2 diabetes. This is not surprising because BMI and type 2 diabetes are highly correlated (Frayling 2007). This observation also confirms the effectiveness of *f*GWAS.

In other GWAS for BMI (Frayling et al. 2007; Scuteri et al. 2007), significant noncoding SNPs were detected in an intron of the *FTO* (fat mass and obesity associated) gene on chromosome 16. Although *f*GWAS did not find highly significant SNPs on this chromosome, it did detect some SNPs with borderline significance (*p* < 10^{−4}) in both sexes. *f*GWAS has identified several SNPs associated with BMI (Table 1) which have not been detected in previous studies, possibly showing unique power of this new approach. To show the biological relevance of this gained power, we searched for the presence of candidate genes within the 500 kb region in which SNPs associated with related traits have been reported by other groups (Table S2). The detailed biochemical functions of these genes are given in Table S3, many of which are related to energy intake, obesity, and cardiovascular diseases (http://www.ncbi.nlm.nih.gov/gene).

Apart from increased power for gene detection, *f*GWAS can produce more biologically relevant results by studying the interplay between gene actions and trait progression. To show how the significant SNPs affect the change of BMI with age, we drew age-specific BMI curves for different genotypes at each significant SNP (Fig. 4). SNP rs3903759 on chromosome 6 triggers a pronounced effect on BMI in mid-age males and females, with the magnitude of the effect reducing with age and tending to be zero when entering old ages (Fig. 4d). Some SNPs, such as rs4451518 on chromosome 1 (Fig. 4a), rs13124340 on chromosome 4 (Fig. 4c), and rs948716 on chromosome 18 (Fig. 4j), actively affect BMI changes in middle and old ages.

Based on three genotypes at each SNP, we drew the curves of additive and dominant genetic effects for age-specific changes of BMI (Fig. 5). The additive effect is defined as the homozygote for the common allele minus the homozygote for the minor allele. The dominance effect reflects the interaction between the common and minor alleles at the same SNP. In general, SNPs control age-specific changes of BMI in different patterns. In males, additive effects of all SNPs on BMI tends to reduce with age, but the age-specific change of dominant effects show a complicated pattern, with some SNPs altering the direction of their effects. In females, there are more remarkable dominant effects compared with additive effects, especially at middle ages. Overdominant effects were detected at middle ages. A quantitative genetic theory has been established to explain the molecular basis of dominance and overdominance in terms of metabolic pathways (Kacser and Burns 1981; Keightley and Kacser 1987).

We performed a cross-validation analysis to test the reproducibility of results obtained from *f*GWAS. By randomly splitting all individuals in each sex into two groups of roughly the same size and analyzing each group with *f*GWAS, we obtained the same set of significant SNPs from each of these two groups and complete samples, suggesting an excellent reproducibility of our results. We performed simulation studies to examine the power of *f*GWAS to detect significant genes for longitudinal traits and the false-positive rates of the new approach. The data were simulated by assuming a segregating SNP with different minor allele frequencies (MAF) which are associated with a longitudinal trait. The trait was assumed to follow a sparse structure characterized by the FHS.

As shown in Table S4, *f*GWAS provides good estimates of the parameters that model genotypic curves using Legendre orthogonal polynomials and the autoregressive covariance structure. In general, a sample size of 1,000 is adequate for precise estimates of parameters for all genotypic curves when the MAF is 0.3 or larger. For those with MAF < 0.3, a larger sample size, say 2,000, is required. The power of gene detection estimated from the simulated data is about 0.8 or greater (Table 2), whereas the false-positive rates of *f*GWAS are less than 0.10 for a sample size of 1,000. Because our simulation was conducted by mimicking the FHS, it is likely that our power and FPR analyses reflect an actual case.

Genome-wide association studies with single nucleotide polymorphisms have proven to be a powerful tool for elucidating the role genetics plays in human health and disease. By analyzing hundreds of thousands of genetic variants in a particular population, this approach can identify the chromosomal distribution and function of multiple genetic changes that are associated with polygenic traits and diseases. Indeed, in the last 3 years, we have seen successful applications of GWAS in the study of complex traits and diseases of major medical importance such as human height, obesity, diabetes, coronary artery disease, and cancer (Altshuler et al. 2008; Ikram et al. 2009; Psychiatric GCCC 2009; Hirschhorn 2009; Lettre and Rioux 2008; Mohlke et al. 2008; Hirschhorn and Lettre 2009; Shete et al. 2009; Styrkarsdottir et al. 2009; Turnbull et al. 2010).

The successes and potential of GWAS have not been explored when complex phenotypes arise as a function of time. In any regard, a time function is more informative than a point in describing the biological or clinical feature of a trait. By integrating GWAS and functional aspects of dynamic traits, a new analytical model, called functional GWAS (*f*GWAS), can be naturally derived, which provides an unprecedented opportunity to study the genetic control of developmental traits. *f*GWAS is not only able to identify genes that determine the final form of the trait but also displays power to study the temporal pattern of genetic control in a time course. From a statistical standpoint, *f*GWAS capitalizes on the full information provided by growth and development of complex traits over time, increasing the power of gene identification. In particular, *f*GWAS is robust for handling longitudinal sparse data in which no single time point has the phenotypic data for all subjects facilitating the application of GWAS to study the genetic architecture of hard-to-measure traits.

With the completion of the Human Genome Project, it has been possible to draw a comprehensive picture of genetic control mechanisms of complex traits and processes and, ultimately, integrate genetic information into routine clinical therapies for disease treatment and prevention. To achieve this goal, there is a pressing need to develop powerful statistical and computational algorithms for detecting genes that determine dynamic traits. Unlike static traits, dynamic traits are described by a series of developmental processes composed of a large number of variables. In this article, we describe *f*GWAS, derived by integrating mathematical models for the molecular mechanisms and functions of biological processes into a likelihood framework. Next, we will need to extend *f*GWAS to consider the impact of genetic imprinting (Luedi et al. 2005) and copy number variants (McCarroll et al. 2008) to recover so-called missing heritability in GWAS (Bogardus 2009; Manolio et al. 2009). By then, we will be in an excellent position to ask and address any hypothesis tests at the interplay between genetics and developmental disorders.

Click here to view.^{(403K, doc)}

Click here to view.^{(317K, xlsx)}

This work is partially supported by grant DMS/NIGMS-0540745 to RW and NIDA, NIH grants R21 DA024260 and R21 DA024266 to RL. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIDA or the NIH.

**Electronic supplementary material** The online version of this article (doi:10.1007/s00439-011-0960-6) contains supplementary material, which is available to authorized users.

Kiranmoy Das, Department of Statistics, The Pennsylvania State University, University Park, PA, USA.

Jiahan Li, Department of Statistics, The Pennsylvania State University, University Park, PA, USA.

Zhong Wang, Department of Public Health Sciences, Pennsylvania State College of Medicine, Hershey, PA, USA.

Chunfa Tong, Department of Public Health Sciences, Pennsylvania State College of Medicine, Hershey, PA, USA.

Guifang Fu, Department of Statistics, The Pennsylvania State University, University Park, PA, USA.

Yao Li, Department of Statistics, West Virginia University, Morgantown, WV, USA.

Meng Xu, Department of Public Health Sciences, Pennsylvania State College of Medicine, Hershey, PA, USA.

Kwangmi Ahn, Department of Public Health Sciences, Pennsylvania State College of Medicine, Hershey, PA, USA.

David Mauger, Department of Public Health Sciences, Pennsylvania State College of Medicine, Hershey, PA, USA.

Runze Li, Department of Statistics, The Pennsylvania State University, University Park, PA, USA. Department of Public Health Sciences, Pennsylvania State College of Medicine, Hershey, PA, USA.

Rongling Wu, Department of Statistics, The Pennsylvania State University, University Park, PA, USA. Department of Public Health Sciences, Pennsylvania State College of Medicine, Hershey, PA, USA.

- Altshuler D, Daly MJ, Lander ES. Genetic mapping in human disease. Science. 2008;322:881–888. [PMC free article] [PubMed]
- Anholt RRH, Mackay TFC. Genetic analysis of complex behaviors in Drosophila. Nat Rev Genet. 2004;5:838–849. [PubMed]
- Atchley WR, Zhu J. Developmental quantitative genetics, conditional epigenetic variability and growth in mice. Genetics. 1997;147:765–776. [PubMed]
- Bock RD, Thissen D. Fitting multi-component models for growth in stature; Proceedings of the 9th international biometrics conference; 1976. pp. 431–442.
- Bogardus C. Missing heritability and GWAS utility. Obesity. 2009;17:209–210. [PMC free article] [PubMed]
- Cui Y, Zhu J, Wu RL. Functional mapping for genetic control of programmed cell death. Physiol Genomics. 2006;25:458–469. [PubMed]
- Dawber TR, Meadors GF, Moore FE., Jr Epidemiological approaches to heart disease: the Framingham Study. Am J Public Health Nations Health. 1951;41:279–286. [PubMed]
- Fan J, Wu Y. Semiparametric estimation of covariance matrixes for longitudinal data. J Am Stat Assoc. 2008;103:1520–1533. [PMC free article] [PubMed]
- Fan J, Huang T, Li R. Analysis of longitudinal data with semiparametric estimation of covariance function. J Am Stat Assoc. 2007;102:632–641. [PMC free article] [PubMed]
- Fox CS, Heard-Costa N, Cupples LA, Dupuis J, Vasan RS, et al. Genome-wide association to body mass index and waist circumference: the Framingham Heart Study 100K project. BMC Med Genet. 2007;8 Suppl 1:S18. [PMC free article] [PubMed]
- Frayling TM. Genome-wide association studies provide new insights into type 2 diabetes aetiology. Nat Rev Genet. 2007;8:657–662. [PubMed]
- Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM, et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science. 2007;316:889–894. [PMC free article] [PubMed]
- Guiot C, Degiorgis PG, Delsanto PP, Gabriele P, Deisboeck TS. Does tumor growth follow a “universal law”? J Theor Biol. 2003;225:147–151. [PubMed]
- Guiot C, Delsanto PP, Carpinteri A, Pugno N, Mansury Y, Deisboeck TS. The dynamic evolution of the power exponent in a universal growth model of tumors. J Theor Biol. 2006;240:459–463. [PubMed]
- He QL, Berg A, Li Y, Vallejos CE, Wu RL. Modeling genes for plant structure, development and evolution: functional mapping meets plant ontology. Trends Genet. 2010;26:39–46. [PubMed]
- Hirschhorn JN. Genomewide association studies—illuminating biologic pathways. New Engl J Med. 2009;360:1699–1701. [PubMed]
- Hirschhorn JN, Lettre G. Progress in genome-wide association studies of human height. Horm Res. 2009;71:5–13. [PubMed]
- Huskova M, Sen PK. On sequentially adaptive asymptotically efficient rank statistics. Seq Anal. 1985;4:125–151.
- Ikram MA, Seshadri S, Bis JC, Fornage M, DeStefano AL, et al. Genomewide association studies of stroke. New Engl J Med. 2009;360:1718–1728. [PMC free article] [PubMed]
- Jaquish C. The Framingham Heart Study, on its way to becoming the gold standard for cardiovascular genetic epidemiology? BMC Med Genet. 2007;8:63. [PMC free article] [PubMed]
- Kacser H, Burns JA. The molecular basis of dominance. Genetics. 1981;97:639–666. [PubMed]
- Keightley PD, Kacser H. Dominance, pleiotropy and metabolic structure. Genetics. 1987;117:319–329. [PubMed]
- Kirkpatrick M, Hill W, Thompson R. Estimating the covariance structure of traits during growth and ageing, illustrated with lactation in dairy cattle. Genet Res. 1994a;64:57–69. [PubMed]
- Kirkpatrick M, Lofsvold D, Bulmer M. Analysis of the inheritance, selection and evolution of growth trajectories. Genetics. 1994b;124:979–993. [PubMed]
- Lettre G, Rioux JD. Autoimmune diseases: insights from genome-wide association studies. Hum Mol Genet. 2008;17:R116–R121. [PMC free article] [PubMed]
- Li N, McMurry T, Berg A, Wang Z, Berceli SA, Wu RL. Functional clustering of periodic transcriptional profiles through ARMA(p,q) PLoS One. 2001;5(4):e9894. [PMC free article] [PubMed]
- Lin M, Wu RL. A joint model for nonparametric functional mapping of longitudinal trajectories and time-to-events. BMC Bioinformatics. 2006;7(1):138. [PMC free article] [PubMed]
- Loos RJ, Lindgren CM, Li S, Wheeler E, Zhao JH, et al. Common variants near MC4R are associated with fat mass, weight and risk of obesity. Nat Genet. 2008;40:768–775. [PMC free article] [PubMed]
- Luedi PP, Hartemink AJ, Jirtle RL. Genome-wide prediction of imprinted murine genes. Genome Res. 2005;15:875–884. [PubMed]
- Ma CX, Casella G, Wu RL. Functional mapping of quantitative trait loci underlying the character process: a theoretical framework. Genetics. 2002;161:1751–1762. [PubMed]
- Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. [PMC free article] [PubMed]
- McCarroll SA, Kuruvilla FG, Korn JM, et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat Genet. 2008;40:1166–1174. [PubMed]
- Mckay MD. Non-parametric variance based methods for assessing uncertainty importance. Reliab Eng Syst Saf. 1997;57:267–279.
- Meyer K. Random regressions to model phenotypic variation in monthly weights of Australian beef cows. Livest Prod Sci. 2000;65:19–38.
- Mohlke KL, Boehnke M, Abecasis GR. Metabolic and cardiovascular traits: an abundance of recently identified common genetic variants. Hum Mol Genet. 2008;17:R102–R108. [PubMed]
- Pletcher SD, Geyer CJ. The genetic analysis of age-dependent traits: modeling the character process. Genetics. 1999;151:825–835. [PubMed]
- Psychiatric GCCC. Genomewide association studies: history, rationale, and prospects for psychiatric disorders. Am J Psychiatry. 2009;166:540–556. [PubMed]
- Richards FJ. A flexible growth function for empirical use. J Exp Bot. 1959;10:290–300.
- Scuteri A, Sanna S, Chen WM, Uda M, Albai G, et al. Genomewide association scan shows genetic variants in the FTO gene are associated with obesity-related traits. PLoS Genet. 2007;3(7):e115. [PubMed]
- Shete S, Hosking FJ, Robertson LB, et al. Genome-wide association study identifies five susceptibility loci for glioma. Nat Genet. 2009;41:899–904. [PubMed]
- Styrkarsdottir U, Halldorsson BV, Gretarsdottir S, Gudbjartsson DF, Walters GB, et al. New sequence variants associated with bone mineral density. Nat Genet. 2009;41:15–17. [PubMed]
- Thompson P, Thompson PJL. Introduction to coaching theory. UK: Meyer & Meyer Sport; 2009.
- Turnbull C, Ahmed S, Morrison J, Pernet D, Renwick A, et al. Genome-wide association study identifies five new breast cancer susceptibility loci. Nat Genet. 2010;42:504–507. [PMC free article] [PubMed]
- von Bertalanffy L. Quantitative laws for metabolism and growth. Q Rev Biol. 1957;32:217–231. [PubMed]
- West GB, Brown JH, Enquist BJ. A general model for ontogenetic growth. Nature. 2001;413:628–631. [PubMed]
- Wu RL, Lin M. Functional mapping—how to study the genetic architecture of dynamic complex traits. Nat Rev Genet. 2006;7:229–237. [PubMed]
- Yang J, Wu RL, Casella G. Nonparametric functional mapping of quantitative trait loci. Biometrics. 2009;65:30–39. [PubMed]
- Zhao W, Chen YQ, Casella G, Cheverud JM, Wu R. A nonstationary model for functional mapping of complex traits. Bioinformatics. 2005;21:2469–2477. [PubMed]
- Zimmerman D, Núñez-Antón V. Parametric modelling of growth curve data: an overview (with discussions) Test. 2001;10:1–73.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |