Home | About | Journals | Submit | Contact Us | Français |

**|**PLoS Genet**|**v.6(2); 2010 February**|**PMC2829056

Formats

Article sections

Authors

Related links

PLoS Genet. 2010 February; 6(2): e1000864.

Published online 2010 February 26. doi: 10.1371/journal.pgen.1000864

PMCID: PMC2829056

Nicholas J. Schork, Editor^{}

University of California San Diego and The Scripps Research Institute, United States of America

* E-mail: ua.ude.rmiq@yarW.imoaN

Conceived and designed the experiments: NRW. Performed the experiments: NRW. Analyzed the data: NRW. Contributed reagents/materials/analysis tools: NRW JY MEG PMV. Wrote the paper: NRW MEG PMV.

Received 2009 May 28; Accepted 2010 January 28.

Copyright Wray et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

This article has been cited by other articles in PMC.

Genome-wide association studies in human populations have facilitated the creation of genomic profiles which combine the effects of many associated genetic variants to predict risk of disease. The area under the receiver operator characteristic (ROC) curve is a well established measure for determining the efficacy of tests in correctly classifying diseased and non-diseased individuals. We use quantitative genetics theory to provide insight into the genetic interpretation of the area under the ROC curve (AUC) when the test classifier is a predictor of genetic risk. Even when the proportion of genetic variance explained by the test is 100%, there is a maximum value for AUC that depends on the genetic epidemiology of the disease, i.e. either the sibling recurrence risk or heritability and disease prevalence. We derive an equation relating maximum AUC to heritability and disease prevalence. The expression can be reversed to calculate the proportion of genetic variance explained given AUC, disease prevalence, and heritability. We use published estimates of disease prevalence and sibling recurrence risk for 17 complex genetic diseases to calculate the proportion of genetic variance that a test must explain to achieve AUC=0.75; this varied from 0.10 to 0.74. We provide a genetic interpretation of AUC for use with predictors of genetic risk based on genomic profiles. We provide a strategy to estimate proportion of genetic variance explained on the liability scale from estimates of AUC, disease prevalence, and heritability (or sibling recurrence risk) available as an online calculator.

Genome-wide association studies in human populations have facilitated the creation of genomic profiles that combine the effects of many associated genetic variants to predict risk of disease. However, genomic profiles are inherently constrained in their ability to classify diseased from non-diseased individuals dictated by the genetic epidemiology of the disease. In this paper, we use a genetic interpretation to provide insight into the constraints on genomic profiles for risk prediction. We provide a strategy to estimate proportion of genetic variance explained on the liability scale from estimates of AUC, disease prevalence, and heritability available as an online calculator.

Genome-wide association studies in human populations have facilitated the creation of genomic profiles which combine the effects of many associated genetic variants to predict risk of disease. Genetic testing has long been available for Mendelian genetic diseases for which variants within one gene are directly responsible for the disease. In contrast, the etiology of complex genetic diseases, such those listed in Table 1, comprises both genetic and environmental risk factors. Results from genome-wide association studies have provided empirical evidence that very few associated genetic variants with effect size greater than odds ratio of 1.5 exist [1],[2]. Reconciliation of these effect sizes with the, often sizeable, estimates of heritability for many complex diseases (Table 1) means that we must expect there to be many (perhaps thousands) of genetic variants underlying complex disease if the effect size of any one variant is very small. It follows that each individual will carry a different, probably unique, portfolio of risk alleles. Whereas common risk variants have size too small to be used individually as risk predictors, profiles based on many associated genetic variants could provide useful predictions of genetic risk [3],[4]. We define genetic risk as the risk of disease given an individual's unique multi-locus genotype; genetic risk remains unchanged throughout an individual's lifetime and so could be predicted at birth prior to exposure to many environmental risk factors. Indeed, such risk predictions could be age specific, for example, risk of type 2 diabetes at 10 years, 20 years or 50 years if genomic profile sets based on empirical data were available for these scenarios which have age-specific genetic epidemiologies. As more variants are identified in the coming years, there will be increasing interest in the prospects of genomic profiling. It has been argued that genomic profiles should be assessed in terms of their clinical validity as diagnostic classifiers [5],[6]. The receiver operator characteristic (ROC) curve [7] is a well established tool for determining the efficacy of clinical diagnostic and prognostic tests in correctly classifying diseased and non-diseased individuals and has been used in the context of genomic profiling e.g., [6],[8],[9]. While the area under the ROC curve (AUC) is an important measure for clinical validity it does not tell the whole story as it does not differentiate between the accuracy with which the genomic profile predicts the true genetic risk of individuals and the accuracy with which true genetic risk predicts disease status, which is not under our control. We believe that the ability to differentiate between these components (i.e. the distinction between prediction of genotype and phenotype) is important for interpretation of the value of the genomic profile, particularly as the use of genomic profiles is very much in its infancy at present. Our knowledge of the genetic epidemiology of a disease means that we can know *a priori* that genomic profiles might not, on their own, be accurate diagnostic classifiers. For this reason, genomic profiles should judged in the first instance on the basis of their analytic validity [10] as predictors of *genetic* rather than *absolute* risk. Of course, in the long term genomic profiles can be combined with environmental risk factors to predict absolute risk in the context of clinical utility. Genomic profiles should improve upon family history which has long been used as a crude estimate of genetic risk (see Text S1).

In this paper, we provide insight into the genetic interpretation of AUC. We begin by considering quantitative traits for which the concepts of accuracy of risk prediction are well developed. For disease traits we differentiate between measures on the observed scale of disease versus the underlying scale of disease risk as we believe recognition of scale of measurement is often overlooked. We define *AUC _{max}* as the maximum AUC that could be achieved for a disease when the test classifier is a perfect predictor of genetic risk. We quantify the relationship between

For quantitative traits, in which phenotypic scores are (or can be transformed to be) normally distributed, the efficacy of a genomic profile is naturally expressed as the proportion of the genetic variance explained by the profile. The variance in phenotypes, *V _{P}*, can be partitioned into variance of genetic values,

For disease traits, the phenotype has two possible values, either affected or not affected. On this observed scale, the directly measurable genetic parameters are those of recurrence risks to relatives, *λ _{R}* for relatives of type

where cov(*X*, *R*) the covariance in disease status between diseased individuals *X* and their relatives on the observed disease risk scale [12]. For example, when the relatives are monozygous twins (*R*=*MZ*), Cov(*X*,*MZ*)= the genetic variance, with the subscript “01” denoting the all-or-none disease risk scale. On this scale, the majority of the genetic variance is non-additive, especially when disease prevalence is low [13],[14]. The broad sense heritability on this scale is =(*λ _{MZ}* -1)

The genetic properties of disease are much more easily understood by using the threshold liability model [11], in which risk of disease is transformed to a normally distributed liability scale *P ~N*(0, 1) and *P*=*A* + *E*, where *A*~*N*(0, ) are the genetic effects on the liability scale. On this scale the genetic effects combine in an additive way; is the narrow sense heritability on the liability scale (or heritability of liability) and on this scale broad sense and narrow sense heritability are equal. *E* are independent environmental effects, *E~N*(0,1-). The biological plausibility of an underlying normally distributed liability to disease is based on the assumption that complex traits are influenced by many variables; the central limit theorem states that the distribution of the sum of independent random variables approaches normality as the number of variables increases. Under the threshold liability model individuals are affected when *P* >*T*, where *T* is the threshold on the normal distribution which truncates the proportion of affected individuals or disease prevalence (i.e., *K*), *T*=Φ^{−1}(1-*K*), Φ(*T*)=1-*K*, where Φ(*T*) is the cumulative density function of the normal distribution up to values of *T*, e.g. if *K*=0.05, *T*=1.645. The threshold liability of risk scale has much nicer properties than the observed disease scale and provides a framework for comparison of scenarios independent of disease prevalence. The relationship between heritability of liability and the directly estimable parameters of *K* and *λ _{S} is*

(1)

[16] with and *z* the height of the standard normal curve and *T*_{1}=Φ^{−1}(1- *λ _{S} K*), i.e. the threshold

The AUC is a statistic calculated on the observed disease scale and is a measure of the efficacy of prediction of phenotype using a test classifier. The ROC plots the true positive rate (TPR or sensitivity) against the false-positive rate (FPR or 1-specificity). TPR =probability (positive test result|diseased) and FPR = probability (positive test result|not diseased). Since these probabilities are conditional, they are not dependent on the number of cases or controls tested, except through the sampling variance associated with them. In genomic profiling the ROC is obtained by ranking a set of individuals with known disease status by their genomic profile from lowest estimated risk (i.e., profile score) to highest estimated risk and then assessing sensitivity and specificity assuming a cut-off after each rank (starting with the highest ranked individual). If *n _{d}* and

(2)

(see example in Figure S1). Equally, AUC can be calculated as AUC=0.5(1 + *D*) where *D* is the Somers' rank correlation [17] between risk profile and disease status (1= diseased, 0= not diseased). Another equivalent definition of AUC is the probability that a randomly selected pair of diseased (*d*) and non-diseased (*d'*) individuals are accurately classified [18]. The probability is the same as the probability that difference between the genetic liability of the *d* and *d'* individuals is greater than zero. This difference is approximately normally distributed with mean *μ _{d}* -

where *v*=-*iK/*(1 – *K*). The genetic liabilities of the *d* and *d'* groups are each approximately normally distributed, the approximation being less accurate for high heritabilities.

Therefore,

(3)

A useful property of AUC (as discussed above) is that for a given disease the estimated AUC is independent of the relative proportions of cases and controls in the sample being classified [7], i.e. the mean rank is approximately the same if the proportion of cases: controls is *K*: (1-*K*) or 11. Or equally, the probability of a randomly selected case and control being correctly ranked is independent (except for sampling) of the number of cases and controls measured. We can use equation 3 to estimate the variance on the liability scale explained by a genomic profile, *x*, by making the subject of the equation, but renaming it as , recognising that it represents the proportion of variance explained by the profile. Then, from two measurable parameters, *K* and *AUC*, we can calculate ,

(4)

Where *Q*=Φ^{−1}(*AUC*). From this, we can calculate the proportion of the known genetic variance explained by the genomic profile

(5)

using the estimates of *K* and *λ _{S}* to calculate (equation 1). We can also calculate the proportion of the sibling risk explained by the profile, (

(6)

[19]. and (*λ _{S}*

We used simulation under the liability threshold model [11],[14] to check our derivations. We simulated 100,000 nuclear families sampling risk on the liability scale, *P*=*A* + *E*, *A ~ N*(0, ) for parents, and *A*=½*A _{dad}*+½

In Figure 1A we consider two diseases both with heritability of liability, =0.2, plotting probability of disease (*i.e. G _{01}*) vs genetic liability (

Figure 2 plots *AUC _{max}* vs , for

Table 1 lists *AUC _{max}* for a range of complex genetic diseases calculated using equation 3, with calculated using equation 1 from published estimates of

Using equations (4) and (5) we calculate for the diseases listed in Table 1 when AUC=0.75. The results (Table 1) show that the same AUC can represent quite different successes of the genomic profile in representing the known genetic variance, ranging from 0.10 to 0.74. If we are able to explain half of the known genetic variance with identified risk variants then genomic profiles for most complex genetic disease (*AUC _{half},* Table 1) will achieve some clinical validity as AUC is >0.75 for all but bladder cancer, for the examples provided.

Consider the first listed example in Table 1, age related macular degeneration (AMD).

Based on the review of Scholl et al [21] and the large twin study of Seddon et al [22] we have used a prevalence after 80 years age of advanced AMD *K*=11.8% and a sibling recurrence risk representing the genetic contribution of *λ _{S}*=2.2, which correspond to heritability on the liability scale of =0.68 (equation 1). If the genetic test explains all the genetic variance (=1), the maximum AUC that could be achieved by a genomic profile is

The AUC is a widely used statistic that summarises the clinical validity of a diagnostic or prognostic test. However, the AUC statistic of a genomic profile alone has an upper limit (*i.e. AUC _{max}*) which depends on the genetic epidemiology of the disease, namely the disease prevalence and heritability. It is important that in the first instance, particularly when genomic profiling is in its infancy, that genomic profiles are judged on their ability to predict genetic risk (their analytic validity) rather than on the basis of clinical validity [10]. Since AUC is estimated as a function of a rank correlation its genetic interpretation is not immediately obvious. Here we provide a genetic interpretation of the AUC expressed in terms of it genetic epidemiology parameters (equation 3). A relationship between

Initially, it may seem counter-intuitive that AUC depends on disease prevalence since for an individual disease TPR and FPR are independent of the proportion of cases and controls measured and therefore of the sample prevalence. However, as we have clearly shown (Figure 1A and 1B) the dependence on disease prevalence results from our ability to generalise across diseases in the context of a test classifier being a genomic profile.

In contrast to our results and those of Janssens et al [3], Clayton [24] provided an expression for ROC under a polygenic model which is independent of population disease prevalence. His derivation assumes that the effect of each locus is additive on the log risk scale [25]. Slatkin [26] and we [27] have found that this model allows probabilities of disease that exceed one, which although they occur with low frequency can have substantial impact on the estimates of recurrence risk and genetic variance. Under this model there is a relationship between recurrence risk to monozygotic twins and to siblings of *λ _{MZ}*/=1; this ratio is not achieved when probabilities of disease are constrained to their natural parameter space of a maximum of 1. Furthermore, empirical estimates of the ratio of

AUC is a useful measure because of its independence of the numbers of diseased and diseased individuals tested, but we advocate the reporting of an estimate of the proportion of the known genetic variance on the liability scale () or the proportion of sibling risk accounted for by the profile and we provide a method to do this using the estimated AUC, disease prevalence and heritability on the liability scale or sibling recurrence risk (equation 5). An AUC of 0.75 can imply anything from 0.10 to 0.74 of the genetic variance explained by the genomic profile for the complex diseases listed in Table 1. The correlation has long been the benchmark in non-human genetics of accuracy of genetic risk predictors. can be calculated from three measurable statistics, disease prevalence, sibling recurrence risk and AUC of the profile (using equations (1) and (4)). In this way, estimates of AUC can provide direct estimates of the proportion of ‘missing heritability’ [28] which takes into account the interdependence of identified associated variants.

Currently, the derivation of genomic profiles is very much in its infancy. As the sample size of genome-wide association studies increase, we can expect genomic profiles to include more and more validated associated variants. However, is constrained by the variance that could be detected by the markers that are genotyped recognising that the current generation of genome-wide chips explain at most ~80% of the known variance in single nucleotide polymorphisms across the Caucasian genome [29]. This, in turn, may only be a fraction of the total genomic variance once structural variants such as copy number variants are included [30]. The actual variance explained by the profile depends on the sample size (i.e., power) of the studies from which associated genetic variants have been detected. It is likely that there are many variants which have such a small effect size that they will be impossible to detect even with very large samples. Although each such variant makes only a very small contribution to the genetic variance, there may be so many that a sizeable proportion of the variance will go undetected. Even if only quarter of the genetic variance is detectable by our future genotyping technology, the AUC is still greater for the genomic profile than for family history (ignoring shared environmental risks of family members, Text S1).

In our derivations we have assumed the liability threshold model [11],[14]. Slatkin [26] demonstrated that the threshold model was one of several genetic models that provided the necessary steep increase in probability of disease with increasing load of genetic risk alleles [26]. The main assumption of the liability threshold model is that the distribution of liability scores is unimodal which should be achieved as long as there is no single unidentified genetic or environmental of very large effect [11]. The model accommodates any distribution of risk allele effect sizes and risk allele frequencies as long as there are sufficient (“more than one or a few” [11]) risk alleles in the population to create an approximately normal distribution of genetic liability scores. Since our simulation results of *AUC _{max}* vs (Figure 3) based on the liability threshold model agree with those of Janssens et al [3] who used a logit model to combine genetic risks from individual genetic variants, it is clear that the dependence of

We have also assumed that a genetic profile is applied in the same “average” environment as the genetic risks were estimated and we have assumed that all familiality is of genetic origin. The *AUC _{max}* will be lower than those derived here if any part of the sibling recurrence risk reflects co-variation of non-genetic origin. Using recurrence risks from different types of relatives, the importance of common environmental factors can be assessed and a

We have provided a genetic interpretation of and insight into the AUC statistic calculated under a genomic profile. Time will tell if genetic variants amenable to genotyping are able to reconstruct the known genetic variance in its totality. Even if it is possible to explain only a quarter of the known genetic variance, the genomic profile will be a more useful predictor of genetic risk than self-reported family history (in the absence of shared environmental risk factors) which is a commonly used measure for targeted screening programmes for complex genetic diseases. In practice, predictions of risk to disease will incorporate both genetic and environmental risk factors to produce the best predictions of absolute risk to disease. Here we provide a benchmark for the expected contribution from the genetic component of the prediction illustrating that the same AUC estimated for different diseases can imply quite different proportions of genetic variance explained by the genomic profile, which is often overlooked (e.g. [5]). Ultimately, genomic profiles may be used without contributions from environmental risk factors, since the contribution from the genomic profile can be estimated perinatally, prior to exposure by many environmental risk factors and when limited family history of disease is available. Indeed, one purpose of a genetic risk predictor is to allow individuals to choose to modify their exposure to environmental risks. We provide a simple online calculator (http://gump.qimr.edu.au/genroc) to calculate i) the maximum AUC for a genomic profile of a disease given estimates of disease prevalence and sibling recurrence risk or heritability of liability, ii) the proportion of variance explained on the liability scale given an estimate of AUC from a risk predictor and disease prevalence and iii) proportion of genetic variance or of sibling risk explained given an estimate AUC, disease prevalence and sibling recurrence risk [2].

Example calculation of ROC curve for a genomic profile. An example of *n _{d}*=9 diseased (case) and

(0.07 MB TIF)

Click here for additional data file.^{(69K, tif)}

AUC related statistics for complex genetic diseases: Table 1 with added columns considering family history.

(0.10 MB PDF)

Click here for additional data file.^{(102K, pdf)}

AUC based on family history as a prediction of genetic risk.

(0.07 MB PDF)

Click here for additional data file.^{(70K, pdf)}

We would like to thank three reviewers for the constructive criticism on earlier versions of this manuscript.

The authors have declared that no competing interests exist.

This work was supported by the Australian National Health and Medical Research Council (grants 389892, 442915, and 496688), and by the Australian Research Council (grant DP0770096). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

1. McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Reviews Genetics. 2008;9:356–369. [PubMed]

2. Iles MM. What can genome-wide association studies tell us about the genetics of common disease? PLoS Genet. 2008;4:e33. doi: 10.1371/journal.pgen.0040033. [PubMed]

3. Janssens AC, Aulchenko YS, Elefante S, Borsboom GJ, Steyerberg EW, et al. Predictive testing for complex diseases using multiple genes: fact or fiction? Genet Med. 2006;8:395–400. [PubMed]

4. Wray NR, Goddard ME, Visscher PM. Prediction of individual genetic risk to disease from genome-wide association studies. Genome Res. 2007;17:1520–1528. [PubMed]

5. Kraft P, Wacholder S, Cornelis MC, Hu FB, Hayes RB, et al. OPINION Beyond odds - ratios communicating disease risk based on genetic profiles. Nature Reviews Genetics. 2009;10:264–269. [PubMed]

6. Jakobsdottir J, Gorin MB, Conley YP, Ferrell RE, Weeks DE. Interpretation of genetic association studies: markers with replicated highly significant odds ratios may be poor classifiers. PLoS Genet. 2009;5:e1000337. doi: 10.1371/journal.pgen.1000337. [PMC free article] [PubMed]

7. Metz CE. Basic principles of ROC analysis. Seminars in Nuclear Medicine. 1978;8:283–298. [PubMed]

8. Lu Q, Elston RC. Using the optimal receiver operating characteristic curve to design a predictive genetic test, exemplified with type 2 diabetes. American Journal of Human Genetics. 2008;82:641–651. [PubMed]

9. van der Net JB, Janssens A, Defesche JC, Kastelein JJP, Sijbrands EJG, et al. Usefulness of Genetic Polymorphisms and Conventional Risk Factors to Predict Coronary Heart Disease in Patients With Familial Hypercholesterolemia. American Journal of Cardiology. 2009;103:375–380. [PubMed]

10. Grosse SD, Khoury MJ. What is the clinical utility of genetic testing? Genet Med. 2006;8:448–450. [PubMed]

11. Falconer D, Mackay T. England: Longman; 1996. Introduction to Quantitative Genetics.

12. James JW. Frequency in relatives for an all-or-none trait. Ann Hum Genet. 1971;35:47–49. [PubMed]

13. Dempster ER, Lerner IM. Heritability of Threshold Characters. Genetics. 1950;35:212–236. [PubMed]

14. Lynch M, Walsh B. Sunderland, Massachusetts: Sinauer Associates, Inc; 1998. Genetics and Analysis of Quantitative Traits.

15. Robertson A, Lerner IM. The heritability of all-or-none traits - viability of poultry. Genetics. 1949;34:395–411. [PubMed]

16. Reich T, James JW, Morris CA. The use of multiple thresholds in determining the mode of transmission of semi-continuous traits. Ann Hum Genet. 1972;36:163–184. [PubMed]

17. Somers RH. A new asymmetric measure of association for ordinal variables. American Sociological Review. 1962;27:799–811.

18. Hanley J, McNeil B. The Meaning and Use of the Area under a Receiver Operating Characteristic (ROC) Curve. Radiology. 1982;143 [PubMed]

19. Yang J, Visscher PM, Wray NR. Sporadic cases are the norm for common disease. European Journal of Human Genetics 2009 Oct 14. [Epub ahead of print] 2009 [PMC free article] [PubMed]

20. Janssens AC, Moonesinghe R, Yang Q, Steyerberg EW, van Duijn CM, et al. The impact of genotype frequencies on the clinical validity of genomic profiling for predicting common chronic diseases. Genet Med. 2007;9:528–535. [PubMed]

21. Scholl HPN, Fleckenstein M, Issa PC, Keilhauer C, Holz FG, et al. An update on the genetics of age-related macular degeneration. Molecular Vision. 2007;13:196–205. [PMC free article] [PubMed]

22. Seddon JM, Cote J, Page WF, Aggen SH, Neale MC. The US twin study of age-related macular degeneration - Relative roles of genetic and einivironmental influences. Archives of Ophthalmology. 2005;123:321–327. [PubMed]

23. Gu J, Pauer GJ, Yue X, Narendra U, Sturgill GM, et al. Assessing susceptibility to age-related macular degeneration with proteomic and genomic biomarkers. Mol Cell Proteomics. 2009;8:1338–1349. [PMC free article] [PubMed]

24. Clayton DG. Prediction and interaction in complex disease genetics: experience in type 1 diabetes. PLoS Genet. 2009;5:e1000540. doi: 10.1371/journal.pgen.1000540. [PMC free article] [PubMed]

25. Risch N. Linkage strategies for genetically complex traits. I. Multilocus models. Am J Hum Genet. 1990;46:222–228. [PubMed]

26. Slatkin M. Exchangeable models of complex inherited diseases. Genetics. 2008;179:2253–2261. [PubMed]

27. Wray NR, Goddard ME. Multi-locus models of genetic risk of disease. Genome Medicine In press 2010 [PMC free article] [PubMed]

28. Maher B. Personal genomes: The case of the missing heritability. Nature. 2008;456:18–21. [PubMed]

29. Bhangale TR, Rieder MJ, Nickerson DA. Estimating coverage and power for genetic association studies using near-complete variation data. Nature Genetics. 2008;40:841–843. [PubMed]

30. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, et al. Global variation in copy number in the human genome. Nature. 2006;444:444–454. [PMC free article] [PubMed]

31. Youngson NA, Whitelaw E. Transgenerational epigenetic effects. Annual Review of Genomics and Human Genetics. 2008;9:233–257. [PubMed]

32. Baker SG, Cook NR, Vickers A, Kramer BS. Using relative utility curves to evaluate risk prediction. Journal of the Royal Statistical Society. 2009;172:729–748. [PMC free article] [PubMed]

33. Levinson DF. The genetics of depression: A review. Biological Psychiatry. 2006;60:84–92. [PubMed]

34. Sullivan PF, Neale MC, Kendler KS. Genetic epidemiology of major depression: Review and meta-analysis. American Journal of Psychiatry. 2000;157:1552–1562. [PubMed]

35. Marenberg ME, Risch N, Berkman LF, Floderus B, Defaire U. Genetic susceptibility to death from coronary heart disease in a study of twins. New England Journal of Medicine. 1994;330:1041–1046. [PubMed]

36. Risch N. The genetic epidemiology of cancer: interpreting family and twin studies and their implications for molecular genetic approaches. Cancer Epidemiol Biomarkers Prev. 2001;10:733–741. [PubMed]

37. Das SK, Elbein SC. The Genetic Basis of Type 2 Diabetes. Cellscience. 2006;2:100–131. [PMC free article] [PubMed]

38. Hemminki K, Li X, Sundquist K, Sundquist J. Familial risks for asthma among twins and other siblings based on hospitalizations in Sweden. Clinical and Experimental Allergy. 2007;37:1320–1325. [PubMed]

39. Craddock N, Khodel V, Van Eerdewegh P, Reich T. Mathematical limits of multilocus models: the genetic transmission of bipolar disorder. Am J Hum Genet. 1995;57:690–702. [PubMed]

40. Lichtenstein P, Yip BH, Bjork C, Pawitan Y, Cannon TD, et al. Common genetic determinants of schizophrenia and bipolar disorder in Swedish families: a population-based study. Lancet. 2009;373:234–239. [PubMed]

41. McGue M, Gottesman II, Rao DC. The transmission of schizophrenia under a multifactorial threshold model. American Journal of Human Genetics. 1983;35:1161–1178. [PubMed]

42. Harney S, Wordsworth BP. Genetic epidemiology of rheumatoid arthritis. Tissue Antigens. 2002;60:465–473. [PubMed]

43. Hyttinen V, Kaprio J, Kinnunen L, Koskenvuo M, Tuomilehto J. Genetic liability of type 1 diabetes and the onset age among 22,650 young Finnish twin pairs - A nationwide follow-up study. Diabetes. 2003;52:1052–1055. [PubMed]

44. WTCCC. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678. [PMC free article] [PubMed]

45. Harley JB, Alarcon-Riquelme ME, Criswell LA, Jacob CO, Kimberly RP, et al. Genome-wide association scan in women with systemic lupus erythematosus identifies susceptibility variants in ITGAM, PXK, KIAA1542 and other loci. Nat Genet. 2008;40:204–210. [PubMed]

46. Sing T, Sander O, Beerenwinkel N, Lengauer T. ROCR: visualizing classifier performance in R. Bioinformatics. 2005;21:3940–3941. [PubMed]

Articles from PLoS Genetics are provided here courtesy of **Public Library of Science**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |