|Home | About | Journals | Submit | Contact Us | Français|
New DNA sequencing methods will soon make it possible to identify all germline variants in any individual at a reasonable cost. However, the ability of whole-genome sequencing to predict predisposition to common diseases in the general population is unknown. To estimate this predictive capacity, we use the concept of a “genometype”. A specific genometype represents the genomes in the population conferring a specific level of genetic risk for a specified disease. Using this concept, we estimated the capacity of whole-genome sequencing to identify individuals at clinically significant risk for 24 different diseases. Our estimates were derived from the analysis of large numbers of monozygotic twin pairs; twins of a pair share the same genometype and therefore identical genetic risk factors. Our analyses indicate that: (i) for 23 of the 24 diseases, the majority of individuals will receive negative test results, (ii) these negative test results will, in general, not be very informative, as the risk of developing 19 of the 24 diseases in those who test negative will still be, at minimum, 50 - 80% of that in the general population, and (iii) on the positive side, in the best-case scenario more than 90% of tested individuals might be alerted to a clinically significant predisposition to at least one disease. These results have important implications for the valuation of genetic testing by industry, health insurance companies, public policy makers and consumers.
As a result of continuing advances in high-throughput sequencing technologies (1–4), whole-genome sequencing will soon become an affordable approach to identify all sequence variants in an individual human. Recent evidence suggests that each human genome has more than 3 million sequence variants, some common, some infrequent (5). To date, several thousand genomic variants have been associated with human diseases, either as rare variants in Mendelian disorders or as common SNPs in genome-wide association studies (GWAS) (6, 7). Whole-genome or whole-exome sequencing has recently been used to identify new disease predisposing variants in various familial disorders, such as familial pancreatic cancer (8) and Miller syndrome (9). However, the potential utility of genome-wide sequencing for personalized medicine in the general population is unclear. Suppose, for example, that sequencing becomes sufficiently inexpensive that all individuals, at birth, could have their genomes sequenced at negligible cost. What fraction of the population would benefit from such sequencing? “Benefit” in this context is defined as receiving information indicating that the risk of disease is increased or decreased to a degree that would alter an individual's lifestyle or medical management.
On the surface, it might seem impossible to answer this question at present, as there are millions of genetic variants in every individual and the contribution of nearly all of these variants to any disease is unknown. However, there is one group of individuals in which this question can be immediately addressed: monozygotic twin pairs. If one twin of the pair has a disease, then the probability of the other twin developing that disease is dependent on the genome whenever that disease has some genetic component. We show below that when this logic is applied to a large numbers of twins, estimates of the potential benefits of genome-wide sequencing in the general (non-twin) population can be made.
The key to our analysis is the concept of a “genometype”. We do not know the genomic sequences of the twin pairs analyzed in the studies described herein, but we do know that each twin pair shares a nearly identical genome (10) and that a genome confers a particular genetic risk to every disease. For each disease, we group genomes that confer identical genetic risks into genometypes. For example, genometypes could be grouped into 20 bins, with genometypes in bin 1 conferring zero genetic risk, genometypes in bin 2 conferring 3% genetic risk, genometypes in bin 3 conferring 10% genetic risk, etc. We can then estimate what distributions of genometypes in the population best reflect the observed monozygotic twin concordancy and discordancy for any given disease.
In twin studies on diseases, heritability (defined in Box 1) is generally based on the difference in the incidence of a disease in monozygotic versus dizygotic twins (11, 12). Heritability reflects the average genetic contribution to disease in a twin population. We are interested in the distribution of genetic risks rather than the average. For example, a 30% average risk could reflect a small fraction of twin-pairs with genometypes conferring high genetic risk or a larger fraction of twin-pairs with genometypes conferring a moderate genetic risk. Among all the distributions of genometypes that are compatible with the twin epidemiologic data, we wished to find the distributions that maximized or minimized the potential clinical utility of identifying those genometypes by genomic sequencing.
Whole-genome sequencing-based tests, like any genetic test, can be informative in two ways: negative and positive tests would indicate a substantially lower or higher risk, respectively, than that of the general population. The challenge is to define “substantially” in clinically meaningful and quantitative terms. An example might help put this challenge into perspective. Suppose a woman receives a whole-genome test result indicating that she has a 90% lifetime risk (the total risk over her entire life) of developing breast cancer. She may decide to have a prophylactic double mastectomy to prevent this outcome. Similarly, if the test indicated an 80% or even a 50% lifetime risk of developing breast cancer, she may consider mastectomy. On the other hand, if the test indicated only a 14% risk of developing breast cancer, then mastectomies would be considered by very few women, given that most women today do not opt for prophylactic mastectomies even though the lifetime risk of developing breast cancer in the general population is 12%.
This example illustrates that the risk threshold required for clinical utility represents a balance between the risk reduction afforded by an intervention and its negative consequences. A precedent exists for defining this threshold, in that the decision to implement genetic tests is often based on a positive predictive value (PPV) of at least 10%, implying that more than 1 in 10 patients with a positive test result are expected to develop disease (13). While the choice of this threshold will depend on the specific intervention and should ideally be left to the individual, we use this 10% threshold for our population-level analyses of 20 of the 24 diseases analyzed (table S1). In the other four diseases (chronic fatigue syndrome, gastro-esophageal reflux disorder, coronary heart disease-related death and general dystocia), which occur at relatively high frequency in the population, this 10% threshold is inadequate to distinguish individuals with a significantly increased genetic risk from the rest of the population. For these four diseases (table S1), a more appropriate threshold corresponds to one conferring a genetic risk that is at least as great as that of the non-genetic component. Individuals with genometypes conferring this degree of genetic risk would therefore have a total risk at least-twice as large as those without any genetic predisposing factors. This 2x threshold in relative risk is similar to those widely used as clinical benchmarks for common diseases (14–18).
For whole-genome testing in healthy individuals, we thereby defined a threshold at which a positive test result would be clinically meaningful as follows. If the non-genetic risk was <5%, then the threshold was set at 10%. If the non-genetic risk was >5%, then the threshold was set at 2x the non-genetic contribution. Though we have used these particular thresholds in most of the examples described below, we also describe how these results varied when other thresholds were considered.
We collated monozygotic twin pair data from the Swedish Twin Registry, Danish Twin Registry, Finnish Twin Cohort, Norwegian National Birth Registry and the National Academy of Science – National Research World War II Veteran Twins Registry (19–31) (Table 1). From these registries, we selected data representing 24 diseases of diverse etiologies including autoimmune diseases, cancer, cardiovascular diseases, genitourinary diseases, neurological diseases and obesity-associated diseases. Three of these conditions (coronary heart disease, cancer and stroke) represent the leading causes of mortality in the United States, accounted for 54.2% of total deaths in 2007, and are therefore of major public health importance (32). The thresholds for a clinically meaningful test result, as defined above, were calculated from disease prevalence and non-genetic risks in the populations from which the twins were drawn (19–31) (Materials and Methods, Table 1 and table S2).
We then developed computational methods to evaluate possible frequency (f) and genetic risk (r) combinations for a population containing 20 genometypes. Genometype frequency is defined as the proportion of twin pairs in the population that have a given genometype (Box 1). Genometype genetic risk is defined, for each disease, as the absolute increment in risk that an individual with that genometype will face compared to someone with no genetic risk at all (Box 1). For any combination of genometypes, each with a certain frequency and genetic risk, we obtain an expected distribution of disease-affected individuals among a monozygotic twin cohort. Many different combinations of genometype frequencies and genetic risks match the observed distributions in monozygotic twins; we are interested in those combinations (distributions) that maximize or minimize clinical utility, thus putting bounds on the expectations from whole-genome sequencing. The mathematical framework for our study, and associated statistical and technical issues, are detailed in the Material and Methods.
These analyses allowed us to address various measures of potential clinical utility. First, for each disease, what is the maximum and minimum fraction of patients with the disease that would receive a positive test, i.e., a result indicating that they have a substantially increased risk of that disease? The answers to this question are graphically shown in Fig. 1 for each of the 24 diseases (for three diseases, we present different answers for males and females, resulting in a total of 27 disease categories). As can be seen from Fig. 1, the fraction of patients that would receive a positive test varies widely from disease to disease. The majority of patients (>50%) who would ultimately develop 13 of the 27 disease categories would not test positive, even in the best-case scenario. On the other hand, there were four disease categories - thyroid autoimmunity, type I diabetes, Alzheimer's disease, and coronary heart disease-related deaths in males - for which genetic tests might identify more than 75% of the patients who ultimately develop the disease. Genometype risk and frequency distributions for all diseases are shown in table S3 and graphically for representative diseases in fig. S1.
We could also determine the maximum and minimum fraction of individuals in the population (rather than the fraction of patients with disease) who would receive positive test results for each disease. As shown in Fig. 2, this fraction is generally small, as expected, because the incidence of most diseases is relatively low. Do these negative tests, which would be received by the great majority of individuals for most diseases, have value? Negative tests could be valuable to individual patients if they indicated a considerably lower total risk than would be assumed in the absence of testing. As can be seen from Fig. 3, though, negative tests are generally not very informative in the case of whole-genome sequencing as they are limited by the non-genetic compoment of risk. For 22 of the 27 disease categories studied, a negative test would not indicate a risk that is less than half that in the general population, even in the best-case scenario. This level of risk reduction is probably not sufficient to warrant changes of behavior, lifestyle, or preventative medical practices for these individuals (33, 34). On the other hand, there was one disease category (Alzheimer's disease, Fig. 3) in which a negative test result might indicate as little as a ~12% relative risk of disease compared to the entire twin cohort, at least in the best-case scenario. Knowledge of such a reduced risk might be comforting and relieve anxiety, particularly to those with a family history of Alzheimer's disease.
What is the maximum fraction of individuals that could receive at least one positive test result, i.e., a report indicating that s/he is at risk for at least one of the 24 diseases assessed? From the data depicted in Fig. 2, we estimate that >95% of men and >90% of women could receive at least one positive test result if the risk alleles were actually distributed in the way that produced maximal sensitivity in our model. We assumed that the risk alleles for these 24 diseases were independent in these estimates; if they were not independent, then these figures represent overestimates. On the other hand, these frequencies may represent underestimates as there are a number of additional diseases with hereditary components that have not yet been studied in monozygotic twins or included in our analyses. At the very least, if we consider only distinct disease categories whose pathogenesis is unlikely to be shared, our analyses suggest that, in the best-case scenario, the majority of tested individuals might be alerted to a clinically meaningful risk by whole-genome sequencing.
It was of interest to determine how the results described above varied with the threshold chosen for the analysis. For example, it might be argued that a threshold of 10% was too low for true clinical utility. Our analyses show that the maximum fraction of affected cases testing positive, as well as the maximum fraction of the total population that tests positive, is not changed much when the thresholds are changed to 20% (tables S4 and table S5). With very high thresholds, however, both these measures of sensitivity decrease significantly (table S4 and table S5). Moreover, the maximum predictive value of a negative test drops precipitously at higher thresholds (table S6).
The general public does not appear to be aware that, despite their very similar height and appearance, monozygotic twins in general do not always develop or die from the same maladies (35, 36). This basic observation, that monozygotic twins of a pair are not always afflicted by the same maladies, combined with extensive epidemiologic studies of twins and statistical modeling, allows us to estimate upper- and lower-bounds of the predictive value of whole-genome sequencing.
On the negative side, our results show that the majority of tested individuals would receive negative tests for most diseases (Fig. 2). Moreover, the predictive value of these negative tests would generally be small, as the total risk for acquiring the disease in an individual testing negative would be similar to that of the general population (Fig. 3). On the positive side, our results show that, at least in the best-case scenario, the majority of patients might be alerted to a clinically meaningful risk for at least one disease through whole-genome sequencing.
These conclusions are consistent with what is now known about risk allele loci from genome-wide association studies (GWAS) (37). In general, GWAS have shown that many loci can predispose to disease and that each risk allele confers a relatively small effect (38, 39). For example, a recent analysis of large cohorts of individuals with colorectal cancer showed that only ~1.3% of phenotypic variance could be accounted for by the 10 loci discovered through GWAS (40). However, it could be argued that the relatively low level of utility that might be inferred from such studies is misleading. In particular, it is possible that a more complete knowledge of disease-associated variants and their epistatic relationships would be able to reliably predict who will and who will not develop disease in the general population. Our results allow us to estimate the maximum possible reliability of such tests.
Several of our conclusions are based on the genometype frequency and risk distributions that would maximize the clinical utility of genetic testing, i.e., are best-case scenarios. The actual frequency and risk distributions of genometypes in the population are not likely to be distributed in this way. Indeed, other distributions are also consistent with the monozygotic twin data on which our maxima are determined and all other distributions yield less clinical utility than those of the maxima, as shown in Figs. 1 to to3.3. Moreover, in the real world, it is unlikely that the biomedical correlates of every genetic variant and the epistatic relationships among these variants will ever be completely known, or that the analytic validity of genetic testing will be perfect - as we assume in our ideal scenario. Thus, our conclusions purposely overestimate the value of whole-genome sequencing that will be achieved - they represent an absolute upper bound that cannot be improved by improvements in technology or genetic knowledge. As a practical example of this principle, we estimate that a negative whole-genome sequencing-based test could indicate a ~ two-fold decrease in risk for prostate cancer in men and a similar two-fold decrease for urinary incontinence in women. But this two-fold decrease would only apply in a world in which the risk alleles are distributed in a fashion that maximizes the sensitivity of whole genome testing (Fig. 3). In the real world, the risk alleles are not likely to be distributed in this ideal fashion, and omniscience about every variant is not likely to be realized. Thus, the risk of these diseases in patients who test negative will likely be even more similar to that of the general population. For diseases with a lower heritable component, such as most forms of cancer, whole-genome based genetic tests will be even less informative. Thus, our results suggest that genetic testing, at its best, will not be the dominant determinant of patient care and will not be a substitute for preventative medicine strategies incorporating routine checkups and risk management based on the history, physical status and life style of the patient.
It is important to point out that our study focused on testing relatively common diseases in the general population and did not address the utility of whole-genome sequencing to identify the genetic basis of rare monogenic diseases. In such unusual cases, it has already been shown that whole-genome sequencing can prove highly informative (8, 9).
As with any model-based study, our conclusions have a number of caveats. Our analyses are based on data from twin studies and the assumptions made therein (11). Specifically, we do not model gene-environment interactions and rely on the prevalence of disease in the twin cohorts; this prevalence, as well as the operative non-genetic contributions, may differ from that in the general population. Though twins are likely to be representative of the general population, the estimates provided by our model could be improved through analyses of larger twin cohorts as these become available, as well as through a more complete phenotypic evaluation of twins of varying ethnicities. Another caveat is that our conclusions about potential utility are based on thresholds that represent a complex balance of personal choices, demographic influences, disease characteristics and the clinical intervention(s) available. We have used a minimum 10% total risk and a minimum relative risk of 2 as the threshold in our analyses. Other thresholds may be more appropriate and meaningful for given situations, though the data in table S4 to table S6 show that our major conclusions are not altered much by the choice of threshold.
In sum, no result, including ours, can or should be used to conclude that whole-genome sequencing will be either useful or useless in an absolute sense. This utility will depend on the results of testing, the individual tested, and the perspectives of individuals and societies. What we hoped to accomplish with this study is to put the debate about the value of such sequencing in a mathematical framework so that the potential merits and limitations of whole-genome sequencing, for any disease, can be quantitatively assessed. Recognition of these merits and limits can be useful to consumers, researchers, and industry, as they can minimize unrealistic expectations and foster fruitful investigations.
We used data from twin studies arising from population-based twin registries to investigate the distribution of disease risk within the population (19–31). The registries in our study included the Swedish Twin Registry, Danish Twin Registry, Finnish Twin Cohort, Norwegian National Birth Registry and the National Academy of Science – National Research Council World War II Veteran Twins Registry. Traits were chosen that represented diverse etiologies or were conditions of significant public health importance. We evaluated diseases in the following categories: autoimmune (T1D, thyroid auto-antibodies), neoplastic (breast, colorectal and prostate cancer), cerebrovascular (coronary heart disease-related death and stroke-related death), genitourinary (general dystocia, pelvic organ prolapse, and urinary incontinence), unknown etiology (irritable bowel syndrome, chronic fatigue), neurological (Parkinson disease, Alzheimer's disease and dementia) and obesity-associated (T2D, gallstone disease).
To be included in our analyses, the following data had to be available for each twin study:
Using the data from population-based twin studies, we define cohort risk (CR) - the fraction of people in the cohort that had the disease - as follows:
We define the following generative model that characterizes the joint distribution of an individual having a pre-specified disease and a particular genometype. Each individual is characterized by: (i) a binary (Bernoulli) random variable, Z, specifying whether or not s/he has the disease, and (ii) a categorical random variable, G, indicating the genometype of the individual. This means that of the d assumed extant genometypes, each individual can have only one of them. The joint distribution of both the disease and genometype for an individual is given by P(Z, G). This joint distribution decomposes into a product of the likelihood of getting the disease given the genometype, P(Z | G), and the prior probability of having the genometype, P(G)
Thus, to proceed, we specify both the likelihood function, P(Z | G), and the prior, P(G). As mentioned above, G is a categorical random variable taking values g1,g2,...,gd, each of which with some probability. Therefore we have:
for all i=1,2,...,d. In words, a person can have one of the d assumed extant genometypes, and the probability of having genometype i is given by fi.
The probability of having the disease given a genometype is qi=P(Z= 1|G=gi). Assume that qi is a sum of a non-genetic risk, e, which is assumed to be constant for the whole population, and genetic risk, ri, that is, qi=e+ri (note that 0 ≤ qi ≤ 1). Non-genetic risk ( e ) is the proportion of people in the population that would get the disease if all had the most favorable genometype possible. Non-genetic risk includes all factors that are not inherited, including environmental exposures (e.g., diet, carcinogens), epigenetic alterations and stochastic influences. We estimated it as: e = CR(1– HER) (see below). This model assumes that all risks are either non-genetic or genetic, i.e., no interactions. We require that the unknown parameters, ri, must be between 0 and 1 - e, for all i. Therefore, for a given genometype, the likelihood term for genometype i is given by:
Thus, the joint distribution of disease and genometype can be written as:
If the available data included the genometype and disease status of each individual, then inferring estimates of the parameters, r =(r1,...,rd), and f = (f1, ...,fd), would be relatively straightforward. However, the available data include only the disease status of monozygotic twins. These represent observations of disease status in two individuals with identical genometypes. Therefore, we can describe a joint distribution for monozygotic twins having a disease or not. Let Zj = Z(Xj) be the Bernoulli random variable indicating whether a particular individual has disease and let Zk = Z(Xk) be the Bernoulli random variable for the co-twin. Similarly, let Gj = G(Xj) and Gk = G(Xk) be categorical random variables indicating whether twin j or k of a pair has some particular genometype. The distribution of disease within monozygotic twins can be divided into three distinct groups, namely: disease concordant, discordant, and healthy concordant pairs.
The probability of disease concordant monozygotic twins is given by:
Similarly, the probability of healthy concordant monozygotic twin pairs is given by:
And the probability of monozygotic twin pairs discordant for disease is given by:
For each disease, let nc, nh and nd correspond to the number of concordant disease, healthy and discordant twin pairs. Assuming that there are d genometypes, the expected number of twin pairs of each of the three types is simply the total number of twin pairs times the probability of being each kind of twin pair:
Because we are interested in the limits of utility of genetic testing, we search for a parameter set that maximizes or minimizes the fraction of patients that will receive a positive test result, given certain constraints. Formally, we define the positive fraction (PF) as the proportion, among twin pairs with at least one disease case, that possess a genometype sufficient to change clinical action. In our notation:
where t is the genetic risk required for a person to be at the threshold required for clinical utility and d is the maximum number of genometypes under consideration. The thresholds for each disease are provided in table S2, and for each disease, t is defined as this threshold minus e.
We therefore seek to solve the following optimization problem, for each disease:
where Eq. (14) enforces that none of the residual errors can be larger than 0.5. The parameter nx is the estimated number of twin pairs of each type obtained by plugging the estimated parameters into Eqs. (9) – (11). This is therefore a quadratically constrained nonlinear optimization problem. We utilize the following algorithm to obtain a local optimum.
For d’ = 2, i.e., starting with d’ = 2 genometypes, we implement a grid search over the parameter space and select the parameters that maximize the likelihood over a constrained search space. Let θ = (f,r) and Θ be the set of all θ's under consideration, as defined by the feasible region specified in Eq. (14). We then discretize this space into nine bins for each element of f and 100 bins for each element of r and denote P(Z|G) by Pθ(Z|G) to emphasize the dependence of the joint distribution on the parameter. Thus, we aim to solve the following optimization problem:
where is the parameter estimate assuming only d’ genometypes. For each d’ = 3,...,20, we seek to solve the above optimization problem. To initialize, we pad the previous solution with zeros, yielding and similarly for . Then we use MATLAB's fmincon to find a local maximum of PF given the constraints. If no improvement in PF is obtained for d’ +1 genometypes using the default “padded” initialization, we try randomly initializing. We stop trying random initializations if any of the following criteria are met: (i) if we find an improvement in PF with the constraints satisfied, (ii) if we reach 100% PF, or (iii) if we reach 15 random initializations. If criterion (i) is met, we denote the parameters achieving the improvement and then increment d’ and continue. If criterion (ii) is met, we stop incrementing d’, as we have achieved the maximum possible PF, so adding additional genometypes cannot possibly maximize it further. If criterion (iii) is met, we let ; that is, we let our final estimate for d’ +1 simply be our estimate for d’ padded with a zero. We then increment d’.
We repeat the above approach for each disease. The parameters that we determined using this approach to maximize PF were then used to estimate the percentage of the population testing positive for a given disease, as well as the relative risk of disease for those individuals testing negative, as defined below. We apply this approach separately for each disease, thus assuming independence. To find the minimum PFs compatible with the twin data, we used a simiilar procedure.
We determined the relative risk of disease of individuals whose whole-genome sequencing tests were negative after maximizing or minimizing the sensitivity (PF) of the test. Disease risk in the population testing negative (DRneg) is the ratio of the number of disease cases testing negative to the number of individuals in the population testing negative:
To determine the relative risk of disease if testing negative (RRneg), we calculated the ratio of disease risk of individuals testing negative to the disease risk in the twin cohort (CR):
We defined relative risk (RR) in table S2 as the minimum total risk of individuals with genometypes carrying a given genetic risk compared to the total risk of individuals with genometypes carrying a genetic risk of 0% (i.e., determined solely by non-genetic factors). The minimum total risk was determined using the standard 10% risk threshold described in the text as well as others (tables S4 to S6). In all cases,
Equation (14) enforces that none of the residual errors can be larger than 0.5, such that upon rounding we obtain a perfect fit. Changing this parameter from 0.5 to 0.01 did not alter the PF's depicted in Fig. 1 for any disease.
Instead of maximizing PF's, we also determined the distributions of genometype risks (ri) and frequencies (fi) that would minimize the relative risk of disease of individuals whose whole-genome sequencing tests were negative. This independent optimization yielded results nearly identical to those reported in Fig. 1, Fig. 2, and Fig. 3.
As noted above, we estimated the non-genetic risk as e = CR(1– HER). This risk is somewhat higher than that derived from the standard liability threshold (LT) model. However, it has recently been shown that the LT model underestimates the non-genetic contribution to disease because it does not take into account synergistic interactions among genes (41). The model described herein does not make any assumptions about the nature of the interactions between genes, such as additivity. However, the LT model can also be used to approximate the maximum capacity of whole genome sequencing to detect individuals at pre-defined risks under certain simplifying assumptions about the distribution of risk alleles in the population. The PF predictions from the LT model employing 10% thresholds are provided in table S4 and can be compared to the results of the current model with 10% thresholds (table S4).
Finally, our model can be used to calculate the potential clinical utility of whole-genome sequencing under any assumption about the proportion of non-genetic contributions to disease risk, or estimates thereof. Representative values for each disease, with non-genetic contributions ranging from 10% to 90%, are provided in table S7.
|Genometype||A set of genomes that confer a specific genetic risk for a given disease|
|Genometype genetic risk (r)||The genetic risk conferred by a given genometype|
|Genometype frequency (f)||The frequency of a given genometype in the general poulation|
|Threshold||Minimum risk for a given disease considered to be clinically meaningful|
|Heritability (HER)||Proportion of phenotypic variance associated with genetic factors|
|Cohort risk (CR)||Risk of disease in the relevant twin cohort|
|Non-genetic risk (e)||Proportion of cohort risk due to non-genetic factors|
|Total risk||Sum of genetic risk conferred by a given genometype plus non-genetic risk|
|Relative risk||Ratio of total risk associated with a given genometype to cohort risk|
We thank Naomi Wray and Donald Geman for critical comments regarding the manuscript, and Katie Kinzler for technical assistance. Funding: The project was supported by The Lustgarten Foundation for Pancreatic Cancer Research, The Virginia and D. K. Ludwig Fund for Cancer Research, AACR Stand Up To Cancer-Dream Team Translational Cancer Research Grant, The Dr. Miriam and Sheldon G. Adelson Medical Research Foundation, The European Community's Seventh Framework Programme, NIH grants CA43460, CA57345, CA62924, CA121113, and NCI contract N01-CN-43302.
Author contributions: N.J.R, J.T.V., G.P., K.W.K, B.V. and V.E.V designed the study; N.J.R, J.T.V. and V.E.V. generated and analyzed data; N.J.R., J.T.V, and B.V. wrote the manuscript.
Competing Interests: B.V., K.W.K and V.E.V are a co-founders of Inostics and Personal Genome Diagnostics and are members of their Scientific Advisory Boards. K.W.K., B.V., and V.E.V own Inostics and Personal Genome Diagnostics stock, which is subject to certain restrictions under University policy. The terms of these arrangements are managed by the Johns Hopkins University in accordance with its conflict-of-interest policies. G.P. is on the scientific advisory board of Counsyl.
Citation: N. J. Roberts, J. T. Vogelstein, G. Parmigiani, K. W. Kinzler, B. Vogelstein, V. E. Velculescu, The Predictive Capacity of Personal Genome Sequencing. Sci. Transl. Med. 10.1126/scitranslmed.3003380 (2012).