Genetic research has identified a number of genes for which inherited mutations confer a significantly increased risk of disease (Weiss 1993
, Jimenez-Sanchez et al. 2001
). These mutations give rise to specific syndromes and to clustering of disease phenotypes within families. Examples of major public health interest occur in cancer (Foulkes and Hodgson 1998
). The Breast-Ovarian Cancer Syndrome, which accounts for 5% to 10% of breast cancer in the US (Weber 1998
), is caused by germline mutations of the BRCA1 or BRCA2 genes. Hereditary Nonpoliposis Colorectal Cancer (HNPCC) (Vogelstein and Kinzler 1998
, Lynch and de la Chapelle 1999
), which accounts for up to 5% of all diagnoses of colorectal cancer in the US, can be caused by a germline mutation of any one of a set of known DNA mismatch repair (MMR) genes including MSH2, MLH1 and others.
In relation to these syndromes, a common question concerns the probability that an individual carries a deleterious germline mutation of a disease gene, given a certain pattern of disease diagnoses in the individual’s family history. This calculation is referred to here as carrier status prediction, or risk prediction. Its applications are in two areas. In clinical counseling of concerned individuals, risk prediction provides important support to decision making about genetic testing, disease prophylaxis, family planning and other issues. In research it provides a flexible approach to modeling and analyzing family data in situations in which testing is impractical but extensive family history is available.
Carrier status prediction in genetic counseling concerns inference on the genotype of an individual (the counselee) conditional on information about his/her disease history and his/her relatives’ disease and genotype history (a pedigree). Two broad classes of modeling approaches have been used so far: the Empirical
approaches model the conditional distribution of genotype given phenotype directly, by applying statistical or artificial intelligence techniques to pedigree data for tested individuals; in contrast, Mendelian
models are built upon the conditional distributions of phenotypes given genotype (penetrance), and the marginal distributions of genotypes (prevalence). The probabilities required for counseling are then derived from these using Bayes’ rule and Mendel’s laws (Murphy and Mutalik 1969
, Elston and Stewart 1971
, Szolovits and Pauker 1992
, Offit and Brown 1994
, Parmigiani, Berry and Aguilar 1998
, Antoniou et al. 2000
Mendelian risk prediction models exploit domain knowledge of Mendelian inheritance and other biological characteristics of susceptibility genes and thus can incorporate pedigree features at higher resolution, provide intuitive parameterization in terms of penetrance and prevalence, and can be extended easily to arbitrary pedigrees. Validation studies in cancer models indicate that Mendelian models provide a well founded approach to genetic counseling, and improved predictive performance compared to empirical approaches (Berry et al. 2002
, Marroni et al. 2004
In cancer a widely used Mendelian model is BRCAPRO, which assesses the probability that an individual carries a germline deleterious mutation of the BRCA1 and BRCA2 genes, based on his or her family’s history of breast and ovarian cancer (Berry et al. 1997
, Parmigiani, Berry, Iversen, Müller, Schildkraut and Winer 1998
, Parmigiani, Berry and Aguilar 1998
, Iversen et al. 2000
). BRCAPRO assumes autosomal dominant inheritance, which is supported extensively by previous analyses (Newman et al. 1988
). Following the template of BRCAPRO, CRCAPRO was later developed for the genes MSH2 and MLH1, involved in the HNPCC syndrome. CRCAPRO has a similar structure to BRCAPRO and uses information about colorectal and endometrial cancer, as well as microsatellite instability.
Here, expanding on the principles of BRCAPRO and CRCAPRO, we introduce BayesMendel, a generic tool for building Mendelian risk prediction models for autosomal dominant genes. Currently, the development of models merging Mendelian principles and state-of-the art statistical techniques requires substantial statistical and computational expertise. Our tool is designed to enable genetic epidemiologists to flexibly implement their own Mendelian models for novel syndromes and local subpopulations, without reprogramming complex statistical analyses and prediction tools. It will also allow other groups to contribute to our models by developing variants and input data for specific applications, countries, et cetera. We expect BayesMendel to increase the impact and usefulness of Mendelian models in cancer prevention. Applications of this tool will extend to inherited familial syndromes beyond cancer.
BayesMendel is distributed as a library under the open source environment R (Ihaka and Gentleman 1996
), an intuitive, highly functional and extensible programming language. R provides users with a comprehensive, state-of-the-art statistical analysis toolbox based entirely on free and open source code.
In the remainder of this article, Section 2
reviews the theoretical basis of the Mendelian risk prediction approach. Section 3
presents the functionality and object–oriented structure of the library. Finally Section 4
discusses current limitations and possible future extensions.