Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Stat Appl Genet Mol Biol. Author manuscript; available in PMC 2008 March 24.
Published in final edited form as:
PMCID: PMC2274007

BayesMendel: an R Environment for Mendelian Risk Prediction*


Several important syndromes are caused by deleterious germline mutations of individual genes. In both clinical and research applications it is useful to evaluate the probability that an individual carries an inherited genetic variant of these genes, and to predict the risk of disease for that individual, using information on his/her family history. Mendelian risk prediction models accomplish these goals by integrating Mendelian principles and state-of-the-art statistical models to describe phenotype/genotype relationships. Here we introduce an R library called BayesMendel that allows implementation of Mendelian models in research and counseling settings. Bayes-Mendel is implemented in an object-oriented structure in the language R and distributed freely as an open source library. In its first release, it includes two major cancer syndromes: the breast-ovarian cancer syndrome and the hereditary non-polyposis colorectal cancer syndrome, along with up-to-date estimates of penetrance and prevalence for the corresponding genes. Input genetic parameters can be easily modified by users. BayesMendel can also serve as a generic tool for genetic epidemiologists to flexibly implement their own Mendelian models for novel syndromes and local subpopulations, without reprogramming complex statistical analyses and prediction tools.

1 Introduction

Genetic research has identified a number of genes for which inherited mutations confer a significantly increased risk of disease (Weiss 1993, Jimenez-Sanchez et al. 2001). These mutations give rise to specific syndromes and to clustering of disease phenotypes within families. Examples of major public health interest occur in cancer (Foulkes and Hodgson 1998). The Breast-Ovarian Cancer Syndrome, which accounts for 5% to 10% of breast cancer in the US (Weber 1998), is caused by germline mutations of the BRCA1 or BRCA2 genes. Hereditary Nonpoliposis Colorectal Cancer (HNPCC) (Vogelstein and Kinzler 1998, Lynch and de la Chapelle 1999), which accounts for up to 5% of all diagnoses of colorectal cancer in the US, can be caused by a germline mutation of any one of a set of known DNA mismatch repair (MMR) genes including MSH2, MLH1 and others.

In relation to these syndromes, a common question concerns the probability that an individual carries a deleterious germline mutation of a disease gene, given a certain pattern of disease diagnoses in the individual’s family history. This calculation is referred to here as carrier status prediction, or risk prediction. Its applications are in two areas. In clinical counseling of concerned individuals, risk prediction provides important support to decision making about genetic testing, disease prophylaxis, family planning and other issues. In research it provides a flexible approach to modeling and analyzing family data in situations in which testing is impractical but extensive family history is available.

Carrier status prediction in genetic counseling concerns inference on the genotype of an individual (the counselee) conditional on information about his/her disease history and his/her relatives’ disease and genotype history (a pedigree). Two broad classes of modeling approaches have been used so far: the Empirical approaches model the conditional distribution of genotype given phenotype directly, by applying statistical or artificial intelligence techniques to pedigree data for tested individuals; in contrast, Mendelian models are built upon the conditional distributions of phenotypes given genotype (penetrance), and the marginal distributions of genotypes (prevalence). The probabilities required for counseling are then derived from these using Bayes’ rule and Mendel’s laws (Murphy and Mutalik 1969, Elston and Stewart 1971, Szolovits and Pauker 1992, Offit and Brown 1994, Parmigiani, Berry and Aguilar 1998, Antoniou et al. 2000).

Mendelian risk prediction models exploit domain knowledge of Mendelian inheritance and other biological characteristics of susceptibility genes and thus can incorporate pedigree features at higher resolution, provide intuitive parameterization in terms of penetrance and prevalence, and can be extended easily to arbitrary pedigrees. Validation studies in cancer models indicate that Mendelian models provide a well founded approach to genetic counseling, and improved predictive performance compared to empirical approaches (Berry et al. 2002, Marroni et al. 2004).

In cancer a widely used Mendelian model is BRCAPRO, which assesses the probability that an individual carries a germline deleterious mutation of the BRCA1 and BRCA2 genes, based on his or her family’s history of breast and ovarian cancer (Berry et al. 1997, Parmigiani, Berry, Iversen, Müller, Schildkraut and Winer 1998, Parmigiani, Berry and Aguilar 1998, Iversen et al. 2000). BRCAPRO assumes autosomal dominant inheritance, which is supported extensively by previous analyses (Newman et al. 1988). Following the template of BRCAPRO, CRCAPRO was later developed for the genes MSH2 and MLH1, involved in the HNPCC syndrome. CRCAPRO has a similar structure to BRCAPRO and uses information about colorectal and endometrial cancer, as well as microsatellite instability.

Here, expanding on the principles of BRCAPRO and CRCAPRO, we introduce BayesMendel, a generic tool for building Mendelian risk prediction models for autosomal dominant genes. Currently, the development of models merging Mendelian principles and state-of-the art statistical techniques requires substantial statistical and computational expertise. Our tool is designed to enable genetic epidemiologists to flexibly implement their own Mendelian models for novel syndromes and local subpopulations, without reprogramming complex statistical analyses and prediction tools. It will also allow other groups to contribute to our models by developing variants and input data for specific applications, countries, et cetera. We expect BayesMendel to increase the impact and usefulness of Mendelian models in cancer prevention. Applications of this tool will extend to inherited familial syndromes beyond cancer.

BayesMendel is distributed as a library under the open source environment R (Ihaka and Gentleman 1996), an intuitive, highly functional and extensible programming language. R provides users with a comprehensive, state-of-the-art statistical analysis toolbox based entirely on free and open source code.

In the remainder of this article, Section 2 reviews the theoretical basis of the Mendelian risk prediction approach. Section 3 presents the functionality and object–oriented structure of the library. Finally Section 4 discusses current limitations and possible future extensions.

2 Methods

2.1 Theory

Risk prediction calculations performed by the BayesMendel library are based on a general approach, which can be described as follows. Let γ0 be the vector of genotypes of the counselee at each of the genes considered by the model. Each coordinate represents a different locus. In the current formulation, all deleterious variants are assumed to have the same phenotypic implications. For each coordinate, γ0 is either 0, 1, or 2 depending on whether the individual is non—carrier, heterozygous carrier, or homozygous carrier of a deleterious mutation at the locus. Let R be the number of relatives of the counselee, let r be a relative in the family, and let γr, for r = 1, …, R be the corresponding genotype vectors. Similarly, we denote by h0, h1, …, hR the relevant phenotypes and ages of onset of the counselee and relatives. For example in BRCAPRO, for each relative r, the vector hr includes information on affected status for all relevant cancer sites, with age of onset if affected, or current age or age at death if unaffected. In addition to this information, other individual specific covariates such as being of Ashkenazi Jewish origin are provided by Xr, r = 0, 1, …, R, and the exact relationship of each relative to the counselee is known,

Carrier Probability Calculation

Our goal is to obtain the probability distribution of the counselee genotype given the family history, covariates and pedigree structure, that is


We suppress the notation of X0, X1, …, XR later on to keep the mathematical expressions concise. However conditioning on the covariates is implied in all calculations.

The genotype distribution of Expression (1) can be obtained using a two–step process: an updating step and an integration step via the law of total probability: the updating step is based on the mathematical identity


which is an instance of Bayes’ rule. The unconditional carrier probability p0) (or prevalence) is updated to incorporate information from the pedigree. The term p(h0,h1, …, hR0) is the probability of the phenotypes for the whole pedigree given the genotype of the counselee. This is complex to evaluate directly, but it can be simplified using the law of total probability:


which considers explicitly the unobserved genotypes of the relatives. In the current approach we make the additional assumption that individual histories are conditionally independent given the genotypes, and obtain:


The term p1,…,γR0) is known for all genotype configurations from Mendel’s laws, as long as the mode of inheritance is known. This set of relationships connects the carrier probability with penetrance and prevalence information that can be abstracted from the literature or estimated from cohort data, or both. This approach applies to arbitrary pedigree sizes.

BayesMendel is designed specifically to address the situation in which penetrance is incomplete and age at onset varies across individuals in both wild-type and mutants. Imagine we could observe the time to the development of a certain phenotype if there were no death or censoring. We call this the latent time to phenotype and we call its probability distribution net penetrance. One minus the Kaplan-Meier estimator is a consistent estimator of the net penetrance when the development of the phenotype is independent of other competing risks and censoring (Tsiatis 1998).

In the calculation of p(hrr), we assume that both the censoring process and deaths of causes unrelated to the syndrome is independent of the latent time to the phenotype. We also assume that censoring and deaths of other causes are non-informative, that is, the distribution of the latent time to deaths of other causes or time of censoring is the same for both wildtypes and mutation carriers. If the mutation affects deaths by other causes, then these assumptions do not hold. For details on the impact of censoring and competing risks on Mendelian models, see Katki et al. (2004).

Then p(hrr) can be written as a product of two terms: one that depends on the phenotype-specific net penetrance and another that depends on the latent time to death. Because the latter is the same for all genotypes, it cancels out in the evaluation of Expression (2) thus does not need to be considered. We code age in discrete one-year intervals. If we write the net penetrance of genotype γ by age t as F(t; γ), then the likelihood contribution of a case individual diagnosed in the age interval [t, t + 1) is proportional to f (t; γ) = F(t+1; γ)− F(t; γ), while the likelihood contribution of an asymptomatic individual of age t is proportional to 1 − F(t; γ).

Cancer Risk Prediction

Once the genotype distribution has been calculated for an asymptomatic counselee, we can predict the future risk that the counselee develops the phenotype by age t. There are two quantities that an asymptomatic counselee may be interested in. The first quantity of interest is the “net” probability that he/she develops a particular phenotype by a future age t, that is, the probability that one will develop the phenotype if death and other phenotypes are removed. The counselee may be interested in this quantity if he/she is solely concerned with reducing the net risk without considering other competing risk factors. This net probability is a weighted average of the genotype-specific net penetrances with the weights being the genotype probabilities, that is


For more realistic purposes, we also provide a second quantity, that is, the probability that the counselee will develop the specific phenotype first at age t, surviving other causes and death. We call this the “crude” probability. We consider a case where the syndrome contains two phenotypes S1 and S2, and denote the latent time to diseases by TS1 and TS2 and latent time to death of other causes TD. The calculation can be extended to syndromes containing more than two diseases. The net probability distributions of TS1, TS2 and TD are denoted by F1(t; γ0), F2(t; γ0) and FD(t; γ0). Then the “crude” genotype-specific probability of developing disease S1 at age t is


The FD(t0) in Expression (7) is the probability distribution of the latent time to dying of causes unrelated to the syndrome. The hazard corresponding to this net distribution can be derived from public domain data by the following procedure:

Let all(t) denote the cumulative mortality incidence rate (deaths of all causes) by age t in the population, and i(t) the cumulative incidence rate of death due to disease i. Then the cumulative incidence of death due to unrelated causes is D=all12. Then by assuming independence between the two competing risks, the net hazard of death from unrelated causes is equal to the cause-specific hazard λD=dDdt/(1all).

We convert the net hazard to D, then use Equation (7) to obtain the crude probability of developing disease S1.

2.2 A Simple Illustration

To illustrate the above calculations with a concrete example, we consider the case of a woman seeking counseling because of her mother’s breast cancer history. Let us assume that the woman being counseled is of Ashkenazi ethnic origin, 40 years old and cancer free. Her mother was diagnosed with breast cancer at age 40, and subsequently died at 55 without additional cancer diagnoses. To simplify the exposition, we consider a single hypothetical gene called BRCA. For this gene, we will denote the woman’s genotype with the variable γ0, where γ0 = 1 when the woman is a heterozygous carrier of any deleterious BRCA allele, γ0 = 2 when she is a homozygous carrier, and γ0 = 0 otherwise. We denote the woman’s breast cancer history with h0, and her mother’s with h1. In our example, h0 = {breast and ovarian cancer free at 40}, h1 = {breast cancer diagnosed at 40, ovarian cancer free at 55}.

Our quantity of interest is the a posteriori probability that the counselee carries at least one deleterious BRCA mutation given her family history, that is


By Equation (2),


The bottom equation assumes that homozygous and heterozygous carriers have the same penetrance, an assumption that is currently made by BayesMendel. In the numerator of Equation (8), p0 = 0) is the a priori probability that the counselee is a wild-type. In this illustration, we assume the allele frequency in the Ashkenazi Jewish population to be f = 0.013. Then p0 = 0) = (1 − f)2 = 0.974. To calculate the probability of the observed family history conditional on the woman being a noncarrier, that is p(h0,h10 = 0), we need to integrate out the genotype of the mother, denoted by γ1, as is done in Expression (3) and (4). That is,

=[p(h0|γ0=0)p(h1|γ1=0)]p(γ1=0|γ0=0) +[p(h0|γ0=0)p(h1|γ10)]p(γ10|γ0=0)

The phenotype probabilities in Expression (8) can be taken directly from the known penetrance, as follows. We denote the net probability of getting breast cancer within the age interval [t,t + 1) for a mutation carrier by fb(t; γ = 1), and for a noncarrier, or wild-type, by fb(t; γ = 0). The net cumulative probabilities are denoted by Fb(t; γ = 1) and Fb(t; γ = 0). The corresponding probabilities for ovarian cancer are denoted by fo(t; γ = 1), fo(t; γ = 0), Fo(t; γ = 1) and Fo(t; γ = 0). In this illustration, we use the default penetrance for BRCA1 in BayesMendel. Here we make two further assumptions: conditional independence between the cancer sites given genotype, and independence of prognosis and genotype, once a given cancer type is diagnosed. Using these assumptions,

p(h1|γ1)=p(breast cancer at 40, ovarian cancer free at 55|γ1) fb(40;γ1)[1Fo(55;γ1)] ={0.000540.998=0.00053,whenγ1=00.0240.817=0.020,whenγ10p(h0|γ0=0)=p(breast and ovarian cancer free at 40|γ0=0) [1Fb(40;γ0=0)][1Fo(40;γ0=0)] =0.9980.9960.996

The mother’s genotype given the daughter’s genotype can be derived by using Mendel’s Law and then applying Bayes’ Theorem. The calculations involve integrating out the genotype of the counselee’s father and leads to

p(γ1=0|γ0=0)=p(γ0=0|γ1=0)p(γ1=0)p(γ0=0|γ1=0)p(γ1=0)+p(γ0=0|γ10)p(γ10) =1f=0.987p(γ10|γ0=0)=1p(γ1=0|γ0=0)=f=0.013

Inserting the results into Expression (8) we get


Following a similar procedure, we can get the family history likelihood when the counselee is a noncarrier, that is,


Now we have all the pieces for Expression (8) and we get the a posteriori probability that the counselee is a noncarrier as


The probability that the counselee carries any deleterious BRCA mutation is then 1 − 0.78 = 0.22.

3 Software

The BayesMendel R–library defines an object–oriented environment for Mendelian risk prediction. It provides functionality to a) evaluate Expression (4) for arbitrary syndromes, b) evaluate carrier probabilities for the breast-ovarian and HNPCC syndromes according to the BRCAPRO and CRCAPRO models, and c) process and check pedigree data.

The most challenging computational aspect of the approach of Section 2 is the evaluation of expression (4). The number of terms in the summation is 3GR, where G is the number of genes and R is the number of untested relatives of the counselee plus the number of negative relatives if the genetic test has sensitivity less than one. In BayesMendel, this calculation is performed in C by a subroutine called MARGENE, which serves as the computational engine for all risk prediction models, including BRCAPRO and CRCAPRO. Details of the algorithm are given in Parmigiani, Berry and Aguilar (1998). In BayesMendel, the R function aveG serves as an interface to MARGENE. It takes as input the terms p(hrr) and p0) in expression (4). The current implementation of expression (4) in the software considers two diseases and two genes, referred to as Disease1, Disease2 and Gene1, Gene2 in the text that follows. The two genes are assumed to be in Hardy-Weinberg and linkage equilibrium in the population. The pedigree may extend to first- and second-degree relatives of the counselee.

There are three major object classes in BayesMendel: pedigree objects, penetrance objects, and prediction objects.

Pedigree Objects

A pedigree object includes the pedigree structure and phenotype information for the counselee’s family in matrix form. The definition of the variables is the same as in the original BRCAPRO and in CancerGene (Euhus 2001). The object is a matrix with one row for each family member and 12 or more columns with information of that member as shown in Table 1.

Table 1
Column Codes for Pedigree Objects

We explain how to prepare the pedigree object using the example family shown in Figure 1. Alternatively, the CancerGene package offers a user-friendly interface for forming these input files. The family is suspected to have the HNPCC syndrome.

Figure 1
An example suspected HNPCC syndrome pedigree. The arrow indicates the individual to be counseled (counselee). The counselee can be either a man or a woman. A man is indicated by a square and a woman by a circle. For each relative, cancer of the colorectum ...

Computing carrier probabilities for a different counselee within the same family requires creating a different input file. The input file corresponding to the family of Figure 1 is shown in Figure 2.

Figure 2
Pedigree information matrix corresponding to the family of Figure 1.

Let’s consider now family member 1 in detail. Family member 1 is the counselee. She was diagnosed with colorectal cancer at age 47. She is alive and 57 years old. She has not undergone genetic testing. Her tumor was tested for microsatellite stability and the result was microsatellite instable. We enter 1 in the member identifier column; we enter 1 in the relation column, using Table 2; We enter 0 in the sex column; We enter 3 in the father’s identifier number column — this will constrain us to input the father’s information in the third row; We enter 2 in the mother’s identifier number column — this will constrain us to input the mother’s information in the second row; We enter 1 in the colorectal cancer status column; We enter 0 in the endometrial cancer status column; We enter 47 in the age column for the colorectal cancer; We enter 57 — the current age — in the age column for endometrial cancer; We enter 0 in the age column which is not applicable to HNPCC syndrome; We enter 0 in the MLH1 test result column; We enter 0 in the MSH2 test result column; In the last column we enter 1 for MSI. Note that all of the counselee’s affected first-degree relatives had their tumor samples tested for microsatellite instability. The father’s tumor was tested microsatellite stable while those of the sister and mother were tested microsatellite instable. Thus we enter a 2 in the last column of the 3rd row and two 1s in the last column of the 2nd and 7th row.

Table 2
Relation codes

There are some additional rules and restrictions that one should be aware of when preparing a pedigree information matrix:


The only restrictions in the order of the family members is that the counselee’s husband, if applicable, must be entered immediately after the counselee. The same applies for the brothers’ and the sisters’ husbands. If you are entering pedigrees by hand, we suggest that you begin by creating the first three columns for all the individuals, and then create the father’s and mother’s identifiers columns.

Missing Information

In general, if information about a family member, other than the counselee, is missing entirely, the member can be omitted without affecting the calculations. In our example, we have no information about family member 6, the counselee’s brother’s wife, there are two ways to present this piece of missing information: first, we could omit the row in the family history matrix that corresponds to member 6; alternatively, we could treat that member’s breast cancer status as being left censored at age 1, thus enter a 0 in the cancer status column and enter a 1 in the age at onset column. Both presentations will result in a likelihood contribution p(hrr) of 1 for all possible genotypes γr by that member in the full likelihood as shown in Expression (4). When the breast cancer column is not 0, the age at onset of breast cancer column must be specified. The same applies to ovarian cancer.

Unaffected relatives are very important in the calculation, they provide information as long as their current age or age at death or last contact is known. Effort should be made to incorporate such information. For example, if an aunt is known to be breast cancer free until age 40, the time when she was last in touch, but nothing is known about her ovarian cancer status, then a 0 should be entered in her breast cancer status column and a 40 in her breast cancer age column, while a 0 should be entered in her ovarian cancer status column and a 1 in her ovarian cancer age column.


There is one exception to the rule above. If there is information about a niece of the counselee, it is necessary to include a record (that is a row in the matrix) for the counselee’s sibling that is a parent of the niece in question. This will come natural in most cases, but it must be done even in the case where there is no information about that sibling of the counselee, otherwise the family structure may not be unique.

Penetrance Objects

Penetrance objects include the net penetrance by age, gender, phenotype, and mutation status (wildtype at both loci, mutation on Gene1, mutation on Gene2, mutation on both).

The default penetrance objects used by the current version of brcapro are BRCApenet.nonAJ.2004 and BRCApenet.AJ.2004, containing the most up-to-date penetrance estimates based on Struewing et al. (1997), Antoniou et al. (2003) and King et al. (2003) and updated with the family data from the Cancer Genetics Network validation study (Parmigiani et al. 2004). The penetrance for the Ashkenazi Jewish(AJ) population and non-AJ population are estimated separately. Details of the penetrance estimation are discussed in Chen, Iversen Jr., Friebel, Finkelstein, Weber, Eisen, Peterson and et al. (2004).

The user can choose to use an older version of the penetrance objects: BRCApenet.nonAJ.2001 and BRCApenet.AJ.2001, which combined estimates from Ford et al. (1998) and Struewing et al. (1997) as described in Iversen et al. (2000).

In both versions of penetrances, the incidence rates for wild-types are derived by subtracting the carrier incidence times the population carrier prevalence from the (National Cancer Institute: Surveillance, Epidemiology, and End Results (SEER) Program 1997) population incidence. In the older version, the Ashkenazi Jewish population has the same carrier penetrance but a different phenocopy rate due to the significantly higher carrier prevalence compared to the non-AJ population.

The default penetrance object used by crcapro is called HNPCCpenet.2004, for the risk for colorectal and endometrial cancer of MLH1 and MSH2 by age and gender. The penetrance is derived from the literature (Lin et al. 1998, Vasen et al. 1996, Vasen et al. 2001). For noncarriers, incidence and mortality rates of colorectal and endometrial cancer by age groups are from the 1973–1995 (National Cancer Institute: Surveillance, Epidemiology, and End Results (SEER) Program 1997) Cancer Statistics Review and overall mortality rates from the function in the survival package in R (Thernau 1996).

Prediction Objects

Prediction objects include the joint probability that the counselee carries an inherited deleterious mutation on the two genes in a 3 by 3 matrix. The three rows/columns signify the three genotypes at Gene1/Gene2: homozygous carrier, heterozygous carrier and wild-type. The prediction objects also include net and crude cumulative risk of developing disease in the future if the counselee is unaffected.


List of functions

  • ReadCaGeneFam: reads an external pedigree file in CancerGene format and converts it into a BayesMendel pedigree object;
  • CheckFamStructure: checks the pedigree object for errors and corrects them when possible; prints warnings and error messages if appropriate;
  • aveG: is the R interface to the C routine MARGENE for evaluating the joint posterior probability of an individual (the counselee) carrying a phenotype-altering variant (mutation) of two autosomal dominant genes. Input are the conditional probability of phenotype(s) for each of the individual’s first and second degree relatives, the variants’ allelic frequencies, and the pedigree structure;
  • brcapro: calculates the joint probability that an individual (the counselee) carries an inherited deleterious mutation of the BRCA1 and BRCA2 breast cancer susceptibility genes. Inputs are the corresponding penetrance object, pedigree object of the family history of breast and ovarian cancers and genetic testing results on BRCA1 and/or BRCA2. It calls aveG for its core computation.
  • crcapro: similar to brcapro except the genes involved are MSH2 and MLH1, and the diseases involved are cancers at the colorectum and endometrium. Additionally, the probabilities are modified by MSI test results if provided, through individual contributions to the likelihood. Adjustments are made using literature based sensitivity and specificity (Chen, Watson and Parmigiani 2004);
  • disease.risk: takes a pedigree object, uses one of the carrier probability models (e.g.,brcapro) to determine the joint carrier probabilities and the cumulative risks of developing phenotype for the counselee (the prediction is only meaningful if the counselee is asymptomatic), and stores the results in a prediction object. The cumulative risks of developing phenotype is calculated by taking the average of genotype specific penetrances weighted by the marginal probability of each genotype as shown in Expression (5).
  • summary.disease.risk: takes a prediction object and prints to the R terminal or an external file the summary of the probability of carrying mutations in susceptibility genes and the probability of developing the corresponding phenotypes.

Software is distributed from

4 Discussion

BayesMendel is designed to provide the genetic counseling and genetic epidemiology communities with a flexible tool for genetic susceptibility prediction of hereditary syndromes. It currently includes the BRCAPRO model for the breast-ovarian cancer syndrome and the CRCAPRO model for the hereditary non-polyposis colorectal cancer syndrome, and allows for easy modification of input genetic parameters of those models. BayesMendel can also serve as a generic tool for genetic epidemiologists to flexibly implement their own Mendelian models for novel syndromes, without reprogramming complex statistical analyses and prediction tools.

BayesMendel provides the backbone of the upcoming new release of the genetic counseling package CancerGene (Euhus 2001), a user-friendly environment for cancer risk prediction in breast and colorectal cancer that is currently licensed, free of charge, to over one thousand users. In addition to Mendelian model functionality, it includes a wide range of algorithms for risk prediction, and graphical interfaces for pedigree entry and updating.

Current limitations of the BayesMendel package reside primarily in the structure of the core computing engine MARGENE, which is at the moment confined to two unlinked autosomal dominant genes, to families without half-siblings or loops, and to pedigrees including first and second degree relatives. This is adequate in a wide spectrum of genetic counseling situations, in which the pedigree information is ascertained from a single counselee and information on third degree relative is not generally highly reliable. However, in controlled research studies that collect large pedigrees, important information can be lost. The theory behind the MARGENE calculation is conceptually extendible to arbitrary pedigrees.

Several studies have evaluated the errors associated with BayesMendel (Gilpin et al. 2000, Iversen et al. 1998, Berry et al. 2002). For example, Berry et al. (2002) compared the genetic test results for deleterious mutations of BRCA1 and BRCA2 to BayesMendel predictions for 301 individuals recruited from several breast cancer clinics and by self-referral. In this group, BRCAPRO predicted an average probability of 0.29 for the 150 probands with the smallest predicted probabilities, and 0.952 for the 151 with the largest probabilities. The actual proportion of test positives in the two groups are 0.327 and 0.788. The authors have found BRCAPRO to be an adequately calibrated and to have better discrimination than its empirical counterparts. However, its ability of discriminating between a BRCA1 mutation and a BRCA2 mutation remains limited outside families with male breast cancer.

The release of BayesMendel described in this article is 1.2-1. The BayesMendel laboratory at Johns Hopkins University plans a series of upgrades over the next several years. These upgrades will also be made available at the lab’s website Penetrance and prevalence parameters for the major syndromes covered will be periodically updated to reflect major new publications. The functionality of the software will be expanded to generate prediction intervals based on a probabilistic sensitivity analysis approach, and to provide exceedance probabilities, that is the probability that a counselee’s chance of carrying a gene exceeds a given threshold. Future version of the software will also migrate towards a more object oriented structure by incorporating, in the pedigree object, information about type of syndromes, the loci and cancers potentially involved, and the corresponding penetrance, thus free-ing the users from specifying the penetrance object to use and the prediction model to call.


*The work of Sining Chen, Wenyi Wang and Giovanni Parmigiani was supported by the NCI under grants P30CA06973 (Hopkins Regional Oncology Research Center) and P50CA62924 (Hopkins GI SPORE). Authors thank David Euhus of the University of Texas, Southwestern and Fabio Marroni of the University of Pisa for useful feedback.

Publisher's Disclaimer: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher, bepress, which has been given certain exclusive rights by the author. Statistical Applications in Genetics and Molecular Biology is produced by The Berkeley Electronic Press (bepress).


  • Antoniou AC, Gayther SA, Stratton JF, Ponder BA, Easton DF. Risk models for familial ovarian and breast cancer. Genet. Epidemiol. 2000;18(2):173–190. [PubMed]
  • Antoniou A, Pharoah PDP, Narod S, Risch HA, Eyfjord JE, Hopper JL, Loman N, Olsson H, Johannsson O, Borg Å, Pasini B, Radice P, Manoukian S, Eccles DM, Tang N, Olah E, Anton-Culver H, Warner E, Lubinski J, Gronwald J, Gorski B, Tulinius H, Thorlacius S, Eerola H, Nevanlinna H, Syrjäkoski K, Kallioniemi O-P, Thompson D, Evans C, Peto J, Lalloo F, Evans DG, Easton1 DF. Average risks of breast and ovarian cancer associated with BRCA1 or BRCA2 mutations detected in case series unselected for family history: A combined analysis of 22 studies. Am. J. Hum. Genet. 2003;72:1117–1130. [PubMed]
  • Berry DA, Iversen ESJ, Gudbjartsson DF, Hiller E, Garber J, Peshkin B, Lerman C, Watson P, Lynch H, Hilsenbeck SRS, Hughes K, Parmigiani G. Validation of BRCAPRO, sensitivity of genetic testing of BRCA1 and BRCA2, and implications for the existence of other breast cancer susceptibility genes. J. Clin. Oncol. 2002;20:2701–2712. [PubMed]
  • Berry DA, Parmigiani G, Sanchez J, Schildkraut J, Winer E. Probability of carrying a mutation of breast-ovarian cancer gene BRCA1 based on family history. J Natl Cancer Inst. 1997;89:227–238. [PubMed]
  • Chen S, Iversen ES, Jr, Friebel T, Finkelstein D, Weber B, Eisen A, Peterson LE, et al. Comprehensive evaluation of breast and ovarian cancer risks associated with BRCA1 and BRCA2 mutations. manuscript
  • Chen S, Watson P, Parmigiani G. Technical report. Johns Hopkins University, Department of Biostatistics; 2004. Accuracy of MSI testing in predicting germline mutations of MSH2 and MLH1: a case study in bayesian meta-analysis of diagnostic tests without a gold standard. [PMC free article] [PubMed]
  • Elston RC, Stewart J. A general model for the genetic analysis of pedigree data. Hum. Hered. 1971;21:523–542. [PubMed]
  • Euhus DM. Understanding mathematical models for breast cancer risk assessment and counseling. Breast J. 2001;7(4):224–232. [PubMed]
  • Ford D, Easton DF, Stratton M, Narod S, Goldgar D, Devilee P, Bishop DT, et al. Genetic heterogeneity and penetrance analysis of the BRCA1 and BRCA2 genes in breast cancer families. Am. J. Hum. Genet. 1998;62:676–689. [PubMed]
  • Foulkes WD, Hodgson SV, editors. Inherited Susceptibility to Cancer: Clinical, Predictive and Ethical Perspectives. Cambridge, UK: Cambridge University Press; 1998.
  • Gilpin CA, Carson N, Hunter AG. A preliminary validation of a family history assessment form to select women at risk for breast or ovarian cancer for referral to a genetics center. Clin Genet. 2000;58(4):299–308. [PubMed]
  • Ihaka R, Gentleman R. R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics. 1996;5:299–314.
  • Iversen ES, Jr, Parmigiani G, Berry D. Validating Bayesian prediction models: a case study in genetic susceptibility to breast cancer. Case Studies In Bayesian Statistics. 1998;Vol. IV:321–338.
  • Iversen ES, Jr, Parmigiani G, Berry DA, Schildkraut J. Genetic susceptibility and survival: Application to breast cancer. Journal of the American Statistical Association. 2000;95:28–42.
  • Jimenez-Sanchez G, Childs B, Valle D. Human disease genes. Nature. 2001;409:853–855. [PubMed]
  • Katki H, Chen S, Parmigiani G. Censoring and competing risks in Mendelian mutation prediction models. manuscript. 2004
  • King MC, Marks JH, Mandell JB. New York Breast Cancer Study Group. Breast and ovarian cancer risks due to inherited mutations in BRCA1 and BRCA2. Science. 2003;302(5645):643–646. [PubMed]
  • Lin KM, Shashidharan M, Thorson AG, Ternent CA, Blatchford GF, Christensen MA, et al. Cumulative incidence of colorectal and extracolonic cancers in MLH1 and MSH2 mutation carriers of hereditary colorectal cancer. J Gastrintest Surg. 1998;2:67–71. [PubMed]
  • Lynch HT, de la Chapelle A. Genetic susceptibility to non-polyposis colorectal cancer. J Med Genet. 1999;36(11):801–818. [PMC free article] [PubMed]
  • Marroni F, Aretini P, D’Andrea E, Caligo MA, Cortesi L, Viel A, Ricevuto E, Montagna M, Cipollini G, Ferrari S, Santarosa M, Bisegna R, Bailey-Wilson JE, Bevilacqua G, Parmigiani G, Presciuttini S. Evaluation of widely used BRCA1/2-mutation-predicting models. J Med Genet. 2004;41(4):278–285. [PMC free article] [PubMed]
  • Murphy EA, Mutalik GS. The application of Bayesian methods in genetic counseling. Hum. Hered. 1969;19:126–151.
  • National Cancer Institute: Surveillance, Epidemiology, and End Results (SEER) Program. SEER homepage. 1997.
  • Newman B, Austin MA, Lee M, et al. Inheritance of human breast cancer: evidence for autosomal dominant transmission of high-risk families. Proc Natl Acad Sci USA. 1988;85:3044–3048. [PubMed]
  • Offit K, Brown K. Quantitating familial cancer risk: a resource for clinical oncologists. J Clin Oncol. 1994;12:1724–1736. [PubMed]
  • Parmigiani G, Berry DA, Aguilar O. Determining carrier probabilities for breast cancer susceptibility genes BRCA1 and BRCA2. American Journal of Human Genetics. 1998;62:145–158. [PubMed]
  • Parmigiani G, Berry D, Iversen ES, Jr, Müller P, Schildkraut J, Winer E. Modeling risk of breast cancer and decisions about genetic testing. In: Gatsonis C, et al., editors. Case Studies In Bayesian Statistics. Vol. IV. Springer; 1998. pp. 173–268.
  • Parmigiani G, Friebel T, Iversen ES, Chen S, Finkelstein D, Anton-Culver H, Ziogas A, et al. Validity of models for prediction of BRCA1 and BRCA2 mutations: the cancer genetics network experience. manuscript. 2004
  • Struewing JP, Hartge P, Wacholder S, Baker SM, Berlin M, McAdams M, Timmerman MM, Brody LC, Tucker MA. The risk of cancer associated with specific mutations of BRCA1 and BRCA2 among Ashkenazi jews. New England Journal of Medicine. 1997;336:1401. [PubMed]
  • Szolovits P, Pauker S. In: MEDINFO-92 Proceedings of the Seventh Conference on Medical Informatics. Lun KC, Degoulet P, Piemme TE, Rienhoff O, editors. New York: Elsevier; 1992. pp. 679–683.
  • Thernau TM. A package for survival analysis in S. Rochester, MN: Mayo Foundation; 1996.
  • Tsiatis A. Encyclopedia of Biostatistics. New York: John Wiley and Sons; 1998. pp. 824–834.
  • Vasen HFA, Wijnen JT, Menko FH, Kleibeuker J, Taal B, Griffioen G, Nagengast F, Meijers-Heijboer E, Bertario L, Varesco L, Bisgaard M-L, Mohr J, Fodde R, Khan P. Cancer risk in families with hereditary nonpolyposis colorectal cancer diagnosed by mutation analysis. Gastroenterology. 1996;110:1020–1027. [PubMed]
  • Vasen HF, Stormorken A, Menko FH, Nagengast F, Kleibeuker JH, Griffioen G, Taal BG, Moller P, Wijnen JT. MSH2 mutation carriers are at higher risk of cancer than MLH1 mutation carriers: a study of hereditary nonpolyposis colorectal cancer families. J Clin Oncol. 2001;19(20):4074–4080. [PubMed]
  • Vogelstein B, Kinzler K. The genetic basis of human cancer. New York: McGraw-Hill; 1998.
  • Weber BL. Update on breast cancer susceptibility genes. ASCO Educational Book. 1998
  • Weiss KM. Genetic Variation and Human Disease. Cambridge: Cambridge University Press; 1993.