|Home | About | Journals | Submit | Contact Us | Français|
Mutations in known breast cancer susceptibility genes account for a minority of the familial aggregation of the disease. To search for further breast cancer susceptibility genes, we performed a combined analysis of four genome-wide linkage screens, which included a total of 149 multiple case breast cancer families. All families included at least three cases of breast cancer diagnosed below age 60 years, at least one of whom had been tested and found not to carry a BRCA1 or BRCA2 mutation. Evidence for linkage was assessed using parametric linkage analysis, assuming both a dominant and a recessive mode of inheritance, and using nonparametric methods. The highest LOD score obtained in any analysis of the combined data was 1.80 under the dominant model, in a region on chromosome 4 close to marker D4S392. Three further LOD scores over 1 were identified in the parametric analyses and two in the nonparametric analyses. A maximum LOD score of 2.40 was found on chromosome arm 2p in families with four or more cases of breast cancer diagnosed below age 50 years. The number of linkage peaks did not differ from the number expected by chance. These results suggest regions that may harbor novel breast cancer susceptibility genes. They also indicate that no single gene is likely to account for a large fraction of the familial aggregation of breast cancer that is not due to mutations in BRCA1 or BRCA2.
Breast cancer aggregates in families, with the disease being approximately twice as common in the first-degree relatives of cases as in the general population (Collaborative Group on Hormonal Factors in Breast Cancer, 2001). The higher risk to monozygotic twins of breast cancer cases than to dyzygotic twins of cases suggests that most of this familial clustering is likely to have a genetic basis (Peto and Mack, 2000). However, although several important breast cancer susceptibility genes have now been identified, most of the familial aggregation of breast cancer remains unexplained.
In the 1990s, two important breast cancer susceptibility genes, BRCA1 (MIM 113705) and BRCA2 (MIM 600185), were identified by linkage studies in multiple case families (Miki et al., 1994; Wooster et al., 1995). Germline mutations in these genes confer high lifetime risks of breast cancer and ovarian cancer, together with smaller risks of some other cancer types (Antoniou et al., 2003; Thompson and Easton, 2004). Mutations in these genes are common in families with multiple cases of breast or ovarian cancer, and are present in most families with at least six or more cases (Ford et al., 1998). Population-based studies have estimated that BRCA1 and BRCA2 mutations account for ~15% of the excess familial risk of breast cancer (Peto et al., 1999; Anglian Breast Study, 2000; Dite et al., 2003). Mutations in two other genes, TP53 and PTEN, also confer high risks of breast cancer, but only in the context of rare syndromes. Mutations in the ATM and CHEK2 genes confer more moderate (approximately twofold) risks of breast cancer (CHEK2 Case-Control Consortium, 2004; Thompson et al., 2005), although some mutations in ATM may confer higher risks. In total, the known susceptibility genes have been estimated to account for no more than 25% of the familial aggregation of breast cancer (Easton, 1999), suggesting strongly that other susceptibility genes remain to be identified.
BRCA1 or BRCA2 mutations are found in the majority of families with six or more cases of breast cancer cases consistent with dominant inheritance (Ford et al., 1998). This suggests strongly that further susceptibility genes are likely to confer smaller risks than BRCA1 and BRCA2 mutations, but the number and characteristics of such genes remains unknown. One model, suggested by a recent segregation analysis (Antoniou et al., 2004), proposes that there are a large number of such genes, each conferring only small risks of the disease. If true, such loci could not be identified through linkage studies. However, it is also possible that there are further loci conferring more substantial risks that could be detected by linkage. To evaluate this possibility, we have conducted a genome-wide linkage analysis in multiple case breast cancer families that are unlikely to be segregating BRCA1 or BRCA2 mutations.
As a basis for this linkage study, we sought to identify informative families with a low probability that they contained mutations in BRCA1 or BRCA2. Families were collected independently by four groups, principally through family cancer clinics or epidemiological studies of breast cancer. All families were of Caucasian ancestry. The recruitment of the families used in the study took place over the last 15 years, but all families were regularly updated with regard to their cancer status. All groups obtained appropriate Institutional Review Board approvals. Specific sources of recruitment were as follows:
Families were identified through the Kathleen Cuningham Foundation Consortium for Research into Familial Breast Cancer (kConFaB), which is a national multidisciplinary consortium for research on familial breast cancer (GJ Mann, unpublished). Several families were initially ascertained through the Australian Breast Cancer Family Registry (ABCFS); these kindreds were recruited as part of a population-based case-control-family study and all were recruited via a diagnosis of breast cancer in the proband under the age of 40 years (Hopper et al., 1999). IARC: Families were ascertained by a collaborative group of investigators from the USA, Canada, Australia, and France. Netherlands: The Dutch families were ascertained through the Clinical Genetic Centers in Leiden and Rotterdam, and through the Netherlands Foundation for the Detection of Hereditary Tumors (STOET). United Kingdom: All but 17 of the families were ascertained through clinical genetics centers in the United Kingdom. Two families were initially ascertained in the Netherlands, six from centers in the USA, and nine from Heidelberg, Germany.
Initially, all families had to satisfy the following criteria: (1) at least three women diagnosed with breast cancer below age 60 years, all of whom were related such that they could share a single allele identically by descent, (2) no case of ovarian cancer or male breast cancer in a blood relative (since these phenotypes are strongly predictive of the presence of BRCA1 or BRCA2 mutation), and (3) DNA samples available for genotyping from at least three women affected with breast cancer, or from children of affected women such that the genotypes of at least three affected women might be inferred (in the latter case, at least two children of an affected women needed to be available). In addition, to minimize the probability that the family segregated a BRCA1 or BRCA2 mutation, DNA from at least one affected individual was screened for mutations across both genes, by a method that examined the entire coding sequence and splice junctions. Whenever possible, for families with five or more cases of breast cancer, a second affected individual was screened. Subsequently, we collected detailed information on the method of mutation screening for each family, as well as genotype data on at least three microsatellite markers flanking the BRCA1 and BRCA2 loci. Families with insufficient mutation screening (14 families) or linkage data (a further 6 families) were not included in further analyses. Finally, we estimated the residual probability that the index-affected individual carried a BRCA1 and or BRCA2 mutation, based on the assumed mutation detection sensitivity, the family history and linkage data at BRCA1 and BRCA2 (see statistical methods). Thirteen families, where this probability exceeded 15%, were excluded from all analyses presented here. Characteristics of the 149 families included in the analysis are summarized in Table 1.
To evaluate linkage to BRCA1 and BRCA2, the following markers were used in various combinations in the four family sets: D17S800, D17S855, D17S951, D17S1322, D17S250 (for BRCA1); D13S260, D13S171, D13S1700, D13S267 (for BRCA2). At least three markers were analyzed at each locus in each family.
The entire coding sequences of BRCA1 and BRCA2 in each family were screened for mutations using several methods at the different centers. These include conformation sensitive gel electrophoresis, single strand conformational analysis, protein truncation test, DNA sequencing, and denaturing gradient gel electrophoresis. All of the Netherlands and United Kingdom and three of the IARC families were additionally screened for large deletions and insertions using deletion junction-PCR, multiplex ligation probe amplification (MLPA), or Southern analysis.
For the genome-wide linkage search, the Applied Biosystems Linkage Mapping Set MD10 was analyzed on ABI 3700 DNA sequencers, either on contract at the Australian Genome Research Facility (Australian families) or at the Wellcome Trust Sanger Institute (IARC, Netherlands and United Kingdom families). Genotypes were called automatically using Genotyper or Genemapper software and were then checked manually by at least one individual. Additional markers were used to investigate potential regions of interest in subgroups of the family set.
To compute the residual probability that the index case carried a BRCA1 and BRCA2 mutation, we first used the Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm (BOADICEA) model (Antoniou et al., 2004) to calculate carrier probabilities based on the pedigree and the mutation testing that had been performed. This model allows for the effects of BRCA1 and BRCA2 and the combined effects of other genes in a polygenic component, and is implemented in MENDEL (Lange et al., 1988). For this purpose, the sensitivity of mutation screening was assumed to be 70% for BRCA1 and 80% for BRCA2 (mutation sensitivities estimated from linked families in the Breast Cancer Linkage Consortium dataset; D.Easton, unpublished data). For samples that had been fully screened for large-scale rearrangements by MLPA, the BRCA1 sensitivity was assumed to be 80%. The carrier probabilities were then adjusted to allow for linkage data at the BRCA1 and BRCA2 loci. Multipoint LOD scores were computed using Fastlink (Cottingham et al., 1993), based on at least three markers tightly linked to each locus. The residual BRCA1 carrier probability was then given by
and similarly for BRCA2, where p1 and p2 are the BRCA1 and BRCA2 probabilities generated by BOADICEA and LOD1 and LOD2 are the multipoint LOD scores at the BRCA1 and BRCA2 loci.
To conduct a combined linkage analysis, we first constructed a single linkage map incorporating all markers typed at any center. This map was based on the sex-averaged linkage map generated by deCODE (Kong et al., 2002). For markers that were not present on the deCODE map, we interpolated their position between flanking markers, either using estimates from other linkage maps or based on their physical position in the human genome sequence relative to flanking markers. Allele frequencies for each marker were estimated by averaging over all typed individuals, separately for each center.
Evidence for linkage was assessed using both parametric and nonparametric (allele sharing approach) analyses. For the parametric analysis, we first assumed a model in which susceptibility to breast cancer is conferred by a dominant susceptibility allele with population frequency 0.003 that confers a cumulative breast cancer risk of 80% by age 80, when compared with 8% in noncarriers. This model is based on that derived from the segregation analysis of Claus et al., (1981) and has been used in most previous breast cancer linkage analyses. As in previous analyses, risks were modeled in seven age-categories (<30, 30-39, 40-49, 50-59, 60-69, 70-79, 80+) and implemented by using 14 liability classes, with separate classes for affected and unaffected individuals (Easton et al., 1993). Since this model would have reduced power to detect a recessive susceptibility allele, we also analyzed the data under a recessive model. Under this model, the risks to carriers and noncarriers were identical to those under the dominant model, but the allele frequency was assumed to be 0.08. All analyses were carried out in the program Genehunter (Kruglyak et al., 1996), except for one large family (EUR60) that could not be run because the number of individuals exceeded the limits of the program, and where pruning the family would have lost a significant amount of information. For this family, analyses were run in Vitesse (O’Connell and Weeks, 1995) for autosomes and Fastlink (Cottingham et al., 1993) for the X chromosome. This family separates into two distantly related branches, and these were treated as two distinct families (EUR60a and EUR60b) in the analysis.
For the Genehunter analyses, multipoint LOD scores were calculated for locations at 1 cM intervals along each chromosome, using all markers for that chromosome. For the Vitesse and Fastlink analyses, multipoint LOD scores based on every pair of adjacent markers and the disease locus were calculated. The LOD score for each family at each position was based on an average of the LOD scores from all analyses relevant to that position. The multipoint LOD scores for each family at each position were then used to generate heterogeneity LOD scores (HLODs) based on the standard admixture model under which a certain proportion of families (a) are assumed to be segregating a susceptibility allele at that locus (Ott, 1983).
The nonparametric (allele sharing) analyses were conducted using the program Genehunter-plus (Kong and Cox, 1997), using the “all” scoring function (Whittemore and Halpern, 1994). Analyses were conducted separately for each of the four centers and the results files combined. Nonparametric LOD scores were then generated using the program ASM, using the exponential scoring option and equal weighting of families.
Table 2 summarizes all linkage peaks with LOD scores greater than 1 in the combined dataset, for the whole family set and for analyses restricted to families with four or more breast cancer cases diagnosed below age 50 years. Tables Tables33--55 give the highest LOD scores for each chromosome by group, for each of the three analyses. Figure 1 gives the maximum scores for all chromosomal locations for the three analyses in the combined dataset.
In the parametric analysis under the dominant model, the highest HLOD was 1.80 on chromosome arm 4q, close to D4S392. Positive scores were obtained at this location in the Australian, IARC, and United Kingdom series, but not in the Dutch dataset. Two other HLODs over 1 were found, on 2p (1.20, close to marker D2S2211) and on chromosome 22 (1.15, between D22S278 and D22S283). The latter result is predominantly due to a single family, EUR60, which includes 18 breast cancer cases and is the most informative family in the dataset. One branch of this family (EUR60b) generates a LOD score of 2.62. Seven women with breast cancer in this family, all belonging to branch EUR60b, carry the CHEK2 1100delC variant (Meijers-Heijboer et al., 2002). When both branches of this family were removed, the maximum HLOD on chromosome 22 reduced to 0.06.
When analyses were restricted to families with at least four cases of breast cancer diagnosed below age 50, the maximum HLOD on 2p rose to 2.38. HLODs over 1 in this subset were also found on chromosomes 4 and 22 close to the peaks in the overall analysis, and a further peak on chromosome 10 (HLOD 1.12) was also identified.
In addition to the aforementioned loci, an HLOD of 2.06 was found on 20p (at 8 cM) in the United Kingdom family set. There was, however, no evidence of linkage in the families from the other groups. LOD scores greater than 1.5 in individual families are summarized in Table 6. Of the eight scores, three contribute to the linkage peaks on chromosomes 2 and 4 found in the overall dataset. In addition two families showed linkage on chromosome 11. These peaks were however separated by over 40 cM and there was no evidence of linkage to this region in the overall analysis.
In analyses under a recessive model, only one locus reached a HLOD of greater than 1 (1.04 on 5q). In the nonparametric analysis, the highest peak was on chromosome 14 (LOD 1.56 at position 43). The only other LOD over 1 was on chromosome 2 (LOD 1.10, position 16), almost coincident with the peak in the analysis under the dominant model.
The analyses of 149 families reported here represented by far the largest genome-wide linkage screen for breast cancer susceptibility loci. The only other report since the identification of BRCA1 and BRCA2 was that by Huusko et al., (2004), who studied 14 BRCA1/2 negative breast cancer families from Finland. Other reports have examined specific loci on chromosome arms 6q, 8p, and 13q (Zuppan et al., 1991; Kerangueven et al., 1995; Seitz et al., 1997; Kainu et al., 2000; Rahman et al., 2000; Thompson et al., 2002).
The rationale for the genome-wide linkage searches is that there exist further breast cancer genes in which alleles confer high risks. The pattern of familial risks indicates that such alleles are likely to be dominant, and we therefore considered the parametric analysis assuming a dominant model to be the primary analysis. To provide some protection against model misspecification, we also conducted analyses under a recessive model and using an allele sharing approach. These approaches, however, identified no further strong linkage signals.
Under the dominant model, we found three regions with HLODs in excess of 1, but none with HLODs over 2. Of these linkage peaks, one on chromosome 22 is explained entirely by a single family (EUR60). This family is the most informative in the study, containing 18 breast cancer cases. Seven cases of breast cancer have been shown to carry the CHEK2 variant 1100delC (Miejers-Heijboer et al., 2002). Since CHEK2 is located on chromosome 22, one might hypothesize that the linkage signal is a reflection of the segregation of this variant. However, the breast cancer risk conferred by CHEK2 1100delC is only twofold, and this would not be expected to generate strong linkage evidence. Furthermore, the LOD score in the larger branch of EUR60 at CHEK2 itself is only 0.3. Thus, it remains unclear whether the linkage signal on chromosome 22 reflects the effect of CHEK2 1100delC together with chance segregation, or whether there is an additional susceptibility allele segregating in this family. If the latter is true, given the lack of any linkage evidence from other families, susceptibility alleles at this other locus must be rare.
The strongest linkage signal in our set was found on the short arm of chromosome 4. This score was also, in part, due to EUR60 (LOD score 1.91 in the larger branch), although some evidence of linkage remained when EUR60 was excluded. The third linkage peak was on 2p (HLOD 1.2). This evidence increased (HLOD 2.4) when analyses were restricted to families with at least four cases of breast cancer diagnosed below age 50 years.
Huusko et al. (2004) reported evidence for linkage to markers on 2q32 in 14 Finnish breast cancer families, with a maximum LOD score of 3.20 close to D2S2262. We found no evidence of linkage in this region (maximum HLOD under the dominant model 0.0, α = 0.0; nonparametric LOD = 0.05). Huusko et al. (2004) found one other LOD score over 1 under a dominant model, at D9S283 (1.12). Again, we found no evidence for linkage in this region. Similarly, Zuppan et al. (1991) found evidence of linkage to the estrogen receptor gene on 6q in two families. In our study, we found no evidence of linkage to this region (HLOD = 0 for both the dominant and recessive models).
Theoretical calculations indicate that, for a fully informative marker map, the expected number of regions with LOD scores of greater than 1 and 1.5 will be ~5 and 2, respectively (Lander and Kruglyak, 1995). These predictions are not strictly comparable to our analyses, since our marker sets are not fully informative. Nevertheless, they indicate that the number of linkage peaks is not clearly in excess of the number that might be expected by chance and, therefore, that the observed peaks may reflect the play of chance rather than true susceptibility loci.
Under the admixture model, the estimated proportion of families linked to the loci are 0.18, 0.18, and 0.06 for chromosomes 2, 4, and 22, respectively. Such estimates can be misleading, since they are highly dependent on the genetic model that is assumed, and the true model is unknown. However, they indicate that, even if one or more of these linkage peaks is ultimately shown to harbor a true susceptibility locus, its contribution to the familial aggregation of breast cancer is likely to be modest. Moreover, under the assumed parametric dominant model, 87% of the genome achieved an HLOD of -1 or lower if the proportion of linked families was assumed to be 0.3, and 66% of the genome achieved an HLOD < -2, indicating that such a locus was unlikely to have been missed elsewhere in the genome.
The failure to detect strong linkage signals might reflect extensive locus heterogeneity, whereby the disease is only linked to a particular locus in a small proportion of families. Under this scenario, greater power might be achievable by considering subsets of families from more homogeneous populations where genetic heterogeneity might be reduced. We were able to examine this to a limited extent by performing separate analyses of the families in each of the four study sets. Since the Australian families were largely of British and Irish origin, these two groups might be considered comparable. The Dutch population exhibits distinct founder mutations for many diseases and this group is, to an extent, genetically distinct, while the IARC families originated from many sources and are genetically heterogeneous. In the event, no strong linkage signals were observed either in the Dutch set or in the combined United Kingdom/Australian set. In particular, the linkage peaks identified in family EUR60 were not supported by linkage evidence in other Dutch families. The linkage peak on chromosome 2 did, however, become somewhat stronger when the Dutch families were excluded.
The failure to detect strong evidence for linkage may also reflect disease heterogeneity. Recent studies have demonstrated that breast tumors can be categorized into groups on the basis of CGH profiles and expression patterns, and that these patterns differ between BRCA1, BRCA2, and non-BRCA1/2 familial breast cancer (Hedenfalk et al., 2001, 2003; Gronwald et al., 2005; Macguire et al., 2005). These observations raise the possibility that mutations in other breast cancer susceptibility genes are associated with distinct tumor profiles. If so, incorporating tumor characteristics into the analyses could identify linkage signals that are not evident using breast cancer as a whole as the disease end point.
The positive signals found in this study indicate the most promising locations for further high-risk susceptibility genes, and would be worth following up in further families. Our results also indicate, however, that many genes are likely to be involved in breast cancer predisposition, with no gene accounting for a large fraction of the familial aggregation, and that alternative strategies will probably be necessary to identify them.
The authors would like to thank all of the families for their participation in this study. For the Australian study, the authors would like to thank the kConFab research nurses and staff for data collection, Heather Thorne, Lynda Williams, and Dani Surace for DNA preparation, Jan Groves for establishment of the LCLs, Eveline Niedermayr and Sandra Picken for supplying data, the staff of the Familial Cancer Clinics for their support of kConFab. B.A. Oostra helped out with the second phase genotyping of the Dutch study. The Breast Cancer Susceptibility Collaboration (United Kingdom) consists of the following contributors: A. Ardern-Jones, J. Berg, A. Brady, C. Brewer, G. Brice, B. Bullman, R. Cetnarsryj, C. Chapman, C. Chu, N. Coates, T. Cole, R. Davidson, A. Donaldson, H. Dorkins, F. Douglas, D. Eccles, R. Eeles, F. Elmslie, D.G. Evans, S. Goff, D. Goudie, J. Gray, L. Greenhalgh, H. Gregory, N. Haites, S.V. Hodgson, T. Homfray, R.S. Houlston, L. Izatt, L. Jeffers, V. Johnson-Roffey, F. Lalloo, M. Longmuir, J. Mackay, A. Magee, S. Mansour, Dr. Zosia Miedzybrodzka, J. Miller, P. Morrison, V. Murday, J. Paterson, M. Porteous, N. Rahman, K. Redman, M. Rogers, S. Rowe, A. Saggar, A. Schofield, L. Side, M. Steel. Anita Hall and Elizabeth Mackie supported the genotyping of the United Kingdom set. Families from the IARC group were contributed in part through the Breast Cancer Family Registry (BCFR), which is supported through collaborative agreements with The University of Melbourne, Cancer Care Ontario, the Huntsman Cancer Institute and Columbia University. DFE is a Principal Research Fellow of Cancer Research UK.
Supported by: Cancer Research UK, Canadian Institute of Health Research (INHERIT program), the Kathleen Cuningham Foundation, National Breast Cancer Foundation, National Health and Medical Research Council (NHMRC), Cancer Council of Victoria, Cancer Council of South Australia, Queensland Cancer Fund, Cancer Council of New South Wales, Cancer Foundation of Western Australia, Cancer Council of Tasmania, Victorian Health Promotion Foundation, the NSW Cancer Council, the Breast Cancer Research Foundation (BCRF), the Dutch Cancer Society, grant number: RUL1999-2021; the National Cancer Institute, grant number: RFA #CA-95-003.