|Home | About | Journals | Submit | Contact Us | Français|
US Latinas have a lower incidence of breast cancer compared to non-Latina White women. This difference is partially explained by differences in the prevalence of known risk factors. Genetic factors may also contribute to this difference in incidence. Latinas are an admixed population with most of their genetic ancestry from Europeans and Indigenous Americans. We used genetic markers to estimate the ancestry of Latina breast cancer cases and controls and assessed the association with genetic ancestry, adjusting for reproductive and other risk factors. We typed a set of 106 ancestry informative markers (AIMs) in 440 Latina women with breast cancer and 597 Latina controls from the San Francisco (SF) Bay area and estimated genetic ancestry using a maximum likelihood method. Odds ratios (OR) and 95% confidence intervals (CI) for ancestry modeled as a continuous variable were estimated using logistic regression with known risk factors included as covariates. Higher European ancestry was associated with increased breast cancer risk. The odds ratio for a 25% increase in European ancestry was 1.79 (95% CI: 1.28–2.79, p<0.001). When known risk factors and place of birth were adjusted for, the association with European ancestry was attenuated but remained statistically significant (OR=1.39; 95% CI: 1.06–2.11, p=0.013). Further work is needed to determine if the association is due to genetic differences between populations or possibly due to environmental factors not measured.
Breast cancer incidence varies across populations in the US. Data from the Surveillance, Epidemiology and End Results program show that the age-adjusted incidence (per 100,000) of breast cancer (from 1998 to 2002) is highest in White women (141.0), followed by African American (119.4), Asian American (96.6), and Latina (89.9) women, with the lowest incidence in Indigenous American women (54.8) 1. Variation in exposure to known risk factors may explain some of these differences in incidence 2–5 but not all 6–8. The residual difference among populations may be due to incomplete assessment of known risk factors or to risk factors not yet identified. It could also be partly due to differences between populations in the allele frequencies of predisposing genetic variants.
Women of mixed descent, like US Latinas, present both a challenge and a unique opportunity in genetic association studies 9–11. On one hand, studies in Latinos may be confounded due to the potentially underlying dissimilarity between cases and controls in terms of genetic ancestry 12, 13. On the other hand, populations of mixed ancestry provide an opportunity for examining the role of genetic and environmental factors in explaining observed differences in incidence between populations, and eventually for locating alleles that contribute to dissimilarities in disease risk. This can be achieved by means of admixture mapping, an approach that is based on the idea that if a marker increases the risk of disease and is found at a much higher frequency in one population, then that marker will also be found more commonly among cases and will be strongly associated with other ancestry specific markers across large stretches of the genome 14. Breast cancer among Latinas presents a particularly interesting case since the main ancestral components of the Latino population (European and Indigenous American) have the highest and lowest breast cancer incidence 1.
We have previously investigated the association between genetic ancestry and breast cancer risk factors among Latinas in the San Francisco Bay Area using 44 ancestry informative markers 7. Here we use DNA samples from our previous study (167 cases and 286 controls) and DNA samples for an additional 273 cases and 311 controls to test the association between breast cancer risk and genetic ancestry among Latinas. We used 106 ancestry informative markers (AIMs) to determine the genetic ancestry in all of the women and compared ancestry between cases and controls, adjusting for known breast cancer risk factors in an effort to identify a genetic ancestry component to breast cancer risk. We also investigated the use of genetic ancestry as a covariate in genetic association studies for breast cancer among Latinas.
Analyses were performed using DNA and data from two population-based studies conducted in the San Francisco Bay Area: a case-control study of breast cancer and a family registry for breast cancer.
The San Francisco Bay Area Breast Cancer Study, described elsewhere 8, 15, is a multiethnic population-based case-control study of breast cancer initiated in 1995, and with biospecimen collection added for cases diagnosed between April 1, 1997 and April 30, 2002 and matching controls. Depending on the study protocol, study participants were invited to provide a blood or buccal sample. Women aged 35–79 years residing in San Francisco, San Mateo, Alameda, Contra Costa, or Santa Clara counties and newly diagnosed with a first primary invasive breast cancer were identified through the Greater Bay Area Cancer Registry which ascertains all incident cancers as part of the Surveillance, Epidemiology and End Results (SEER) program and the California Cancer Registry. A brief telephone screening interview that assessed study eligibility and self-reported race/ethnicity (89% response among those contacted) identified 873 eligible Latina cases. Of these, 798 (91%) completed an in-person interview and 747 (86%) provided a biospecimen sample. Control women aged 35–79 years residing in the same 5 Bay Area counties were ascertained by random digit dialing (RDD). They were frequency-matched to cases by race/ethnicity and expected 5-year age group. The telephone screening interview, completed by 93% of women selected as controls, identified 1,126 eligible Latina controls without a personal history of breast cancer. Of these, 999 (89%) completed the in-person interview and 911 (81%) provided a biospecimen sample.
The present analysis includes only cases and controls who donated a blood sample. Sixty-three of the cases that participated in the current case-control study also participated in the Northern California site of the Breast Cancer Family Registry 16 and donated a blood sample as part of that study, which was obtained for this analysis.
The total number of blood samples available for the study was 503 cases and 679 controls. Individuals that did not provide information about country of birth (n=9) or that were born in Europe (n=6), Hawaii (n=2), Philippines (n=1) or in a country that was represented only by one individual (Brazil, Dominican Republic) were excluded from the present analysis (11 cases and 9 controls). The final number of samples genotyped was 492 cases and 670 controls.
All participants provided written informed consent and the research protocols were approved by the respective Institutional Review Boards at UCSF and the Northern California Cancer Center.
Data on age, demographic background (education in years, country of birth, age at migration if not US born, country of birth of parents and grandparents), and known or suspected breast cancer risk factors (age at menarche, parity, age at first full term pregnancy, breast feeding, use of oral contraceptives, use of hormone replacement therapy, daily alcohol intake, family history of breast cancer and benign breast disease) were collected by in-person interview using a structured questionnaire (7). Dietary intake during the reference year (defined as the year before diagnosis for cases and the year before selection into the study for controls) was assessed using a modified version of the Block Food Frequency Questionnaire. Standing height and weight were measured by the interviewers. Body mass index was calculated as measured weight (kg) divided by measured height (m) squared. For participants (13 cases and 21 controls) who declined the measurements, the body mass index was based on self-reported height and weight during the reference year.
Tumor grade, stage, histological type and hormone receptor status were obtained from the SEER Cancer Registry records. Estrogen and progesterone receptor status were dichotomized (positive, negative) based on categories reported in pathology records. Information on human epidermal growth factor receptor 2 (Her2) status was not routinely obtained by the cancer registry for cases diagnosed prior to 2002. Therefore, we did not include Her2 status in the present analysis.
A set of 106 SNPs that can separate Indigenous American, African and European ancestry was used to estimate proportion of genetic ancestry in the sample of US Latinas. Simulation studies have demonstrated that ~100 ancestry informative markers with allele frequency differences similar to the ones we used, are required to achieve a correlation coefficient of >0.9 with true ancestry 17; thus, we genotyped 112 markers with the goal of successfully typing >100 markers. The ancestry informative markers (AIMs) used in this study were bi-allelic SNPs selected from the Affymetrix 100K SNP chip (Affymetrix, Santa Clara, CA). AIM selection was based on calculations of allele frequency differences between Europeans, West Africans and Indigenous Americans. The SNPs chosen maximize information for more than one ancestral population pairing, with a large difference in allele frequency between ancestral populations (>0.5). The AIMs are widely spaced throughout the genome and have a well-balanced distribution across all 22 autosomal chromosomes. The average distance between markers is about 2.4 × 107 bp. The parental population samples that were genotyped on the Affymetrix 100K SNP chip included 42 Europeans (Coriell’s North American Caucasian panel), 37 West Africans (non-admixed Africans living in London, UK and South Carolina) and 30 Indigenous Americans (15 Mayans and 15 Nahuas). (More detailed information on the AIMs is available from the authors upon request).
Genotyping of the 106 AIMs was performed by Dr. Kenneth Beckman at the Children’s Hospital Oakland Research Institute. Quality control was performed on all DNA using a two-part procedure. Quantitative quality control (part 1) involved non-allelic quantitative real-time PCR using a single Taqman probe, in order to ensure amplifiability of DNA samples. Qualitative quality control (part 2) involved genotyping using a balanced polymorphism present in most human populations (rs3818), in order to ensure that cross-contamination of samples has not occurred. Genotyping was performed using iPLEX reagents and protocols for multiplex PCR, single base primer extension and generation of mass spectra, as per the manufacturer’s instructions (for complete details see iPLEX Application Note, Sequenom, San Diego). It involved four multiplexed assays containing 29, 29, 28, and 26 SNPs, respectively, for a total of 112 candidate AIMs. Of these 112 markers, 106 robustly generated call rates at 90% of samples or higher, with typical call rates in excess of 99% of samples. Only those 106 markers were used in the study. Multiplexed PCR was performed in 5-µl reactions on 384-well plates containing 5 ng of genomic DNA. Reactions contained 0.5 U HotStar Taq polymerase (QIAGEN), 100 nM primers, 1.25X HotStar Taq buffer, 1.625 mM MgCl2, and 500 µM dNTPs. Following enzyme activation at 94 °C for 15 min, DNA was amplified with 45 cycles of 94 °C × 20 sec, 56 °C × 30 sec, 72 °C × 1 min, followed by a 3-min extension at 72 °C. Unincorporated dNTPs were removed using shrimp alkaline phosphatase (0.3 U, Sequenom). Single-base extension was carried out by addition of single base primers at concentrations from 0.625 µM (low MW primers) to 1.25 µM (high MW primers) using iPLEX enzyme and buffers (Sequenom, San Diego) in 9-µl reactions. Reactions were desalted and single base primer products measured using the MassARRAY Compact system, and mass spectra analyzed using TYPER software (Sequenom, San Diego), in order to generate genotype calls and allele frequencies.
There was insufficient DNA available from 574 individuals in the study. Therefore DNA from these samples was amplified using a commercially available whole genome amplification (WGA) kit (Qiagen REPLI-g Midi Kit). From the original set of samples that went through amplification, 92 yielded low quality DNA and were excluded from the genotyping phase. A total of 1,070 samples (462 cases and 608 controls) were genotyped. Quality control measures were high for the WGA samples and the non-amplified ones. For WGA samples the average AIM success rate was 98.5%, compared to 99% for the non-amplified samples. The average sample call rate was 95.6% for the WGA samples and 97.4% for the non-amplified samples. Samples with call rate smaller than 75% were excluded from the analysis (22 cases and 11 controls).
Three of the AIMs deviated significantly from Hardy-Weinberg equilibrium (p<0.0005), all of them showing excess homozygosity, which is expected in the presence of population substructure 18.
Genotypes and phenotype information was available for a total of 1,037 individuals (440 cases and 597 controls).
Estimates of each individual’s genetic ancestry were derived using a maximum likelihood approach 19, 20. The maximum likelihood model infers ancestry of each individual as a function of the probability of the genotypes observed at each locus based on the ancestral allele frequencies (Java script available from the authors upon request). We used t tests (for continuous variables) and Fisher’s exact tests for two by two frequency tables (for categorical variables) to determine if there were significant differences in characteristics between cases and controls. Mean genetic ancestry was estimated as the average of the individual genetic ancestry estimates within a group.
Associations between breast cancer risk and genetic ancestry were assessed using logistic regression models. Genetic ancestry was modeled as a continuous variable (with each unit change representing a 25% increase in European or African ancestry). The multivariate adjusted models included European ancestry, age (continuous), family history of breast cancer in first-degree relatives (yes, no), place of birth (US-born, foreign-born), personal history of benign breast disease (yes, no), age at menarche, number of full-term pregnancies, months of breast feeding per child, use of hormone replacement therapy (yes, no), daily alcohol intake (≤10 grams vs >10 grams) and daily caloric intake (log transformed) during the reference year, and education (elementary school, middle school, high school and college). Individuals with missing data were dropped from the multivariate analysis (32 cases and 25 controls). We evaluated models including both European and African ancestry (continuous) and using parent/grandparent European origin instead of genetic ancestry. The association with each AIM was evaluated with a logistic regression model with and without inclusion of genetic ancestry as a covariate in order to compare the distribution of z statistics before and after correction for population substructure.
Characteristics of breast cancer cases and controls are presented in table 1. Cases had a mean age of 55 years at diagnosis, which was not significantly different from that of controls. In bivariate analyses, cases had significantly more full-term pregnancies than controls, were less likely to breast feed, and were more likely to report a personal history of benign breast disease, a family history of breast cancer, earlier menarche, higher alcohol intake and higher daily caloric intake. Cases also reported a significantly higher level of education and were more likely to have been born in the US. They had more European and less Indigenous American ancestry than controls. There were no significant differences between cases and controls in use of hormone replacement therapy or oral contraceptives, age at first full-term pregnancy and body mass index.
In unadjusted models, we found a strong association between genetic ancestry (continuous) and breast cancer risk. Higher European ancestry was associated with increased risk, with an OR of 1.79 (95%CI: 1.28–2.79, P<0.001) for every 25% increase in European ancestry. When known risk factors and place of birth were adjusted for (Table 2), the association with European ancestry was somewhat attenuated but remained statistically significant (OR=1.39; 95% CI: 1.06–2.11, P=0.013). When African ancestry was included in the adjusted model, the association with European ancestry became stronger (OR for European ancestry=1.54; 95% CI: 1.11–2.52, P=0.004 and OR for African ancestry=2.05; 95% CI: 1.00–7.56, P=0.055). In all models, the associations between breast cancer and alcohol consumption, parity, family history, age at menarche, and history of breast feeding were in the expected direction (Table 2). To ensure that there was no confounding due to differences in place of birth between cases and controls the same analysis was stratified by place of birth (USA, Mexico, South America and Central America) with all results showing the same trend as the global analysis (OR for the association with ancestry varied from 1.10 to 1.82) (data not shown). We observed a significant association between the number of European-born parents/grandparents and breast cancer risk, with higher number of European ancestors being associated with increased risk (OR=1.21; 95% CI: 1.02–1.44, P=0.025, adjusted model).
We found no evidence that associations with genetic ancestry differed by tumor characteristics such as hormone receptor status, stage or grade (Table 3). However, there were interesting trends. For example, there was a trend towards higher Indigenous American ancestry for cases with mucinous adenocarcinoma, and a trend towards higher European ancestry for cases with mixed ductal/lobular histology compared to the estimated mean ancestry for cases. Cases diagnosed at a more advanced stage had a trend towards higher Indigenous American ancestry
We examined the effect of adjustment for genetic ancestry on the association between risk of breast cancer and each of the 106 AIMs. Without adjustment for ancestry, 20 of 106 markers were nominally associated with breast cancer risk. After adjustment for ancestry, only 4 markers had p values smaller than 0.05 which were no longer significant after adjustment for multiple testing (rs1398829, p=0.005; rs10498919, p=0.018; rs7535375, p=0.018; rs1470524, p=0.018).
Adjustment for place of birth (US-born vs. foreign-born) and number of European-born ancestors was not as effective as genetic ancestry in eliminating the excess number of AIMs associated with risk of breast cancer. In models that included these factors but did not include genetic ancestry 13 of 106 markers were nominally associated with breast cancer.
We estimated individual ancestry with and without the 3 AIMs that were not in Hardy-Weinberg equilibrium. Estimates were very similar and the associations remained significant.
The incidence of breast cancer among Latinas is up to 40% lower than the incidence among US Caucasian women. Genetic factors may contribute to this difference. We have investigated the association between genetic ancestry and breast cancer risk among Latina women. In analyses not adjusted for known risk factors, such as reproductive and lifestyle factors, we found a strong association between European genetic ancestry and breast cancer risk. This association was somewhat attenuated after adjustment for known risk factors as expected 7, but it remained significant. When African ancestry was included in the model, the effect of European ancestry was enhanced possibly due to the concomitant decrease in Indigenous American ancestry.
The association between European genetic ancestry and breast cancer needs to be interpreted with caution. There may be unmeasured or unknown risk factors for breast cancer that underlie the association that we observed. The present and previous studies 6, 8 found that breast cancer risk is higher among US-born Latinas, which suggests the influence of important unmeasured confounders. For example, place of birth (US-born vs. foreign-born) is significantly associated with breast cancer risk in our multivariate model and is likely to be a marker of some other more proximate risk factor. Similarly, genetic ancestry may be associated with other, unmeasured, non-genetic factors that underlie breast cancer risk. Alternatively, our results suggest that there might be genetic variants with different frequencies in Indigenous American and European populations that influence risk for breast cancer. The only way to directly test this is to identify the genetic factors that underlie breast cancer susceptibility among Latinas. Such work is currently under way in a larger Latina population.
An important caveat in interpreting our results is that Indigenous American populations in the US are diverse and may have some systematic genetic (as well as obvious non-genetic) differences compared with Indigenous American populations in Mexico, Central and South America. Wang et al 23 recently explored the population genetics in Amerindian populations from North, Central and South America. They found substantial genetic differences among populations in the Americas compared to the differences among Asian or European populations. This may be due to repeated founder effects that occurred during the settlement of the Americas. Thus, even if the association we found is due to genetic factors, it may not be applicable to all indigenous populations in the Americas.
We found no evidence that associations with genetic ancestry differed by tumor characteristics such as hormone receptor status, stage or grade. However, since sample sizes for most of the tumor subtypes were small, further work will be needed to explore observed trends.
A related question that our study addresses is whether the variation in genetic ancestry among Latina women acts as a confounding factor in genetic association studies of breast cancer. Our results demonstrate that such studies may be confounded by genetic ancestry. Without adjustment for genetic ancestry, there was a dramatic deviation from the null hypothesis when testing the association between specific AIMs and breast cancer risk. However, there was no deviation after adjusting for ancestry differences, as expected based on theoretical results 24–30 and previous empirical studies 11, 12, 17, 29, 31–33. It is important to note that the AIMs we tested are among the markers that are most likely to be falsely associated with disease precisely because they are strongly correlated with genetic ancestry. However, the bias due to stratification may affect even less informative markers as the sample size increases 28.
We observed a strong association between the number of European-born parents and grandparents with breast cancer risk. This implies that the information provided by Latino women about place of birth of parents and grandparents could be an adequate approximation to genetic ancestry for risk assessment purposes. However, using number of European parents and grandparents to adjust the association of individual markers with breast cancer risk, left 13 of 106 markers significant at p<0.05 compared to 4 of 106 markers when genetic ancestry was adjusted for. Thus, use of genetic ancestry in recently admixed populations may provide information above that of grandparents’ origin. The 4 SNPs that had p values of less than 0.05 after adjustment for ancestry are likely to be false-positives since they did not achieve significance when we corrected the significant p value for multiple testing.
In summary, European genetic ancestry in US Latinas residing in the San Francisco Bay area was associated with increased breast cancer risk after adjustment for known risk factors. Further work is needed to evaluate if the observed association is solely due to differences in non-genetic risk factors not included in the model or to genetic differences between populations.
Funding: Department of Defense Breast Cancer Research Program (BC030551 to EZ); National Cancer Institute (K22 CA10935to EZ); National Cancer Institute (R01 CA120120 to EZ); UCSF Clinical and Translational Sciences Institute (Career Development Award to LF); Prevent Cancer Foundation (Postdoctoral Fellowship to LF); National Cancer Institute (R01 CA63446 and R01 CA77305 to EMJ); Department of Defense Breast Cancer Research Program (DAMD17-96-6071 to EMJ); California Breast Cancer Research Program (7PB-0068 to EMJ); National Institute of Health (RO1 HL078885); Tobacco-Related Disease Research Program New Investigator Award (15KT-0008 to SC); National Cancer Institute (Redes En Acción, U01-CA86117); National Cancer Institute (Cooperative agreement U01CA069417, RFA#CA-95-011 to The Northern California site of the Breast Cancer Family Registry (Breast CFR)) The content of this manuscript does not necessarily reflect the views or policies of the National Cancer Institute or any of the collaborating centers in the Breast CFR, nor does mention of trade names, commercial products, or organizations imply endorsement by the US Government or the Breast CFR.
Notes: The authors would like to thank the study participants.