|Home | About | Journals | Submit | Contact Us | Français|
Background A previous Australian population-based breast cancer case-control study found indirect evidence that control participation, although high, was not random. We hypothesized that unaffected sisters may provide a more appropriate comparison group than unrelated population controls.
Methods Three population-based case-control-family studies of breast cancer in women of white European origin were carried out by the Australian, Ontario and Northern California sites of the Breast Cancer Family Registry. We compared risk factors between 3643 cases, 2444 of their unaffected sisters and 2877 population controls and conducted separate case-control analyses based on population and sister controls using unconditional multivariable logistic regression.
Results Compared with sister controls, population controls were more highly educated, had an earlier age at menarche, fewer births, their first birth at a later age and their last birth more recently. The established breast cancer associations detected using sister controls, but not detected using population controls, were decreasing risk with each of later age at menarche, more births, younger age at first birth and greater time since last birth.
Conclusions Since participation of population controls might be unintentionally related to some risk factors, we hypothesize that sister controls could provide more valid relative risk estimates and be recruited at lower cost. Given declining study participation by population controls, this contention is highly relevant to epidemiologic research.
Case-control studies have been central to the identification of risk factors for breast cancer and other diseases. The design has traditionally involved the recruitment of incident cases and a sample of unaffected controls from the same population. With few exceptions, at least with respect to breast cancer, controls have been unrelated to the cases. Population-based recruitment of cases and controls has long been considered best practice, provided participation is high.
As detailed in a previous report, between 1992 and 1995 we recruited population-based breast cancer cases diagnosed before the age of 40 years and population controls frequency matched for age, in Melbourne and Sydney, Australia.1 Participation was 73% for cases and 64% for controls. Cases were identified from the Victorian and New South Wales cancer registries, half within 9 months of their diagnosis. Controls were sampled from the government electoral rolls (adult registration for voting is compulsory in Australia).
Standard interpretation of multivariable logistic regression analyses would have suggested that higher educational attainment, being never married and being Australian-born were each independently associated with lower risk of breast cancer (i.e. these factors were more common for controls than for cases). This is contrary to the literature which has reported, at least historically, that higher educational attainment, never being married and being born in a Western country are all associated with increased breast cancer risk.2,3
A possible explanation for our earlier results is that breast cancer risk factors differ for early-onset disease. There is some evidence for different strengths, if not directions, of associations with a number of risk factors for early-onset compared with later-onset breast cancer; a difference in the association of breast cancer risk with body mass index (BMI) for pre- and post-menopausal women is well established4 and associations with family history characteristics are generally stronger for earlier-onset disease.5,6
It could also be that the population controls who participated in our earlier study were of higher educational attainment and more likely to be never married and Australian-born than those who did not participate. That is, these factors could be related to study participation rather than breast cancer risk. If that were the case, other groups of women, such as unaffected sisters of cases might provide a more appropriate comparison group than unrelated population controls.
In the present analysis, we assessed this hypothesis by pooling data for women of white European origin from three population-based case-control-family studies carried out in Australia, Canada and the USA as part of the Breast Cancer Family Registry.7 These studies recruited incident breast cancer cases, population controls and unaffected sisters of cases. We first compared the risk factor profiles of population controls and sister controls. We then estimated relative risks for established breast cancer risk factors by comparing population-based cases with: (i) population controls and (ii) sister controls, both with and without adjustment for variables potentially related to study participation and other breast cancer risk factors.
Breast cancer cases, population controls and sister controls were women of white European origin participating in three population-based studies carried out in Melbourne and Sydney (Australia), Ontario (Canada) and the San Francisco Bay Area (USA) by the Australian, Ontario and Northern California sites, respectively, of the Breast Cancer Family Registry.8–11 All participants gave written informed consent and all studies received approval from their local ethics boards.
Cases were women with incident, invasive breast cancer identified through population-based cancer registries. In both Melbourne and Sydney, Australia, eligible cases were diagnosed with first primary breast cancer before the age of 60 years between 1996 and 1999. We selected all cases aged 18–39 years at diagnosis and 35 and 27% random samples of those aged 40–49 years and 50–59 years, respectively. Adding these breast cancer cases to those identified in our previous study1 resulted in a total of 2303 eligible cases. Physician permission to contact cases was received for 2043 (89%) and 1578 (77% of those approached) completed risk factor and family history questionnaires during an in-person interview. Of these, 1465 self-reported a white European origin and were included in the present analysis.
In the studies carried out in Ontario, Canada and the San Francisco Bay Area, USA, a two-stage sampling procedure was used to over-sample breast cancer cases likely to have been at increased familial risk of breast cancer. Newly diagnosed cases were identified from the local population-based cancer registries covering the Province of Ontario and the Greater San Francisco Bay Area and then screened for the indicators of increased genetic susceptibility described below. All cases with these indicators, and a random sample of cases without, were invited to participate.7
In Ontario, cases aged 18–69 years were eligible if they met any of the following criteria: Ashkenazi Jewish; diagnosed before the age of 36 years; previous ovarian or breast diagnosis; one or more first-degree relatives, or two or more second-degree relatives with breast or ovarian cancer; one or more second- or third-degree relatives with either breast cancer diagnosed before the age of 36 years, ovarian cancer diagnosed before the age of 61 years, multiple breast or breast and ovarian primaries, or male breast cancer; three or more first-degree relatives with any combination of breast, ovarian, colon, prostate or pancreatic cancer or sarcoma, with at least one diagnosis before the age of 51 years. In addition, 25% of the cases who did not meet these criteria were randomly selected. A total of 8143 first primary breast cases (all women aged <55 years and a 35% random sample of women aged 55–69 years) diagnosed from 1996 to 1998 were identified and physician permission to contact them was obtained for 7384 (91%). A mailed screening questionnaire on family history and race/ethnicity was completed by 4760 (64%). Of these, 2390 were invited to participate after sampling as described above and 1704 (71%) completed a detailed telephone interview on family history and a mailed risk factor questionnaire, including 1584 of white European origin.
In San Francisco, cases aged 18–64 years were eligible if they met any of the following criteria: diagnosed before the age of 35 years; bilateral breast cancer with the first diagnosis before the age of 50 years; previous diagnosis of ovarian or childhood cancer and one or more first-degree relatives with breast, ovarian or childhood cancer. In addition, 2.5% of the non-Hispanic white cases and 15% of the African American, Hispanic and Asian American cases who did not meet the above criteria were randomly selected. A total of 7359 cases of any race/ethnicity diagnosed with invasive breast cancer from 1995 to 1998 were identified from the cancer registry. Of these, 7247 (98.5%) did not have any physician-reported contraindications. A screening telephone interview that inquired about family history of breast, ovarian and childhood cancer as well as self-reported race/ethnicity was completed for 85% of cases. Of these, 1594 cases meeting the above eligibility and selection criteria were invited to participate and 1209 (76%) completed a detailed family history questionnaire by telephone and a risk factor questionnaire by in-person interview, including 655 of white European origin. After exclusion of cases with previous breast cancer, 594 cases were included in this analysis.
Each study recruited women without any reported invasive or in situ breast cancer from the general populations from which the cases were ascertained and during the same time period as the case recruitment. These population controls were frequency matched by 5-year age group to the expected age distribution of cases at diagnosis. In San Francisco, population controls were also frequency matched by race/ethnicity.
In Melbourne and Sydney, population controls were identified from electoral rolls. Of the 1531 population controls identified from 1992 to 1999, 1021 (67%) completed the in-person interview and 903 of those were of white European origin. In Ontario, 2688 population controls were identified by calling randomly selected residential telephone numbers. Of these, 1713 (64%) returned the mailed risk factor and family history questionnaires and 1589 of those were of white European origin. In San Francisco, population controls were identified by random-digit dialling and of 1041 women selected, 627 completed the family history questionnaire by telephone and the risk factor questionnaire by in-person interview, including 385 (67% response) of white European origin who were included in this analysis.
Each study recruited sisters of cases during the same time period as the corresponding case recruitment. Permission to contact them was given by the respective cases. The present study included living full sisters of enrolled cases with no reported history of invasive or in situ breast cancer. In Melbourne and Sydney, 2039 sisters of all eligible cases diagnosed from 1992 to 1999 were identified, 1390 (68%) completed the in-person interview and 1225 of those were of white European origin. In Ontario, 2077 sisters of all cases were identified, 1484 were contacted and 939 (63% of those contacted, 45% overall) returned the mailed risk factor questionnaire. Of these, 868 were of white European origin. In San Francisco, 501 sisters of included cases diagnosed from 1995 to 1998 and of self-reported white European origin were identified, 416 were contacted and 351 (85% of those contacted, 70% overall) completed the in-person interview. These were the sisters of 771 (53%), 610 (39%) and 231 (39%) cases from Melbourne and Sydney, Ontario and San Francisco, respectively.
Each study used the same questionnaire that captured information concerning established and suspected risk factors for breast cancer (https://cfrisc.georgetown.edu/isc/dd.questionnaires.do). For cases, questions typically asked about exposures and lifestyle factors up to 1 year prior to diagnosis. For sisters and population controls of the cases, questions were asked about exposures and lifestyle factors up to the date of interview.
Each woman was assigned a reference age defined as her age 1 year prior to diagnosis for cases, and as her age at questionnaire completion for controls. Descriptive variables considered were age (5-year categories and continuous), educational attainment (incomplete or no secondary school, secondary school completed, university degree), country of birth (born outside the country in which the corresponding study was conducted; yes or no) and marital status (ever or never) and risk factors considered were age at menarche (<12, 12, 13, ≥14 years and continuous), ever parous (ever had a birth; yes, no), number of births (1, 2, 3, ≥4 and continuous), age at first birth (<20, 20–24, 25–29, ≥30 and continuous), years since last birth (<10, 10–19, 20–29, ≥30 and continuous), ever breastfed (parous women only; yes, no), lifetime duration of breastfeeding (parous women only; 0, 1–6, 7–18, >18 months and continuous), age at menopause (<50, ≥50 and continuous) and height (quartiles and continuous). For all parity-related variables, the term ‘birth’ refers to full-term pregnancies that resulted in live or still births. All variables were defined in relation to the reference age.
Crude differences in the distributions of descriptive variables and established risk factors between groups of participants were assessed using chi-square tests for categorical variables and analysis of variance for continuous variables. Adjusted differences between pair-wise combinations of the three groups of participants were assessed by unconditional logistic regression, including subject group as the dependent variable.
Case-control analyses were conducted using unconditional logistic regression to generate odds ratio (OR) and 95% confidence interval (95% CI) estimates. Robust estimates of variance were generated for the analysis using cases and sister controls to account for possible correlations in exposure variables within families.12 We accounted for the over-sampling based on putative genetic risk criteria used by the two studies by including an offset term in all logistic regression models.13 Offsets for cases from Ontario and San Francisco were set to the natural log of the sampling fraction. For all other groups that were not selected for familial characteristics (i.e. controls from all studies and cases from Melbourne and Sydney), the offset was set to zero. Trends in associations with increasing education were assessed by including educational attainment [1=incomplete or no secondary (high) school, 2=secondary (high) school completed, 3=university degree] as a continuous independent variable.
Two levels of multivariable analysis were considered. The first level included as covariates demographic variables potentially related to study participation: study (Australia, Ontario, San Francisco), age in 5-year categories, educational attainment, marital status and country of birth. Interaction terms for age-by-study and educational-attainment-by-study were also included to account for potential differences by country. The second level included as covariates these demographic variables and interaction terms plus potential confounding factors: ever parous, number of births (continuous), age at menarche (continuous) and family history (having a first-degree relative with breast cancer; yes or no). The family history variable was not included in analyses of cases compared with sister controls. All analyses were carried out using Stata: Release 10.14
Table 1 gives the distributions of reference age, education, country of birth and marital status for the cases and two control groups, by study. The mean reference age for cases was 1.0 year [standard error (SE)=0.3] earlier than for population controls (P<0.001) but not different from that of sister controls (P=0.2). The difference in mean reference age between population and sister controls was 0.6 years; (SE=0.3, P=0.03). There was no difference in mean reference age between the three groups in Australia (P=0.2) but in both Ontario and San Francisco, the reference age for sister controls was greater than for cases by 1.9 years (SE=0.4, P<0.001) and 1.6 years (SE=0.7, P=0.03), respectively. The age distributions differed between the three groups overall (all P<0.001). That is, overall the three groups were reasonably comparable for average reference age but the distributions differed in a somewhat non-systematic manner, and this pattern was not the same for all three groups.
Perhaps, of most significance for this study, the overall distribution of educational attainment differed between the three groups (P<0.001) with population controls more highly educated than cases (P=0.005) and cases more highly educated than sister controls (P=0.006). In all studies, population controls were more highly educated than sister controls (all P≤0.02). Overall, both types of controls were less likely than cases to be born outside the country where the corresponding study was conducted (both P<0.001) and this pattern was seen consistently in all studies; there was no difference between the two control groups (P=0.7). There were no differences in marital status between the three groups (P=0.6).
Table 2 shows that, both before and after adjusting for age (in categories) by study, the population controls were more likely than the sister controls to have higher educational attainment (P<0.001) and earlier age at menarche (P=0.001). They were less likely to be married (P=0.01), more likely to be foreign born (P=0.04) and were shorter (P=0.04) than sister controls. Considering only parous women, population controls had fewer births (P=0.002), were more likely to be older at first birth (P<0.001) and to have a shorter interval between last birth and reference age (P<0.001). Results by study are provided in Supplementary Table S1.
The absolute difference between the two control groups in mean age at menarche was 0.20 years (SE=0.04) overall; 0.13 (SE=0.07), 0.18 (SE=0.07) and 0.11 (SE=0.11) in Australia, Ontario and San Francisco, respectively. This overall difference reduced to 0.16 years (SE=0.04) after adjusting for age by study and to 0.15 years (SE=0.04) after also adjusting for educational attainment by study.
The absolute difference between the control groups in mean height was 0.29cm (SE=0.19) overall; 0.72 (SE=0.32), 0.10 (SE=0.29) and 0.43 (SE=0.49) in Australia, Ontario and San Francisco, respectively. This overall difference increased to 0.42cm (SE=0.20) after adjusting for age by study and to 0.58cm (SE=0.20) after also adjusting for educational attainment by study.
For parous women, the absolute difference between the two control groups in mean number of births was 0.13 (SE=0.03) overall; 0.08 (SE=0.05), 0.09 (SE=0.05) and 0.41 (SE=0.10) in Australia, Ontario and San Francisco, respectively. This overall difference became 0.10 (SE=0.03) after adjusting for age by study and 0.08 (SE=0.03) after also adjusting for educational attainment by study. The absolute difference between the control groups in mean age at first birth was 0.56 years (SE=0.15) overall; 0.79 (SE=0.24), 0.43 (SE=0.22) and 1.49 (SE=0.47) in Australia, Ontario and San Francisco, respectively. This overall difference became 0.66 years (0.15) after adjusting for age by study and 0.40 years (0.14) after also adjusting for educational attainment by study. The absolute difference between the control groups in mean time since last birth was −0.23 years (SE=0.35) overall; 1.17 (SE=0.48), 1.61 (SE=0.51) and 1.48 (SE=1.00) in Australia, Ontario and San Francisco, respectively. This difference became 0.63 years (SE=0.16) after adjusting for age by study and 0.44 years (SE=0.16) after also adjusting for educational attainment by study.
Further adjustment for country of birth by study and exclusion of women aged <40 years made no substantial difference to the results reported above.
Table 3 shows the results of case-control analyses using population controls. After adjusting for age by study, higher educational attainment and never being married were associated with ‘decreased’ risk (P<0.001 and P=0.02, respectively), whereas being foreign born was associated with ‘increased’ risk (P<0.001). All subsequent analyses adjusted for these three variables. Age at menarche was ‘not’ associated with breast cancer risk (P=0.4). The only pregnancy-related variables found to be associated with risk were ever having had a birth and having had a longer total duration of breastfeeding; both associated with reduced risk (both P<0.001). Being taller and, for post-menopausal women, having later age at menopause were both associated with increased risk (both P<0.001). Adjustment for number of births and family history and the exclusion of women aged <40 years had no substantive influence on these results. Results by study are provided in Supplementary Table S2.
Table 4 shows the results of case-control analyses using the sisters of cases as controls. There was no evidence that educational attainment or marital status were associated with breast cancer risk, but being foreign born was associated with ‘increased’ risk (P<0.001). After adjusting for demographic characteristics and potential confounders, there was an estimated decreased risk of 8% per year (95% CI 5–12%, P<0.001) associated with each 1-year increase in age at menarche and this association was consistent across all studies (OR=0.91, 0.92 and 0.91 for Australia, Ontario and San Francisco, respectively). Being parous was associated with a 19% reduction in risk (95% CI 3–32%, P=0.02).
For parous women, there was an estimated 10% (95% CI 4–16%, P=0.002) decreased risk associated with each birth, a 3% (95% CI 1–4%, P=0.003) increased risk associated with each increasing year in age at first birth, a 3% (95% CI 2–5%, P<0.001) decreased risk per year since last birth and a 3% (95% CI 1–7%, P=0.04) decreased risk associated with each additional 6 months of cumulative lifetime breastfeeding. For post-menopausal women, each 1-year increase in age at menopause was associated with a 6% (95% CI 3–9%, P<0.001) increased risk of breast cancer. These associations were in general consistent across studies (Supplementary Table S3), and were maintained after excluding women under the age of 40 years.
A more comprehensive multivariable model was fitted which included all second-level covariates, plus age at first birth, time since last birth and lifetime duration of breastfeeding. This predicted a 51% (P=0.3) increased risk of breast cancer associated with a first birth which decreased by 9% (95% CI 1–16%, P=0.03) with each additional birth, by 3% (95% CI 2–4%, P<0.001) with each additional year since last birth and by 5% (95% CI 2–8%, P=0.002) with each additional 6 months of lifetime cumulative breastfeeding. After including these variables in the model, there was no association with age at first birth (P=0.8), but the corresponding regression parameters were correlated, particularly those for age at first birth and time since last birth (r=0.58). The inverse association with age at menarche was maintained in this more comprehensive model (OR 0.92; 95% CI 0.88–0.95, P<0.001) as was that with age at menopause (OR=1.05 per 1-year increment in age, 95% CI 1.03–1.18, P<0.001) for post-menopausal women.
We conducted three case-control-family studies using identical questionnaires with what would be generally considered acceptable participation from population-based cases, population controls and unaffected sisters of cases (as an additional control group). We found that, despite similar and reasonably high participation at each stage of sampling for each group, the two control groups differed for some key variables including highest educational attainment, age at menarche and, for parous women, number of births, age at first birth and the interval between last birth and reference age.
When we attempted to replicate results for some well-established risk factors for breast cancer, we found that several key associations were not evident when cases were compared with population controls, but were observed when cases were compared with sister controls. This could have been due to differential participation across the groups.
Contrary to established associations, and consistent with results from our previous Australian study of women aged <40 years,1 we found that population controls were more highly educated, less likely to be married and less likely to be foreign born than cases. These differences remained both when women from the previous Australian study were excluded and when all women aged <40 years were excluded. As these observations are for women up to 69 years of age, they suggest that these factors are not differentially associated with early-onset disease, but rather that they are related to study participation. They also did not appear to depend on the proportion of eligible women in the different control groups who participated in each study as the results were consistent across studies. The present observations of differential participation are consistent with findings from a number of published studies.15–17 It appears that the use of population controls results in a selection bias affecting the estimation of relative risks associated with some key reproductive risk factors, even after adjusting for educational attainment, marital status and country of birth. This bias does not appear to exist when cases are compared with sister controls.
We observed a reduced risk associated with increasing age at menarche only when using sister controls, even though age at menarche was correlated between sister pairs (r=0.25). All three studies found that the sisters had a later age at menarche than did the population controls. Similarly, when cases were compared with sister controls, pregnancy-related variables were strongly associated with breast cancer risk in the expected direction based on the literature, with a decreasing risk associated with increasing number of births, increasing time since last birth and earlier age at first birth, although the latter association was not maintained when all these correlated factors were included in a multivariable analysis. None of these established associations were observed when population controls were used. Population controls were not more likely to be parous than sister controls, but those who were parous had fewer children than parous sister controls, had their first child at a later age and their last child more recently. Again, this was consistent across all studies.
These findings for pregnancy-related variables are consistent with there being greater participation by population controls with a higher socio-economic status. Educational attainment can be considered a surrogate indicator of socio-economic status. The recruited population controls had a socio-economic status and reproductive history more similar to that of the breast cancer cases than did sister controls.
It is perhaps not so clear why this would influence inference in relation to age at menarche. One possible explanation is that the majority of our study participants went through puberty prior to the 1960s when nutritional status was positively associated with parental income and socio-economic status, which in turn could have influenced age at menarche via the early attainment of body mass.18
One established association we replicated when using population controls, but not sister controls, was an increasing risk with increasing height. It might be that, whereas there is no differential participation with respect to height for population controls, the moderately strong correlation in height (and familial factors associated with height) between cases and sisters (r=0.47) meant that the latter analysis had inadequate power to detect a real association.
We observed some established associations when using population controls and when using sister controls. Later age at menopause was associated with an increased risk of the same magnitude when using either control group. For parous women, longer lifetime duration of breastfeeding was associated with decreased risk. This association was numerically stronger when population controls were used, possibly because of correlations in breastfeeding behaviour within families.
We have compared analytically the use of sister controls versus population controls for breast cancer case-control association studies with respect to selected demographic, reproductive and other factors known to be associated with risk. Further studies will be required to assess whether our findings are generalizable to other exposures or lifestyle factors, or to men. Within-family correlations in exposures, especially those related to early life, mean that studies using sister controls would have less statistical power to detect associations than those using the same number of population controls. This could be more important for genetic association studies or studies investigating early-life risk factors. This concept is referred to as ‘statistical inefficiency’, but it does not necessarily argue against using a sibling control design and also does not mean the sibling control design is inefficient in terms of time and money spent on resource collection, a point that is not necessarily well understood.
We found that the cost of recruiting a sister control was substantially less than that of recruiting an independent population control. Population control recruitment requires the organization of a separate and appropriate recruitment strategy and process, which might not even be possible for some studies. On the other hand, sibling control recruitment can be readily merged with case recruitment, as we have demonstrated. Once we were given permission by the case to approach their sibling, the sibling was approached using identical study protocols in terms of informed consent, administration of questionnaires and bio-specimen collection. In this way, recruiting sibling controls could be a much more cost-effective strategy, despite the typically small within-family correlations that might influence the calculation of standard errors for risk estimates derived from sibship comparisons of some covariates. Further, although not all cases have sisters, or give permission to contact their sisters, we have pooled all cases and all sister controls so that no cases were excluded from the analysis. Other sources of controls could be considered, such as other relatives, friends or neighbours, although each has disadvantages.19
Our study suggests that, for association studies, the recruitment of population controls that are representative in terms of all relevant demographic and reproductive variables might no longer be possible, even when participation appears to be acceptable. At least with respect to women, it appears there is differential participation by socio-economic status and hence correlated risk factors. This could be expected to become even more pronounced for future case-control studies, should levels of participation in population-based epidemiologic research continue to decline, particularly for population controls. Given this growing and widespread problem, we suggest that recruitment of siblings as controls might be a valid (i.e. unbiased) and cost-effective alternative.
Supplementary Data are available at IJE online.
The National Cancer Institute; National Institutes of Health under RFA (grant no. CA-06-503); the Breast Cancer Family Registry (BCFR); the University of Melbourne (grant no. U01 CA69638); Cancer Care Ontario (grant no. U01 CA69467); the Northern California Cancer Center (grant no. U01 CA69417); National Health and Medical Research Council of Australia (to the Australian Breast Cancer Family Study); the New South Wales Cancer Council; the Victorian Health Promotion Foundation; the National Institutes of Health Grant (grant no. U01CA 71966 to the Northern California Cancer Center); the Canadian Breast Cancer Research Initiative (Ontario).
The authors thank for their contributions to the study Margaret McCredie, Maggie Angelakos and Judith Maskiell from the Australian Breast Cancer Family Registry, Gord Glendon, Elaine Maloney and Nayana Weerasooriya from Cancer Care Ontario, members of the Ontario Cancer Genetics Network and Enid Satariano, Connie Cady and Jocelyn Koo from the Cancer Prevention Institute of California. The content of this manuscript does not necessarily reflect the views or policies of the National Cancer Institute or any of collaborating centres in the BCFR, nor does mention of trade names, commercial products, or organizations imply endorsement by the US Government or the BCFR. J.L.H. is an Australia Fellow of the NHMRC and both he and M.C.S. are Victorian Breast Cancer Research Consortium (VBCRC) Group Leaders.
Conflict of interest: None declared.