The methods of our retrospective cohort study are published, and key details on linkage are also given in appendix 1.
20
23 We followed a strict protocol that preserved anonymity and maintained separation of personal data from the Census and NHS, and clinical data (see also ethics below). We used computerised matching of names, addresses and dates of birth to link the Census 2001 for Scotland, which provided ethnic group as reported by either individuals or the householder completing the form based on a question followed by a choice of 14 categories (appendix 1, table A1, which also provides linkage by ethnic group), and other demographic and socio-economic variables, to the Scottish Community Health Index (CHI), which is a register of patients using the NHS. We then matched, using CHI number, to an already linked death in the community and hospital, and cancer registration records (SMR06) database.
Ethnic group is a legally required field that was well completed (95.8%) and, after imputation (4.3%), available for 100% of those completing the census form (which is also a legal obligation). (For details see:
http://www.gro-scotland.gov.uk/census/censushm/censcr02/data-quality/census-variables/results-and-conclusions/appendix-d-person-items-reports-and-tables-p10-to-p-17.html; accessed 26 April 2012). About 95% of the people participating in the 2001 census (4.9 million) were linked as above to health records, that is 4.65 million, with 85% or more linked in every ethnic group
20 (see appendix 1). The total estimated Scottish population was 5.06 million so our cohort of 4.65 million includes about 92% of the 2001 population. While the identities of those not completing a census form are unknown; it is estimated in census validity studies that a higher proportion of non-White than White groups were non-completers—estimated at, for example, 10.2% of Pakistanis and 3.8% of White Scottish.
The ethnic group categories (and labels) follow those of the Scottish Census 2011, given in appendix 1.
20 Because of small numbers we grouped Bangladeshis with other South Asians; and Caribbean, African and Black Scottish or other Black, into one ‘African origin’ group. Further grouping was sometimes necessary because of small numbers in analysis of specific cancers as described in the results. Mostly, following our analytical strategy, ethnic groups were sometimes omitted to avoid potential disclosure of identity.
About 90% of the cases were obtained from the cancer registry, 10% from mortality files. Cancers are registered at diagnosis, so mortality data add cases where the diagnosis was first made outside Scotland, which is especially important for mobile ethnic minority groups. A date of embarkation field is in the registry but we did not think this was reliable enough in relation to non-UK migration to use to adjust denominators. More than 90% of the Scottish Cancer Registry records for 2001–2008 were linked to our census-extract file. We excluded non-melanoma skin cancer. The ICD codes used are in
box 2. Other non-cancer health outcomes were excluded from the analysis file for reasons given in the ethics section below.
Box 2. ICD codes used in the studyUp to 31 December 1996 ICD9 codes were used by the Cancer Registry (needed for 10 year look-back)
Lung cancer


ICD9 162
Breast cancer


ICD9 174
Prostate cancer


ICD9 185
Colorectal cancer


ICD9 153–154
All cancers


ICD9 140–208
All cancer without


ICD9 140–172 and 174–208; non-melanona skin cancers
From 1 January 1997 in Cancer Registry and from 1 January 2000 in mortality data ICD10 codes were used
Lung cancer


ICD10 C33–C34
Breast cancer


ICD10 C50
Prostate cancer


ICD10 C61
Colorectal cancer


ICD10 C18–C21
All cancers


ICD10 C00–C97*
All cancer without


ICD10 C00–C43 and C45–C97
non-melanona skin cancers
*C97 is multiple cancer sites—used in mortality data only.
To minimise the numbers of age/sex cells with no cases, which creates instability in the analysis, we restricted analysis by age as follows: ≥20 years for all cancer; ≥30 years for lung cancer; ≥20 years for breast cancer; and ≥30 years for colorectal and ≥40 years for prostate cancer. This led to few omissions, ranging from 0.1% to 1.9% depending on the specific diagnosis.
We analysed only first events, that is, newly diagnosed cancers occurring between 2001 and 2008. First event meant that there was no record of the cancer diagnosis under study in the preceding 10 years in the mortality and cancer registration (SMR06) linked file. The cancer registry collects data from a range of sources including pathology laboratories, so our cases are likely to be new ones.
We calculated for first cancers for all and each cause, by sex: directly age standardised cumulative incidence rates (DASRs) per 100 000/year using 10-year age groups; DASR ratios (DASRRs); risk ratios (RRs) using Poisson regression with robust variance adjusting for age and country of birth; and 95% CI around summary measures. To assess effects of out-migration we calculated RR using moving average for 3-year time periods 2001–2004, 2002–2005, etc. In appendix 2, we provide details of our approach in calculating rates and RRs, including details of the Poisson modelling. The standard reference population was the White Scottish population. For ease of interpretation we multiplied ratios by 100 to get whole numbers interpretable as percentages. We adjusted the RRs for country of birth being Scotland or outside Scotland. Relatively few cases in ethnic minority populations were born in Scotland, for example, for all cancers excepting non-melanoma, the proportion was 5.1% in other White British, 11.2% in Indians, 18.5% in Pakistani, 8% of Chinese and 36% of African origin groups. In the small any mixed background group 64.7% were born in Scotland. For this reason, that is, statistical precision, analysis is not stratified by country of birth.
We examined, in each ethnic group, whether there was an association between eight indicators of socioeconomic position and all cancer rates (at all ages) and hence whether any were potentially valid confounding factors across all our ethnic groups. The indicators were: (1) the postcode (zipcode)-based Scottish Index of Multiple Deprivation, (2) car ownership, (3) highest qualification of the individual, (4) highest qualification in the household, (5) National Statistics Socio-economic Classification at individual, and (6) household levels, (7) household tenure and (8) economic activity in the previous week (of the Census completion date).
Data were analysed using SAS V.9 (SAS Institute Inc, Cary, North Carolina, USA) and Stata 11 (StataCorporation 2009; Statistical Software: Release V.11.0; College Station, Texas, USA).
In the Results section we provide both absolute (DASRs) and ratio (DASRRs and RRs) measures and describe findings where the 95% CI does not include 100, the value for the reference White Scottish population.