The study was conducted in the Lombardy region of Italy, which includes approximately 9,600,000 people and is served by a network of modern hospitals, medical schools, and a regional health service. The catchment area includes 5 cities (Milan, Monza, Brescia, Pavia, and Varese) and surrounding towns and villages, for a total of 216 municipalities, encompassing over 3,000,000 people. The theoretical ideal is random collection of cases from the population, e.g., through a cancer registry. However, random selection from hundreds of large and small hospitals would have made collection of biospecimens (particularly fresh frozen tissue samples) and detailed epidemiological and clinical data unfeasible. Thus, we designed EAGLE to be as close as possible to a really comprehensive population-based study through enrollment of cases in a defined set of hospitals, which examine approximately 80% of all lung cancer cases from the catchment area. These hospitals were selected based on a review of the hospital admission/discharge records from the years 1997–2000.
EAGLE includes 2101 verified, incident, primary lung cancer cases of any histologic type, with the exception of carcinoids, and 2120 healthy population-based controls. Participants are both male and female, born in Italy, of Italian nationality, and with official residence in the 216 selected municipalities, at ages between 35 and 79 years old at diagnosis (cases) or enrollment for interview (controls) that signed an informed consent form to participate in the study. The study was approved by the Institutional Review Board (IRB) of each participating hospital and university in Italy and by the National Cancer Institute, Bethesda, MD.
EAGLE's study size is powered to detect small increases in risk for factors with moderate frequency; for example the power is at least 80% to detect an association between a given genotype and lung cancer risk with an OR of 1.4 for at-risk genotype frequency between 10% and 90%. Under a multiplicative gene-environment interaction model, the study is large enough to reject at a 0.05-level with 80 percent power an interaction of 0.5 between the highest smoking category relative to the non-smoking category when the at-risk genotype frequency is > 13%, and of 0.2, if the at-risk genotype frequency is 5% or higher and the distributions of smoking and the gene are independent [7
Lung cancer cases
We recruited cases from thirteen hospitals. A detailed description and link to the respective hospitals is available on the EAGLE website [8
]. The first diagnosis of all lung cancer cases occurred in the period between April 22nd
, 2002 and February 28th
, 2005, and enrollment continued until June 30th
, 2005. Two research physicians per hospital reviewed daily hospital admission logs in different departments, identified cases of suspected lung cancer in the specified age range and from the catchment area, and arranged for collection of blood specimens and for subject interview. Rapid communication with the central institute from each hospital was performed using a web-based case-registry connected through dedicated Integrated Services Digital Network (ISDN) lines. Reasons for non-eligibility were recorded for all subjects. Subjects who declined to participate were asked to answer a few questions on smoking and demographic characteristics that allowed us to obtain a more comprehensive picture of the lung cancer cases of the area within the study period.
The diagnosis of lung cancer was established based on clinical criteria and confirmed by pathology reports from surgery, biopsy or cytology samples in approximately 95% of cases, and on clinical history and imaging for the remaining 5%. The date of diagnosis was defined as the date of the first clinical study to report a suspicious lesion (for example, chest X-ray or CT-scan) that led directly to diagnosis. To verify the diagnosis, we examined the clinical history, bronchoscopy and biopsy results, and x-ray and thorax CT scans (and MRI or PET scans when available) and hospital discharge letter for each case. In addition, we reviewed surgery descriptions and pathology reports for the surgical cases, and biopsies and/or cytology reports from brushing, broncho-alveolar lavage, sputum, bronchoaspirate, or pleural or pericardial effusion for the non-surgical cases. All available imaging documenting lymph node and/or distant metastases or other functional/clinical conditions that excluded surgery were also assessed. Tumor histology was coded according to the WHO Histological Typing of Lung and Pleural Tumors (1999); clinical and/or post-surgical staging was performed according to the International System for Staging Lung Cancer adopted by the American Joint Committee on Cancer and the Union Internationale Contre le Cancer [9
]. To verify extra-thorax metastases, we reviewed abdominal CT scans and ultrasounds, brain CT scans or MRI, and bone scintigraphy scans. To standardize diagnostic criteria across hospitals we reviewed clinical documentation and when necessary made changes to the original diagnosis/staging; in these instances the reason and the specific changes made were documented in a decision log. Diagnoses from approximately 10% of cases were reviewed and confirmed by an experienced independent pulmonary pathologist from the National Cancer Institute, NIH (Dr. Ilona Linnoila).
At study completion, we had screened 4630 subjects of whom over 2706 were eligible based on the inclusion criteria. We enrolled 2343 cases (86.6% of the eligible cases), of whom 179 (7.6%) were determined not to have lung cancer after diagnosis review; an additional 63 subjects had an uncertain diagnosis or were determined not to fit the inclusion criteria. Thus, subjects with confirmed diagnosis and characteristics fitting the inclusion and exclusion criteria were 2101. Epidemiological data and DNA specimens were collected from 98.4% and 97.3% of cases, respectively (Table ).
Distribution of questionnaire data and biological samples in cases and controls from EAGLE
The distribution of EAGLE lung cancer cases by area of residence, gender and age (matching variables) is reported in Table ; the distribution by cigarette smoking, histology and stage is reported in Tables and for females and males, respectively.
Population in the catchment area* and distribution of EAGLE subjects by gender, age, and residence (matching variables)
Distribution of EAGLE lung cancer cases by cigarette smoking*, histology and stage in females and males.
The population with official residence in the catchment area represented the study pool from which controls were sampled. The Regional Health Service (RHS) database contains information on subjects' demographics and on the family physician for virtually all Italians. We sampled population controls from updated population databases obtained periodically (twice a year) from the Lombardy Region; the age of controls was calculated as of pre-specified dates, i.e., July 22nd and January 22nd of each year. Controls were selected randomly within 90 cells (see below) to yield a set of controls with a distribution that initially approximated the case distribution based on year 2000 lung cancer admissions, and subsequently was based on enrolled EAGLE lung cancer cases, for 3 key variables: residence (5 areas: Brescia, Milan, Monza, Pavia, Varese), gender, and five-year age classes in the range 35–79 years (5 × 2 × 9 combinations of residence-gender-age categories). To select controls, a random number was assigned to each subject using statistical software, and the records were sorted based on this random number. Each subject with a number below the target enrollment number for the individual's cell was selected to be invited into the study. The family physicians for the potential study subjects were identified. The selected physicians were then asked to provide information about eligibility of the potential study subjects and, if eligible, to contact the selected controls to inform them about the study. Eligible controls were contacted by letter from the study personnel, followed by a phone call. When the phone number for the selected individuals could not be found, we searched for the phone numbers of other members of the family identified through contact with the corresponding municipalities, or sent pre-stamped return-cards requesting contact information.
At study completion we had sent invitation letters followed by phone call to 3314 potential controls. Traced eligible subjects were 2774, of whom 2012 accepted to participate. Completion rate was 60.7% (subjects who accepted to participate/contacted subjects) and participation rate was 72.5% (subjects who accepted to participate/eligible subjects). Moreover, we sent pre-stamped return-cards to 393 subjects whom we were unable to trace through the phone. Of these, 155 were eligible, and 108 accepted to participate (completion rate = 27.5%, participation rate = 69.7%). Overall, we enrolled 2120 controls, with an overall participation rate of 72.4%. Epidemiological data and DNA samples were collected from 99.8% and 99.9% of controls, respectively (Table ).
The distribution of EAGLE controls by area of residence, gender and age (matching variables) is reported in Table .
Strategies to improve subjects' participation rate
The strategies we followed for subjects' enrollment described above were derived from a series of pilot studies we conducted before officially beginning the EAGLE field activities. Because determinants of participation in cases (medical condition, performance status) are different from the determinants in controls (altruism, time), we were concerned that lack of participation might affect estimates of genetic or environmental main effects and interactions on lung cancer risk [10
]. We made concerted efforts to maximize participation of both cases and controls, and used several strategies and incentives to increase the participation rate in controls.
First, to verify the feasibility of a population-based study in the Lombardy region of Italy and study the characteristics related to potential subjects' participation, we conducted a phone survey of 1053 healthy subjects from the catchment area, selected from the rosters of the Regional Health Service, to have the age, gender and geographic distribution expected in the lung cancer cases. We asked the contacted subjects whether they would agree to participate in a study of lung cancer that would require an interview and donation of a blood sample. Only 320 (30%) of the subjects responded "Yes". There was only modest variation in response by municipality, age, gender, tobacco use, and educational level. From this effort, we obtained valuable information about preferred location for blood drawing, days of the week, and times most convenient for participation. This information was used in a series of pilot studies in which we first contacted the selected individuals by mail with follow-up by telephone, offering participation in the study at the closest hospitals of the catchment area. Then, we advertised the study on the local TV and in newspapers, and added gas coupons as reimbursement for time lost. Subsequently, we proposed conducting the interview in the subjects' homes, and added a letter endorsing the study signed by the family physician. These efforts achieved a response rate of 48.9%. To increase the participation rate further, we consulted with one of the largest market research companies in Italy, and implemented the following procedures: we altered the layout of the invitation letter, established a toll-free phone number through which potential participants could obtain study information, added to our invitation a letter from the mayor of Milan supporting our research project, and requested that family physicians call the subjects directly to inform them about the seriousness and scientific value of the study. We also provided a token of gratitude (gas coupon) to the physicians. With these measures, we achieved an acceptable response rate of 72%. Overall, in the pilot studies, we collected data on approximately 300 subjects. This level of response remained constant through the course of the full-scale study.
Impact of incentives on participation rate and enrolled subjects' characteristics
After study completion, we assessed the impact of the involvement of the family physician in controls' response rate. Controls contacted by their family physician had a much higher participation rate (80.1%) than those (49.3%) not contacted.
To further evaluate the impact of incentives on study participation, we explored the socio-demographic differences between control subjects who were enrolled with few or no incentives (~49% response rate) and those who were enrolled with the improved procedures (~73% response rate). We included in these analyses all controls recruited during the pilot studies and those recruited in the main study up to March 2003 (N = 748). We found some suggestive associations: the high incentive group exhibited an increased family history of lung cancer (p = 0.03); in addition, borderline associations were observed for: awareness of the link between smoking and lung cancer (lower, p = 0.07), anxiety score (lower, p = 0.08) and depression score (higher, p = 0.13) as measured by the Hospital Anxiety and Depression Scale (HADS), intention to quit smoking (lower, p = 0.10), history of quit attempts (lower, p = 0.11), military service (lower frequency, p = 0.11), and percentage attending college (lower, p = 0.12). Adjustment for age, gender, and, when appropriate, smoking, did not substantially alter these findings. We also found small non-significant differences by incentive group in a panel of 15 short tandem repeat (STR) loci used to identify genetic differences between samples for quality control of sample handling and processing [13
]. These analyses suggest that in studies with low response rates, estimates may be influenced by factors such as family history, education or behavioral characteristics.
In the pilot studies, we could not verify the efficacy of each incentive or procedure separately because different types of incentives were often offered together. We did, however, ask participants to rank the factors that influenced their participation. Among the most influential factors reported by subjects recruited through December 2003, "desire to help medical research" (78%), "reassured by the family physician" (53%), and "possibility to participate from home" (44%) were the most frequent "very high" scores. "Receiving compensation", "obtaining information by calling a toll-free number", and "receiving the letter from the mayor of Milan" were the factors with the most frequent, "very low" scores (61%, 47%, and 40%, respectively). The majority of subjects were not aware of the advertisements about the study that appeared on the local TV or newspapers. These data are relevant to future studies, with the caveat that they are based on self-reporting in one cultural setting, and need to be evaluated by direct comparison.
Epidemiological and clinical data collection
Extensive epidemiological data have been collected through both a Computer Assisted Personal Interview (CAPI) to capture the major risk factors for lung cancer and a self-administered questionnaire to address behavioral aspects possibly associated with smoking persistence and diet (questionnaires are available on the EAGLE website). In particular, data on tobacco smoking included information on number of cigarettes, cigars, pipes, and cigarillos per day averaged over lifetime and in the last year, age at initiation/quit, quitting attempts and time between attempts, inhalation habits, cigarette/cigar brand, passive smoking during childhood, at workplace and at home during adulthood. Moreover, we collected data on tobacco smoking in first-degree relatives. To explore the determinants of smoking persistence we also added key behavioral rating scales including the Fagerström Test for Nicotine Dependence (FTND) [14
], nicotine withdrawal [15
], knowledge about smoking effects, Beck's Depression Inventory, Hospital Anxiety and Depression Scale (HADS) [16
], alcohol dependence, Attention Deficit Disorder (ADD), and the Short-Form Revised Eysenck Personality Questionnaire [17
]. A limited food frequency questionnaire evaluated diet for specific variables of interest: vegetables, fresh and dry fruit, ham, salami, and other processed meats, red and white meat consumption (with questions about meat cooking practices), pizza, pasta, alcohol, and vitamin/mineral supplements. Additionally, subjects were asked whether they were on special diets and for what reason. From each lung cancer case we also collected extensive clinical data, including histology and grading (ICD-O codes), TNM/stage (clinical and surgical, AJCC and UICC), imaging and pathology (surgery, biopsy, and cytology) reports, blood count and serum tumor markers, chemotherapy or radiation therapy for previous tumors, blood transfusions, and previous lung diseases with spirometry indexes. From approximately 10% of the cases, histology slides were scanned and digital images stored in a large database for archival, research, and educational purposes.
One of our goals is to integrate genomic and epidemiological data with clinical data on therapy outcome and survival in order to identify genetic factors that affect these factors. We are currently collecting data from cases on surgical procedures, chemotherapy (type, doses, duration, cycles, and breaks), radiation therapy (type, duration, dose, equipments, and breaks), major toxicities, ECOG performance status, recurrence, smoking after lung cancer diagnosis, vital status through the Vital Statistic Office, and death certificates through the Local Health Units (causes of death are coded following the ICD IX).
Specific laboratory Standard Operating Procedures were developed (and updated as warranted) within EAGLE to ensure quality control of every step involved in biospecimen collection, processing, transportation, tracking, shipping, and eventual long term storage. Approximately 90% of cases and 87% of controls donated a blood sample, and 7% of cases and 13% of controls donated buccal rinse samples (Table ). Blood samples were transported from each hospital to the central laboratory within four hours of phlebotomy by a transportation team established ad hoc for EAGLE. Blood specimens were processed to obtain cryopreserved lymphocytes, RBC, granulocytes, DNA, RNA, whole blood, buffy coat, serum, plasma, and blood cards. For RNA collection and extraction, we also used PAX tubes (Paxgene Blood RNA System), which contain a solution that inhibits RNA degradation and gene induction as blood is drawn into the tube. Buccal cells obtained by mouthwash were processed to obtain DNA.
Lung tissue paraffin blocks and slides were collected from cases that underwent surgery, biopsy, or cytological examination of the lung tumor (Table ). Multiple fresh "normal" lung tissue (adjacent and distant from the malignant lesion) and tumor samples, frozen in liquid nitrogen within 20 minutes from excision at surgery, were also collected from 436 cases (about 46% of the surgical cases). All biospecimens and accompanying forms were labeled using 2-D bar codes. Biospecimens were shipped according to international regulations on alternate weeks following different procedures based on biospecimen type, and tracked through a database [18
] that stores detailed information on sample descriptions, dates, sample transfers, aliquoting, freezer locations, and material type that is linked to the repository for easy access and exchange of laboratory information.
The study coordination center for EAGLE was established at the Epidemiology Research Center (EPOCA) of the University of Milan. Ancillary facilities included: 1) a Study Document Center for the collection and completeness verification of the computer-assisted questionnaire, and the scanning and verification of the optical readable forms and self-administered questionnaires; 2) a Storage and Processing Laboratory for the collection, processing, storage, and shipping of biospecimens; 3) a Data Processing Center, for the collection of all data in a central relational database (MS SQLServer). Data on subjects' accrual and data/biospecimen collection were regularly transmitted to the principal investigators at the National Cancer Institute (NCI), Bethesda, MD through automatically generated weekly reports. We routinely validated received data by: comparing information from different sources; assessing variable range and distribution; evaluating the quality of biospecimens through specific analyses conducted at the NCI laboratories on random samples; comparing numbers of cases accrued and those reported in the discharge records of the hospitals during the same time period to ensure that all consecutive cases were approached for the study; and verifying completeness of the database through multiple queries. Upon study completion, we developed a portal [19
] for exchange of data, documents, timelines, meeting minutes, procedures, and draft manuscripts among investigators involved in the EAGLE data analyses, and a website [8
] for public access to study design, collaborators, descriptive statistics, and publications.
Epidemiological analyses of these and other risk factors are ongoing, exploiting the richness of molecular data and the integrative approach described above. For example, the first analysis of gene expression changes due to tobacco smoking was recently completed [20