|Home | About | Journals | Submit | Contact Us | Français|
Germline polymorphisms may confer susceptibility to lung cancer in never smokers, but studies in the US have been limited by the low number of cases seen at single institutions. We hypothesized that we could use the Internet to bolster accrual of appropriate patients.
We established an Internet-based protocol to collect blood and information from patients throughout the U.S. To illustrate the power of this approach, we used these samples, plus additional cases and age-matched controls from Memorial Sloan-Kettering Cancer Center (New York, NY) and the Aichi Cancer Center (Nagoya, Japan), to analyze germline DNA for genetic variants reportedly associated with lung cancer susceptibility. The genotypes for the polymorphisms rs763317 (intron 1) and T790M (exon 20) in the EGFR gene were determined by direct sequencing, and CHRNA3 nicotinic acetylcholine receptor SNPs (rs8034191 and rs1051730) were genotyped as part of a pilot genome-wide association study.
We successfully analyzed germline DNA from 369 cases, including 45 obtained via the Internet, and 342 controls. A germline EGFR T790M variant was identified in 2 (0.54%, 95% CI: 0.21%–1.29%) of the 369 cases, and in none of the 292 controls (p=0.208). No difference was observed in EGFR rs763317 frequency between cases and controls. Similarly, neither CHRNA3 rs8034191 nor rs1051730 was associated with lung cancer risk.
The Internet provides a way to recruit patients throughout the country for minimal risk studies. This approach could be used to facilitate studies of germline polymorphisms in specific groups of patients with cancer.
Lung cancer is the leading cause of cancer-related death worldwide. Cigarette smoking causes lung cancer in the majority of patients. However, the disease also arises in patients who are lifelong never smokers, i.e. individuals who smoked less than 100 cigarettes in their lifetime. In the US, an estimated 10% of lung cancers occur in never smokers . Higher percentages of never smokers, up to 30%, have been observed in East Asian countries . Compared to other lung cancer patients, “never smokers” with lung cancer have a unique clinical course [3, 4]: they have a better prognosis , their tumors are more likely to harbor somatic mutations in the gene encoding the epidermal growth factor receptor (EGFR), and their tumors are more likely to respond to EGFR tyrosine kinase inhibitors [5, 6].
The etiology of lung cancer in never smokers is poorly understood. Second-hand smoke and radon may account for as many as 50% of cases [7, 8]. Outdoor air pollution, cooking oil fumes, coal fumes, and asbestos may also contribute as risk factors . Given the lack of direct exposure to known carcinogens, never smokers with lung cancer may represent a subgroup for which predisposing genetic factors may be prominent and distinct from those of smokers.
Several reports, including recent genome-wide association studies (GWAS), have identified germline genetic variants associated with an increased risk of lung cancer [10–15]. Most variants occur in genes that encode key proteins involved in the metabolism of tobacco-related products, including cytochrome P4501A1 , glutathione-S-transferase , myeloperoxidase , or nicotinic acetylcholine receptor (nAchR) subunits [13–15]. The influence of cigarette smoking in these studies has been inconsistent , in part because fewer never smokers were studied. Collection of appropriate samples is limited by the relatively small number of never smokers seen in the U.S. at single institutions.
In the US, there are 210,000 new cases per year of lung cancer [1, 9]. At an incidence of about 10%, there are annually ~21,000 cases of lung cancer in never smokers. We hypothesized that we could facilitate collection of appropriate samples by using an IRB-approved protocol to collect clinical information and blood from never smokers with non-small cell lung cancer (NSCLC) across the US, making use of the Internet for patient recruitment and questionnaire delivery. To illustrate the power of this approach, we used these samples, plus additional samples from sources at two academic institutions, to assess the frequency in never smokers of three germline variants reported to be associated with genetic susceptibility to lung cancer in never smokers: the allele that leads to substitution of methionine for threonine at position 790 (T790M) in EGFR ; a recently reported single nucleotide polymorphism (SNP) in intron 1 of EGFR (rs763317) implicated in female never smokers with lung cancer ; and the risk alleles rs8034191 and rs1051730 in the nAchR subunits [13–15].
Patient samples from cases were obtained on IRB-approved protocols from three different sources: 1) an Internet-based protocol that collected samples from patients throughout the U.S., 2) Memorial Sloan-Kettering Cancer Center (MSKCC) in New York, NY, and 3) Aichi Cancer Center (ACC) in Nagoya, Japan. DNA specimens from age-matched never-smoker controls without lung cancer (controls) were collected under various IRB-approved protocols at MSKCC (120 cases and 95 controls) or at ACC (125 cases and 247 controls) (Table 1).
A prospective study to collect clinical data and biological specimens from never smokers with lung cancer was approved by the MSKCC Institutional Review Board and recorded in the ClinicalTrials.gov database (NCT00745160). The study opened in September 2008.1
Eligible patients were 1) ≥ 18 years old, 2) had histologically and/or cytologically-proven NSCLC (adenocarcinoma, squamous cell carcinoma, large cell carcinoma ), and 3) were never smokers. Patients had to complete a screening questionnaire and give written informed consent for participation. Patients were excluded if they had any previous history of invasive malignancy (other than lung cancer), lived outside the United States, or were not English speaking (as protocol documents and consents are only written in English).
Patients at MSKCC were allowed to enroll on the same protocol directly in clinic. A retrospective review of the clinical and pathological characteristics of these patients, including results of routine genotyping for the most common types of EGFR activating mutations, was done with IRB approval.
Patients read about the study via an IRB-approved website 2 (Figure 1) and then provided contact information either through the website or by sending an e-mail requesting information 3 (step 1). A screening kit was directly sent to patients by e-mail or by regular mail. This included a detailed information packet, a screening questionnaire, consent forms, and a letter for release of pathology records (step 2). The questionnaire aimed at determining study eligibility and consisted of a detailed smoking questionnaire, based on the Centers for Disease Control and Prevention Behavioral Risk Factor Surveillance System Survey Questionnaire , and additional questions regarding diagnosis, sex, age, ethnicity, mutational status of the tumor, treatment history, environmental exposure, and history of previous malignancies. A toll-free number was available for questions about the study. Documents were shipped back to study investigators using a prepaid shipping label (step 3).
Eligible patients were shipped materials for blood collection, including a letter to a local laboratory testing provider, 2 sodium heparinized 10 mL tubes, biohazard bags, and a pre-paid overnight delivery bubble envelope (step 4). For patients who did not have regularly scheduled blood tests, saliva self-collection kits (Oragene OG-250, DNA Genotek, Ottawa, ON) were sent. To ensure patient confidentiality, prior to sending kits to patients, collection materials were labeled with unique identifiers. Files linking patients’ names and protocol ID codes were accessible to only one person, who was not directly involved in the processing of the samples or subsequently in the data analysis. Bloods were drawn as part of a routine venipuncture (step 5), returned to study investigators, and stored at −80°C (step 6).
DNA was extracted from blood and saliva specimens using the Puregene DNA blood kit (Qiagen, Germantown, MD) or standard methods. EGFR intron 1 and exon 20 were analyzed by direct sequencing, using the following primers: ex1-F-5’-AGGGCTGAGAAAGAGAGACA-3’ and ex1-R-5’-TGGGGAGAAAGTTAAAGCTA-3 ’, and ex 20-F-5’- CTCCACAGCC CCAGTGTC -3 ’ and ex 20-R-5’ GGCCAGTGCTGTCTCTAAGG-3’, respectively.
SNPs in the nAchR were genotyped as part of a pilot GWAS. We genotyped 672 DNA samples using the Illumina 610-Quadv1.0 genotyping system. We first removed random samples used to assess the quality of DNA derived from our protocol (see below). Following evaluation of call rate and presence of monomorphs, we removed samples with less than a 98% call rate per individual, 50% identity-by-descent to another sample in the study, and/or with a gender discordance (X homozygosity). SNPs with more than 5% missing genotypes and monomorphs were also removed. This left us with 624 individuals genotyped at 582,871 SNPs each. We used principal components analysis, as implemented in the EIGENSOFT software package , separately on samples of Asian and European ancestry to remove population outliers and determine significant principal components to correct for residual population stratification. Among the Asian ancestry samples, all non-outlier samples came from Japan. Three significant principal components were found in the Asian ancestry samples, while five significant principal components were found in the European ancestry samples.
For EGFR exon 20, categorical variables were compared using the Fisher’s exact test. Results were considered significant at the 0.05 level. Statistical analyses were performed using SPSS (Chicago, IL), version 17.0.
For the other SNPs, logistic regression was used to assess allele frequency differences between cases and controls as implemented in the software package PLINK . For the nAchR SNPs, significant principal components were included as covariates to account for population substructure.
To determine the feasibility of extracting high quality DNA from blood stored at room temperature for varying periods of times, we tested blood samples from 10 volunteers processed 24, 48, or 72 hours after venipucture. In all cases, processing of extracted DNA on Human 610-Quadv1.0 arrays (Illumina, San Diego, CA) resulted in call rates higher than 99.99%.
We allowed blood collection from patients at any time during their treatment course. To ensure that this approach was valid, we demonstrated that SNP discordance was lower than 10−5 pre- and post-chemotherapy in 4 patients.
The protocol website went online on September 15th 2008. As of May 15th 2009, 122 patients expressed interest in the study via the Internet (n=119) or the phone (n=3). Patients learned about the program through Internet searches (n=49), blogs (n=48), personal doctors (n=15), or non-profit organizations newsletters (n=10). A significant number of patients contacted us after the website was highlighted on cancer-related websites; subsequently, accrual was quite steady. Five patients resided outside the U.S. and were excluded. Five individuals sent email messages only to share personal experiences. Overall, screening documents were sent to 111 patients.
Preferred initial contact with study investigators was email (n=89), regular mail (n=11), and phone call (n=11). Patients were from 28 different states, with the highest representations from Florida (n=19) and California (n=11) (Figure 2).
Sixty-three (57%) patients returned the screening documents. Eligibility was confirmed for 55 (87%) patients. Main reasons for exclusion were 1) a previous history of invasive cancer (n=6), and 2) a histological type other than NSCLC (carcinoid tumor in 1 case, small-cell lung cancer in 1 case). The majority of patients (n=52) had a regular blood test, and saliva collection kits were sent to only 3 patients. As of May 15th 2009, a total of 45 patients returned a blood/saliva specimen. Median time between first email contact and receipt of blood was 33 days (range: 9–200 days).
During the same period, an additional 79 patients were recruited directly in MSKCC clinics. Clinical characteristics of all 124 patients whose biospecimens were collected on this protocol are shown in Table 1. 100 were women and 24 were men. Most were white and had tumors with adenocarcinoma histology (98% of cases). Tumor EGFR mutational status was known for 67 patients and consisted of exon 19 deletions in 41 (80%) cases.
Genotyping for the germline EGFR T790M variant (C→T at nucleotide 2369 in exon 20) was performed on DNA from 661 never smokers, including 369 patients with NSCLC, mostly of adenocarcinoma subtype (99% of cases), and 292 controls, mostly of Asian ethnicity (83% of controls). 50 controls of our cohort (45 white and 5 Asian) were not included in these analyses, due to insufficient quantity of DNA after processing on SNP arrays. Median age was not significantly different in the 2 groups (61 and 62 years-old; p=0.822, Fisher’s exact test). Results from routine tumor genotyping for EGFR somatic mutations were available for 226 cases, 131 (58%) of whom had an activating EGFR mutation. The T790M variant was identified in 2 (0.54%, 95% IC: 0.21%–1.29%) of the 369 patients with NSCLC, and in none of the 292 controls (p=0.208 at Fisher‘s exact test).
SNP rs763317 in intron 1 was investigated for association with lung cancer, adjusting for ethnicity. Due to the limited number of individuals of black or Hispanic ancestry, these individuals were excluded from the analysis. No difference was observed in allele frequency between cases and controls (OR=0.93; 95% CI=0.70–1.2; p=0.58). Similar results were found after removal of the 40 males from the dataset (OR=0.94; 95% CI=0.71–1.2; p=0.67). After restricting the cases to individuals known to have an EGFR mutant tumor, still no association was observed (OR=0.94; 95% CI=0.65–1.4; p=0.73). Similar negative results were found with dominant and recessive models.
Data for two SNPs in the nAchR subunit gene CHRNA3 on chromosome 15q25 (rs8034191 and rs1051730) were extracted from an ongoing pilot GWAS, including 217 of the cases and 342 of the controls of the cohort. This allowed us to consider population substructure within those two ethnic groups, using other data from SNP arrays. The nAchR SNPs were tested for association with lung cancer in the white and Japanese cohorts separately, adjusting for significant principal components in each cohort. Neither rs8034191 nor rs1051730 was associated with lung cancer risk (Table 2).
The first patient with a germline EGFR T790M mutation is a 66-year-old woman of Asian descent (from India) who presented with a 2-cm ground-glass nodule in the left lower lobe, associated with multiple bilateral sub-centimeter ground-glass opacities. The patient’s mother had a history of NSCLC (Figure 3A). Left lower lobectomy revealed mixed adenocarcinoma with acinar and bronchioloalveolar features, harboring a somatic EGFR L858R mutation (Supplemental Figure 1).
The second patient is a 58 year-old man of Eastern European ancestry who was diagnosed with NSCLC with bone and liver metastases. Family history was assessed by an expert genetic counselor (M.R.). The patient had a significant family history for NSCLC; most family members were smokers (Figure 3B). Core needle biopsy of a liver lesion revealed poorly-differentiated acinar and solid adenocarcinoma which was found to contain the EGFR L858R mutation at routine genotyping (Supplemental Figure 1). Unfortunately, tumor histology and mutational status could not be determined for any of the family members with NSCLC.
The identification of genetic risk variants that may predispose certain patients to disease requires analysis of germline DNA from affected cases. In this study, we demonstrate that the Internet can provide a secure, confidential, and convenient way to bolster accrual of target patients from across the country.
Our protocol had a similar design to the Harvard Myeloproliferative Disorders Study. That protocol used the Internet to collect blood from 345 participants over a 1-year period (i.e. 0.41%/month of the 7,000 newly diagnosed cases of myeloproliferative disorders in the U.S.) . We had hoped to accrue patients at a similar rate. However, our recruitment was significantly lower, as we only recruited 45 patients over an 8-month period (i.e. 0.11%/month of the 14,000 estimated newly diagnosed cases over the 8 month-period). Limiting factors for enrollment may include an older age at onset of lung cancer and differences in socio-economic characteristics compared to those who develop myeloproliferative diseases ; these factors ultimately may reduce these patients’ familiarity/comfort/access to the Internet. Socioeconomic factors have previously been reported to significantly influence the completion of an Internet-based behavioral study in smokers  and may also exist in never smokers . Second, disease-free survival in NSCLC is far shorter than in myeloproliferative disorders, limiting the extra time and effort patients are willing to allot to research. This reason was given by several patients who received screening documents but did not return them. Interestingly, the majority of patients who signed the consent form ultimately completed the questionnaire, had their blood drawn, and sent the blood specimens. This success rate suggests that the study participation was not overly complicated and that a more significant limitation of the study was awareness. Thus, in the future, we will expand efforts to connect with individuals appropriate to this study, through patients’ groups, development of collaboration with other institutions, and presentation of the study at oncology meetings.
While somatic EGFR T790M mutations are common in patients with acquired resistance to EGFR inhibitors , germline EGFR T790M mutations are rare. This variant was initially reported in 2005 in a family of European descent, in which 6 family members in 3 generations had lung cancer . We found the germline EGFR T790M variant in 2 of 369 cases (0.54%) of never smokers with NSCLC; both patients had family histories significant for lung cancer. In a separate study, no germline mutation was identified in a series of 237 individuals with 3 or more first-degree relatives with lung cancer and 32 patients with bronchioloalveolar carcinoma . One germline EGFR T790M mutation was identified in a cohort of 240 patients with previously untreated lung adenocarcinoma . Including our 2 patients, a total of 5 patients have now been reported to have the germline EGFR T790M variant (Table 3). In these patients, lung cancer was diagnosed from age 50 to 72 years and was metastatic to the lung or other locations at the time of diagnosis. At least three patients were never smokers. Histology was adenocarcinoma and/or bronchioloalveolar carcinoma in all cases. EGFR was genotyped in all the 5 germline EGFR T790M cases. Somatic EGFR activating mutations, which are known to be more frequent in never smokers with lung adenocarcinoma , were identified in four of the five patients. Collectively, these data indicate that a germline T790M alteration is a genetic risk variant for lung cancer in never smokers. However, tumors develop only after a relatively long latency, suggesting other genetic alterations, such as additional EGFR mutations, may collaborate with T790M .
We were unable to confirm a reported association between rs763317 in the EGFR gene and lung cancer . In the previous report, female never smokers with lung adenocarcinoma heterozygous at the locus had a 1.2-fold increased risk of lung cancer relative to individuals homozygous for the G allele. Individuals homozygous for the A allele were 3.6 times more likely to develop lung adenocarcinoma. Based on this effect size and the frequency of this SNP in our control population, we estimated 82% power to detect this association in our Asian cohort, and 96% power to detect the association in our white cohort. Therefore, our failure to replicate this association is not due to low power. One possible explanation is that this association was a false positive. Alternatively, the initial report could have over-estimated the effect size at this locus, due to the "winner's curse," and therefore our power to replicate the association is lower than we calculated.
We were also unable to replicate the previous reported association between SNPs rs8034191 and rs1051730 in CHRNA3 and lung cancer risk [13–15]. Unlike EGFR rs763317, these associations are much more modest. Moreover, the minor allele frequency in Asian populations at these SNPs is much lower than in the European populations in which the association was first discovered. Our power to detect an association in our cohort, therefore, was limited. However, as it has been suggested that these SNPs affect smoking behavior as well as lung cancer risk, and as we focused on never smokers with lung cancers, we cannot discount the possibility that the SNP only has an effect in smokers (or influences smoking behavior) and is truly not associated with lung cancer in never smokers.
To conclude, this study shows the feasibility of bolstering accrual for germline predisposition studies in never smokers with lung cancer by collecting clinical information and blood sample specimens from appropriate patients throughout the nation using an Internet-based protocol. We plan in future studies to conduct genome-wide association studies with more cases and controls to identify genetic risk variants in this disease population.
This study examines in US and Japanese never smokers with lung cancer 3 genetic variants reported to be associated with genetic susceptibility to the disease. We collected germline DNA from a large cohort of American and Japanese never smokers with or without lung cancer. We used a dedicated internet-based protocol to bolster accrual from patients throughout the U.S. Contrary to previous studies including smokers, no differences were observed in EGFR rs763317 or CHRNA3 (rs8034191/rs1051730) frequencies between cases and controls. The CHRNA3 loci are likely to be associated with smoking addiction rather than the development of lung cancer. We did identify a germline EGFR T790M variant in 2 of 369 (0.54% of cases). This study validates the EGFR T790M as a rare lung cancer risk variant in never smokers and illustrates the usefulness of the Internet in recruiting patients throughout the country for minimal risk studies.
We thank R. Levine for protocol assistance/reviewing the manuscript; K. Liao for processing specimens on the Illumina platform; T. Aliff for referring patients; H. West for discussing the study on his lung cancer blog; and S. Mantel, for discussing this study in the monthly newsletter of Joan's Legacy: Uniting Against Lung Cancer. We are grateful to all the patients who participated.
Financial support: This study was supported by the HOPP Lung Cancer Research Fund (WP), the Rosalind Warren Memorial Fund (WP), the MSKCC Geoffrey Beene Cancer Research Center (WP), the NIH/NCI R01-CA121210 (WP), the Society of MSKCC (IO), Steps for Breath (IO), and the Labrecque Fundation (IO). Services provided by the Genomics Core Facility were partially supported by an NCI CCSG award (P30-CA008748). NG is a recipient of travel grants from the College des Professeurs de Pneumologie (CEP)/AstraZeneca, the National Federation of French Comprehensive Cancer Centers/Fondation de France, and the Philippe Foundation.