A single mutation can alter cellular and global homeostatic mechanisms and give rise to multiple clinical diseases. We hypothesized that these disease mechanisms could be identified using low minor allele frequency (MAF<0.1) non-synonymous SNPs (nsSNPs) associated with “mechanistic phenotypes”, comprised of collections of related diagnoses. We studied two mechanistic phenotypes: (1) thrombosis, evaluated in a population of 1,655 African Americans; and (2) four groupings of cancer diagnoses, evaluated in 3,009 white European Americans. We tested associations between nsSNPs represented on GWAS platforms and mechanistic phenotypes ascertained from electronic medical records (EMRs), and sought enrichment in functional ontologies across the top-ranked associations. We used a two-step analytic approach whereby nsSNPs were first sorted by the strength of their association with a phenotype. We tested associations using two reverse genetic models and standard additive and recessive models. In the second step, we employed a hypothesis-free ontological enrichment analysis using the sorted nsSNPs to identify functional mechanisms underlying the diagnoses comprising the mechanistic phenotypes. The thrombosis phenotype was solely associated with ontologies related to blood coagulation (Fisher's p = 0.0001, FDR p = 0.03), driven by the F5, P2RY12 and F2RL2 genes. For the cancer phenotypes, the reverse genetics models were enriched in DNA repair functions (p = 2×10−5, FDR p = 0.03) (POLG/FANCI, SLX4/FANCP, XRCC1, BRCA1, FANCA, CHD1L) while the additive model showed enrichment related to chromatid segregation (p = 4×10−6, FDR p = 0.005) (KIF25, PINX1). We were able to replicate nsSNP associations for POLG/FANCI, BRCA1, FANCA and CHD1L in independent data sets. Mechanism-oriented phenotyping using collections of EMR-derived diagnoses can elucidate fundamental disease mechanisms.
To evaluate the association between variants in the prostaglandin Fα receptor (pTGFR) and solute carrier organic anion transporter family 2A1 (SLCO2A1) genes and IOP response to prostaglandin analogs
The medical records of subjects with previously diagnosed open angle glaucoma or ocular hypertension were searched for intraocular pressure measurements before and after prescriptions of prostaglandin analogs. Stored DNA samples were genotyped for the following SNPs: rs3753380 (promoter region) and rs3766355 (intronic region) of the prostaglandin F2α receptor gene, and rs34550074 (Ala396Thr) of SLCO2A1. The mean change in IOP by genotype was measured.
Prostaglandin analogs were prescribed to 267 subjects; 242 (204 right eyes, 205 left eyes) met the inclusion/exclusion criteria for the current study. There was no significant association between genotype and IOP response to prostaglandin. analogs (p=0.48, p=0.54, p=0.90).
In summary, we found no indication for an association between SNPs in the prostaglandin F2α receptor gene or SLCO2A1 and IOP response to prostaglandin analogs in a population of European descent.
glaucoma; pharmacogenetics; intraocular pressure; prostaglandin analog
The Electronic Medical Records and Genomics (eMERGE) Network is a National Human Genome Research Institute (NHGRI)-funded consortium engaged in the development of methods and best-practices for utilizing the Electronic Medical Record (EMR) as a tool for genomic research. Now in its sixth year, its second funding cycle and comprising nine research groups and a coordinating center, the network has played a major role in validating the concept that clinical data derived from EMRs can be used successfully for genomic research. Current work is advancing knowledge in multiple disciplines at the intersection of genomics and healthcare informatics, particularly electronic phenotyping, genome-wide association studies, genomic medicine implementation and the ethical and regulatory issues associated with genomics research and returning results to study participants. Here we describe the evolution, accomplishments, opportunities and challenges of the network since its inception as a five-group consortium focused on genotype-phenotype associations for genomic discovery to its current form as a nine-group consortium pivoting towards implementation of genomic medicine.
electronic medical records; personalized medicine; genome-wide association studies; genetics and genomics; collaborative research
There is increasing interest in adding common genetic variants
identified through genome wide association studies (GWAS) to breast cancer
risk prediction models. First results from such models showed modest
benefits in terms of risk discrimination. Heterogeneity of breast cancer as
defined by hormone-receptor status has not been considered in this context.
In this study we investigated the predictive capacity of 32 GWAS-detected
common variants for breast cancer risk, alone and in combination with
classical risk factors, and for tumors with different hormone receptor
Material and Methods
Within the Breast and Prostate Cancer Cohort Consortium (BPC3), we
analyzed 6009 invasive breast cancer cases and 7827 matched controls of
European ancestry, with data on classical breast cancer risk factors and 32
common gene variants identified through GWAS. Discriminatory ability with
respect to breast cancer of specific hormone receptor-status was assessed
with the age- and cohort-adjusted concordance statistic
(AUROCa). Absolute risk scores were
calculated with external reference data. Integrated discrimination
improvement (IDI) was used to measure improvements in risk prediction.
We found a small but steady increase in discriminatory ability with
increasing numbers of genetic variants included in the model (difference in
AUROCa going from 2.7 to 4%). Discriminatory ability
for all models varied strongly by hormone receptor status
Discussion and Conclusion
Adding information on common polymorphisms provides small but
statistically significant improvements in the quality of breast cancer risk
prediction models. We consistently observed better performance for receptor
positive cases, but the gain in discriminatory quality is not sufficient for
breast cancer; risk prediction; genetic factors; hormone receptor status
Only one LDL-C GWAS has been reported in African Americans. We performed a GWAS of LDL-C in African Americans using data extracted from electronic medical records (EMR) in the eMERGE network. African Americans were genotyped on the Illumina 1M chip. All LDL-C measurements, prescriptions, and diagnoses of concomitant disease were extracted from EMR. We created two analytic datasets; one dataset having median LDL-C calculated after the exclusion of some lab values based on co-morbidities and medication (n = 618) and another dataset having median LDL-C calculated without any exclusions (n = 1249). Rs7412 in APOE was strongly associated with LDL-C at levels of GWAS significance in both datasets (p < 5 X 10−8). In the dataset with exclusions, a decrease of 20.0 mg/dl per minor allele was observed. The effect size was attenuated (12.3 mg/dl) in the dataset without any lab values excluded. Although other signals in APOE have been detected in previous GWAS, this large and important SNP association has not been well detected in large GWAS because rs7412 was not included on many genotyping arrays. Use of median LDL-C extracted from EMR after exclusions for medications and co-morbidities increased the percentage of trait variance explained by genetic variation.
GWAS; LDL; electronic medical records
Statin use can be accompanied by a variety of musculoskeletal complaints. We describe the clinical characteristics of case subjects experiencing adverse statin-induced musculoskeletal symptoms within a large, population based cohort in Central Wisconsin. Case status was determined based upon elevated serum creatine kinase (CK) levels and the presence of at least one physician note reflecting an increased index of suspicion for statin intolerance. From the medical records of nearly 2 million unique patients, we identified more than 20,000 potential study subjects (∼1%) having CK data and at least one exposure to a statin drug. Manual screening was completed on 2,227 subjects with CK levels in the upper 10th percentile. Of those screened, 267 met inclusion criteria (12.0% eligibility), and 218 agreed to participate in a retrospective study characterizing the risk determinants of statin-induced muscle toxicity. Three categorical pain variables were graded retrospectively (distribution, location, and severity of pain). The presenting complaints of these case subjects were extremely heterogeneous. The number of subjects with a compelling pain syndrome (diffuse, proximal muscle pain of high intensity) increased at higher serum CK levels; the number of subjects with indeterminate pain variables decreased at higher serum CK levels. The lines reflecting these relationships cross at a CK level of approximately 1,175 U/l, approximately half the threshold level needed to make a clinical diagnosis of “myopathy” (i.e., CK > 10-fold upper limit).
Return of individual genetic results to research participants, including participants in archives and biorepositories, is receiving increased attention. However, few groups have deliberated on specific results or weighed deliberations against relevant local contextual factors.
The Electronic Medical Records and GEnomics (eMERGE) network, which includes five biorepositories conducting genome-wide association studies, convened a Return of Results Oversight Committee (RROC) to identify potentially returnable results. Network-wide deliberations were then brought to local constituencies for final decision-making.
Defining results that should be considered for return required input from clinicians with relevant expertise and much deliberation. The RROC identified two sex chromosomal anomalies, Klinefelter Syndrome and Turner Syndrome, as well as homozygosity for Factor V Leiden, as findings that could warrant reporting. Views about returning HFE gene mutations associated with hemochromatosis were mixed due to low penetrance. Review of EMRs suggested that most participants with detected abnormalities were unaware of these findings. Local considerations relevant to return varied and, to date, four sites have elected not to return findings (return was not possible at one site).
The eMERGE experience reveals the complexity of return of results decision-making and provides a potential deliberative model for adoption in other collaborative contexts.
Result return; biorepository; electronic medical records; deliberation; context
The feasibility of using imperfectly phenotyped “silver standard” samples identified from electronic medical record diagnoses is considered in genetic association studies when these samples might be combined with an existing set of samples phenotyped with a gold standard technique. An analytic expression is derived for the power of a chi-square test of independence using either research-quality case/control samples alone, or augmented with silver standard data. The subset of the parameter space where inclusion of silver standard samples increases statistical power is identified. A case study of dementia subjects identified from electronic medical records from the Electronic Medical Records and Genomics (eMERGE) network, combined with subjects from two studies specifically targeting dementia, verifies these results.
Although the penetration of electronic health records is increasing rapidly, much of the historical medical record is only available in handwritten notes and forms, which require labor-intensive, human chart abstraction for some clinical research. The few previous studies on automated extraction of data from these handwritten notes have focused on monolithic, custom-developed recognition systems or third-party systems that require proprietary forms.
We present an optical character recognition processing pipeline, which leverages the capabilities of existing third-party optical character recognition engines, and provides the flexibility offered by a modular custom-developed system. The system was configured and run on a selected set of form fields extracted from a corpus of handwritten ophthalmology forms.
The processing pipeline allowed multiple configurations to be run, with the optimal configuration consisting of the Nuance and LEADTOOLS engines running in parallel with a positive predictive value of 94.6% and a sensitivity of 13.5%.
While limitations exist, preliminary experience from this project yielded insights on the generalizability and applicability of integrating multiple, inexpensive general-purpose third-party optical character recognition engines in a modular pipeline.
We tested the hypothesis that genes involved in the alcohol oxidation pathway modify the association between alcohol intake and breast cancer.
Subjects were women aged 55–74 at baseline from the screening arm of the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial. Incident breast cancers were identified through annual health surveys. Controls were frequency matched to cases by age and year of entry into the trial. A self-administered food frequency questionnaire queried frequency and usual serving size of beer, wine or wine coolers and liquor. Three SNPs in genes in the alcohol metabolism pathway were genotyped: alcohol dehydrogenase 2, alcohol dehydrogenase 3 and CYP2E1.
The study included 1041 incident breast cancer cases and 1070 controls. In comparison to non-drinkers, the intake of any alcohol significantly increased the risk of breast cancer, and this risk increased with each category of daily alcohol intake, (OR=2.01, 95% CL=1.14, 3.53) for women who drank three or more standard drinks per day. Stratification by genotype revealed significant gene/environment interactions. For the ADH1B gene, there were statistically significant associations between all levels of alcohol intake and risk of breast cancer (all OR>1.34 and all lower CL >1.01), while for women with the GA or AA genotype, there were no significant associations between alcohol intake and risk of breast cancer.
Alcohol intake, genes involved in alcohol metabolism and their interaction increase the risk of breast cancer in post-menopausal women.
This information could be useful for primary care providers to personalize information about breast cancer risk reduction.
breast cancer; alcohol; metabolizing enzyme; genetics; risk factors
Feature screening is a useful feature selection approach for high-dimensional data when the goal is to identify all the features relevant to the response variable. However, common feature screening methods do not take into account the correlation structure of the covariate space. We propose the concept of a feature relevance network, a binary Markov random field to represent the relevance of each individual feature by potentials on the nodes, and represent the correlation structure by potentials on the edges. By performing inference on the feature relevance network, we can accordingly select relevant features. Our algorithm does not yield sparsity, which is different from the particular popular family of feature selection approaches based on penalized least squares or penalized pseudo-likelihood. We give one concrete algorithm under this framework and show its superior performance over common feature selection methods in terms of prediction error and recovery of the truly relevant features on real-world data and synthetic data.
To determine whether polymorphisms in the sulfonamide detoxification genes, CYB5A (encoding cytochrome b5), CYB5R3 (encoding cytochrome b5 reductase), or NAT2 (encoding N-acetyltransferase 2) were over-represented in patients with delayed sulfonamide drug hypersensitivity, compared to control patients that tolerated a therapeutic course of trimethoprim-sulfamethoxazole without adverse event.
DNA from 99 non-immunocompromised patients with sulfonamide hypersensitivity that were identified from the Personalized Medicine Research Project at the Marshfield Clinic, and from 99 age-, race-, and gender-matched drug-tolerant controls, were genotyped for four CYB5A and five CYB5R3 polymorphisms, and for all coding NAT2 SNPs.
CYB5A and CYB5R3 SNPs were found at low allele frequencies (less than 3–4%), which did not differ between hypersensitive and tolerant patients. NAT2 allele and haplotype frequencies, as well as inferred NAT2 phenotypes, also did not differ between groups (60% vs. 59% slow acetylators). Finally, no difference in NAT2 status was found in a subset of patients with more severe hypersensitivity signs (drug reaction with eosinophilia and systemic symptoms; DRESS) compared to tolerant patients.
We found no evidence for a substantial involvement of these 9 CYB5A or CYB5R3 polymorphisms in sulfonamide HS risk, although minor effects cannot be completely ruled out. Despite careful medical record review and full re-sequencing of the NAT2 coding region, we found no association of NAT2 coding alleles with sulfonamide hypersensitivity (predominantly cutaneous eruptions) in this adult Caucasian population.
sulfamethoxazole; potentiated sulfonamides; drug hypersensitivity; N-acetyltransferase; cytochrome b5; hydroxylamine
Investigating the association between biobank derived genomic data and the information of linked electronic health records (EHRs) is an emerging area of research for dissecting the architecture of complex human traits, where cases and controls for study are defined through the use of electronic phenotyping algorithms deployed in large EHR systems. For our study, 2580 cataract cases and 1367 controls were identified within the Marshfield Personalized Medicine Research Project (PMRP) Biobank and linked EHR, which is a member of the NHGRI-funded electronic Medical Records and Genomics (eMERGE) Network. Our goal was to explore potential gene-gene and gene-environment interactions within these data for 529,431 single nucleotide polymorphisms (SNPs) with minor allele frequency > 1%, in order to explore higher level associations with cataract risk beyond investigations of single SNP-phenotype associations. To build our SNP-SNP interaction models we utilized a prior-knowledge driven filtering method called Biofilter to minimize the multiple testing burden of exploring the vast array of interaction models possible from our extensive number of SNPs. Using the Biofilter, we developed 57,376 prior-knowledge directed SNP-SNP models to test for association with cataract status. We selected models that required 6 sources of external domain knowledge. We identified 5 statistically significant models with an interaction term with p-value < 0.05, as well as an overall model with p-value < 0.05 associated with cataract status. We also conducted gene-environment interaction analyses for all GWAS SNPs and a set of environmental factors from the PhenX Toolkit: smoking, UV exposure, and alcohol use; these environmental factors have been previously associated with the formation of cataracts. We found a total of 288 models that exhibit an interaction term with a p-value ≤ 1×10−4 associated with cataract status. Our results show these approaches enable advanced searches for epistasis and gene-environment interactions beyond GWAS, and that the EHR based approach provides an additional source of data for seeking these advanced explanatory models of the etiology of complex disease/outcome such as cataracts.
Cancer Research Network (CRN) sites use administrative data to populate their Virtual Data Warehouse (VDW). However, information on VDW chemotherapy data validity is limited. The purpose of this study was to assess the validity of VDW chemotherapy data.
This was a retrospective, cohort study of women ≥18 years with incident, invasive breast cancer diagnosed between January 1999 and December 2007. Pharmacy and procedure chemotherapy data were extracted from each site’s VDW. Random samples of 50 patients stratified on trastuzumab, anthracyclines, and no chemotherapy exposure were selected from each site for detailed chart abstraction. Weighted sensitivities and specificities of VDW compared to abstracted data were calculated. Cumulative doses calculated from VDW data were compared to doses obtained from the medical chart review.
The cohort included 13497 patients with 6456 (48%) chart-review eligible. Patients in the sample (N=400) had a mean age of 65 years. Trastuzumab, anthracycline, and other chemotherapy weighted sensitivities were 95%, 97%, and 100%, respectively; specificities were 99%, 99%, and 93%, respectively; positive predictive values were 96%, 99%, and 55%, respectively; and negative predictive values were 99%, 96%, and 100%. Trastuzumab and anthracyclines VDW mean doses were 873 mgs and 386 mgs, respectively, while abstracted mean doses were 1734 mgs and 369 mgs, respectively (R2=0.14, p<0.01 and R2=0.05, p=0.03, respectively).
Sensitivities and specificities for CRN chemotherapy VDW data were high and dosages were correlated with chart information.
The findings support the use of CRN data in evaluating chemotherapy exposures and related outcomes.
chemotherapy; sensitivity and specificity; data retrieval; data quality; breast cancer
Genome-wide association studies (GWAS) are a useful approach in the study of the genetic components of complex phenotypes. Aside from large cohorts, GWAS have generally been limited to the study of one or a few diseases or traits. The emergence of biobanks linked to electronic medical records (EMRs) allows the efficient re-use of genetic data to yield meaningful genotype-phenotype associations for multiple phenotypes or traits. Phase I of the electronic MEdical Records and GEnomics (eMERGE-I) Network is a National Human Genome Research Institute (NHGRI)-supported consortium composed of five sites to perform various genetic association studies using DNA repositories and EMR systems. Each eMERGE site has developed EMR-based algorithms to comprise a core set of fourteen phenotypes for extraction of study samples from each site’s DNA repository. Each eMERGE site selected samples for a specific phenotype, and these samples were genotyped at either the Broad Institute or at the Center for Inherited Disease Research (CIDR) using the Illumina Infinium BeadChip technology. In all, approximately 17,000 samples from across the five sites were genotyped. A unified quality control (QC) pipeline was developed by the eMERGE Genomics Working Group and used to ensure thorough cleaning of the data. This process includes examination of sample quality, marker quality, and various batch effects. Upon completion of the genotyping and QC analyses for each site’s primary study, the eMERGE Coordinating Center merged the datasets from all five sites. This larger merged dataset re-entered the established eMERGE QC pipeline. Based on lessons learned during the process, additional analyses and QC checkpoints were added to the pipeline to ensure proper merging. Here we explore the challenges associated with combining datasets from different genotyping centers and describe the expansion to the eMERGE QC pipeline for merged datasets. These additional steps will be useful as the eMERGE project expands to include additional sites in eMERGE-II and also serve as a starting point for investigators merging multiple genotype data sets accessible through the National Center for Biotechnology Information (NCBI) in the database of Genotypes and Phenotypes (dbGaP). Our experience demonstrates that merging multiple datasets after additional QC can be an efficient use of genotype data despite new challenges that appear in the process.
quality control; genome-wide association (GWAS); eMERGE; dbGaP; merging datasets
There is increasing interest in using electronic health records (EHRs) to identify subjects for genomic association studies, due in part to the availability of large amounts of clinical data and the expected cost efficiencies of subject identification. We describe the construction and validation of an EHR-based algorithm to identify subjects with age-related cataracts.
Materials and methods
We used a multi-modal strategy consisting of structured database querying, natural language processing on free-text documents, and optical character recognition on scanned clinical images to identify cataract subjects and related cataract attributes. Extensive validation on 3657 subjects compared the multi-modal results to manual chart review. The algorithm was also implemented at participating electronic MEdical Records and GEnomics (eMERGE) institutions.
An EHR-based cataract phenotyping algorithm was successfully developed and validated, resulting in positive predictive values (PPVs) >95%. The multi-modal approach increased the identification of cataract subject attributes by a factor of three compared to single-mode approaches while maintaining high PPV. Components of the cataract algorithm were successfully deployed at three other institutions with similar accuracy.
A multi-modal strategy incorporating optical character recognition and natural language processing may increase the number of cases identified while maintaining similar PPVs. Such algorithms, however, require that the needed information be embedded within clinical documents.
We have demonstrated that algorithms to identify and characterize cataracts can be developed utilizing data collected via the EHR. These algorithms provide a high level of accuracy even when implemented across multiple EHRs and institutional boundaries.
Cataract; electronic health record; intelligent character recognition; natural language processing; phenotyping; bioinformatics; NLP; information systems; software engineering; clinical research informatics; natural-language processing; linking the genotype and phenotype; improving the education and skills training of health professionals; translational research; application of biological knowledge to clinical care; genomics; pharmacogenomics; genome wide association studies; clinical phenotyping; ritu and pupu and 12; medical informatics; infection control
To establish a well-defined cohort for genetic epidemiology studies of endometriosis and conduct a pilot study to confirm validity using existing data associated with endometriosis.
Between January and May 2010, a nested cohort within a population-based biobank was established in Marshfield, Wisconsin, USA. The inclusion criteria were women who had laparoscopy or hysterectomy. Fifty-one pleiotropic genetic polymorphisms and other established risk factors, such as smoking status and body mass index, were compared between endometriosis cases and controls.
From the existing biobank, 796 cases and 501 controls were identified, and 259 women with endometriosis were enrolled specifically for the nested cohort within this biobank. A single nucleotide polymorphism in the MMP1 gene significantly differed between cases and controls only when stratified by smoking status. Minor allele frequency was higher in control women who smoked than in women with endometriosis who smoked (55.5% versus 45.5%, χ2=8.2, P=0.017); the inverse relationship was found in non-smoker control women.
Women with endometriosis were successfully recruited to participate in a general biobank, and a novel gene–environment interaction was identified. The findings suggest that important potential genetic associations may be missed if gene–environment interactions with known epidemiologic risk factors are not considered.
Biobank; Endometriosis; Epidemiology; Genetics; Smoking
Hypovitaminosis D may be associated with diabetes, hypertension and coronary heart disease (CHD). However because studies examining associations of all three chronic conditions with circulating 25(OH)D and 1,25(OH)2D are limited. We examined these associations in the US. Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial (n=2,465).
Research Design and Methods
Caucasian PLCO participants selected as controls in previous nested case-control studies of 25(OH)D and 1,25(OH)2D were included in this analysis. Diabetes, CHD and hypertension prevalence, risk factors for these conditions, and intake of vitamin D and calcium were collected from a base-line questionnaire.
Serum levels of 25(OH)D were low (<50 nmol/L) in 29% and very low (<37 nmol/L) in 11% of subjects. The prevalence of diabetes, hypertension and CHD were 7%, 30% and 10%, respectively. After adjustment for confounding by gender, geographical location, educational level, smoking history, body mass index(BMI), physical activity, total dietary energy and vitamin D and Ca intake, only diabetes was significantly associated with lower 25(OH)D and 1,25 (OH)2D levels. Caucasians who had 25(OH)D ≥80 nmol/L were half as likely to have diabetes [odds ratio (OR) =0.5 (95% CI=0.3-0.9)] compared to those who had 25(OH)D < 37 nmol/L. Those in the highest quartile of 1,25(OH)2D (≥103 pmol/L) were less than half as likely to have diabetes [OR= 0.3 (95% CI=0.1-0.7)] than those in the lowest quartile (< 72 pmol/L).
The independent association of 25(OH)D and 1,25(OH)2D with diabetes prevalence in a large population is a new finding and thus these findings warrant confirmation in larger, prospective studies.
Diabetes; Vitamin D status; 25(OH)D; 1,25(OH)2D
Common polymorphisms in the N-acetyltransferase 2 gene (NAT2) modify the association between cigarette smoking and bladder cancer and have been hypothesized to determine whether active cigarette smoking increases breast cancer risk. The authors sought to replicate the latter hypothesis in a prospective analysis of 6,900 breast cancer cases and 9,903 matched controls drawn from 6 cohorts (1989–2006) in the National Cancer Institute’s Breast and Prostate Cancer Cohort Consortium. Standardized methods were used to genotype the 3 most common polymorphisms that define NAT2 acetylation phenotype (rs1799930, rs1799931, and rs1801280). In unconditional logistic regression analyses, breast cancer risk was higher in women with more than 20 pack-years of active cigarette smoking than in never smokers (odds ratio (OR) = 1.28, 95% confidence interval (CI): 1.17, 1.39), after controlling for established risk factors other than alcohol consumption and physical inactivity. However, associations were similar for the slow (OR = 1.25, 95% CI: 1.11, 1.39) and rapid/intermediate (OR = 1.24, 95% CI: 1.08, 1.42) acetylation phenotypes, with no evidence of interaction (P = 0.87). These results provide some support for the hypothesis that long-term cigarette smoking may be causally associated with breast cancer risk but underscore the need for caution when interpreting sparse data on gene-environment interactions.
arylamine N-acetyltransferase; breast neoplasms; NAT2 protein, human; polymorphism, single nucleotide; smoking
We conducted a genome-wide association study (GWAS) of breast cancer by genotyping 528,173 single nucleotide polymorphisms (SNPs) in 1,145 cases of invasive breast cancer among postmenopausal white women, and 1,142 controls. We identified a set of four SNPs in intron 2 of FGFR2, a tyrosine kinase receptor previously shown to be amplified and/or over-expressed in some breast cancers, as highly associated with breast cancer and we confirmed this association in 1,776 cases and 2,072 controls from three additional studies. In both association testing and ancestral recombination graph analysis, FGFR2 haplotypes were associated with risk of breast cancer. Across the four studies the association with all four SNPs was highly statistically significant (Ptrend for the most strongly associated SNP, rs1219648 = 1.1 × 10−10; population attributable risk = 16%). Four SNPs at other chromosomal loci most strongly associated with breast cancer in the initial GWAS were not associated with risk in the three replication studies. Our summary results from the GWAS are freely available online in a form that should speed the identification of additional loci conferring risk.
To identify common genetic variants influencing red blood cell (RBC) traits.
Patients and Methods
We performed a genomewide association study from June 2008 through July 2011 of hemoglobin, hematocrit, RBC count, mean corpuscular volume, mean corpuscular hemoglobin, and mean corpuscular hemoglobin concentration in 12,486 patients of European ancestry from the electronic MEdical Records and Genomics (eMERGE) network. We developed an electronic medical record–based algorithm that included individuals who had RBC measurements obtained for clinical care and excluded values measured in the setting of hematopoietic disorders, comorbid conditions, or medications known to affect RBC production or a recent history of blood loss.
We identified 4 new genetic loci and replicated 11 loci previously reported to be associated with one or more RBC traits in individuals of European ancestry. Notably, genes present in 3 of the 4 newly identified loci (THRB, PTPLAD1, CDT1) and in 6 of the 11 replicated loci (KLF1, ALDH8A1, CCND3, SPTA1, FBXO7, TFR2/EPO) are implicated in erythroid differentiation and regulation of cell cycle in hematopoietic stem cells.
Genes in the erythroid differentiation and cell cycle regulation pathways influence interindividual variation in RBC indices. Our results provide insights into the molecular basis underlying variation in RBC traits.
eMERGE, electronic MEdical Records and GEnomics; EMMAX, mixed-model association-expedited; EMR, electronic medical record; eQTL, expression quantitative trait locus; GHC, Group Health Cooperative--University of Washington; GWAS, genomewide association study; HCT, hematocrit; HGB, hemoglobin; IBS, identity-by-state; LD, linkage disequilibrium; MC, Marshfield Clinic; MCH, mean corpuscular hemoglobin; MCHC, mean corpuscular hemoglobin concentration; MCV, mean corpuscular volume; MIM, Mendelian Inheritance of Man; NU, Northwestern University; RBC, red blood cell; SNP, single-nucleotide polymorphism; VUMC, Vanderbilt University Medical Center
Clinical trials demonstrated that women treated for breast cancer with anthracycline or trastuzumab are at increased risk for heart failure and/or cardiomyopathy (HF/CM), but the generalizability of these findings is unknown. We estimated real-world adjuvant anthracycline and trastuzumab use and their associations with incident HF/CM.
We conducted a population-based, retrospective cohort study of 12 500 women diagnosed with incident, invasive breast cancer from January 1, 1999 through December 31, 2007, at eight integrated Cancer Research Network health systems. Using administrative procedure and pharmacy codes, we identified anthracycline, trastuzumab, and other chemotherapy use. We identified incident HF/CM following chemotherapy initiation and assessed risk of HF/CM with time-varying chemotherapy exposures vs no chemotherapy. Multivariable Cox proportional hazards regression models were used to estimate hazard ratios (HRs) and 95% confidence intervals (CIs) with adjustment for age at diagnosis, stage, Cancer Research Network site, year of diagnosis, radiation therapy, and comorbidities.
Among 12 500 women (mean age = 60 years, range = 22–99 years), 29.6% received anthracycline alone, 0.9% received trastuzumab alone, 3.5% received anthracycline plus trastuzumab, 19.5% received other chemotherapy, and 46.5% received no chemotherapy. Anthracycline and trastuzumab recipients were younger, with fewer comorbidities than recipients of other chemotherapy or none. Compared with no chemotherapy, the risk of HF/CM was higher in patients treated with anthracycline alone (adjusted HR = 1.40, 95% CI = 1.11 to 1.76), although the increased risk was similar to other chemotherapy (adjusted HR = 1.49, 95% CI = 1.25 to 1.77); the risk was highly increased in patients treated with trastuzumab alone (adjusted HR = 4.12, 95% CI = 2.30 to 7.42) or anthracycline plus trastuzumab (adjusted HR = 7.19, 95% CI = 5.00 to 10.35).
Anthracycline and trastuzumab were primarily used in younger, healthier women and associated with increased HF/CM risk compared with no chemotherapy. This population-based observational study complements findings from clinical trials on cancer treatment safety.
A number of organizations have employed a consultative process with the vision community to engage relevant parties in identifying needs and opportunities for vision research. The National Eye Institute in the US and the European Commission are currently undergoing consultation to develop priorities for vision research. Once these priorities have been established, the challenge will be to identify the resources to advance these research agendas. Success rates for Federal funding for research have decreased recently in the USA, UK, and Australia. Researchers should consider various potential funding sources for their research. The universal consideration for funding is that the reason for funding should align with the mission of the funding organization. In addition to Federal research organizations that fund investigator-initiated research, other potential funding sources include nongovernmental organizations, for-profit companies, individual philanthropy, and service organizations. In addition to aligning with organizational funding priorities, researchers need to consider turn-around time and total funds available including whether an organization will cover institutional indirect costs. Websites are useful tools to find information about organizations that fund research, including grant deadlines. Collaboration is encouraged.
Funding; vision research priorities; peer review; research
Recently, several genome-wide association studies have identified various genetic susceptibility loci for breast cancer. Relatively little is known about the possible interactions between these loci and the established risk factors for breast cancer.
To assess interactions between single-nucleotide polymorphisms (SNPs) and established risk factors, we prospectively collected DNA samples and questionnaire data from 8576 breast cancer case subjects and 11 892 control subjects nested within the National Cancer Institute’s Breast and Prostate Cancer Cohort Consortium (BPC3). We genotyped 17 germline SNPs (FGFR2-rs2981582, FGFR2-rs3750817, TNRC9-rs3803662, 2q35-rs13387042, MAP3K1-rs889312, 8q24-rs13281615, CASP8-rs1045485, LSP1-rs3817198, COL1A1-rs2075555, COX11-rs6504950, RNF146-rs2180341, 6q25-rs2046210, SLC4A7-rs4973768, NOTCH2-rs11249433, 5p12-rs4415084, 5p12-rs10941679, RAD51L1-rs999737), and odds ratios were estimated by logistic regression to confirm previously reported associations with breast cancer risk. We performed likelihood ratio test to assess interactions between 17 SNPs and nine established risk factors (age at menarche, parity, age at menopause, use of hormone replacement therapy, family history, height, body mass index, smoking status, and alcohol consumption), and a correction for multiple testing of 153 tests (adjusted P value threshold = .05/153 = 3 × 10−4) was done. Case–case comparisons were performed for possible differential associations of polymorphisms by subgroups of tumor stage, estrogen and progesterone receptor status, and age at diagnosis. All statistical tests were two-sided.
We confirmed the association of 14 SNPs with breast cancer risk (Ptrend = 2.57 × 10−3 –3.96 × 10−19). Three SNPs (LSP1-rs3817198, COL1A1-rs2075555, and RNF146-rs2180341) did not show association with breast cancer risk. After accounting for multiple testing, no statistically significant interactions were detected between the 17 SNPs and the nine risk factors. We also confirmed that SNPs in FGFR2 and TNRC9 were associated with greater risk of estrogen receptor–positive than estrogen receptor–negative breast cancer (Pheterogeneity = .0016 for FGFR2-rs2981582 and Pheterogeneity = .0053 for TNRC9-rs3803662). SNP 5p12-rs10941679 was statistically significantly associated with greater risk of progesterone receptor–positive than progesterone receptor–negative breast cancer (Pheterogeneity = .0028).
This study does not support the hypothesis that known common breast cancer susceptibility loci strongly modify the associations between established risk factors and breast cancer.
Nonverbal and verbal communication elements enhance and reinforce the consent form in the informed consent process and need to be transferred appropriately to multimedia formats using interaction design when re-designing the process.
Observational, question asking behavior, and content analyses were used to analyze nonverbal and verbal elements of an informed consent process.
A variety of gestures, interruptions, and communication styles were observed.
In converting a verbal conversation about a textual document to multimedia formats, all aspects of the original process including verbal and nonverbal variation should be one part of an interaction community-centered design approach.
Informed consent; gestures; user-computer interface; multimedia