1.  Generalization of Variants Identified by Genome-wide Association Studies for Electrocardiographic Traits in African Americans 
Annals of human genetics  2013;77(4):321-332.
Electrocardiographic (ECG) measurements vary by ancestry. Genome-wide association studies (GWAS) have identified loci that contribute to ECG measurements; however most are performed in Europeans collected from population-based cohorts or surveys. The strongest associations reported are in NOS1AP with QT interval and SCN10A with PR and QRS durations. The extent to which these associations can be generalized to African Americans has yet to be determined. Using electronic medical records, PR and QT intervals, QRS duration, and heart rate were determined in 455 African Americans as part of the Vanderbilt Genome-Electronic Records Project and Northwestern University NUgene Project. We tested for an association between these ECG traits and >930K SNPs. We identified a total 36 novel associations with PR interval, QRS duration, QT interval, and heart rate at p< 1.0 ×10−6. Using published GWAS data, we compared our results with those previously identified in other populations. Five associations originally identified in other populations generalized with respect to statistical significance and direction of effect. A total of 43 associations have a consistent direction of effect with European and/or Asian populations. This work provides a catalogue of generalized versus non-generalized associations, a necessary step in prioritizing GWAS-identified regions for further fine-mapping in diverse populations.
PMCID: PMC3743946  PMID: 23534349
Electrocardiography; African Americans; GWAS; Generalization; Electronic Medical Records
2.  Admixture Mapping and Subsequent Fine-Mapping Suggests a Biologically Relevant and Novel Association on Chromosome 11 for Type 2 Diabetes in African Americans 
PLoS ONE  2014;9(3):e86931.
Type 2 diabetes (T2D) is a complex metabolic disease that disproportionately affects African Americans. Genome-wide association studies (GWAS) have identified several loci that contribute to T2D in European Americans, but few studies have been performed in admixed populations. We first performed a GWAS of 1,563 African Americans from the Vanderbilt Genome-Electronic Records Project and Northwestern University NUgene Project as part of the electronic Medical Records and Genomics (eMERGE) network. We successfully replicate an association in TCF7L2, previously identified by GWAS in this African American dataset. We were unable to identify novel associations at p<5.0×10−8 by GWAS. Using admixture mapping as an alternative method for discovery, we performed a genome-wide admixture scan that suggests multiple candidate genes associated with T2D. One finding, TCIRG1, is a T-cell immune regulator expressed in the pancreas and liver that has not been previously implicated for T2D. We performed subsequent fine-mapping to further assess the association between TCIRG1 and T2D in >5,000 African Americans. We identified 13 independent associations between TCIRG1, CHKA, and ALDH3B1 genes on chromosome 11 and T2D. Our results suggest a novel region on chromosome 11 identified by admixture mapping is associated with T2D in African Americans.
PMCID: PMC3940426  PMID: 24595071
3.  Electronic medical records and genomics (eMERGE) network exploration in cataract: Several new potential susceptibility loci 
Molecular Vision  2014;20:1281-1295.
Cataract is the leading cause of blindness in the world, and in the United States accounts for approximately 60% of Medicare costs related to vision. The purpose of this study was to identify genetic markers for age-related cataract through a genome-wide association study (GWAS).
In the electronic medical records and genomics (eMERGE) network, we ran an electronic phenotyping algorithm on individuals in each of five sites with electronic medical records linked to DNA biobanks. We performed a GWAS using 530,101 SNPs from the Illumina 660W-Quad in a total of 7,397 individuals (5,503 cases and 1,894 controls). We also performed an age-at-diagnosis case-only analysis.
We identified several statistically significant associations with age-related cataract (45 SNPs) as well as age at diagnosis (44 SNPs). The 45 SNPs associated with cataract at p<1×10−5 are in several interesting genes, including ALDOB, MAP3K1, and MEF2C. All have potential biologic relationships with cataracts.
This is the first genome-wide association study of age-related cataract, and several regions of interest have been identified. The eMERGE network has pioneered the exploration of genomic associations in biobanks linked to electronic health records, and this study is another example of the utility of such resources. Explorations of age-related cataract including validation and replication of the association results identified herein are needed in future studies.
PMCID: PMC4168835  PMID: 25352737
4.  Nonmuscle myosin IIA with a GFP fused to the N-terminus of the regulatory light chain is regulated normally 
Nonmuscle myosin II plays a crucial role in a variety of cellular processes (e.g., polarity formation, cell motility, and cytokinesis). It is composed of two heavy chains, two regulatory light chains and two essential light chains. The ATPase activity of the myosin II motor domain is regulated through phosphorylation of the regulatory light chain (RLC) by myosin light chain kinase. To study myosin function and localization in cellular processes, GFP-fused RLCs are widely used; however, the exact kinetic properties of myosins with bound GFP-RLC are poorly described. More importantly, it has not been shown that a regulatory light chain fused at its N-terminus with GFP can maintain the normal phosphorylation-dependent regulation of nonmuscle myosin or serve as a substrate for myosin light chain kinase. We coexpressed N-terminal GFP-RLC with a heavy meromyosin (HMM)-like fragment of nonmuscle myosin IIA and essential light chain to characterize the phosphorylation dynamics and in vitro kinetic properties of the resulting HMM. Myosin light chain kinase phosphorylates the GFP-RLC bound to HMM IIA with the same Vmax as it does the wild type RLC bound to HMM IIA, but the Km is about two fold higher for the GFP fusion protein, meaning that it is a somewhat poorer substrate. The steady-state actin-activated MgATPase activity of the GFP-RLC HMM is very low in the absence of phosphorylation demonstrating that the GFP moiety does not prevent formation of the off state. The actin-activated MgATPase activity of phosphorylated GFP-RLC-HMM and is about half that of wild type phosphorylated HMM. The ability of phosphorylated GFP-RLC-HMM to move actin filaments in the actin gliding assay is also slightly compromised. These data indicate that despite some kinetic differences the N-terminal GFP fusion to the regulatory light chain is a reasonable model system for studying myosin function in vivo.
PMCID: PMC3786345  PMID: 20711642
GFP; Nonmuscle myosin; Regulatory light chain; Enzymatic activity; In vitro motility
5.  Return of Individual Research Results from Genome-wide Association Studies: Experience of the Electronic Medical Records & Genomics (eMERGE) Network 
Return of individual genetic results to research participants, including participants in archives and biorepositories, is receiving increased attention. However, few groups have deliberated on specific results or weighed deliberations against relevant local contextual factors.
The Electronic Medical Records and GEnomics (eMERGE) network, which includes five biorepositories conducting genome-wide association studies, convened a Return of Results Oversight Committee (RROC) to identify potentially returnable results. Network-wide deliberations were then brought to local constituencies for final decision-making.
Defining results that should be considered for return required input from clinicians with relevant expertise and much deliberation. The RROC identified two sex chromosomal anomalies, Klinefelter Syndrome and Turner Syndrome, as well as homozygosity for Factor V Leiden, as findings that could warrant reporting. Views about returning HFE gene mutations associated with hemochromatosis were mixed due to low penetrance. Review of EMRs suggested that most participants with detected abnormalities were unaware of these findings. Local considerations relevant to return varied and, to date, four sites have elected not to return findings (return was not possible at one site).
The eMERGE experience reveals the complexity of return of results decision-making and provides a potential deliberative model for adoption in other collaborative contexts.
PMCID: PMC3723451  PMID: 22361898
Result return; biorepository; electronic medical records; deliberation; context
6.  Development of a Scalable Pharmacogenomic Clinical Decision Support Service 
Advances in sequencing technology are making genomic data more accessible within the healthcare environment. Published pharmacogenetic guidelines attempt to provide a clinical context for specific genomic variants; however, the actual implementation to convert genomic data into a clinical report integrated within an electronic medical record system is a major challenge for any hospital. We created a two-part solution that integrates with the medical record system and converts genetic variant results into an interpreted clinical report based on published guidelines. We successfully developed a scalable infrastructure to support TPMT genetic testing and are currently testing approximately two individuals per week in our production version. We plan to release an online variant to clinical interpretation reporting system in order to facilitate translation of pharmacogenetic information into clinical practice.
PMCID: PMC3814487  PMID: 24303299
7.  Managing Incidental Findings and Research Results in Genomic Research Involving Biobanks & Archived Datasets 
Biobanks and archived datasets collecting samples and data have become crucial engines of genetic and genomic research. Unresolved, however, is what responsibilities biobanks should shoulder to manage incidental findings (IFs) and individual research results (IRRs) of potential health, reproductive, or personal importance to individual contributors (using “biobank” here to refer to both collections of samples and collections of data). This paper reports recommendations from a 2-year, NIH-funded project. The authors analyze responsibilities to manage return of IFs and IRRs in a biobank research system (primary research or collection sites, the biobank itself, and secondary research sites). They suggest that biobanks shoulder significant responsibility for seeing that the biobank research system addresses the return question explicitly. When re-identification of individual contributors is possible, the biobank should work to enable the biobank research system to discharge four core responsibilities: to (1) clarify the criteria for evaluating findings and roster of returnable findings, (2) analyze a particular finding in relation to this, (3) re-identify the individual contributor, and (4) recontact the contributor to offer the finding. The authors suggest that findings that are analytically valid, reveal an established and substantial risk of a serious health condition, and that are clinically actionable should generally be offered to consenting contributors. The paper specifies 10 concrete recommendations, addressing new biobanks and biobanks already in existence.
PMCID: PMC3597341  PMID: 22436882
incidental findings; return of results; biobanks; research ethics; bioethics; genetics; genomics
8.  Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study 
Genome-wide association studies (GWAS) require high specificity and large numbers of subjects to identify genotype–phenotype correlations accurately. The aim of this study was to identify type 2 diabetes (T2D) cases and controls for a GWAS, using data captured through routine clinical care across five institutions using different electronic medical record (EMR) systems.
Materials and Methods
An algorithm was developed to identify T2D cases and controls based on a combination of diagnoses, medications, and laboratory results. The performance of the algorithm was validated at three of the five participating institutions compared against clinician review. A GWAS was subsequently performed using cases and controls identified by the algorithm, with samples pooled across all five institutions.
The algorithm achieved 98% and 100% positive predictive values for the identification of diabetic cases and controls, respectively, as compared against clinician review. By standardizing and applying the algorithm across institutions, 3353 cases and 3352 controls were identified. Subsequent GWAS using data from five institutions replicated the TCF7L2 gene variant (rs7903146) previously associated with T2D.
By applying stringent criteria to EMR data collected through routine clinical care, cases and controls for a GWAS were identified that subsequently replicated a known genetic variant. The use of standard terminologies to define data elements enabled pooling of subjects and data across five different institutions to achieve the robust numbers required for GWAS.
An algorithm using commonly available data from five different EMR can accurately identify T2D cases and controls for genetic study across multiple institutions.
PMCID: PMC3277617  PMID: 22101970
Analytics; application of biological knowledge to clinical care; bioinformatics; biomedical informatics; clinical phenotyping; controlled terminologies and vocabularies; data mining; EHR; EMR secondary and meaningful use; genetic epidemiology; genetics; genome-wide association studies; genomics; HIT data standards; improving the education and skills training of health professionals; infection control; information retrieval; knowledge representations; linking the genotype and phenotype; medical informatics; modeling; natural-language processing; ontologies; pharmacogenomics; phenotyping; reuseability; translational research
9.  Variation of osteocyte lacunae size within the tetrapod skeleton: implications for palaeogenomics 
Biology Letters  2011;7(5):751-754.
Recent studies have emphasized the ability to reconstruct genome sizes (C-values) of extinct organisms such as dinosaurs, using correlations between known genome sizes and bone cell (osteocyte lacunae) volumes. Because of the established positive relationship between cell size and genome size in extant vertebrates, osteocyte lacunae volume is a viable proxy for reconstructing C-values in the absence of any viable genetic material. However, intra-skeletal osteocyte lacunae size variation, which could cause error in genome size estimation, has remained unexplored. Here, 11 skeletal elements of one individual from each of four major clades (Mammalia, Amphibia, Aves, Reptilia) were examined histologically. Skeletal elements in all four clades exhibit significant differences in the average sizes of their lacunae. This variation, however, generally does not cause a significant difference in the estimated genome size when common phylogenetic estimation methods are employed. On the other hand, the spread of the estimations illustrates that this method may not be precise. High variance in genome size estimations remains an outstanding problem. Additionally, a suite of new methods is introduced to further automate the measurement of bone cells and other microstructural features on histological thin sections.
PMCID: PMC3169053  PMID: 21411450
osteocyte lacunae; genome size; palaeogenomics; bone histology
12.  The eMERGE Network: A consortium of biorepositories linked to electronic medical records data for conducting genomic studies 
BMC Medical Genomics  2011;4:13.
The eMERGE (electronic MEdical Records and GEnomics) Network is an NHGRI-supported consortium of five institutions to explore the utility of DNA repositories coupled to Electronic Medical Record (EMR) systems for advancing discovery in genome science. eMERGE also includes a special emphasis on the ethical, legal and social issues related to these endeavors.
The five sites are supported by an Administrative Coordinating Center. Setting of network goals is initiated by working groups: (1) Genomics, (2) Informatics, and (3) Consent & Community Consultation, which also includes active participation by investigators outside the eMERGE funded sites, and (4) Return of Results Oversight Committee. The Steering Committee, comprised of site PIs and representatives and NHGRI staff, meet three times per year, once per year with the External Scientific Panel.
Current progress
The primary site-specific phenotypes for which samples have undergone genome-wide association study (GWAS) genotyping are cataract and HDL, dementia, electrocardiographic QRS duration, peripheral arterial disease, and type 2 diabetes. A GWAS is also being undertaken for resistant hypertension in ≈2,000 additional samples identified across the network sites, to be added to data available for samples already genotyped. Funded by ARRA supplements, secondary phenotypes have been added at all sites to leverage the genotyping data, and hypothyroidism is being analyzed as a cross-network phenotype. Results are being posted in dbGaP. Other key eMERGE activities include evaluation of the issues associated with cross-site deployment of common algorithms to identify cases and controls in EMRs, data privacy of genomic and clinically-derived data, developing approaches for large-scale meta-analysis of GWAS data across five sites, and a community consultation and consent initiative at each site.
Future activities
Plans are underway to expand the network in diversity of populations and incorporation of GWAS findings into clinical care.
By combining advanced clinical informatics, genome science, and community consultation, eMERGE represents a first step in the development of data-driven approaches to incorporate genomic information into routine healthcare delivery.
PMCID: PMC3038887  PMID: 21269473
13.  A fluorescent resonant energy transfer–based biosensor reveals transient and regional myosin light chain kinase activation in lamella and cleavage furrows 
The Journal of Cell Biology  2002;156(3):543-553.
Approaches with high spatial and temporal resolution are required to understand the regulation of nonmuscle myosin II in vivo. Using fluorescence resonance energy transfer we have produced a novel biosensor allowing simultaneous determination of myosin light chain kinase (MLCK) localization and its [Ca2+]4/calmodulin-binding state in living cells. We observe transient recruitment of diffuse MLCK to stress fibers and its in situ activation before contraction. MLCK is highly active in the lamella of migrating cells, but not at the retracting tail. This unexpected result highlights a potential role for MLCK-mediated myosin contractility in the lamella as a driving force for migration. During cytokinesis, MLCK was enriched at the spindle equator during late metaphase, and was maximally activated just before cleavage furrow constriction. As furrow contraction was completed, active MLCK was redistributed to the poles of the daughter cells. These results show MLCK is a myosin regulator in the lamella and contractile ring, and pinpoints sites where myosin function may be mediated by other kinases.
PMCID: PMC2173328  PMID: 11815633
myosin light chain kinase; myosin light chains; phosphorylation; cell division; FRET

