Search tips
Search criteria

Results 1-3 (3)

Clipboard (0)
more »
Year of Publication
Document Types
1.  Validation of PhenX measures in the personalized medicine research project for use in gene/environment studies 
The purpose of this paper is to describe the data collection efforts and validation of PhenX measures in the Personalized Medicine Research Project (PMRP) cohort.
Thirty-six measures were chosen from the PhenX Toolkit within the following domains: demographics; anthropometrics; alcohol, tobacco and other substances; cardiovascular; environmental exposures; cancer; psychiatric; neurology; and physical activity and physical fitness. Eligibility criteria for the current study included: living PMRP subjects with known addresses who consented to future contact and were not currently living in a nursing home, available GWAS data from eMERGE I for subjects where age-related cataract, HDL, dementia and resistant hypertension were the primary phenotypes, thus biasing the sample to the older PMRP participants. The questionnaires were mailed twice. Data from the PhenX measures were compared with information from PMRP questionnaires and data from Marshfield Clinic electronic medical records.
Completed PhenX questionnaires were returned by 2271 subjects for a final response rate of 70%. The mean age reported on the PhenX questionnaire (73.1 years) was greater than the PMRP questionnaire (64.8 years) because the data were collected at different time points. The mean self-reported weight, and subsequently calculated BMI, were less on the PhenX survey than the measured values at the time of enrollment into PMRP (PhenX means 173.5 pounds and BMI 28.2 kg/m2 versus PMRP 182.9 pounds and BMI 29.6 kg/m2). There was 95.3% agreement between the two questionnaires about having ever smoked at least 100 cigarettes. 139 (6.2%) of subjects indicated on the PhenX questionnaire that they had been told they had a stroke. Of them, only 15 (10.8%) had no electronic indication of a prior stroke or TIA. All of the age-and gender-specific 95% confidence limits around point estimates for major depressive episodes overlap and show that 31% of women aged 50–64 reported symptoms associated with a major depressive episode.
The approach employed resulted in a high response rate and valuable data for future gene/environment analyses. These results and high response rate highlight the utility of the PhenX Toolkit to collect valid phenotypic data that can be shared across groups to facilitate gene/environment studies.
PMCID: PMC3896802  PMID: 24423110
2.  BioBin: a bioinformatics tool for automating the binning of rare variants using publicly available biological knowledge 
BMC Medical Genomics  2013;6(Suppl 2):S6.
With the recent decreasing cost of genome sequence data, there has been increasing interest in rare variants and methods to detect their association to disease. We developed BioBin, a flexible collapsing method inspired by biological knowledge that can be used to automate the binning of low frequency variants for association testing. We also built the Library of Knowledge Integration (LOKI), a repository of data assembled from public databases, which contains resources such as: dbSNP and gene Entrez database information from the National Center for Biotechnology (NCBI), pathway information from Gene Ontology (GO), Protein families database (Pfam), Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, NetPath - signal transduction pathways, Open Regulatory Annotation Database (ORegAnno), Biological General Repository for Interaction Datasets (BioGrid), Pharmacogenomics Knowledge Base (PharmGKB), Molecular INTeraction database (MINT), and evolutionary conserved regions (ECRs) from UCSC Genome Browser. The novelty of BioBin is access to comprehensive knowledge-guided multi-level binning. For example, bin boundaries can be formed using genomic locations from: functional regions, evolutionary conserved regions, genes, and/or pathways.
We tested BioBin using simulated data and 1000 Genomes Project low coverage data to test our method with simulated causative variants and a pairwise comparison of rare variant (MAF < 0.03) burden differences between Yoruba individuals (YRI) and individuals of European descent (CEU). Lastly, we analyzed the NHLBI GO Exome Sequencing Project Kabuki dataset, a congenital disorder affecting multiple organs and often intellectual disability, contrasted with Complete Genomics data as controls.
The results from our simulation studies indicate type I error rate is controlled, however, power falls quickly for small sample sizes using variants with modest effect sizes. Using BioBin, we were able to find simulated variants in genes with less than 20 loci, but found the sensitivity to be much less in large bins. We also highlighted the scale of population stratification between two 1000 Genomes Project data, CEU and YRI populations. Lastly, we were able to apply BioBin to natural biological data from dbGaP and identify an interesting candidate gene for further study.
We have established that BioBin will be a very practical and flexible tool to analyze sequence data and potentially uncover novel associations between low frequency variants and complex disease.
PMCID: PMC3654874  PMID: 23819467
3.  The eMERGE Network: A consortium of biorepositories linked to electronic medical records data for conducting genomic studies 
BMC Medical Genomics  2011;4:13.
The eMERGE (electronic MEdical Records and GEnomics) Network is an NHGRI-supported consortium of five institutions to explore the utility of DNA repositories coupled to Electronic Medical Record (EMR) systems for advancing discovery in genome science. eMERGE also includes a special emphasis on the ethical, legal and social issues related to these endeavors.
The five sites are supported by an Administrative Coordinating Center. Setting of network goals is initiated by working groups: (1) Genomics, (2) Informatics, and (3) Consent & Community Consultation, which also includes active participation by investigators outside the eMERGE funded sites, and (4) Return of Results Oversight Committee. The Steering Committee, comprised of site PIs and representatives and NHGRI staff, meet three times per year, once per year with the External Scientific Panel.
Current progress
The primary site-specific phenotypes for which samples have undergone genome-wide association study (GWAS) genotyping are cataract and HDL, dementia, electrocardiographic QRS duration, peripheral arterial disease, and type 2 diabetes. A GWAS is also being undertaken for resistant hypertension in ≈2,000 additional samples identified across the network sites, to be added to data available for samples already genotyped. Funded by ARRA supplements, secondary phenotypes have been added at all sites to leverage the genotyping data, and hypothyroidism is being analyzed as a cross-network phenotype. Results are being posted in dbGaP. Other key eMERGE activities include evaluation of the issues associated with cross-site deployment of common algorithms to identify cases and controls in EMRs, data privacy of genomic and clinically-derived data, developing approaches for large-scale meta-analysis of GWAS data across five sites, and a community consultation and consent initiative at each site.
Future activities
Plans are underway to expand the network in diversity of populations and incorporation of GWAS findings into clinical care.
By combining advanced clinical informatics, genome science, and community consultation, eMERGE represents a first step in the development of data-driven approaches to incorporate genomic information into routine healthcare delivery.
PMCID: PMC3038887  PMID: 21269473

Results 1-3 (3)