Search tips
Search criteria

Results 1-4 (4)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  The Drosophila phenotype ontology 
Phenotype ontologies are queryable classifications of phenotypes. They provide a widely-used means for annotating phenotypes in a form that is human-readable, programatically accessible and that can be used to group annotations in biologically meaningful ways. Accurate manual annotation requires clear textual definitions for terms. Accurate grouping and fruitful programatic usage require high-quality formal definitions that can be used to automate classification. The Drosophila phenotype ontology (DPO) has been used to annotate over 159,000 phenotypes in FlyBase to date, but until recently lacked textual or formal definitions.
We have composed textual definitions for all DPO terms and formal definitions for 77% of them. Formal definitions reference terms from a range of widely-used ontologies including the Phenotype and Trait Ontology (PATO), the Gene Ontology (GO) and the Cell Ontology (CL). We also describe a generally applicable system, devised for the DPO, for recording and reasoning about the timing of death in populations. As a result of the new formalisations, 85% of classifications in the DPO are now inferred rather than asserted, with much of this classification leveraging the structure of the GO. This work has significantly improved the accuracy and completeness of classification and made further development of the DPO more sustainable.
The DPO provides a set of well-defined terms for annotating Drosophila phenotypes and for grouping and querying the resulting annotation sets in biologically meaningful ways. Such queries have already resulted in successful function predictions from phenotype annotation. Moreover, such formalisations make extended queries possible, including cross-species queries via the external ontologies used in formal definitions. The DPO is openly available under an open source license in both OBO and OWL formats. There is good potential for it to be used more broadly by the Drosophila community, which may ultimately result in its extension to cover a broader range of phenotypes.
PMCID: PMC3816596  PMID: 24138933
Drosophila; Phenotype; Ontology; OWL; OBO; Gene ontology; FlyBase
2.  FlyBase: enhancing Drosophila Gene Ontology annotations 
Nucleic Acids Research  2008;37(Database issue):D555-D559.
FlyBase ( is a database of Drosophila genetic and genomic information. Gene Ontology (GO) terms are used to describe three attributes of wild-type gene products: their molecular function, the biological processes in which they play a role, and their subcellular location. This article describes recent changes to the FlyBase GO annotation strategy that are improving the quality of the GO annotation data. Many of these changes stem from our participation in the GO Reference Genome Annotation Project—a multi-database collaboration producing comprehensive GO annotation sets for 12 diverse species.
PMCID: PMC2686450  PMID: 18948289
3.  The Framingham Heart Study 100K SNP genome-wide association study resource: overview of 17 phenotype working group reports 
BMC Medical Genetics  2007;8(Suppl 1):S1.
The Framingham Heart Study (FHS), founded in 1948 to examine the epidemiology of cardiovascular disease, is among the most comprehensively characterized multi-generational studies in the world. Many collected phenotypes have substantial genetic contributors; yet most genetic determinants remain to be identified. Using single nucleotide polymorphisms (SNPs) from a 100K genome-wide scan, we examine the associations of common polymorphisms with phenotypic variation in this community-based cohort and provide a full-disclosure, web-based resource of results for future replication studies.
Adult participants (n = 1345) of the largest 310 pedigrees in the FHS, many biologically related, were genotyped with the 100K Affymetrix GeneChip. These genotypes were used to assess their contribution to 987 phenotypes collected in FHS over 56 years of follow up, including: cardiovascular risk factors and biomarkers; subclinical and clinical cardiovascular disease; cancer and longevity traits; and traits in pulmonary, sleep, neurology, renal, and bone domains. We conducted genome-wide variance components linkage and population-based and family-based association tests.
The participants were white of European descent and from the FHS Original and Offspring Cohorts (examination 1 Offspring mean age 32 ± 9 years, 54% women). This overview summarizes the methods, selected findings and limitations of the results presented in the accompanying series of 17 manuscripts. The presented association results are based on 70,897 autosomal SNPs meeting the following criteria: minor allele frequency ≥ 10%, genotype call rate ≥ 80%, Hardy-Weinberg equilibrium p-value ≥ 0.001, and satisfying Mendelian consistency. Linkage analyses are based on 11,200 SNPs and short-tandem repeats. Results of phenotype-genotype linkages and associations for all autosomal SNPs are posted on the NCBI dbGaP website at .
We have created a full-disclosure resource of results, posted on the dbGaP website, from a genome-wide association study in the FHS. Because we used three analytical approaches to examine the association and linkage of 987 phenotypes with thousands of SNPs, our results must be considered hypothesis-generating and need to be replicated. Results from the FHS 100K project with NCBI web posting provides a resource for investigators to identify high priority findings for replication.
PMCID: PMC1995613  PMID: 17903291
4.  Mapping complex traits using Random Forests 
BMC Genetics  2003;4(Suppl 1):S64.
Random Forest is a prediction technique based on growing trees on bootstrap samples of data, in conjunction with a random selection of explanatory variables to define the best split at each node. In the case of a quantitative outcome, the tree predictor takes on a numerical value. We applied Random Forest to the first replicate of the Genetic Analysis Workshop 13 simulated data set, with the sibling pairs as our units of analysis and identity by descent (IBD) at selected loci as our explanatory variables. With the knowledge of the true model, we performed two sets of analyses on three phenotypes: HDL, triglycerides, and glucose. The goal was to approach the mapping of complex traits from a multivariate perspective. The first set of analyses mimics a candidate gene approach with a high proportion of true genes among the predictors while the second set represents a genome scan analysis using microsatellite markers. Random Forest was able to identify a few of the major genes influencing the phenotypes, such as baseline HDL and triglycerides, but failed to identify the major genes regulating baseline glucose levels.
PMCID: PMC1866502  PMID: 14975132

Results 1-4 (4)