2.  Chado use case: storing genomic, genetic and breeding data of Rosaceae and Gossypium crops in Chado 
The Genome Database for Rosaceae (GDR) and CottonGen are comprehensive online data repositories that provide access to integrated genomic, genetic and breeding data through search, visualization and analysis tools for Rosaceae crops and Gossypium (cotton). These online databases use Chado, an open-source, generic and ontology-driven database schema for biological data, as the primary data storage platform. Chado is highly normalized and uses ontologies to indicate the ‘types’ of data. Therefore, Chado is flexible such that it has been used to house genomic, genetic and breeding data for GDR and CottonGen. These data include whole genome sequence and annotation, transcripts, molecular markers, genetic maps, Quantitative Trait Loci, Mendelian Trait Loci, traits, germplasm, pedigrees, large scale phenotypic and genotypic data, ontologies and publications. We provide information about how to store these types of data in Chado using GDR and CottonGen as examples sites that were converted from an older legacy infrastructure.
Database URL: GDR (, CottonGen (
PMCID: PMC4795932  PMID: 26989146
3.  The Genome Database for Rosaceae (GDR): year 10 update 
Nucleic Acids Research  2013;42(Database issue):D1237-D1244.
The Genome Database for Rosaceae (GDR, http:/, the long-standing central repository and data mining resource for Rosaceae research, has been enhanced with new genomic, genetic and breeding data, and improved functionality. Whole genome sequences of apple, peach and strawberry are available to browse or download with a range of annotations, including gene model predictions, aligned transcripts, repetitive elements, polymorphisms, mapped genetic markers, mapped NCBI Rosaceae genes, gene homologs and association of InterPro protein domains, GO terms and Kyoto Encyclopedia of Genes and Genomes pathway terms. Annotated sequences can be queried using search interfaces and visualized using GBrowse. New expressed sequence tag unigene sets are available for major genera, and Pathway data are available through FragariaCyc, AppleCyc and PeachCyc databases. Synteny among the three sequenced genomes can be viewed using GBrowse_Syn. New markers, genetic maps and extensively curated qualitative/Mendelian and quantitative trait loci are available. Phenotype and genotype data from breeding projects and genetic diversity projects are also included. Improved search pages are available for marker, trait locus, genetic diversity and publication data. New search tools for breeders enable selection comparison and assistance with breeding decision making.
PMCID: PMC3965003  PMID: 24225320
4.  CottonGen: a genomics, genetics and breeding database for cotton research 
Nucleic Acids Research  2013;42(Database issue):D1229-D1236.
CottonGen ( is a curated and integrated web-based relational database providing access to publicly available genomic, genetic and breeding data for cotton. CottonGen supercedes CottonDB and the Cotton Marker Database, with enhanced tools for easier data sharing, mining, visualization and data retrieval of cotton research data. CottonGen contains annotated whole genome sequences, unigenes from expressed sequence tags (ESTs), markers, trait loci, genetic maps, genes, taxonomy, germplasm, publications and communication resources for the cotton community. Annotated whole genome sequences of Gossypium raimondii are available with aligned genetic markers and transcripts. These whole genome data can be accessed through genome pages, search tools and GBrowse, a popular genome browser. Most of the published cotton genetic maps can be viewed and compared using CMap, a comparative map viewer, and are searchable via map search tools. Search tools also exist for markers, quantitative trait loci (QTLs), germplasm, publications and trait evaluation data. CottonGen also provides online analysis tools such as NCBI BLAST and Batch BLAST.
PMCID: PMC3964939  PMID: 24203703
5.  Tripal v1.1: a standards-based toolkit for construction of online genetic and genomic databases 
Tripal is an open-source freely available toolkit for construction of online genomic and genetic databases. It aims to facilitate development of community-driven biological websites by integrating the GMOD Chado database schema with Drupal, a popular website creation and content management software. Tripal provides a suite of tools for interaction with a Chado database and display of content therein. The tools are designed to be generic to support the various ways in which data may be stored in Chado. Previous releases of Tripal have supported organisms, genomic libraries, biological stocks, stock collections and genomic features, their alignments and annotations. Also, Tripal and its extension modules provided loaders for commonly used file formats such as FASTA, GFF, OBO, GAF, BLAST XML, KEGG heir files and InterProScan XML. Default generic templates were provided for common views of biological data, which could be customized using an open Application Programming Interface to change the way data are displayed. Here, we report additional tools and functionality that are part of release v1.1 of Tripal. These include (i) a new bulk loader that allows a site curator to import data stored in a custom tab delimited format; (ii) full support of every Chado table for Drupal Views (a powerful tool allowing site developers to construct novel displays and search pages); (iii) new modules including ‘Feature Map’, ‘Genetic’, ‘Publication’, ‘Project’, ‘Contact’ and the ‘Natural Diversity’ modules. Tutorials, mailing lists, download and set-up instructions, extension modules and other documentation can be found at the Tripal website located at
Database URL:
PMCID: PMC3808541  PMID: 24163125
6.  Chestnut resistance to the blight disease: insights from transcriptome analysis 
BMC Plant Biology  2012;12:38.
A century ago, Chestnut Blight Disease (CBD) devastated the American chestnut. Backcross breeding has been underway to introgress resistance from Chinese chestnut into surviving American chestnut genotypes. Development of genomic resources for the family Fagaceae, has focused in this project on Castanea mollissima Blume (Chinese chestnut) and Castanea dentata (Marsh.) Borkh (American chestnut) to aid in the backcross breeding effort and in the eventual identification of blight resistance genes through genomic sequencing and map based cloning. A previous study reported partial characterization of the transcriptomes from these two species. Here, further analyses of a larger dataset and assemblies including both 454 and capillary sequences were performed and defense related genes with differential transcript abundance (GDTA) in canker versus healthy stem tissues were identified.
Over one and a half million cDNA reads were assembled into 34,800 transcript contigs from American chestnut and 48,335 transcript contigs from Chinese chestnut. Chestnut cDNA showed higher coding sequence similarity to genes in other woody plants than in herbaceous species. The number of genes tagged, the length of coding sequences, and the numbers of tagged members within gene families showed that the cDNA dataset provides a good resource for studying the American and Chinese chestnut transcriptomes. In silico analysis of transcript abundance identified hundreds of GDTA in canker versus healthy stem tissues. A significant number of additional DTA genes involved in the defense-response not reported in a previous study were identified here. These DTA genes belong to various pathways involving cell wall biosynthesis, reactive oxygen species (ROS), salicylic acid (SA), ethylene, jasmonic acid (JA), abscissic acid (ABA), and hormone signalling. DTA genes were also identified in the hypersensitive response and programmed cell death (PCD) pathways. These DTA genes are candidates for host resistance to the chestnut blight fungus, Cryphonectria parasitica.
Our data allowed the identification of many genes and gene network candidates for host resistance to the chestnut blight fungus, Cryphonectria parasitica. The similar set of GDTAs in American chestnut and Chinese chestnut suggests that the variation in sensitivity to this pathogen between these species may be the result of different timing and amplitude of the response of the two to the pathogen infection. Resources developed in this study are useful for functional genomics, comparative genomics, resistance breeding and phylogenetics in the Fagaceae.
PMCID: PMC3376029  PMID: 22429310
8.  Tripal: a construction toolkit for online genome databases 
As the availability, affordability and magnitude of genomics and genetics research increases so does the need to provide online access to resulting data and analyses. Availability of a tailored online database is the desire for many investigators or research communities; however, managing the Information Technology infrastructure needed to create such a database can be an undesired distraction from primary research or potentially cost prohibitive. Tripal provides simplified site development by merging the power of Drupal, a popular web Content Management System with that of Chado, a community-derived database schema for storage of genomic, genetic and other related biological data. Tripal provides an interface that extends the content management features of Drupal to the data housed in Chado. Furthermore, Tripal provides a web-based Chado installer, genomic data loaders, web-based editing of data for organisms, genomic features, biological libraries, controlled vocabularies and stock collections. Also available are Tripal extensions that support loading and visualizations of NCBI BLAST, InterPro, Kyoto Encyclopedia of Genes and Genomes and Gene Ontology analyses, as well as an extension that provides integration of Tripal with GBrowse, a popular GMOD tool. An Application Programming Interface is available to allow creation of custom extensions by site developers, and the look-and-feel of the site is completely customizable through Drupal-based PHP template files. Addition of non-biological content and user-management is afforded through Drupal. Tripal is an open source and freely available software package found at
PMCID: PMC3263599  PMID: 21959868
9.  Candidate Agtr2 influenced genes and pathways identified by expression profiling in the developing brain of Agtr2−/y mice 
Genomics  2009;94(3):188-195.
Intellectual disability (ID) is a common developmental disability observed in one to three percent of the human population. A possible role for the Angiotensin II type 2 receptor (AGTR2) in brain function, affecting learning, memory, and behavior, has been suggested in humans and rodents. Mice lacking the Agtr2 gene (Agtr2−/y) showed significant impairment in their spatial memory and exhibited abnormal dendritic spine morphology. To identify Agtr2 influenced genes and pathways, we performed whole genome microarray analysis on RNA isolated from brains of Agtr2−/y and control male mice at embryonic day 15 (E15) and postnatal day one (P1). The gene expression profiles of the Agtr2−/y brain samples were significantly different when compared to profiles of the age-matched control brains. We identified 62 differently expressed genes (p ≤ 0.005) at E15 and in P1 brains of the Agtr2−/y mice. We verified the differential expression of several of these genes in brain samples using quantitative RT-PCR. Differentially expressed genes encode molecules involved in multiple cellular processes including microtubule functions associated with dendritic spine morphology. This study provides insight into Agtr2 influenced candidate genes and suggests that expression dysregulation of these genes may modulate Agtr2 actions in the brain that influences learning and memory.
PMCID: PMC3164574  PMID: 19501643
Learning and memory; Intellectual Disability; Dendritic spine; Expression profiling; Agtr2
10.  Sequencing of a QTL-rich region of the Theobroma cacao genome using pooled BACs and the identification of trait specific candidate genes 
BMC Genomics  2011;12:379.
BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library.
This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight.
Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed.
PMCID: PMC3154204  PMID: 21794110
next-generation sequencing; QTL sequencing; fungal disease resistance; chocolate
11.  Regulation of Cellular Metabolism and Cytokines by the Medicinal Herb Feverfew in the Human Monocytic THP-1 Cells 
The herb feverfew is a folk remedy for various symptoms including inflammation. Inflammation has recently been implicated in the genesis of many diseases including cancers, atherosclerosis and rheumatoid arthritis. The mechanisms of action of feverfew in the human body are largely unknown. To determine the cellular targets of feverfew extracts, we have utilized oligo microarrays to study the gene expression profiles elicited by feverfew extracts in human monocytic THP-1 cells. We have identified 400 genes that are consistently regulated by feverfew extracts. Most of the genes are involved in cellular metabolism. However, the genes undergoing the highest degree of change by feverfew treatment are involved in other pathways including chemokine function, water homeostasis and heme-mediated signaling. Our results also suggest that feverfew extracts effectively reduce Lipopolysaccharides (LPS)-mediated TNF-α and CCL2 (MCP-1) releases by THP-1 cells. We hypothesize that feverfew components mediate metabolism, cell migration and cytokine production in human monocytes/macrophages.
PMCID: PMC2644270  PMID: 18955216
feverfew; herbal medicine; immune response; microarray; monocyte
12.  Expressed sequence tags from Peromyscus testis and placenta tissue: Analysis, annotation, and utility for mapping 
BMC Genomics  2008;9:300.
Mice of the genus Peromyscus are found in nearly every habitat from Alaska to Central America and from the Atlantic to the Pacific. They provide an evolutionary outgroup to the Mus/Rattus lineage and serve as an intermediary between that lineage and humans. Although Peromyscus has been studied extensively under both field and laboratory conditions, research has been limited by the lack of molecular resources. Genes associated with reproduction typically evolve rapidly and thus are excellent sources of evolutionary information. In this study we describe the generation of two cDNA libraries, one from placenta and one from testis, characterize the resulting ESTs, and describe their utility for mapping the Peromyscus genome.
The 5' ends of 1,510 placenta and 4,798 testis clones were sequenced. Low quality sequences were removed and after clustering and contig assembly, 904 unique placenta and 2,002 unique testis sequences remained. Average lengths of placenta and testis ESTs were 711 bp and 826 bp, respectively. Approximately 82% of all ESTs were identified using the BLASTX algorithm to Mus and Rattus, and 34 – 54% of all ESTs could be assigned to a biological process gene ontology category in either Mus or Rattus. Because the Peromyscus genome organization resembles the Rattus genome more closely than Mus we examined the distribution of the Peromyscus ESTs across the rat genome finding markers on all rat chromosomes except the Y. Approximately 40% of all ESTs were specific to only one location in the Mus genome and spanned introns of an appropriate size for sequencing and SNP detection. Of the primers that were tried 54% provided useful assays for genotyping on interspecific backcross and whole-genome radiation hybrid cell panels.
The 2,906 Peromyscus placenta and testis ESTs described here significantly expands the molecular resources available for the genus. These ESTs allow for specific PCR amplification and broad coverage across the genome, creating an excellent genetic marker resource for the generation of a medium-density genomic map. Thus, this resource will significantly aid research of a genus that is uniquely well-suited to both laboratory and field research.
PMCID: PMC2443383  PMID: 18577228

