1.  Proteomic analysis of honeybee worker (Apis mellifera) hypopharyngeal gland development 
BMC Genomics  2009;10:645.
Hypopharyngeal glands (HG) of honeybee workers play an important role in honeybee nutrition and caste differentiation. Previous research mainly focused on age-dependent morphological, physiological, biochemical and genomic characters of the HG. Here proteomics and biochemical network analysis were used to follow protein changes during the HG development.
A total of 87, 76, 85, 74, 71, and 55 proteins were unambiguously identified on day 1, 3, 6, 12, 15 and 20, respectively. These proteins were major royal jelly proteins (MRJPs), metabolism of carbohydrates, lipids and proteins, cytoskeleton, development regulation, antioxidant, molecule transporter, regulation of transcription/translation, proteins with folding functions. The most interesting is that MRJP's that have been detected in the HG of the newly emerged worker bees. The MRJP's expression is at peak level from 6-12 days, was validated by western blot analysis of MRJP1, 2 and 3. Moreover, 35 key node proteins were found in the biochemical networks of the HG.
HG secretes RJ at peak level within 6-12 days, but the worker bee can secrete royal jelly (RJ) since birth, which is a new finding. Several key node proteins play an important role in the biochemical networks of the developing HG. This provides us some target proteins when genetically manipulating honeybees.
PMCID: PMC2810308  PMID: 20043834
2.  A genomic glimpse of aminoacyl-tRNA synthetases in malaria parasite Plasmodium falciparum 
BMC Genomics  2009;10:644.
Plasmodium parasites are causative agents of malaria which affects >500 million people and claims ~2 million lives annually. The completion of Plasmodium genome sequencing and availability of PlasmoDB database has provided a platform for systematic study of parasite genome. Aminoacyl-tRNA synthetases (aaRSs) are pivotal enzymes for protein translation and other vital cellular processes. We report an extensive analysis of the Plasmodium falciparum genome to identify and classify aaRSs in this organism.
Using various computational and bioinformatics tools, we have identified 37 aaRSs in P. falciparum. Our key observations are: (i) fraction of proteome dedicated to aaRSs in P. falciparum is very high compared to many other organisms; (ii) 23 out of 37 Pf-aaRS sequences contain signal peptides possibly directing them to different cellular organelles; (iii) expression profiles of Pf-aaRSs vary considerably at various life cycle stages of the parasite; (iv) several PfaaRSs posses very unusual domain architectures; (v) phylogenetic analyses reveal evolutionary relatedness of several parasite aaRSs to bacterial and plants aaRSs; (vi) three dimensional structural modelling has provided insights which could be exploited in inhibitor discovery against parasite aaRSs.
We have identified 37 Pf-aaRSs based on our bioinformatics analysis. Our data reveal several unique attributes in this protein family. We have annotated all 37 Pf-aaRSs based on predicted localization, phylogenetics, domain architectures and their overall protein expression profiles. The sets of distinct features elaborated in this work will provide a platform for experimental dissection of this family of enzymes, possibly for the discovery of novel drugs against malaria.
PMCID: PMC2813244  PMID: 20042123
3.  Improving the efficiency of genomic loci capture using oligonucleotide arrays for high throughput resequencing 
BMC Genomics  2009;10:646.
The emergence of next-generation sequencing technology presents tremendous opportunities to accelerate the discovery of rare variants or mutations that underlie human genetic disorders. Although the complete sequencing of the affected individuals' genomes would be the most powerful approach to finding such variants, the cost of such efforts make it impractical for routine use in disease gene research. In cases where candidate genes or loci can be defined by linkage, association, or phenotypic studies, the practical sequencing target can be made much smaller than the whole genome, and it becomes critical to have capture methods that can be used to purify the desired portion of the genome for shotgun short-read sequencing without biasing allelic representation or coverage. One major approach is array-based capture which relies on the ability to create a custom in-situ synthesized oligonucleotide microarray for use as a collection of hybridization capture probes. This approach is being used by our group and others routinely and we are continuing to improve its performance.
Here, we provide a complete protocol optimized for large aggregate sequence intervals and demonstrate its utility with the capture of all predicted amino acid coding sequence from 3,038 human genes using 241,700 60-mer oligonucleotides. Further, we demonstrate two techniques by which the efficiency of the capture can be increased: by introducing a step to block cross hybridization mediated by common adapter sequences used in sequencing library construction, and by repeating the hybridization capture step. These improvements can boost the targeting efficiency to the point where over 85% of the mapped sequence reads fall within 100 bases of the targeted regions.
The complete protocol introduced in this paper enables researchers to perform practical capture experiments, and includes two novel methods for increasing the targeting efficiency. Coupled with the new massively parallel sequencing technologies, this provides a powerful approach to identifying disease-causing genetic variants that can be localized within the genome by traditional methods.
PMCID: PMC2808330  PMID: 20043857
4.  Transcriptional adaptations following exercise in Thoroughbred horse skeletal muscle highlights molecular mechanisms that lead to muscle hypertrophy 
BMC Genomics  2009;10:638.
Selection for exercise-adapted phenotypes in the Thoroughbred racehorse has provided a valuable model system to understand molecular responses to exercise in skeletal muscle. Exercise stimulates immediate early molecular responses as well as delayed responses during recovery, resulting in a return to homeostasis and enabling long term adaptation. Global mRNA expression during the immediate-response period has not previously been reported in skeletal muscle following exercise in any species. Also, global gene expression changes in equine skeletal muscle following exercise have not been reported. Therefore, to identify novel genes and key regulatory pathways responsible for exercise adaptation we have used equine-specific cDNA microarrays to examine global mRNA expression in skeletal muscle from a cohort of Thoroughbred horses (n = 8) at three time points (before exercise, immediately post-exercise, and four hours post-exercise) following a single bout of treadmill exercise.
Skeletal muscle biopsies were taken from the gluteus medius before (T0), immediately after (T1) and four hours after (T2) exercise. Statistically significant differences in mRNA abundance between time points (T0 vs T1 and T0 vs T2) were determined using the empirical Bayes moderated t-test in the Bioconductor package Linear Models for Microarray Data (LIMMA) and the expression of a select panel of genes was validated using real time quantitative reverse transcription PCR (qRT-PCR). While only two genes had increased expression at T1 (P < 0.05), by T2 932 genes had increased (P < 0.05) and 562 genes had decreased expression (P < 0.05). Functional analysis of genes differentially expressed during the recovery phase (T2) revealed an over-representation of genes localized to the actin cytoskeleton and with functions in the MAPK signalling, focal adhesion, insulin signalling, mTOR signaling, p53 signaling and Type II diabetes mellitus pathways. At T1, using a less stringent statistical approach, we observed an over-representation of genes involved in the stress response, metabolism and intracellular signaling. These findings suggest that protein synthesis, mechanosensation and muscle remodeling contribute to skeletal muscle adaptation towards improved integrity and hypertrophy.
This is the first study to characterize global mRNA expression profiles in equine skeletal muscle using an equine-specific microarray platform. Here we reveal novel genes and mechanisms that are temporally expressed following exercise providing new knowledge about the early and late molecular responses to exercise in the equine skeletal muscle transcriptome.
PMCID: PMC2812474  PMID: 20042072
5.  Deep RNA sequencing of L. monocytogenes reveals overlapping and extensive stationary phase and sigma B-dependent transcriptomes, including multiple highly transcribed noncoding RNAs 
BMC Genomics  2009;10:641.
Identification of specific genes and gene expression patterns important for bacterial survival, transmission and pathogenesis is critically needed to enable development of more effective pathogen control strategies. The stationary phase stress response transcriptome, including many σB-dependent genes, was defined for the human bacterial pathogen Listeria monocytogenes using RNA sequencing (RNA-Seq) with the Illumina Genome Analyzer. Specifically, bacterial transcriptomes were compared between stationary phase cells of L. monocytogenes 10403S and an otherwise isogenic ΔsigB mutant, which does not express the alternative σ factor σB, a major regulator of genes contributing to stress response, including stresses encountered upon entry into stationary phase.
Overall, 83% of all L. monocytogenes genes were transcribed in stationary phase cells; 42% of currently annotated L. monocytogenes genes showed medium to high transcript levels under these conditions. A total of 96 genes had significantly higher transcript levels in 10403S than in ΔsigB, indicating σB-dependent transcription of these genes. RNA-Seq analyses indicate that a total of 67 noncoding RNA molecules (ncRNAs) are transcribed in stationary phase L. monocytogenes, including 7 previously unrecognized putative ncRNAs. Application of a dynamically trained Hidden Markov Model, in combination with RNA-Seq data, identified 65 putative σB promoters upstream of 82 of the 96 σB-dependent genes and upstream of the one σB-dependent ncRNA. The RNA-Seq data also enabled annotation of putative operons as well as visualization of 5'- and 3'-UTR regions.
The results from these studies provide powerful evidence that RNA-Seq data combined with appropriate bioinformatics tools allow quantitative characterization of prokaryotic transcriptomes, thus providing exciting new strategies for exploring transcriptional regulatory networks in bacteria.
See minireivew
PMCID: PMC2813243  PMID: 20042087
6.  Molecular evolution of the hyperthermophilic archaea of the Pyrococcus genus: analysis of adaptation to different environmental conditions 
BMC Genomics  2009;10:639.
Prokaryotic microorganisms are able to survive and proliferate in severe environmental conditions. The increasing number of complete sequences of prokaryotic genomes has provided the basis for studying the molecular mechanisms of their adaptation at the genomic level. We apply here a computer-based approach to compare the genomes and proteomes from P. furiosus, P. horikoshii, and P. abyssi to identify features of their molecular evolution related to adaptation strategy to diverse environmental conditions.
Phylogenetic analysis of rRNA genes from 26 Pyrococcus strains suggested that the divergence of P. furiosus, P. horikoshii and P. abyssi might have occurred from ancestral deep-sea organisms. It was demonstrated that the function of genes that have been subject to positive Darwinian selection is closely related to abiotic and biotic conditions to which archaea managed to become adapted. Divergence of the P. furiosus archaea might have been due to loss of some genes involved in cell motility or signal transduction, and/or to evolution under positive selection of the genes for translation machinery. In the course of P. horikoshii divergence, positive selection was found to operate mainly on the transcription machinery; divergence of P. abyssi was related with positive selection for the genes mainly involved in inorganic ion transport. Analysis of radical amino acid replacement rate in evolving P. furiosus, P. horikoshii and P. abyssi showed that the fixation rate was higher for radical substitutions relative to the volume of amino acid side-chain.
The current results give due credit to the important role of hydrostatic pressure as a cause of variability in the P. furiosus, P. horikoshii and P. abyssi genomes evolving in different habitats. Nevertheless, adaptation to pressure does not appear to be the sole factor ensuring adaptation to environment. For example, at the stage of the divergence of P. horikoshii and P. abyssi, an essential evolutionary role may be assigned to changes in the trophic chain, namely, acquisition of a consumer status at a high (P. horikoshii) or low level (P. abyssi).
PMCID: PMC2816203  PMID: 20042074
7.  Quantitative analysis of replication-related mutation and selection pressures in bacterial chromosomes and plasmids using generalised GC skew index 
BMC Genomics  2009;10:640.
Due to their bi-directional replication machinery starting from a single finite origin, bacterial genomes show characteristic nucleotide compositional bias between the two replichores, which can be visualised through GC skew or (C-G)/(C+G). Although this polarisation is used for computational prediction of replication origins in many bacterial genomes, the degree of GC skew visibility varies widely among different species, necessitating a quantitative measurement of GC skew strength in order to provide confidence measures for GC skew-based predictions of replication origins.
Here we discuss a quantitative index for the measurement of GC skew strength, named the generalised GC skew index (gGCSI), which is applicable to genomes of any length, including bacterial chromosomes and plasmids. We demonstrate that gGCSI is independent of the window size and can thus be used to compare genomes with different sizes, such as bacterial chromosomes and plasmids. It can suggest the existence of different replication mechanisms in archaea and of rolling-circle replication in plasmids. Correlation of gGCSI values between plasmids and their corresponding host chromosomes suggests that within the same strain, these replicons have reproduced using the same replication machinery and thus exhibit similar strengths of replication strand skew.
gGCSI can be applied to genomes of any length and thus allows comparative study of replication-related mutation and selection pressures in genomes of different lengths such as bacterial chromosomes and plasmids. Using gGCSI, we showed that replication-related mutation or selection pressure is similar for replicons with similar machinery.
PMCID: PMC2804667  PMID: 20042086
8.  Genome-wide dissection of globally emergent multi-drug resistant serotype 19A Streptococcus pneumoniae 
BMC Genomics  2009;10:642.
Emergence of multi-drug resistant (MDR) serotype 19A Streptococcus pneumoniae (SPN) is well-documented but causal factors remain unclear. Canadian SPN isolates (1993-2008, n = 11,083) were serotyped and in vitro susceptibility tested. A subset of MDR 19A were multi-locus sequence typed (MLST) and representative isolates' whole genomes sequenced.
MDR 19A increased in the post-PCV7 era while 19F, 6B, and 23F concurrently declined. MLST of MDR 19A (n = 97) revealed that sequence type (ST) 320 predominated. ST320 was unique amongst MDR 19A in that its minimum inhibitory concentration (MIC) values for penicillin, amoxicillin, ceftriaxone, and erythromycin were higher than for other ST present amongst post-PCV7 MDR 19A. DNA sequencing revealed that alleles at key drug resistance loci pbp2a, pbp2x, pbp2b, ermB, mefA/E, and tetM were conserved between pre-PCV7 ST 320 19F and post-PCV7 ST 320 19A most likely due to a capsule switch recombination event. A genome wide comparison of MDR 19A ST320 with MDR 19F ST320 identified 822 unique SNPs in 19A, 61 of which were present in antimicrobial resistance genes and 100 in virulence factors.
Our results suggest a complex genetic picture where high-level drug resistance, vaccine selection pressure, and SPN mutational events have created a "perfect storm" for the emergence of MDR 19A.
PMCID: PMC2807444  PMID: 20042094
9.  MicroRNA and tasiRNA diversity in mature pollen of Arabidopsis thaliana 
BMC Genomics  2009;10:643.
New generation sequencing technology has allowed investigation of the small RNA populations of flowering plants at great depth. However, little is known about small RNAs in their reproductive cells, especially in post-meiotic cells of the gametophyte generation. Pollen - the male gametophyte - is the specialised haploid structure that generates and delivers the sperm cells to the female gametes at fertilisation. Whether development and differentiation of the male gametophyte depends on the action of microRNAs and trans-acting siRNAs guiding changes in gene expression is largely unknown. Here we have used 454 sequencing to survey the various small RNA populations present in mature pollen of Arabidopsis thaliana.
In this study we detected the presence of 33 different microRNA families in mature pollen and validated the expression levels of 17 selected miRNAs by Q-RT-PCR. The majority of the selected miRNAs showed pollen-enriched expression compared with leaves. Furthermore, we report for the first time the presence of trans-acting siRNAs in pollen. In addition to describing new patterns of expression for known small RNAs in each of these classes, we identified 7 putative novel microRNAs. One of these, ath-MIR2939, targets a pollen-specific F-box transcript and we demonstrate cleavage of its target mRNA in mature pollen.
Despite the apparent simplicity of the male gametophyte, comprising just two different cell types, pollen not only utilises many miRNAs and trans-acting siRNAs expressed in the somatic tissues but also expresses novel miRNAs.
PMCID: PMC2808329  PMID: 20042113
10.  Genome-wide loss-of-function analysis of deubiquitylating enzymes for zebrafish development 
BMC Genomics  2009;10:637.
Deconjugation of ubiquitin and/or ubiquitin-like modified protein substrates is essential to modulate protein-protein interactions and, thus, signaling processes in cells. Although deubiquitylating (deubiquitinating) enzymes (DUBs) play a key role in this process, however, their function and regulation remain insufficiently understood. The "loss-of-function" phenotype studies can provide important information to elucidate the gene function, and zebrafish is an excellent model for this goal.
From an in silico genome-wide search, we found more than 90 putative DUBs encoded in the zebrafish genome belonging to six different subclasses. Out of them, 85 from five classical subclasses have been tested with morpholino (MO) knockdown experiments and 57 of them were found to be important in early development of zebrafish. These DUB morphants resulted in a complex and pleiotropic phenotype that, regardless of gene target, always affected the notochord. Based on the huC neuronal marker expression, we grouped them into five sets (groups I to V). Group I DUBs (otud7b, uchl3 and bap1) appear to be involved in the Notch signaling pathway based on the neuronal hyperplasia, while group IV DUBs (otud4, usp5, usp15 and usp25) play a critical role in dorsoventral patterning through the BMP pathway.
We have identified an exhaustive list of genes in the zebrafish genome belonging to the five established classes of DUBs. Additionally, we performed the corresponding MO knockdown experiments in zebrafish as well as functional studies for a subset of the predicted DUB genes. The screen results in this work will stimulate functional follow-up studies of potential DUB genes using the zebrafish model system.
PMCID: PMC2809080  PMID: 20040115
11.  3PD: Rapid design of optimal primers for chromosome conformation capture assays 
BMC Genomics  2009;10:635.
Higher eukaryotes control the expression of their genes by mechanisms that we are just beginning to understand. A complex layer of control is the dynamic spatial organization of the nucleus.
We present a bioinformatics solution (3PD) to support the experimentalist in detecting long-ranging intra or inter chromosomal contacts by Chromosome conformation capture (3C) assays. 3C assays take a snapshot of chromosomal contacts by a fixation step and quantify them by PCR. Our contribution is to rapidly design an optimal primer set for the crucial PCR step. Our primer design reduces the level of experimental error as primers are highly similar in terms of physical properties and amplicon length. All 3C primers are compatible with multiplex PCR reactions. Primer uniqueness is checked genome-wide with a suitable index structure.
In summary, our software 3PD facilitates genome-wide primer design for 3C experiments in a matter of seconds. Our software is available as a web server at:
PMCID: PMC2811132  PMID: 20040085
12.  Comprehensive in silico prediction and analysis of chlamydial outer membrane proteins reflects evolution and life style of the Chlamydiae 
BMC Genomics  2009;10:634.
Chlamydiae are obligate intracellular bacteria comprising some of the most important bacterial pathogens of animals and humans. Although chlamydial outer membrane proteins play a key role for attachment to and entry into host cells, only few have been described so far. We developed a comprehensive, multiphasic in silico approach, including the calculation of clusters of orthologues, to predict outer membrane proteins using conservative criteria. We tested this approach using Escherichia coli (positive control) and Bacillus subtilis (negative control), and applied it to five chlamydial species; Chlamydia trachomatis, Chlamydia muridarum, Chlamydia (a.k.a. Chlamydophila) pneumoniae, Chlamydia (a.k.a. Chlamydophila) caviae, and Protochlamydia amoebophila.
In total, 312 chlamydial outer membrane proteins and lipoproteins in 88 orthologous clusters were identified, including 238 proteins not previously recognized to be located in the outer membrane. Analysis of their taxonomic distribution revealed an evolutionary conservation among Chlamydiae, Verrucomicrobia, Lentisphaerae and Planctomycetes as well as lifestyle-dependent conservation of the chlamydial outer membrane protein composition.
This analysis suggested a correlation between the outer membrane protein composition and the host range of chlamydiae and revealed a common set of outer membrane proteins shared by these intracellular bacteria. The collection of predicted chlamydial outer membrane proteins is available at the online database pCOMP and might provide future guidance in the quest for anti-chlamydial vaccines.
PMCID: PMC2811131  PMID: 20040079
13.  A bi-dimensional genome scan for prolificacy traits in pigs shows the existence of multiple epistatic QTL 
BMC Genomics  2009;10:636.
Prolificacy is the most important trait influencing the reproductive efficiency of pig production systems. The low heritability and sex-limited expression of prolificacy have hindered to some extent the improvement of this trait through artificial selection. Moreover, the relative contributions of additive, dominant and epistatic QTL to the genetic variance of pig prolificacy remain to be defined. In this work, we have undertaken this issue by performing one-dimensional and bi-dimensional genome scans for number of piglets born alive (NBA) and total number of piglets born (TNB) in a three generation Iberian by Meishan F2 intercross.
The one-dimensional genome scan for NBA and TNB revealed the existence of two genome-wide highly significant QTL located on SSC13 (P < 0.001) and SSC17 (P < 0.01) with effects on both traits. This relative paucity of significant results contrasted very strongly with the wide array of highly significant epistatic QTL that emerged in the bi-dimensional genome-wide scan analysis. As much as 18 epistatic QTL were found for NBA (four at P < 0.01 and five at P < 0.05) and TNB (three at P < 0.01 and six at P < 0.05), respectively. These epistatic QTL were distributed in multiple genomic regions, which covered 13 of the 18 pig autosomes, and they had small individual effects that ranged between 3 to 4% of the phenotypic variance. Different patterns of interactions (a × a, a × d, d × a and d × d) were found amongst the epistatic QTL pairs identified in the current work.
The complex inheritance of prolificacy traits in pigs has been evidenced by identifying multiple additive (SSC13 and SSC17), dominant and epistatic QTL in an Iberian × Meishan F2 intercross. Our results demonstrate that a significant fraction of the phenotypic variance of swine prolificacy traits can be attributed to first-order gene-by-gene interactions emphasizing that the phenotypic effects of alleles might be strongly modulated by the genetic background where they segregate.
PMCID: PMC2812473  PMID: 20040109
14.  Sympatric ecological speciation meets pyrosequencing: sampling the transcriptome of the apple maggot Rhagoletis pomonella 
BMC Genomics  2009;10:633.
The full power of modern genetics has been applied to the study of speciation in only a small handful of genetic model species - all of which speciated allopatrically. Here we report the first large expressed sequence tag (EST) study of a candidate for ecological sympatric speciation, the apple maggot Rhagoletis pomonella, using massively parallel pyrosequencing on the Roche 454-FLX platform. To maximize transcript diversity we created and sequenced separate libraries from larvae, pupae, adult heads, and headless adult bodies.
We obtained 239,531 sequences which assembled into 24,373 contigs. A total of 6810 unique protein coding genes were identified among the contigs and long singletons, corresponding to 48% of all known Drosophila melanogaster protein-coding genes. Their distribution across GO classes suggests that we have obtained a representative sample of the transcriptome. Among these sequences are many candidates for potential R. pomonella "speciation genes" (or "barrier genes") such as those controlling chemosensory and life-history timing processes. Furthermore, we identified important marker loci including more than 40,000 single nucleotide polymorphisms (SNPs) and over 100 microsatellites. An initial search for SNPs at which the apple and hawthorn host races differ suggested at least 75 loci warranting further work. We also determined that developmental expression differences remained even after normalization; transcripts expected to show different expression levels between larvae and pupae in D. melanogaster also did so in R. pomonella. Preliminary comparative analysis of transcript presences and absences revealed evidence of gene loss in Drosophila and gain in the higher dipteran clade Schizophora.
These data provide a much needed resource for exploring mechanisms of divergence in this important model for sympatric ecological speciation. Our description of ESTs from a substantial portion of the R. pomonella transcriptome will facilitate future functional studies of candidate genes for olfaction and diapause-related life history timing, and will enable large scale expression studies. Similarly, the identification of new SNP and microsatellite markers will facilitate future population and quantitative genetic studies of divergence between the apple and hawthorn-infesting host races.
PMCID: PMC2807884  PMID: 20035631
15.  Large-scale identification of odorant-binding proteins and chemosensory proteins from expressed sequence tags in insects 
BMC Genomics  2009;10:632.
Insect odorant binding proteins (OBPs) and chemosensory proteins (CSPs) play an important role in chemical communication of insects. Gene discovery of these proteins is a time-consuming task. In recent years, expressed sequence tags (ESTs) of many insect species have accumulated, thus providing a useful resource for gene discovery.
We have developed a computational pipeline to identify OBP and CSP genes from insect ESTs. In total, 752,841 insect ESTs were examined from 54 species covering eight Orders of Insecta. From these ESTs, 142 OBPs and 177 CSPs were identified, of which 117 OBPs and 129 CSPs are new. The complete open reading frames (ORFs) of 88 OBPs and 123 CSPs were obtained by electronic elongation. We randomly chose 26 OBPs from eight species of insects, and 21 CSPs from four species for RT-PCR validation. Twenty two OBPs and 16 CSPs were confirmed by RT-PCR, proving the efficiency and reliability of the algorithm. Together with all family members obtained from the NCBI (OBPs) or the UniProtKB (CSPs), 850 OBPs and 237 CSPs were analyzed for their structural characteristics and evolutionary relationship.
A large number of new OBPs and CSPs were found, providing the basis for deeper understanding of these proteins. In addition, the conserved motif and evolutionary analysis provide some new insights into the evolution of insect OBPs and CSPs. Motif pattern fine-tune the functions of OBPs and CSPs, leading to the minor difference in binding sex pheromone or plant volatiles in different insect Orders.
PMCID: PMC2808328  PMID: 20034407
16.  The repertoire of equine intestinal α-defensins 
BMC Genomics  2009;10:631.
Defensins represent an important class of antimicrobial peptides. These effector molecules of the innate immune system act as endogenous antibiotics to protect the organism against infections with pathogenic microorganisms. Mammalian defensins are classified into three distinct sub-families (α-, β- and θ-defensins) according to their specific intramolecular disulfide-bond pattern. The peptides exhibit an antimicrobial activity against a broad spectrum of microorganisms including bacteria and fungi. Alpha-Defensins are primarily synthesised in neutrophils and intestinal Paneth cells. They play a role in the pathogenesis of intestinal diseases and may regulate the flora of the intestinal tract. An equine intestinal α-defensin (DEFA1), the first characterised in the Laurasiatheria, shows a broad antimicrobial spectrum against human and equine pathogens. Here we report a first investigation of the repertoire of equine intestinal α-defensins. The equine genome was screened for putative α-defensin genes by using known α-defensin sequences as matrices. Based on the obtained sequence information, a set of oligonucleotides specific to the α-defensin gene-family was designed. The products generated by reverse-transcriptase PCR with cDNA from the small intestine as template were sub-cloned and numerous clones were sequenced.
Thirty-eight equine intestinal α-defensin transcripts were determined. After translation it became evident that at least 20 of them may code for functional peptides. Ten transcripts lacked matching genomic sequences and for 14 α-defensin genes apparently present in the genome no appropriate transcript could be verified. In other cases the same genomic exons were found in different transcripts.
The large repertoire of equine α-defensins found in this study points to a particular importance of these peptides regarding animal health and protection from infectious diseases. Moreover, these findings make the horse an excellent species to study biological properties of α-defensins. Interestingly, the peptides were not found in other species of the Laurasiatheria to date. Comparison of the obtained transcripts with the genomic sequences in the current assembly of the horse (EquCab2.0) indicates that it is yet not complete and/or to some extent falsely assembled.
PMCID: PMC2803202  PMID: 20030839
17.  Canine tumor cross-species genomics uncovers targets linked to osteosarcoma progression 
BMC Genomics  2009;10:625.
Pulmonary metastasis continues to be the most common cause of death in osteosarcoma. Indeed, the 5-year survival for newly diagnosed osteosarcoma patients has not significantly changed in over 20 years. Further understanding of the mechanisms of metastasis and resistance for this aggressive pediatric cancer is necessary. Pet dogs naturally develop osteosarcoma providing a novel opportunity to model metastasis development and progression. Given the accelerated biology of canine osteosarcoma, we hypothesized that a direct comparison of canine and pediatric osteosarcoma expression profiles may help identify novel metastasis-associated tumor targets that have been missed through the study of the human cancer alone.
Using parallel oligonucleotide array platforms, shared orthologues between species were identified and normalized. The osteosarcoma expression signatures could not distinguish the canine and human diseases by hierarchical clustering. Cross-species target mining identified two genes, interleukin-8 (IL-8) and solute carrier family 1 (glial high affinity glutamate transporter), member 3 (SLC1A3), which were uniformly expressed in dog but not in all pediatric osteosarcoma patient samples. Expression of these genes in an independent population of pediatric osteosarcoma patients was associated with poor outcome (p = 0.020 and p = 0.026, respectively). Validation of IL-8 and SLC1A3 protein expression in pediatric osteosarcoma tissues further supported the potential value of these novel targets. Ongoing evaluation will validate the biological significance of these targets and their associated pathways.
Collectively, these data support the strong similarities between human and canine osteosarcoma and underline the opportunities provided by a comparative oncology approach as a means to improve our understanding of cancer biology and therapies.
PMCID: PMC2803201  PMID: 20028558
18.  Generation and analysis of expression sequence tags from haustoria of the wheat stripe rust fungus Puccinia striiformis f. sp. Tritici 
BMC Genomics  2009;10:626.
Stripe rust, caused by Puccinia striiformis f. sp. tritici (Pst), is one of the most destructive diseases of wheat (Triticum aestivum L.) worldwide. In spite of its agricultural importance, the genomics and genetics of the pathogen are poorly characterized. Pst transcripts from urediniospores and germinated urediniospores have been examined previously, but little is known about genes expressed during host infection. Some genes involved in virulence in other rust fungi have been found to be specifically expressed in haustoria. Therefore, the objective of this study was to generate a cDNA library to characterize genes expressed in haustoria of Pst.
A total of 5,126 EST sequences of high quality were generated from haustoria of Pst, from which 287 contigs and 847 singletons were derived. Approximately 10% and 26% of the 1,134 unique sequences were homologous to proteins with known functions and hypothetical proteins, respectively. The remaining 64% of the unique sequences had no significant similarities in GenBank. Fifteen genes were predicted to be proteins secreted from Pst haustoria. Analysis of ten genes, including six secreted protein genes, using quantitative RT-PCR revealed changes in transcript levels in different developmental and infection stages of the pathogen.
The haustorial cDNA library was useful in identifying genes of the stripe rust fungus expressed during the infection process. From the library, we identified 15 genes encoding putative secreted proteins and six genes induced during the infection process. These genes are candidates for further studies to determine their functions in wheat-Pst interactions.
PMCID: PMC2805700  PMID: 20028560
19.  Single strand conformation polymorphism based SNP and Indel markers for genetic mapping and synteny analysis of common bean (Phaseolus vulgaris L.) 
BMC Genomics  2009;10:629.
Expressed sequence tags (ESTs) are an important source of gene-based markers such as those based on insertion-deletions (Indels) or single-nucleotide polymorphisms (SNPs). Several gel based methods have been reported for the detection of sequence variants, however they have not been widely exploited in common bean, an important legume crop of the developing world. The objectives of this project were to develop and map EST based markers using analysis of single strand conformation polymorphisms (SSCPs), to create a transcript map for common bean and to compare synteny of the common bean map with sequenced chromosomes of other legumes.
A set of 418 EST based amplicons were evaluated for parental polymorphisms using the SSCP technique and 26% of these presented a clear conformational or size polymorphism between Andean and Mesoamerican genotypes. The amplicon based markers were then used for genetic mapping with segregation analysis performed in the DOR364 × G19833 recombinant inbred line (RIL) population. A total of 118 new marker loci were placed into an integrated molecular map for common bean consisting of 288 markers. Of these, 218 were used for synteny analysis and 186 presented homology with segments of the soybean genome with an e-value lower than 7 × 10-12. The synteny analysis with soybean showed a mosaic pattern of syntenic blocks with most segments of any one common bean linkage group associated with two soybean chromosomes. The analysis with Medicago truncatula and Lotus japonicus presented fewer syntenic regions consistent with the more distant phylogenetic relationship between the galegoid and phaseoloid legumes.
The SSCP technique is a useful and inexpensive alternative to other SNP or Indel detection techniques for saturating the common bean genetic map with functional markers that may be useful in marker assisted selection. In addition, the genetic markers based on ESTs allowed the construction of a transcript map and given their high conservation between species allowed synteny comparisons to be made to sequenced genomes. This synteny analysis may support positional cloning of target genes in common bean through the use of genomic information from these other legumes.
PMCID: PMC2806352  PMID: 20030833
20.  Effects of temperature on gene expression in embryos of the coral Montastraea faveolata 
BMC Genomics  2009;10:627.
Coral reefs are expected to be severely impacted by rising seawater temperatures associated with climate change. This study used cDNA microarrays to investigate transcriptional effects of thermal stress in embryos of the coral Montastraea faveolata. Embryos were exposed to 27.5°C, 29.0°C, and 31.5°C directly after fertilization. Differences in gene expression were measured after 12 and 48 hours.
Analysis of differentially expressed genes indicated that increased temperatures may lead to oxidative stress, apoptosis, and a structural reconfiguration of the cytoskeletal network. Metabolic processes were downregulated, and the action of histones and zinc finger-containing proteins may have played a role in the long-term regulation upon heat stress.
Embryos responded differently depending on exposure time and temperature level. Embryos showed expression of stress-related genes already at a temperature of 29.0°C, but seemed to be able to counteract the initial response over time. By contrast, embryos at 31.5°C displayed continuous expression of stress genes. The genes that played a role in the response to elevated temperatures consisted of both highly conserved and coral-specific genes. These genes might serve as a basis for research into coral-specific adaptations to stress responses and global climate change.
PMCID: PMC2807443  PMID: 20030803
21.  Identification of mammalian orthologs using local synteny 
BMC Genomics  2009;10:630.
Accurate determination of orthology is central to comparative genomics. For vertebrates in particular, very large gene families, high rates of gene duplication and loss, multiple mechanisms of gene duplication, and high rates of retrotransposition all combine to make inference of orthology between genes difficult. Many methods have been developed to identify orthologous genes, mostly based upon analysis of the inferred protein sequence of the genes. More recently, methods have been proposed that use genomic context in addition to protein sequence to improve orthology assignment in vertebrates. Such methods have been most successfully implemented in fungal genomes and have long been used in prokaryotic genomes, where gene order is far less variable than in vertebrates. However, to our knowledge, no explicit comparison of synteny and sequence based definitions of orthology has been reported in vertebrates, or, more specifically, in mammals.
We test a simple method for the measurement and utilization of gene order (local synteny) in the identification of mammalian orthologs by investigating the agreement between coding sequence based orthology (Inparanoid) and local synteny based orthology. In the 5 mammalian genomes studied, 93% of the sampled inter-species pairs were found to be concordant between the two orthology methods, illustrating that local synteny is a robust substitute to coding sequence for identifying orthologs. However, 7% of pairs were found to be discordant between local synteny and Inparanoid. These cases of discordance result from evolutionary events including retrotransposition and genome rearrangements.
By analyzing cases of discordance between local synteny and Inparanoid we show that local synteny can distinguish between true orthologs and recent retrogenes, can resolve ambiguous many-to-many orthology relationships into one-to-one ortholog pairs, and might be used to identify cases of non-orthologous gene displacement by retroduplicated paralogs.
PMCID: PMC2807883  PMID: 20030836
22.  Conservation of DNA-binding specificity and oligomerisation properties within the p53 family 
BMC Genomics  2009;10:628.
Transcription factors activate their target genes by binding to specific response elements. Many transcription factor families evolved from a common ancestor by gene duplication and subsequent divergent evolution. Members of the p53 family, which play key roles in cell-cycle control and development, share conserved DNA binding and oligomerisation domains but exhibit distinct functions. In this study, the molecular basis of the functional divergence of related transcription factors was investigated.
We characterised the DNA-binding specificity and oligomerisation properties of human p53, p63 and p73, as well as p53 from other organisms using novel biophysical approaches. All p53 family members bound DNA cooperatively as tetramers with high affinity. Despite structural differences in the oligomerisation domain, the dissociation constants of the tetramers was in the low nanomolar range for all family members, indicating that the strength of tetramerisation was evolutionarily conserved. However, small differences in the oligomerisation properties were observed, which may play a regulatory role. Intriguingly, the DNA-binding specificity of p53 family members was highly conserved even for evolutionarily distant species. Additionally, DNA recognition was only weakly affected by CpG methylation. Prediction of p53/p63/p73 binding sites in the genome showed almost complete overlap between the different homologs.
Diversity of biological function of p53 family members is not reflected in differences in sequence-specific DNA binding. Hence, additional specificity factors must exist, which allowed the acquisition of novel functions during evolution while preserving original roles.
PMCID: PMC2807882  PMID: 20030809
23.  Functional diversity of human protein kinase splice variants marks significant expansion of human kinome 
BMC Genomics  2009;10:622.
Protein kinases are involved in diverse spectrum of cellular processes. Availability of draft version of the human genomic data in the year 2001 enabled recognition of repertoire of protein kinases. However, over the years the human genomic data is being refined and the current release of human genomic data has helped us to recognize a larger repertoire of over 900 human protein kinases represented mainly by splice variants.
Many of these identified protein kinases are alternatively spliced products. Interestingly, some of the human kinase splice variants appear to be significantly diverged in terms of their functional properties as represented by incorporation or absence of one or more domains. Many sets of protein kinase splice variants have substantially different domain organization and in a few sets of splice variants kinase domains belong to different subfamilies of kinases suggesting potential participation in different signal transduction pathways.
Addition or deletion of a domain between splice variants of multi-domain kinases appears to be a means of generating differences in the functional features of otherwise similar kinases. It is intriguing that marked sequence diversity within the catalytic regions of some of the splice variant kinases result in kinases belonging to different subfamilies. These human kinase splice variants with different functions might contribute to diversity of eukaryotic cellular signaling.
PMCID: PMC2805699  PMID: 20028505
24.  Potential impact of stress activated retrotransposons on genome evolution in a marine diatom 
BMC Genomics  2009;10:624.
Transposable elements (TEs) are mobile DNA sequences present in the genomes of most organisms. They have been extensively studied in animals, fungi, and plants, and have been shown to have important functions in genome dynamics and species evolution. Recent genomic data can now enlarge the identification and study of TEs to other branches of the eukaryotic tree of life. Diatoms, which belong to the heterokont group, are unicellular eukaryotic algae responsible for around 40% of marine primary productivity. The genomes of a centric diatom, Thalassiosira pseudonana, and a pennate diatom, Phaeodactylum tricornutum, that likely diverged around 90 Mya, have recently become available.
In the present work, we establish that LTR retrotransposons (LTR-RTs) are the most abundant TEs inhabiting these genomes, with a much higher presence in the P. tricornutum genome. We show that the LTR-RTs found in diatoms form two new phylogenetic lineages that appear to be diatom specific and are also found in environmental samples taken from different oceans. Comparative expression analysis in P. tricornutum cells cultured under 16 different conditions demonstrate high levels of transcriptional activity of LTR retrotransposons in response to nitrate limitation and upon exposure to diatom-derived reactive aldehydes, which are known to induce stress responses and cell death. Regulatory aspects of P. tricornutum retrotransposon transcription also include the occurrence of nitrate limitation sensitive cis-regulatory components within LTR elements and cytosine methylation dynamics. Differential insertion patterns in different P. tricornutum accessions isolated from around the world infer the role of LTR-RTs in generating intraspecific genetic variability.
Based on these findings we propose that LTR-RTs may have been important for promoting genome rearrangements in diatoms.
PMCID: PMC2806351  PMID: 20028555
25.  A new measurement of sequence conservation 
BMC Genomics  2009;10:623.
Understanding sequence conservation is important for the study of sequence evolution and for the identification of functional regions of the genome. Current studies often measure sequence conservation based on every position in contiguous regions. Therefore, a large number of functional regions that contain conserved segments separated by relatively long divergent segments are ignored. Our goal in this paper is to define a new measurement of sequence conservation such that both contiguously conserved regions and discontiguously conserved regions can be detected based on this new measurement. Here and in the following, conserved regions are those regions that share similarity higher than a pre-specified similarity threshold with their homologous regions in other species. That is, conserved regions are good candidates of functional regions and may not be always functional. Moreover, conserved regions may contain long and divergent segments.
To identify both discontiguously and contiguously conserved regions, we proposed a new measurement of sequence conservation, which measures sequence similarity based only on the conserved segments within the regions. By defining conserved segments using the local alignment tool CHAOS, under the new measurement, we analyzed the conservation of 1642 experimentally verified human functional non-coding regions in the mouse genome. We found that the conservation in at least 11% of these functional regions could be missed by the current conservation analysis methods. We also found that 72% of the mouse homologous regions identified based on the new measurement are more similar to the human functional sequences than the aligned mouse sequences from the UCSC genome browser. We further compared BLAST and discontiguous MegaBLAST with our method. We found that our method picks up many more conserved segments than BLAST and discontiguous MegaBLAST in these regions.
It is critical to have a new measurement of sequence conservation that is based only on the conserved segments in one region. Such a new measurement can aid the identification of better local "orthologous" regions. It will also shed light on the identification of new types of conserved functional regions in vertebrate genomes [1].
PMCID: PMC2807881  PMID: 20028539

