Search tips
Search criteria

Results 1-25 (47)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
1.  Genome Sequences of Vibrio navarrensis, a Potential Human Pathogen 
Genome Announcements  2014;2(6):e01188-14.
Vibrio navarrensis is an aquatic bacterium recently shown to be associated with human illness. We report the first genome sequences of three V. navarrensis strains obtained from clinical and environmental sources. Preliminary analyses of the sequences reveal that V. navarrensis contains genes commonly associated with virulence in other human pathogens.
PMCID: PMC4239357  PMID: 25414502
2.  Genome Sequence-Based Discriminator for Vancomycin-Intermediate Staphylococcus aureus 
Journal of Bacteriology  2014;196(5):940-948.
Vancomycin is the mainstay of treatment for patients with Staphylococcus aureus infections, and reduced susceptibility to vancomycin is becoming increasingly common. Accordingly, the development of rapid and accurate assays for the diagnosis of vancomycin-intermediate S. aureus (VISA) will be critical. We developed and applied a genome-based machine-learning approach for discrimination between VISA and vancomycin-susceptible S. aureus (VSSA) using 25 whole-genome sequences. The resulting machine-learning model, based on 14 gene parameters, including 3 molecular typing markers and 11 genes implicated in reduced vancomycin susceptibility, is able to unambiguously distinguish between the VISA and VSSA isolates analyzed here despite the fact that they do not form evolutionarily distinct groups. As such, the model is able to discriminate based on specific genomic markers of antibiotic susceptibility rather than overall sequence relatedness. Subsequent evaluation of the model using leave-one-out validation yielded a classification accuracy of 84%. The machine-learning approach described here provides a generalized framework for the application of genome sequence analysis to the classification of bacteria that differ with respect to clinically relevant phenotypes and should be particularly useful in defining the genomic features that underlie antibiotic resistance.
PMCID: PMC3957707  PMID: 24363339
3.  Epigenetics Components of Aging in the Central Nervous System 
Neurotherapeutics  2013;10(4):647-663.
This review highlights recent discoveries that have shaped the emerging viewpoints in the field of epigenetic influences in the central nervous system (CNS), focusing on the following questions: i) How is the CNS shaped during development when precursor cells transition into morphologically and molecularly distinct cell types, and is this event driven by epigenetic alterations?; ii) How do epigenetic pathways control CNS function?; iii) What happens to “epigenetic memory” during aging processes, and do these alterations cause CNS dysfunction?; iv) Can one restore normal CNS function by manipulating the epigenome using pharmacologic agents, and will this ameliorate aging-related neurodegeneration? These and other still unanswered questions remain critical to understanding the impact of multifaceted epigenetic machinery on the age-related dysfunction of CNS.
Electronic supplementary material
The online version of this article (doi:10.1007/s13311-013-0229-y) contains supplementary material, which is available to authorized users.
PMCID: PMC3805869  PMID: 24132650
Epigenetics; CNS; chromatin; neurodegeneration; aging; histone code; HDAC; DNA methylation
4.  Origin and evolution of the cystic fibrosis transmembrane regulator protein R domain 
Gene  2013;523(2):137-146.
The Cystic Fibrosis Transmembrane Conductance Regulator protein (CFTR) is a member of the ABC transporter superfamily. CFTR is distinguished from all other members of this superfamily by its status as an ion channel as well as the presence of its unique regulatory (R) domain. We investigated the origin and subsequent evolution of the R domain along the CFTR evolutionary lineage. The R domain protein coding sequence originated via the loss of a splice donor site at the 3′ end of exon 14, leading to the subsequent read-through and capture of formerly intronic sequence as novel coding sequence. Inclusion of the remaining part of the R domain coding sequence in the CFTR transcript involved a lineage-specific gain of exonic sequence with no homology to protein coding sequences outside of CFTR and loss of two exons conserved among ABC family members. These events occurred at the base of the Gnathostome evolutionary lineage ~550–650 million years ago. The apparent origination of the R domain de novo from previously non-coding sequence is consistent with its lack of sequence similarity to other domains as well as its intrinsically disordered structure, which has important implications for its function. In particular, this lack of structure may provide for a dynamic and inducible regulatory activity based on transient physical interactions with more structured domains of the protein. Since its acquisition along the CFTR evolutionary lineage, the R domain has evolved more rapidly than any other CFTR domain; however, there is no evidence for positive (adaptive) selection in the evolution of the domain. The R domain does show a distinct pattern of relative evolutionary rates compared to other CFTR domains, which sheds additional light on the connection between its function and evolution. The regulatory function of the R domain is dependent upon a fairly small number of sites that are subject to phosphorylation, and these sites were fixed very early in R domain evolution and have remained largely invariant since that time. In contrast, the rest of the R domain has been free to drift in sequence space leading to a more star-like phylogeny than seen for the other CFTR domains. The case of the R domain suggests that domain acquisition via the de novo creation of coding sequence, and the novel functional utility that such an event would seemingly entail, can be one route by which neo-functionalization is favored to occur.
PMCID: PMC3793851  PMID: 23578801
Cystic fibrosis; R domain; Molecular evolution; Coding sequence; Neo-functionalization
5.  Flow-dependent epigenetic DNA methylation regulates endothelial gene expression and atherosclerosis 
The Journal of Clinical Investigation  2014;124(7):3187-3199.
In atherosclerosis, plaques preferentially develop in arterial regions of disturbed blood flow (d-flow), which alters endothelial gene expression and function. Here, we determined that d-flow regulates genome-wide DNA methylation patterns in a DNA methyltransferase–dependent (DNMT-dependent) manner. Induction of d-flow by partial carotid ligation surgery in a murine model induced DNMT1 in arterial endothelium. In cultured endothelial cells, DNMT1 was enhanced by oscillatory shear stress (OS), and reduction of DNMT with either the inhibitor 5-aza-2′-deoxycytidine (5Aza) or siRNA markedly reduced OS-induced endothelial inflammation. Moreover, administration of 5Aza reduced lesion formation in 2 mouse models of atherosclerosis. Using both reduced representation bisulfite sequencing (RRBS) and microarray, we determined that d-flow in the carotid artery resulted in hypermethylation within the promoters of 11 mechanosensitive genes and that 5Aza treatment restored normal methylation patterns. Of the identified genes, HoxA5 and Klf3 encode transcription factors that contain cAMP response elements, suggesting that the methylation status of these loci could serve as a mechanosensitive master switch in gene expression. Together, our results demonstrate that d-flow controls epigenomic DNA methylation patterns in a DNMT-dependent manner, which in turn alters endothelial gene expression and induces atherosclerosis.
PMCID: PMC4071393  PMID: 24865430
6.  Mammalian-wide interspersed repeat (MIR)-derived enhancers and the regulation of human gene expression 
Mobile DNA  2014;5:14.
Mammalian-wide interspersed repeats (MIRs) are the most ancient family of transposable elements (TEs) in the human genome. The deep conservation of MIRs initially suggested the possibility that they had been exapted to play functional roles for their host genomes. MIRs also happen to be the only TEs whose presence in-and-around human genes is positively correlated to tissue-specific gene expression. Similar associations of enhancer prevalence within genes and tissue-specific expression, along with MIRs’ previous implication as providing regulatory sequences, suggested a possible link between MIRs and enhancers.
To test the possibility that MIRs contribute functional enhancers to the human genome, we evaluated the relationship between MIRs and human tissue-specific enhancers in terms of genomic location, chromatin environment, regulatory function, and mechanistic attributes. This analysis revealed MIRs to be highly concentrated in enhancers of the K562 and HeLa human cell-types. Significantly more enhancers were found to be linked to MIRs than would be expected by chance, and putative MIR-derived enhancers are characterized by a chromatin environment highly similar to that of canonical enhancers. MIR-derived enhancers show strong associations with gene expression levels, tissue-specific gene expression and tissue-specific cellular functions, including a number of biological processes related to erythropoiesis. MIR-derived enhancers were found to be a rich source of transcription factor binding sites, underscoring one possible mechanistic route for the element sequences co-option as enhancers. There is also tentative evidence to suggest that MIR-enhancer function is related to the transcriptional activity of non-coding RNAs.
Taken together, these data reveal enhancers to be an important cis-regulatory platform from which MIRs can exercise a regulatory function in the human genome and help to resolve a long-standing conundrum as to the reason for MIRs’ deep evolutionary conservation.
PMCID: PMC4090950  PMID: 25018785
7.  Genomic Basis of a Polyagglutinating Isolate of Neisseria meningitidis 
Journal of Bacteriology  2012;194(20):5649-5656.
Containment strategies for outbreaks of invasive Neisseria meningitidis disease are informed by serogroup assays that characterize the polysaccharide capsule. We sought to uncover the genomic basis of conflicting serogroup assay results for an isolate (M16917) from a patient with acute meningococcal disease. To this end, we characterized the complete genome sequence of the M16917 isolate and performed a variety of comparative sequence analyses against N. meningitidis reference genome sequences of known serogroups. Multilocus sequence typing and whole-genome sequence comparison revealed that M16917 is a member of the ST-11 sequence group, which is most often associated with serogroup C. However, sequence similarity comparisons and phylogenetic analysis showed that the serogroup diagnostic capsule polymerase gene (synD) of M16917 belongs to serogroup B. These results suggest that a capsule-switching event occurred based on homologous recombination at or around the capsule locus of M16917. Detailed analysis of this locus uncovered the locations of recombination breakpoints in the M16917 genome sequence, which led to the introduction of an ∼2-kb serogroup B sequence cassette into the serogroup C genomic background. Since there is no currently available vaccine for serogroup B strains of N. meningitidis, this kind capsule-switching event could have public health relevance as a vaccine escape mutant.
PMCID: PMC3458693  PMID: 22904290
8.  Inhibition of activated pericentromeric SINE/Alu repeat transcription in senescent human adult stem cells reinstates self-renewal 
Cell Cycle  2011;10(17):3016-3030.
Cellular aging is linked to deficiencies in efficient repair of DNA double strand breaks and authentic genome maintenance at the chromatin level. Aging poses a significant threat to adult stem cell function by triggering persistent DNA damage and ultimately cellular senescence. Senescence is often considered to be an irreversible process. Moreover, critical genomic regions engaged in persistent DNA damage accumulation are unknown. Here we report that 65% of naturally occurring repairable DNA damage in self-renewing adult stem cells occurs within transposable elements. Upregulation of Alu retrotransposon transcription upon ex vivo aging causes nuclear cytotoxicity associated with the formation of persistent DNA damage foci and loss of efficient DNA repair in pericentric chromatin. This occurs due to a failure to recruit of condensin I and cohesin complexes. Our results demonstrate that the cytotoxicity of induced Alu repeats is functionally relevant for the human adult stem cell aging. Stable suppression of Alu transcription can reverse the senescent phenotype, reinstating the cells' self-renewing properties and increasing their plasticity by altering so-called “master” pluripotency regulators.
PMCID: PMC3218602  PMID: 21862875
adult stem cells; senescence; SINE/Alu transposons; DNA damage; H2AX; ChIP-seq; cohesin; condensin; PML body; induced pluripotency
9.  Transcriptional Activity, Chromosomal Distribution and Expression Effects of Transposable Elements in Coffea Genomes 
PLoS ONE  2013;8(11):e78931.
Plant genomes are massively invaded by transposable elements (TEs), many of which are located near host genes and can thus impact gene expression. In flowering plants, TE expression can be activated (de-repressed) under certain stressful conditions, both biotic and abiotic, as well as by genome stress caused by hybridization. In this study, we examined the effects of these stress agents on TE expression in two diploid species of coffee, Coffea canephora and C. eugenioides, and their allotetraploid hybrid C. arabica. We also explored the relationship of TE repression mechanisms to host gene regulation via the effects of exonized TE sequences. Similar to what has been seen for other plants, overall TE expression levels are low in Coffea plant cultivars, consistent with the existence of effective TE repression mechanisms. TE expression patterns are highly dynamic across the species and conditions assayed here are unrelated to their classification at the level of TE class or family. In contrast to previous results, cell culture conditions per se do not lead to the de-repression of TE expression in C. arabica. Results obtained here indicate that differing plant drought stress levels relate strongly to TE repression mechanisms. TEs tend to be expressed at significantly higher levels in non-irrigated samples for the drought tolerant cultivars but in drought sensitive cultivars the opposite pattern was shown with irrigated samples showing significantly higher TE expression. Thus, TE genome repression mechanisms may be finely tuned to the ideal growth and/or regulatory conditions of the specific plant cultivars in which they are active. Analysis of TE expression levels in cell culture conditions underscored the importance of nonsense-mediated mRNA decay (NMD) pathways in the repression of Coffea TEs. These same NMD mechanisms can also regulate plant host gene expression via the repression of genes that bear exonized TE sequences.
PMCID: PMC3823963  PMID: 24244387
10.  On the presence and role of human gene-body DNA methylation 
Oncotarget  2012;3(4):462-474.
DNA methylation of promoter sequences is a repressive epigenetic mark that down-regulates gene expression. However, DNA methylation is more prevalent within gene-bodies than seen for promoters, and gene-body methylation has been observed to be positively correlated with gene expression levels. This paradox remains unexplained, and accordingly the role of DNA methylation in gene-bodies is poorly understood. We addressed the presence and role of human gene-body DNA methylation using a meta-analysis of human genome-wide methylation, expression and chromatin data sets. Methylation is associated with transcribed regions as genic sequences have higher levels of methylation than intergenic or promoter sequences. We also find that the relationship between gene-body DNA methylation and expression levels is non-monotonic and bell-shaped. Mid-level expressed genes have the highest levels of gene-body methylation, whereas the most lowly and highly expressed sets of genes both have low levels of methylation. While gene-body methylation can be seen to efficiently repress the initiation of intragenic transcription, the vast majority of methylated sites within genes are not associated with intragenic promoters. In fact, highly expressed genes initiate the most intragenic transcription, which is inconsistent with the previously held notion that gene-body methylation serves to repress spurious intragenic transcription to allow for efficient transcriptional elongation. These observations lead us to propose a model to explain the presence of human gene-body methylation. This model holds that the repression of intragenic transcription by gene-body methylation is largely epiphenomenal, and suggests that gene-body methylation levels are predominantly shaped via the accessibility of the DNA to methylating enzyme complexes.
PMCID: PMC3380580  PMID: 22577155
genome-wide methylation; epigenetic mark; intragenic transcription; methylating enzyme complexes
11.  Do human transposable element small RNAs serve primarily as genome defenders or genome regulators? 
Mobile Genetic Elements  2012;2(1):19-25.
It is currently thought that small RNA (sRNA) based repression mechanisms are primarily employed to mitigate the mutagenic threat posed by the activity of transposable elements (TEs). This can be achieved by the sRNA guided processing of TE transcripts via Dicer-dependent (e.g., siRNA) or Dicer-independent (e.g., piRNA) mechanisms. For example, potentially active human L1 elements are silenced by mRNA cleavage induced by element encoded siRNAs, leading to a negative correlation between element mRNA and siRNA levels. On the other hand, there is emerging evidence that TE derived sRNAs can also be used to regulate the host genome. Here, we evaluated these two hypotheses for human TEs by comparing the levels of TE derived mRNA and TE sRNA across six tissues. The genome defense hypothesis predicts a negative correlation between TE mRNA and TE sRNA levels, whereas the genome regulatory hypothesis predicts a positive correlation. On average, TE mRNA and TE sRNA levels are positively correlated across human tissues. These correlations are higher than seen for human genes or for randomly permuted control data sets. Overall, Alu subfamilies show the highest positive correlations of element mRNA and sRNA levels across tissues, although a few of the youngest, and potentially most active, Alu subfamilies do show negative correlations. Thus, Alu derived sRNAs may be related to both genome regulation and genome defense. These results are inconsistent with a simple model whereby TE derived sRNAs reduce levels of standing TE mRNA via transcript cleavage, and suggest that human cells efficiently process TE transcripts into sRNA based on the available message levels. This may point to a widespread role for processed TE transcripts in genome regulation or to alternative roles of TE-to-sRNA processing including the mitigation of TE transcript cytotoxicity.
PMCID: PMC3383446  PMID: 22754749
RNA interference; RNA processing; gene expression; genome regulation; small RNA
12.  Repetitive DNA elements, nucleosome binding and human gene expression 
Gene  2009;436(1-2):12-22.
We evaluated the epigenetic contributions of repetitive DNA elements to human gene regulation. Human proximal promoter sequences show distinct distributions of transposable elements (TEs) and simple sequence repeats (SSRs). TEs are enriched distal from transcriptional start sites (TSSs) and their frequency decreases closer to TSSs being largely absent from the core promoter region. SSRs, on the other hand, are found at low frequency distal to the TSS and then increase in frequency starting ∼150bp upstream of the TSS. The peak of SSR density is centered around the -35bp position where the basal transcriptional machinery assembles. These trends in repetitive sequence distribution are strongly correlated, positively for TEs and negatively for SSRs, with relative nucleosome binding affinities along the promoters. Nucleosomes bind with highest probability distal from the TSS and the nucleosome binding affinity steadily decreases reaching its nadir just upstream of the TSS at the same point where SSR frequency is at its highest. Promoters that are enriched for TEs are more highly and broadly expressed, on average, than promoters that are devoid of TEs. In addition, promoters that have similar repetitive DNA profiles regulate genes that have more similar expression patterns and encode proteins with more similar functions than promoters that differ with respect to their repetitive DNA. Furthermore, distinct repetitive DNA promoter profiles are correlated with tissue-specific patterns of expression. These observations indicate that repetitive DNA elements mediate chromatin accessibility in proximal promoter regions and the repeat content of promoters is relevant to both gene expression and function.
PMCID: PMC2921533  PMID: 19393174
13.  Genome Sequences for Six Rhodanobacter Strains, Isolated from Soils and the Terrestrial Subsurface, with Variable Denitrification Capabilities 
Journal of Bacteriology  2012;194(16):4461-4462.
We report the first genome sequences for six strains of Rhodanobacter species isolated from a variety of soil and subsurface environments. Three of these strains are capable of complete denitrification and three others are not. However, all six strains contain most of the genes required for the respiration of nitrate to gaseous nitrogen. The nondenitrifying members of the genus lack only the gene for nitrate reduction, the first step in the full denitrification pathway. The data suggest that the environmental role of bacteria from the genus Rhodanobacter should be reevaluated.
PMCID: PMC3416251  PMID: 22843592
14.  Co-evolutionary Rates of Functionally Related Yeast Genes 
Evolutionary knowledge is often used to facilitate computational attempts at gene function prediction. One rich source of evolutionary information is the relative rates of gene sequence divergence, and in this report we explore the connection between gene evolutionary rates and function. We performed a genome-scale evaluation of the relationship between evolutionary rates and functional annotations for the yeast Saccharomyces cerevisiae. Non-synonymous (dN) and synonymous (dS) substitution rates were calculated for 1,095 orthologous gene sets common to S. cerevisiae and six other closely related yeast species. Differences in evolutionary rates between pairs of genes (ΔdN & ΔdS) were then compared to their functional similarities (sGO), which were measured using Gene Ontology (GO) annotations. Substantial and statistically significant correlations were found between ΔdN and sGO, whereas there is no apparent relationship between ΔdS and sGO. These results are consistent with a mode of action for natural selection that is based on similar rates of elimination of deleterious protein coding sequence variants for functionally related genes. The connection between gene evolutionary rates and function was stronger than seen for phylogenetic profiles, which have previously been employed to inform functional inference. The co-evolution of functionally related yeast genes points to the relevance of specific function for the efficacy of natural selection and underscores the utility of gene evolutionary rates for functional predictions.
PMCID: PMC2674680  PMID: 18345352
Functional inference; Co-evolution; natural selection; genome evolution; gene ontology
15.  Depletion of nuclear histone H2A variants is associated with chronic DNA damage signaling upon drug-evoked senescence of human somatic cells 
Aging (Albany NY)  2012;4(11):823-842.
Cellular senescence is associated with global chromatin changes, altered gene expression, and activation of chronic DNA damage signaling. These events ultimately lead to morphological and physiological transformations in primary cells. In this study, we show that chronic DNA damage signals caused by genotoxic stress impact the expression of histones H2A family members and lead to their depletion in the nuclei of senescent human fibroblasts. Our data reinforce the hypothesis that progressive chromatin destabilization may lead to the loss of epigenetic information and impaired cellular function associated with chronic DNA damage upon drug-evoked senescence. We propose that changes in the histone biosynthesis and chromatin assembly may directly contribute to cellular aging. In addition, we also outline the method that allows for quantitative and unbiased measurement of these changes.
PMCID: PMC3560435  PMID: 23235539
γH2A.X; DNA damage; senescence; LS-MS analysis; quantitative proteomic; SRM; histone H2A family; chromatin; DNA repair; HCA2 primary fibroblasts; epigenetics
16.  Cell type-specific termination of transcription by transposable element sequences 
Mobile DNA  2012;3:15.
Transposable elements (TEs) encode sequences necessary for their own transposition, including signals required for the termination of transcription. TE sequences within the introns of human genes show an antisense orientation bias, which has been proposed to reflect selection against TE sequences in the sense orientation owing to their ability to terminate the transcription of host gene transcripts. While there is evidence in support of this model for some elements, the extent to which TE sequences actually terminate transcription of human gene across the genome remains an open question.
Using high-throughput sequencing data, we have characterized over 9,000 distinct TE-derived sequences that provide transcription termination sites for 5,747 human genes across eight different cell types. Rarefaction curve analysis suggests that there may be twice as many TE-derived termination sites (TE-TTS) genome-wide among all human cell types. The local chromatin environment for these TE-TTS is similar to that seen for 3′ UTR canonical TTS and distinct from the chromatin environment of other intragenic TE sequences. However, those TE-TTS located within the introns of human genes were found to be far more cell type-specific than the canonical TTS. TE-TTS were much more likely to be found in the sense orientation than other intragenic TE sequences of the same TE family and TE-TTS in the sense orientation terminate transcription more efficiently than those found in the antisense orientation. Alu sequences were found to provide a large number of relatively weak TTS, whereas LTR elements provided a smaller number of much stronger TTS.
TE sequences provide numerous termination sites to human genes, and TE-derived TTS are particularly cell type-specific. Thus, TE sequences provide a powerful mechanism for the diversification of transcriptional profiles between cell types and among evolutionary lineages, since most TE-TTS are evolutionarily young. The extent of transcription termination by TEs seen here, along with the preference for sense-oriented TE insertions to provide TTS, is consistent with the observed antisense orientation bias of human TEs.
PMCID: PMC3517506  PMID: 23020800
Polyadenylation; Transcription termination; Orientation bias; Gene regulation
17.  Chromatin signature discovery via histone modification profile alignments 
Nucleic Acids Research  2012;40(21):10642-10656.
We report on the development of an unsupervised algorithm for the genome-wide discovery and analysis of chromatin signatures. Our Chromatin-profile Alignment followed by Tree-clustering algorithm (ChAT) employs dynamic programming of combinatorial histone modification profiles to identify locally similar chromatin sub-regions and provides complementary utility with respect to existing methods. We applied ChAT to genomic maps of 39 histone modifications in human CD4+ T cells to identify both known and novel chromatin signatures. ChAT was able to detect chromatin signatures previously associated with transcription start sites and enhancers as well as novel signatures associated with a variety of regulatory elements. Promoter-associated signatures discovered with ChAT indicate that complex chromatin signatures, made up of numerous co-located histone modifications, facilitate cell-type specific gene expression. The discovery of novel L1 retrotransposon-associated bivalent chromatin signatures suggests that these elements influence the mono-allelic expression of human genes by shaping the chromatin environment of imprinted genomic regions. Analysis of long gene-associated chromatin signatures point to a role for the H4K20me1 and H3K79me3 histone modifications in transcriptional pause release. The novel chromatin signatures and functional associations uncovered by ChAT underscore the ability of the algorithm to yield novel insight on chromatin-based regulatory mechanisms.
PMCID: PMC3505981  PMID: 22989711
18.  Relating the Disease Mutation Spectrum to the Evolution of the Cystic Fibrosis Transmembrane Conductance Regulator (CFTR) 
PLoS ONE  2012;7(8):e42336.
Cystic fibrosis (CF) is the most common genetic disease among Caucasians, and accordingly the cystic fibrosis transmembrane conductance regulator (CFTR) protein has perhaps the best characterized disease mutation spectrum with more than 1,500 causative mutations having been identified. In this study, we took advantage of that wealth of mutational information in an effort to relate site-specific evolutionary parameters with the propensity and severity of CFTR disease-causing mutations. To do this, we devised a scoring scheme for known CFTR disease-causing mutations based on the Grantham amino acid chemical difference matrix. CFTR site-specific evolutionary constraint values were then computed for seven different evolutionary metrics across a range of increasing evolutionary depths. The CFTR mutational scores and the various site-specific evolutionary constraint values were compared in order to evaluate which evolutionary measures best reflect the disease-causing mutation spectrum. Site-specific evolutionary constraint values from the widely used comparative method PolyPhen2 show the best correlation with the CFTR mutation score spectrum, whereas more straightforward conservation based measures (ConSurf and ScoreCons) show the greatest ability to predict individual CFTR disease-causing mutations. While far greater than could be expected by chance alone, the fraction of the variability in mutation scores explained by the PolyPhen2 metric (3.6%), along with the best set of paired sensitivity (58%) and specificity (60%) values for the prediction of disease-causing residues, were marginal. These data indicate that evolutionary constraint levels are informative but far from determinant with respect to disease-causing mutations in CFTR. Nevertheless, this work shows that, when combined with additional lines of evidence, information on site-specific evolutionary conservation can and should be used to guide site-directed mutagenesis experiments by more narrowly defining the set of target residues, resulting in a potential savings of both time and money.
PMCID: PMC3413703  PMID: 22879944
19.  Co-evolutionary Rates of Functionally Related Yeast Genes 
Evolutionary knowledge is often used to facilitate computational attempts at gene function prediction. One rich source of evolutionary information is the relative rates of gene sequence divergence, and in this report we explore the connection between gene evolutionary rates and function. We performed a genome-scale evaluation of the relationship between evolutionary rates and functional annotations for the yeast Saccharomyces cerevisiae. Non-synonymous (dN) and synonymous (dS) substitution rates were calculated for 1,095 orthologous gene sets common to S. cerevisiae and six other closely related yeast species. Differences in evolutionary rates between pairs of genes (ΔdN & ΔdS) were then compared to their functional similarities (sGO), which were measured using Gene Ontology (GO) annotations. Substantial and statistically significant correlations were found between ΔdN and sGO, whereas there is no apparent relationship between ΔdS and sGO. These results are consistent with a mode of action for natural selection that is based on similar rates of elimination of deleterious protein coding sequence variants for functionally related genes. The connection between gene evolutionary rates and function was stronger than seen for phylogenetic profiles, which have previously been employed to inform functional inference. The co-evolution of functionally related yeast genes points to the relevance of specific function for the efficacy of natural selection and underscores the utility of gene evolutionary rates for functional predictions.
PMCID: PMC2674680  PMID: 18345352
Functional inference; Co-evolution; natural selection; genome evolution; gene ontology
20.  Genome Sequences for Five Strains of the Emerging Pathogen Haemophilus haemolyticus 
Journal of Bacteriology  2011;193(20):5879-5880.
We report the first whole-genome sequences for five strains, two carried and three pathogenic, of the emerging pathogen Haemophilus haemolyticus. Preliminary analyses indicate that these genome sequences encode markers that distinguish H. haemolyticus from its closest Haemophilus relatives and provide clues to the identity of its virulence factors.
PMCID: PMC3187195  PMID: 21952546
21.  Genome Sequence of the Mycobacterium colombiense Type Strain, CECT 3035 
Journal of Bacteriology  2011;193(20):5866-5867.
We report the first whole-genome sequence of the Mycobacterium colombiense type strain, CECT 3035, which was initially isolated from Colombian HIV-positive patients and causes respiratory and disseminated infections. Preliminary comparative analyses indicate that the M. colombiense lineage has experienced a substantial genome expansion, possibly contributing to its distinct pathogenic capacity.
PMCID: PMC3187203  PMID: 21952541
22.  Using Single-Nucleotide Polymorphisms To Discriminate Disease-Associated from Carried Genomes of Neisseria meningitidis▿† 
Journal of Bacteriology  2011;193(14):3633-3641.
Neisseria meningitidis is one of the main agents of bacterial meningitis, causing substantial morbidity and mortality worldwide. However, most of the time N. meningitidis is carried as a commensal not associated with invasive disease. The genomic basis of the difference between disease-associated and carried isolates of N. meningitidis may provide critical insight into mechanisms of virulence, yet it has remained elusive. Here, we have taken a comparative genomics approach to interrogate the difference between disease-associated and carried isolates of N. meningitidis at the level of individual nucleotide variations (i.e., single nucleotide polymorphisms [SNPs]). We aligned complete genome sequences of 8 disease-associated and 4 carried isolates of N. meningitidis to search for SNPs that show mutually exclusive patterns of variation between the two groups. We found 63 SNPs that distinguish the 8 disease-associated genomes from the 4 carried genomes of N. meningitidis, which is far more than can be expected by chance alone given the level of nucleotide variation among the genomes. The putative list of SNPs that discriminate between disease-associated and carriage genomes may be expected to change with increased sampling or changes in the identities of the isolates being compared. Nevertheless, we show that these discriminating SNPs are more likely to reflect phenotypic differences than shared evolutionary history. Discriminating SNPs were mapped to genes, and the functions of the genes were evaluated for possible connections to virulence mechanisms. A number of overrepresented functional categories related to virulence were uncovered among SNP-associated genes, including genes related to the category “symbiosis, encompassing mutualism through parasitism.”
PMCID: PMC3133314  PMID: 21622743
23.  Protein interactions with piALU RNA indicates putative participation of retroRNA in the cell cycle, DNA repair and chromatin assembly 
Mobile Genetic Elements  2012;2(1):26-35.
Recent analyses suggest that transposable element-derived transcripts are processed to yield a variety of small RNA species that play critical functional roles in gene regulation and chromatin organization as well as genome stability and maintenance. Here we report a mass spectrometry analysis of an RNA-affinity complex isolation using a piRNA homologous sequence derived from Alu retrotransposal RNA. Our data point to potential roles for piALU RNAs in DNA repair, cell cycle and chromatin regulations.
PMCID: PMC3383447  PMID: 22754750
Alu; DNA repair; PIWI; SINE; TE (transposable element); cell cycle; centromere; chromatin modifiers; histone modifiers; kinetochore assembly; microtubule dynamics; piRNA; transcriptional factors
24.  Constant relative rate of protein evolution and detection of functional diversification among bacterial, archaeal and eukaryotic proteins 
Genome Biology  2001;2(12):research0053.1-research0053.9.
Detection of changes in a protein's evolutionary rate may reveal cases of change in that protein's function. We developed and implemented a simple relative rates test in an attempt to assess the rate constancy of protein evolution and to detect cases of functional diversification between orthologous proteins. The test was performed on clusters of orthologous protein sequences from complete bacterial genomes (Chlamydia trachomatis, C. muridarum and Chlamydophila pneumoniae), complete archaeal genomes (Pyrococcus horikoshii, P. abyssi and P. furiosus) and partially sequenced mammalian genomes (human, mouse and rat).
Amino-acid sequence evolution rates are significantly correlated on different branches of phylogenetic trees representing the great majority of analyzed orthologous protein sets from all three domains of life. However, approximately 1% of the proteins from each group of species deviates from this pattern and instead shows variation that is consistent with an acceleration of the rate of amino-acid substitution, which may be due to functional diversification. Most of the putative functionally diversified proteins from all three species groups are predicted to function at the periphery of the cells and mediate their interaction with the environment.
Relative rates of protein evolution are remarkably constant for the three species groups analyzed here. Deviations from this rate constancy are probably due to changes in selective constraints associated with diversification between orthologs. Functional diversification between orthologs is thought to be a relatively rare event. However, the resolution afforded by the test designed specifically for genomic-scale datasets allowed us to identify numerous cases of possible functional diversification between orthologous proteins.
PMCID: PMC64838  PMID: 11790256
25.  Transcription factor binding sites are highly enriched within microRNA precursor sequences 
Biology Direct  2011;6:61.
Transcription factors are thought to regulate the transcription of microRNA genes in a manner similar to that of protein-coding genes; that is, by binding to conventional transcription factor binding site DNA sequences located in or near promoter regions that lie upstream of the microRNA genes. However, in the course of analyzing the genomics of human microRNA genes, we noticed that annotated transcription factor binding sites commonly lie within 70- to 110-nt long microRNA small hairpin precursor sequences.
We report that about 45% of all human small hairpin microRNA (pre-miR) sequences contain at least one predicted transcription factor binding site motif that is conserved across human, mouse and rat, and this rises to over 75% if one excludes primate-specific pre-miRs. The association is robust and has extremely strong statistical significance; it affects both intergenic and intronic pre-miRs and both isolated and clustered microRNA genes. We also confirmed and extended this finding using a separate analysis that examined all human pre-miR sequences regardless of conservation across species.
The transcription factor binding sites localized within small hairpin microRNA precursor sequences may possibly regulate their transcription. Transcription factors may also possibly bind directly to nascent primary microRNA gene transcripts or small hairpin microRNA precursors and regulate their processing.
This article was reviewed by Guillaume Bourque (nominated by Jerzy Jurka), Dmitri Pervouchine (nominated by Mikhail Gelfand), and Yuriy Gusev.
PMCID: PMC3240832  PMID: 22136256
Transcription factors; microRNA biogenesis; drosha

Results 1-25 (47)