PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (42)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
more »
1.  Database resources of the National Center for Biotechnology Information 
Nucleic Acids Research  2013;42(Database issue):D7-D17.
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, PubReader, Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link, Primer-BLAST, COBALT, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, the Genetic Testing Registry, Genome and related tools, the Map Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, ClinVar, MedGen, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Probe, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All these resources can be accessed through the NCBI home page.
doi:10.1093/nar/gkt1146
PMCID: PMC3965057  PMID: 24259429
2.  HMGN1 Modulates Nucleosome Occupancy and DNase I Hypersensitivity at the CpG Island Promoters of Embryonic Stem Cells 
Molecular and Cellular Biology  2013;33(16):3377-3389.
Chromatin structure plays a key role in regulating gene expression and embryonic differentiation; however, the factors that determine the organization of chromatin around regulatory sites are not fully known. Here we show that HMGN1, a nucleosome-binding protein ubiquitously expressed in vertebrate cells, preferentially binds to CpG island-containing promoters and affects the organization of nucleosomes, DNase I hypersensitivity, and the transcriptional profile of mouse embryonic stem cells and neural progenitors. Loss of HMGN1 alters the organization of an unstable nucleosome at transcription start sites, reduces the number of DNase I-hypersensitive sites genome wide, and decreases the number of nestin-positive neural progenitors in the subventricular zone (SVZ) region of mouse brain. Thus, architectural chromatin-binding proteins affect the transcription profile and chromatin structure during embryonic stem cell differentiation.
doi:10.1128/MCB.00435-13
PMCID: PMC3753902  PMID: 23775126
3.  Identification of Immunity Related Genes to Study the Physalis peruviana – Fusarium oxysporum Pathosystem 
PLoS ONE  2013;8(7):e68500.
The Cape gooseberry (Physalisperuviana L) is an Andean exotic fruit with high nutritional value and appealing medicinal properties. However, its cultivation faces important phytosanitary problems mainly due to pathogens like Fusarium oxysporum, Cercosporaphysalidis and Alternaria spp. Here we used the Cape gooseberry foliar transcriptome to search for proteins that encode conserved domains related to plant immunity including: NBS (Nucleotide Binding Site), CC (Coiled-Coil), TIR (Toll/Interleukin-1 Receptor). We identified 74 immunity related gene candidates in P. peruviana which have the typical resistance gene (R-gene) architecture, 17 Receptor like kinase (RLKs) candidates related to PAMP-Triggered Immunity (PTI), eight (TIR-NBS-LRR, or TNL) and nine (CC–NBS-LRR, or CNL) candidates related to Effector-Triggered Immunity (ETI) genes among others. These candidate genes were categorized by molecular function (98%), biological process (85%) and cellular component (79%) using gene ontology. Some of the most interesting predicted roles were those associated with binding and transferase activity. We designed 94 primers pairs from the 74 immunity-related genes (IRGs) to amplify the corresponding genomic regions on six genotypes that included resistant and susceptible materials. From these, we selected 17 single band amplicons and sequenced them in 14 F. oxysporum resistant and susceptible genotypes. Sequence polymorphisms were analyzed through preliminary candidate gene association, which allowed the detection of one SNP at the PpIRG-63 marker revealing a nonsynonymous mutation in the predicted LRR domain suggesting functional roles for resistance.
doi:10.1371/journal.pone.0068500
PMCID: PMC3701084  PMID: 23844210
4.  In silico identification and characterization of the ion transport specificity for P-type ATPases in the Mycobacterium tuberculosis complex 
Background
P-type ATPases hydrolyze ATP and release energy that is used in the transport of ions against electrochemical gradients across plasma membranes, making these proteins essential for cell viability. Currently, the distribution and function of these ion transporters in mycobacteria are poorly understood.
Results
In this study, probabilistic profiles were constructed based on hidden Markov models to identify and classify P-type ATPases in the Mycobacterium tuberculosis complex (MTBC) according to the type of ion transported across the plasma membrane. Topology, hydrophobicity profiles and conserved motifs were analyzed to correlate amino acid sequences of P-type ATPases and ion transport specificity. Twelve candidate P-type ATPases annotated in the M. tuberculosis H37Rv proteome were identified in all members of the MTBC, and probabilistic profiles classified them into one of the following three groups: heavy metal cation transporters, alkaline and alkaline earth metal cation transporters, and the beta subunit of a prokaryotic potassium pump. Interestingly, counterparts of the non-catalytic beta subunits of Hydrogen/Potassium and Sodium/Potassium P-type ATPases were not found.
Conclusions
The high content of heavy metal transporters found in the MTBC suggests that they could play an important role in the ability of M. tuberculosis to survive inside macrophages, where tubercle bacilli face high levels of toxic metals. Finally, the results obtained in this work provide a starting point for experimental studies that may elucidate the ion specificity of the MTBC P-type ATPases and their role in mycobacterial infections.
doi:10.1186/1472-6807-12-25
PMCID: PMC3573892  PMID: 23031689
Tuberculosis; Mycobacterium tuberculosis complex; P-type ATPases; Ion transport; Conserved motifs
5.  Differences in local genomic context of bound and unbound motifs 
Gene  2012;506(1):125-134.
Understanding gene regulation is a major objective in molecular biology research. Frequently, transcription is driven by transcription factors (TFs) that bind to specific DNA sequences. These motifs are usually short and degenerate, rendering the likelihood of multiple copies occurring throughout the genome due to random chance as high. Despite this, TFs only bind to a small subset of sites, thus prompting our investigation into the differences between motifs that are bound by TFs and those that remain unbound. Here we constructed vectors representing various chromatin- and sequence-based features for a published set of bound and unbound motifs representing nine TFs in the budding yeast Saccharomyces cerevisiae. Using a machine learning approach, we identified a set of features that can be used to discriminate between bound and unbound motifs. We also discovered that some TFs bind most or all of their strong motifs in intergenic regions. Our data demonstrate that local sequence context can be strikingly different around motifs that are bound compared to motifs that are unbound. We concluded that there are multiple combinations of genomic features that characterize bound or unbound motifs.
doi:10.1016/j.gene.2012.06.005
PMCID: PMC3412921  PMID: 22692006
Gene regulation; yeast; transcription factors; genomic features; machine learning
6.  A unified phylogeny-based nomenclature for histone variants 
Histone variants are non-allelic protein isoforms that play key roles in diversifying chromatin structure. The known number of such variants has greatly increased in recent years, but the lack of naming conventions for them has led to a variety of naming styles, multiple synonyms and misleading homographs that obscure variant relationships and complicate database searches. We propose here a unified nomenclature for variants of all five classes of histones that uses consistent but flexible naming conventions to produce names that are informative and readily searchable. The nomenclature builds on historical usage and incorporates phylogenetic relationships, which are strong predictors of structure and function. A key feature is the consistent use of punctuation to represent phylogenetic divergence, making explicit the relationships among variant subtypes that have previously been implicit or unclear. We recommend that by default new histone variants be named with organism-specific paralog-number suffixes that lack phylogenetic implication, while letter suffixes be reserved for structurally distinct clades of variants. For clarity and searchability, we encourage the use of descriptors that are separate from the phylogeny-based variant name to indicate developmental and other properties of variants that may be independent of structure.
doi:10.1186/1756-8935-5-7
PMCID: PMC3380720  PMID: 22650316
7.  The Physalis peruviana leaf transcriptome: assembly, annotation and gene model prediction 
BMC Genomics  2012;13:151.
Background
Physalis peruviana commonly known as Cape gooseberry is a member of the Solanaceae family that has an increasing popularity due to its nutritional and medicinal values. A broad range of genomic tools is available for other Solanaceae, including tomato and potato. However, limited genomic resources are currently available for Cape gooseberry.
Results
We report the generation of a total of 652,614 P. peruviana Expressed Sequence Tags (ESTs), using 454 GS FLX Titanium technology. ESTs, with an average length of 371 bp, were obtained from a normalized leaf cDNA library prepared using a Colombian commercial variety. De novo assembling was performed to generate a collection of 24,014 isotigs and 110,921 singletons, with an average length of 1,638 bp and 354 bp, respectively. Functional annotation was performed using NCBI’s BLAST tools and Blast2GO, which identified putative functions for 21,191 assembled sequences, including gene families involved in all the major biological processes and molecular functions as well as defense response and amino acid metabolism pathways. Gene model predictions in P. peruviana were obtained by using the genomes of Solanum lycopersicum (tomato) and Solanum tuberosum (potato). We predict 9,436 P. peruviana sequences with multiple-exon models and conserved intron positions with respect to the potato and tomato genomes. Additionally, to study species diversity we developed 5,971 SSR markers from assembled ESTs.
Conclusions
We present the first comprehensive analysis of the Physalis peruviana leaf transcriptome, which will provide valuable resources for development of genetic tools in the species. Assembled transcripts with gene models could serve as potential candidates for marker discovery with a variety of applications including: functional diversity, conservation and improvement to increase productivity and fruit quality. P. peruviana was estimated to be phylogenetically branched out before the divergence of five other Solanaceae family members, S. lycopersicum, S. tuberosum, Capsicum spp, S. melongena and Petunia spp.
doi:10.1186/1471-2164-13-151
PMCID: PMC3488962  PMID: 22533342
P. peruviana; Solanaceae; ESTs; Functional annotation; Gene model; Phylogenetics
8.  Genome Sequence of the Mycobacterium colombiense Type Strain, CECT 3035 
Journal of Bacteriology  2011;193(20):5866-5867.
We report the first whole-genome sequence of the Mycobacterium colombiense type strain, CECT 3035, which was initially isolated from Colombian HIV-positive patients and causes respiratory and disseminated infections. Preliminary comparative analyses indicate that the M. colombiense lineage has experienced a substantial genome expansion, possibly contributing to its distinct pathogenic capacity.
doi:10.1128/JB.05928-11
PMCID: PMC3187203  PMID: 21952541
9.  Analysis of Biological Features Associated with Meiotic Recombination Hot and Cold Spots in Saccharomyces cerevisiae 
PLoS ONE  2011;6(12):e29711.
Meiotic recombination is not distributed uniformly throughout the genome. There are regions of high and low recombination rates called hot and cold spots, respectively. The recombination rate parallels the frequency of DNA double-strand breaks (DSBs) that initiate meiotic recombination. The aim is to identify biological features associated with DSB frequency. We constructed vectors representing various chromatin and sequence-based features for 1179 DSB hot spots and 1028 DSB cold spots. Using a feature selection approach, we have identified five features that distinguish hot from cold spots in Saccharomyces cerevisiae with high accuracy, namely the histone marks H3K4me3, H3K14ac, H3K36me3, and H3K79me3; and GC content. Previous studies have associated H3K4me3, H3K36me3, and GC content with areas of mitotic recombination. H3K14ac and H3K79me3 are novel predictions and thus represent good candidates for further experimental study. We also show nucleosome occupancy maps produced using next generation sequencing exhibit a bias at DSB hot spots and this bias is strong enough to obscure biologically relevant information. A computational approach using feature selection can productively be used to identify promising biological associations. H3K14ac and H3K79me3 are novel predictions of chromatin marks associated with meiotic DSBs. Next generation sequencing can exhibit a bias that is strong enough to lead to incorrect conclusions. Care must be taken when interpreting high throughput sequencing data where systematic biases have been documented.
doi:10.1371/journal.pone.0029711
PMCID: PMC3248464  PMID: 22242140
10.  Database resources of the National Center for Biotechnology Information 
Nucleic Acids Research  2011;40(Database issue):D13-D25.
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Website. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Probe, Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
doi:10.1093/nar/gkr1184
PMCID: PMC3245031  PMID: 22140104
12.  The Histone Database: an integrated resource for histones and histone fold-containing proteins 
Eukaryotic chromatin is composed of DNA and protein components—core histones—that act to compactly pack the DNA into nucleosomes, the fundamental building blocks of chromatin. These nucleosomes are connected to adjacent nucleosomes by linker histones. Nucleosomes are highly dynamic and, through various core histone post-translational modifications and incorporation of diverse histone variants, can serve as epigenetic marks to control processes such as gene expression and recombination. The Histone Sequence Database is a curated collection of sequences and structures of histones and non-histone proteins containing histone folds, assembled from major public databases. Here, we report a substantial increase in the number of sequences and taxonomic coverage for histone and histone fold-containing proteins available in the database. Additionally, the database now contains an expanded dataset that includes archaeal histone sequences. The database also provides comprehensive multiple sequence alignments for each of the four core histones (H2A, H2B, H3 and H4), the linker histones (H1/H5) and the archaeal histones. The database also includes current information on solved histone fold-containing structures. The Histone Sequence Database is an inclusive resource for the analysis of chromatin structure and function focused on histones and histone fold-containing proteins.
Database URL: The Histone Sequence Database is freely available and can be accessed at http://research.nhgri.nih.gov/histones/.
doi:10.1093/database/bar048
PMCID: PMC3199919  PMID: 22025671
14.  Effects of HMGN variants on the cellular transcription profile 
Nucleic Acids Research  2011;39(10):4076-4087.
High mobility group N (HMGN) is a family of intrinsically disordered nuclear proteins that bind to nucleosomes, alters the structure of chromatin and affects transcription. A major unresolved question is the extent of functional specificity, or redundancy, between the various members of the HMGN protein family. Here, we analyze the transcriptional profile of cells in which the expression of various HMGN proteins has been either deleted or doubled. We find that both up- and downregulation of HMGN expression altered the cellular transcription profile. Most, but not all of the changes were variant specific, suggesting limited redundancy in transcriptional regulation. Analysis of point and swap HMGN mutants revealed that the transcriptional specificity is determined by a unique combination of a functional nucleosome-binding domain and C-terminal domain. Doubling the amount of HMGN had a significantly larger effect on the transcription profile than total deletion, suggesting that the intrinsically disordered structure of HMGN proteins plays an important role in their function. The results reveal an HMGN-variant-specific effect on the fidelity of the cellular transcription profile, indicating that functionally the various HMGN subtypes are not fully redundant.
doi:10.1093/nar/gkq1343
PMCID: PMC3105402  PMID: 21278158
15.  Towards BioDBcore: a community-defined information specification for biological databases 
The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources; and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.
doi:10.1093/database/baq027
PMCID: PMC3017395  PMID: 21205783
16.  Database resources of the National Center for Biotechnology Information 
Nucleic Acids Research  2010;39(Database issue):D38-D51.
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Electronic PCR, OrfFinder, Splign, ProSplign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), IBIS, Biosystems, Peptidome, OMSSA, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
doi:10.1093/nar/gkq1172
PMCID: PMC3013733  PMID: 21097890
17.  Towards BioDBcore: a community-defined information specification for biological databases 
Nucleic Acids Research  2010;39(Database issue):D7-D10.
The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.
doi:10.1093/nar/gkq1173
PMCID: PMC3013734  PMID: 21097465
18.  Repetitive DNA elements, nucleosome binding and human gene expression 
Gene  2009;436(1-2):12-22.
We evaluated the epigenetic contributions of repetitive DNA elements to human gene regulation. Human proximal promoter sequences show distinct distributions of transposable elements (TEs) and simple sequence repeats (SSRs). TEs are enriched distal from transcriptional start sites (TSSs) and their frequency decreases closer to TSSs being largely absent from the core promoter region. SSRs, on the other hand, are found at low frequency distal to the TSS and then increase in frequency starting ∼150bp upstream of the TSS. The peak of SSR density is centered around the -35bp position where the basal transcriptional machinery assembles. These trends in repetitive sequence distribution are strongly correlated, positively for TEs and negatively for SSRs, with relative nucleosome binding affinities along the promoters. Nucleosomes bind with highest probability distal from the TSS and the nucleosome binding affinity steadily decreases reaching its nadir just upstream of the TSS at the same point where SSR frequency is at its highest. Promoters that are enriched for TEs are more highly and broadly expressed, on average, than promoters that are devoid of TEs. In addition, promoters that have similar repetitive DNA profiles regulate genes that have more similar expression patterns and encode proteins with more similar functions than promoters that differ with respect to their repetitive DNA. Furthermore, distinct repetitive DNA promoter profiles are correlated with tissue-specific patterns of expression. These observations indicate that repetitive DNA elements mediate chromatin accessibility in proximal promoter regions and the repeat content of promoters is relevant to both gene expression and function.
doi:10.1016/j.gene.2009.01.013
PMCID: PMC2921533  PMID: 19393174
19.  Many sequence-specific chromatin modifying protein-binding motifs show strong positional preferences for potential regulatory regions in the Saccharomyces cerevisiae genome 
Nucleic Acids Research  2010;38(6):1772-1779.
Initiation and regulation of gene expression is critically dependent on the binding of transcriptional regulators, which is often temporal and position specific. Many transcriptional regulators recognize and bind specific DNA motifs. The length and degeneracy of these motifs results in their frequent occurrence within the genome, with only a small subset serving as actual binding sites. By occupying potential binding sites, nucleosome placement can specify which sequence motif is available for DNA-binding regulatory factors. Therefore, the specification of nucleosome placement to allow access to transcriptional regulators whenever and wherever required is critical. We show that many DNA-binding motifs in Saccharomyces cerevisiae show a strong positional preference to occur only in potential regulatory regions. Furthermore, using gene ontology enrichment tools, we demonstrate that proteins with binding motifs that show the strongest positional preference also have a tendency to have chromatin-modifying properties and functions. This suggests that some DNA-binding proteins may depend on the distribution of their binding motifs across the genome to assist in the determination of specificity. Since many of these DNA-binding proteins have chromatin remodeling properties, they can alter the local nucleosome structure to a more permissive and/or restrictive state, thereby assisting in determining DNA-binding protein specificity.
doi:10.1093/nar/gkp1195
PMCID: PMC2847247  PMID: 20047965
20.  Database resources of the National Center for Biotechnology Information 
Nucleic Acids Research  2009;38(Database issue):D5-D16.
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, Reference Sequence, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Peptidome, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
doi:10.1093/nar/gkp967
PMCID: PMC2808881  PMID: 19910364
21.  Identification of cis-Regulatory Elements in Gene Co-expression Networks Using A-GLAM 
Reliable identification and assignment of cis-regulatory elements in promoter regions is a challenging problem in biology. The sophistication of transcriptional regulation in higher eukaryotes, particularly in metazoans, could be an important factor contributing to their organismal complexity. Here we present an integrated approach where networks of co-expressed genes are combined with gene ontology–derived functional networks to discover clusters of genes that share both similar expression patterns and functions. Regulatory elements are identified in the promoter regions of these gene clusters using a Gibbs sampling algorithm implemented in the A-GLAM software package. Using this approach, we analyze the cell-cycle co-expression network of the yeast Saccharomyces cerevisiae, showing that this approach correctly identifies cis-regulatory elements present in clusters of co-expressed genes.
doi:10.1007/978-1-59745-243-4_1
PMCID: PMC2719765  PMID: 19381547
Promoter sequences; transcription factor–binding sites; co-expression; networks; gene ontology; Gibbs sampling
22.  Promoter Analysis: Gene Regulatory Motif Identification with A-GLAM 
Reliable detection of cis-regulatory elements in promoter regions is a difficult and unsolved problem in computational biology. The intricacy of transcriptional regulation in higher eukaryotes, primarily in metazoans, could be a major driving force of organismal complexity. Eukaryotic genome annotations have improved greatly due to large-scale characterization of full-length cDNAs, transcriptional start sites (TSSs), and comparative genomics. Regulatory elements are identified in promoter regions using a variety of enumerative or alignment-based methods. Here we present a survey of recent computational methods for eukaryotic promoter analysis and describe the use of an alignment-based method implemented in the A-GLAM program.
doi:10.1007/978-1-59745-251-9_13
PMCID: PMC2702474  PMID: 19378149
Promoter regions; transcription factor binding sites; enumerative methods; promoter comparison
25.  Expression Patterns of Protein Kinases Correlate with Gene Architecture and Evolutionary Rates 
PLoS ONE  2008;3(10):e3599.
Background
Protein kinase (PK) genes comprise the third largest superfamily that occupy ∼2% of the human genome. They encode regulatory enzymes that control a vast variety of cellular processes through phosphorylation of their protein substrates. Expression of PK genes is subject to complex transcriptional regulation which is not fully understood.
Principal Findings
Our comparative analysis demonstrates that genomic organization of regulatory PK genes differs from organization of other protein coding genes. PK genes occupy larger genomic loci, have longer introns, spacer regions, and encode larger proteins. The primary transcript length of PK genes, similar to other protein coding genes, inversely correlates with gene expression level and expression breadth, which is likely due to the necessity to reduce metabolic costs of transcription for abundant messages. On average, PK genes evolve slower than other protein coding genes. Breadth of PK expression negatively correlates with rate of non-synonymous substitutions in protein coding regions. This rate is lower for high expression and ubiquitous PKs, relative to low expression PKs, and correlates with divergence in untranslated regions. Conversely, rate of silent mutations is uniform in different PK groups, indicating that differing rates of non-synonymous substitutions reflect variations in selective pressure. Brain and testis employ a considerable number of tissue-specific PKs, indicating high complexity of phosphorylation-dependent regulatory network in these organs. There are considerable differences in genomic organization between PKs up-regulated in the testis and brain. PK genes up-regulated in the highly proliferative testicular tissue are fast evolving and small, with short introns and transcribed regions. In contrast, genes up-regulated in the minimally proliferative nervous tissue carry long introns, extended transcribed regions, and evolve slowly.
Conclusions/Significance
PK genomic architecture, the size of gene functional domains and evolutionary rates correlate with the pattern of gene expression. Structure and evolutionary divergence of tissue-specific PK genes is related to the proliferative activity of the tissue where these genes are predominantly expressed. Our data provide evidence that physiological requirements for transcription intensity, ubiquitous expression, and tissue-specific regulation shape gene structure and affect rates of evolution.
doi:10.1371/journal.pone.0003599
PMCID: PMC2572838  PMID: 18974838

Results 1-25 (42)