PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1030495)

Clipboard (0)
None

Related Articles

1.  AthaMap web tools for database-assisted identification of combinatorial cis-regulatory elements and the display of highly conserved transcription factor binding sites in Arabidopsis thaliana 
Nucleic Acids Research  2005;33(Web Server issue):W397-W402.
The AthaMap database generates a map of cis-regulatory elements for the Arabidopsis thaliana genome. AthaMap contains more than 7.4 × 106 putative binding sites for 36 transcription factors (TFs) from 16 different TF families. A newly implemented functionality allows the display of subsets of higher conserved transcription factor binding sites (TFBSs). Furthermore, a web tool was developed that permits a user-defined search for co-localizing cis-regulatory elements. The user can specify individually the level of conservation for each TFBS and a spacer range between them. This web tool was employed for the identification of co-localizing sites of known interacting TFs and TFs containing two DNA-binding domains. More than 1.8 × 105 combinatorial elements were annotated in the AthaMap database. These elements can also be used to identify more complex co-localizing elements consisting of up to four TFBSs. The AthaMap database and the connected web tools are a valuable resource for the analysis and the prediction of gene expression regulation at .
doi:10.1093/nar/gki395
PMCID: PMC1160156  PMID: 15980498
2.  AthaMap web tools for the analysis and identification of co-regulated genes 
Nucleic Acids Research  2006;35(Database issue):D857-D862.
The AthaMap database generates a map of cis-regulatory elements for the whole Arabidopsis thaliana genome. This database has been extended by new tools to identify common cis-regulatory elements in specific regions of user-provided gene sets. A resulting table displays all cis-regulatory elements annotated in AthaMap including positional information relative to the respective gene. Further tables show overviews with the number of individual transcription factor binding sites (TFBS) present and TFBS common to the whole set of genes. Over represented cis-elements are easily identified. These features were used to detect specific enrichment of drought-responsive elements in cold-induced genes. For identification of co-regulated genes, the output table of the colocalization function was extended to show the closest genes and their relative distances to the colocalizing TFBS. Gene sets determined by this function can be used for a co-regulation analysis in microarray gene expression databases such as Genevestigator or PathoPlant. Additional improvements of AthaMap include display of the gene structure in the sequence window and a significant data increase. AthaMap is freely available at .
doi:10.1093/nar/gkl1006
PMCID: PMC1761422  PMID: 17148485
3.  PathoPlant®: a platform for microarray expression data to analyze co-regulated genes involved in plant defense responses 
Nucleic Acids Research  2006;35(Database issue):D841-D845.
Plants react to pathogen attack by expressing specific proteins directed toward the infecting pathogens. This involves the transcriptional activation of specific gene sets. PathoPlant®, a database on plant–pathogen interactions and signal transduction reactions, has now been complemented by microarray gene expression data from Arabidopsis thaliana subjected to pathogen infection and elicitor treatment. New web tools enable identification of plant genes regulated by specific stimuli. Sets of genes co-regulated by multiple stimuli can be displayed as well. A user-friendly web interface was created for the submission of gene sets to be analyzed. This results in a table, listing the stimuli that act either inducing or repressing on the respective genes. The search can be restricted to certain induction factors to identify, e.g. strongly up- or down-regulated genes. Up to three stimuli can be combined with the option of induction factor restriction to determine similarly regulated genes. To identify common cis-regulatory elements in co-regulated genes, a resulting gene list can directly be exported to the AthaMap database for analysis. PathoPlant is freely accessible at .
doi:10.1093/nar/gkl835
PMCID: PMC1669748  PMID: 17099232
4.  ‘MicroRNA Targets’, a new AthaMap web-tool for genome-wide identification of miRNA targets in Arabidopsis thaliana 
BioData Mining  2012;5:7.
Background
The AthaMap database generates a genome-wide map for putative transcription factor binding sites for A. thaliana. When analyzing transcriptional regulation using AthaMap it may be important to learn which genes are also post-transcriptionally regulated by inhibitory RNAs. Therefore, a unified database for transcriptional and post-transcriptional regulation will be highly useful for the analysis of gene expression regulation.
Methods
To identify putative microRNA target sites in the genome of A. thaliana, processed mature miRNAs from 243 annotated miRNA genes were used for screening with the psRNATarget web server. Positional information, target genes and the psRNATarget score for each target site were annotated to the AthaMap database. Furthermore, putative target sites for small RNAs from seven small RNA transcriptome datasets were used to determine small RNA target sites within the A. thaliana genome.
Results
Putative 41,965 genome wide miRNA target sites and 10,442 miRNA target genes were identified in the A. thaliana genome. Taken together with genes targeted by small RNAs from small RNA transcriptome datasets, a total of 16,600 A. thaliana genes are putatively regulated by inhibitory RNAs. A novel web-tool, ‘MicroRNA Targets’, was integrated into AthaMap which permits the identification of genes predicted to be regulated by selected miRNAs. The predicted target genes are displayed with positional information and the psRNATarget score of the target site. Furthermore, putative target sites of small RNAs from selected tissue datasets can be identified with the new ‘Small RNA Targets’ web-tool.
Conclusions
The integration of predicted miRNA and small RNA target sites with transcription factor binding sites will be useful for AthaMap-assisted gene expression analysis. URL: http://www.athamap.de/
doi:10.1186/1756-0381-5-7
PMCID: PMC3410767  PMID: 22800758
Arabidopsis thaliana; AthaMap; MicroRNAs; Small RNAs; Post-transcriptional regulation
5.  Integrating bioinformatic resources to predict transcription factors interacting with cis-sequences conserved in co-regulated genes 
BMC Genomics  2014;15:317.
Background
Using motif detection programs it is fairly straightforward to identify conserved cis-sequences in promoters of co-regulated genes. In contrast, the identification of the transcription factors (TFs) interacting with these cis-sequences is much more elaborate. To facilitate this, we explore the possibility of using several bioinformatic and experimental approaches for TF identification. This starts with the selection of co-regulated gene sets and leads first to the prediction and then to the experimental validation of TFs interacting with cis-sequences conserved in the promoters of these co-regulated genes.
Results
Using the PathoPlant database, 32 up-regulated gene groups were identified with microarray data for drought-responsive gene expression from Arabidopsis thaliana. Application of the binding site estimation suite of tools (BEST) discovered 179 conserved sequence motifs within the corresponding promoters. Using the STAMP web-server, 49 sequence motifs were classified into 7 motif families for which similarities with known cis-regulatory sequences were identified. All motifs were subjected to a footprintDB analysis to predict interacting DNA binding domains from plant TF families. Predictions were confirmed by using a yeast-one-hybrid approach to select interacting TFs belonging to the predicted TF families. TF-DNA interactions were further experimentally validated in yeast and with a Physcomitrella patens transient expression system, leading to the discovery of several novel TF-DNA interactions.
Conclusions
The present work demonstrates the successful integration of several bioinformatic resources with experimental approaches to predict and validate TFs interacting with conserved sequence motifs in co-regulated genes.
doi:10.1186/1471-2164-15-317
PMCID: PMC4234446  PMID: 24773781
Databases; Arabidopsis thaliana; Physcomitrella patens; Yeast one-hybrid; Microarray; Transcription factor; cis-element
6.  AthaMap, integrating transcriptional and post-transcriptional data 
Nucleic Acids Research  2008;37(Database issue):D983-D986.
The AthaMap database generates a map of predicted transcription factor binding sites (TFBS) for the whole Arabidopsis thaliana genome. AthaMap has now been extended to include data on post-transcriptional regulation. A total of 403 173 genomic positions of small RNAs have been mapped in the A. thaliana genome. These identify 5772 putative post-transcriptionally regulated target genes. AthaMap tools have been modified to improve the identification of common TFBS in co-regulated genes by subtracting post-transcriptionally regulated genes from such analyses. Furthermore, AthaMap was updated to the TAIR7 genome annotation, a graphic display of gene analysis results was implemented, and the TFBS data content was increased. AthaMap is freely available at http://www.athamap.de/.
doi:10.1093/nar/gkn709
PMCID: PMC2686474  PMID: 18842622
7.  AthaMap: an online resource for in silico transcription factor binding sites in the Arabidopsis thaliana genome 
Nucleic Acids Research  2004;32(Database issue):D368-D372.
Gene expression is controlled mainly by the binding of transcription factors to regulatory sequences. To generate a genomic map for regulatory sequences, the Arabidopsis thaliana genome was screened for putative transcription factor binding sites. Using publicly available data from the TRANSFAC database and from publications, alignment matrices for 23 transcription factors of 13 different factor families were used with the pattern search program Patser to determine the genomic positions of more than 2.4 × 106 putative binding sites. Due to the dense clustering of genes and the observation that regulatory sequences are not restricted to upstream regions, the prediction of binding sites was performed for the whole genome. The genomic positions and the underlying data were imported into the newly developed AthaMap database. This data can be accessed by positional information or the Arabidopsis Genome Initiative identification number. Putative binding sites are displayed in the defined region. Data on the matrices used and on the thresholds applied in these screens are given in the database. Considering the high density of sites it will be a valuable resource for generating models on gene expression regulation. The data are available at http://www.athamap.de.
doi:10.1093/nar/gkh017
PMCID: PMC308752  PMID: 14681436
8.  AthaMap-assisted transcription factor target gene identification in Arabidopsis thaliana 
The AthaMap database generates a map of potential transcription factor binding sites (TFBS) and small RNA target sites in the Arabidopsis thaliana genome. The database contains sites for 115 different transcription factors (TFs). TFBS were identified with positional weight matrices (PWMs) or with single binding sites. With the new web tool ‘Gene Identification’, it is possible to identify potential target genes for selected TFs. For these analyses, the user can define a region of interest of up to 6000 bp in all annotated genes. For TFBS determined with PWMs, the search can be restricted to high-quality TFBS. The results are displayed in tables that identify the gene, position of the TFBS and, if applicable, individual score of the TFBS. In addition, data files can be downloaded that harbour positional information of TFBS of all TFs in a region between −2000 and +2000 bp relative to the transcription or translation start site. Also, data content of AthaMap was increased and the database was updated to the TAIR8 genome release.
Database URL: http://www.athamap.de/gene_ident.php
doi:10.1093/database/baq034
PMCID: PMC3011983  PMID: 21177332
9.  AGRIS: Arabidopsis Gene Regulatory Information Server, an information resource of Arabidopsis cis-regulatory elements and transcription factors 
BMC Bioinformatics  2003;4:25.
Background
The gene regulatory information is hardwired in the promoter regions formed by cis-regulatory elements that bind specific transcription factors (TFs). Hence, establishing the architecture of plant promoters is fundamental to understanding gene expression. The determination of the regulatory circuits controlled by each TF and the identification of the cis-regulatory sequences for all genes have been identified as two of the goals of the Multinational Coordinated Arabidopsis thaliana Functional Genomics Project by the Multinational Arabidopsis Steering Committee (June 2002).
Results
AGRIS is an information resource of Arabidopsis promoter sequences, transcription factors and their target genes. AGRIS currently contains two databases, AtTFDB (Arabidopsis thaliana transcription factor database) and AtcisDB (Arabidopsis thaliana cis-regulatory database). AtTFDB contains information on approximately 1,400 transcription factors identified through motif searches and grouped into 34 families. AtTFDB links the sequence of the transcription factors with available mutants and, when known, with the possible genes they may regulate. AtcisDB consists of the 5' regulatory sequences of all 29,388 annotated genes with a description of the corresponding cis-regulatory elements. Users can search the databases for (i) promoter sequences, (ii) a transcription factor, (iii) a direct target genes for a specific transcription factor, or (vi) a regulatory network that consists of transcription factors and their target genes.
Conclusion
AGRIS provides the necessary software tools on Arabidopsis transcription factors and their putative binding sites on all genes to initiate the identification of transcriptional regulatory networks in the model dicotyledoneous plant Arabidopsis thaliana. AGRIS can be accessed from .
doi:10.1186/1471-2105-4-25
PMCID: PMC166152  PMID: 12820902
10.  ‘In silico expression analysis’, a novel PathoPlant web tool to identify abiotic and biotic stress conditions associated with specific cis-regulatory sequences 
Using bioinformatics, putative cis-regulatory sequences can be easily identified using pattern recognition programs on promoters of specific gene sets. The abundance of predicted cis-sequences is a major challenge to associate these sequences with a possible function in gene expression regulation. To identify a possible function of the predicted cis-sequences, a novel web tool designated ‘in silico expression analysis’ was developed that correlates submitted cis-sequences with gene expression data from Arabidopsis thaliana. The web tool identifies the A. thaliana genes harbouring the sequence in a defined promoter region and compares the expression of these genes with microarray data. The result is a hierarchy of abiotic and biotic stress conditions to which these genes are most likely responsive. When testing the performance of the web tool, known cis-regulatory sequences were submitted to the ‘in silico expression analysis’ resulting in the correct identification of the associated stress conditions. When using a recently identified novel elicitor-responsive sequence, a WT-box (CGACTTTT), the ‘in silico expression analysis’ predicts that genes harbouring this sequence in their promoter are most likely Botrytis cinerea induced. Consistent with this prediction, the strongest induction of a reporter gene harbouring this sequence in the promoter is observed with B. cinerea in transgenic A. thaliana.
Database URL: http://www.pathoplant.de/expression_analysis.php.
doi:10.1093/database/bau030
PMCID: PMC3983564  PMID: 24727366
11.  AtPAN: an integrated system for reconstructing transcriptional regulatory networks in Arabidopsis thaliana 
BMC Genomics  2012;13:85.
Background
Construction of transcriptional regulatory networks (TRNs) is of priority concern in systems biology. Numerous high-throughput approaches, including microarray and next-generation sequencing, are extensively adopted to examine transcriptional expression patterns on the whole-genome scale; those data are helpful in reconstructing TRNs. Identifying transcription factor binding sites (TFBSs) in a gene promoter is the initial step in elucidating the transcriptional regulation mechanism. Since transcription factors usually co-regulate a common group of genes by forming regulatory modules with similar TFBSs. Therefore, the combinatorial interactions of transcription factors must be modeled to reconstruct the gene regulatory networks.
Description For systems biology applications, this work develops a novel database called Arabidopsis thaliana Promoter Analysis Net (AtPAN), capable of detecting TFBSs and their corresponding transcription factors (TFs) in a promoter or a set of promoters in Arabidopsis. For further analysis, according to the microarray expression data and literature, the co-expressed TFs and their target genes can be retrieved from AtPAN. Additionally, proteins interacting with the co-expressed TFs are also incorporated to reconstruct co-expressed TRNs. Moreover, combinatorial TFs can be detected by the frequency of TFBSs co-occurrence in a group of gene promoters. In addition, TFBSs in the conserved regions between the two input sequences or homologous genes in Arabidopsis and rice are also provided in AtPAN. The output results also suggest conducting wet experiments in the future.
Conclusions
The AtPAN, which has a user-friendly input/output interface and provide graphical view of the TRNs. This novel and creative resource is freely available online at http://AtPAN.itps.ncku.edu.tw/.
doi:10.1186/1471-2164-13-85
PMCID: PMC3314555  PMID: 22397531
12.  Targeted interactomics reveals a complex core cell cycle machinery in Arabidopsis thaliana 
A protein interactome focused towards cell proliferation was mapped comprising 857 interactions among 393 proteins, leading to many new insights in plant cell cycle regulation.A comprehensive view on heterodimeric cyclin-dependent kinase (CDK)/cyclin complexes in plants is obtained, in relation with their regulators.Over 100 new candidate cell cycle proteins were predicted.
The basic underlying mechanisms that govern the cell cycle are conserved among all eukaryotes. Peculiar for plants, however, is that their genome contains a collection of cell cycle regulatory genes that is intriguingly large (Vandepoele et al, 2002; Menges et al, 2005) compared to other eukaryotes. Arabidopsis thaliana (Arabidopsis) encodes 71 genes in five regulatory classes versus only 15 in yeast and 23 in human.
Despite the discovery of numerous cell cycle genes, little is known about the protein complex machinery that steers plant cell division. Therefore, we applied tandem affinity purification (TAP) approach coupled with mass spectrometry (MS) on Arabidopsis cell suspension cultures to isolate and analyze protein complexes involved in the cell cycle. This approach allowed us to successfully map a first draft of the basic cell cycle complex machinery of Arabidopsis, providing many new insights into plant cell division.
To map the interactome, we relied on a streamlined platform comprising generic Gateway-based vectors with high cloning flexibility, the fast generation of transgenic suspension cultures, TAP adapted for plant cells, and matrix-assisted laser desorption ionization (MALDI) tandem-MS for the identification of purified proteins (Van Leene et al, 2007, 2008Van Leene et al, 2007, 2008). Complexes for 102 cell cycle proteins were analyzed using this approach, leading to a non-redundant data set of 857 interactions among 393 proteins (Figure 1A). Two subspaces were identified in this data set, domain I1, containing interactions confirmed in at least two independent experimental repeats or in the reciprocal purification experiment, and domain I2 consisting of uniquely observed interactions.
Several observations underlined the quality of both domains. All tested reverse purifications found the original interaction, and 150 known or predicted interactions were confirmed, meaning that also a huge stack of new interactions was revealed. An in-depth computational analysis revealed enrichment for many cell cycle-related features among the proteins of the network (Figure 1B), and many protein pairs were coregulated at the transcriptional level (Figure 1C). Through integration of known cell cycle-related features, more than 100 new candidate cell cycle proteins were predicted (Figure 1D). Besides common qualities of both interactome domains, their real significance appeared through mutual differences exposing two subspaces in the cell cycle interactome: a central regulatory network of stable complexes that are repeatedly isolated and represent core regulatory units, and a peripheral network comprising transient interactions identified less frequently, which are involved in other aspects of the process, such as crosstalk between core complexes or connections with other pathways. To evaluate the biological relevance of the cell cycle interactome in plants, we validated interactions from both domains by a transient split-luciferase assay in Arabidopsis plants (Marion et al, 2008), further sustaining the hypothesis-generating power of the data set to understand plant growth.
With respect to insights into the cell cycle physiology, the interactome was subdivided according to the functional classes of the baits and core protein complexes were extracted, covering cyclin-dependent kinase (CDK)/cyclin core complexes together with their positive and negative regulation networks, DNA replication complexes, the anaphase-promoting complex, and spindle checkpoint complexes. The data imply that mitotic A- and B-type cyclins exclusively form heterodimeric complexes with the plant-specific B-type CDKs and not with CDKA;1, whereas D-type cyclins seem to associate with CDKA;1. Besides the extraction of complexes previously shown in other organisms, our data also suggested many new functional links; for example, the link coupling cell division with the regulation of transcript splicing. The association of negative regulators of CDK/cyclin complexes with transcription factors suggests that their role in reallocation is not solely targeted to CDK/cyclin complexes. New members of the Siamese-related inhibitory proteins were identified, and for the first time potential inhibitors of plant-specific mitotic B-type CDKs have been found in plants. New evidence that the E2F–DP–RBR network is not only active at G1-to-S, but also at the G2-to-M transition is provided and many complexes involved in DNA replication or repair were isolated. For the first time, a plant APC has been isolated biochemically, identifying three potential new plant-specific APC interactors, and finally, complexes involved in the spindle checkpoint were isolated mapping many new but specific interactions.
Finally, to get a general view on the complex machinery, modules of interacting cyclins and core cell cycle regulators were ranked along the cell cycle phases according to the transcript expression peak of the cyclins, showing an assorted set of CDK–cyclin complexes with high regulatory differentiation (Figure 4). Even within the same subfamily (e.g. cyclin A3, B1, B2, D3, and D4), cyclins differ not only in their functional time frame but also in the type and number of CDKs, inhibitors, and scaffolding proteins they bind, further indicating their functional diversification. According to our interaction data, at least 92 different variants of CDK–cyclin complexes are found in Arabidopsis.
In conclusion, these results reflect how several rounds of gene duplication (Sterck et al, 2007) led to the evolution of a large set of cyclin paralogs and a myriad of regulators, resulting in a significant jump in the complexity of the cell cycle machinery that could accommodate unique plant-specific features such as an indeterminate mode of postembryonic development. Through their extensive regulation and connection with a myriad of up- and downstream pathways, the core cell cycle complexes might offer the plant a flexible toolkit to fine-tune cell proliferation in response to an ever-changing environment.
Cell proliferation is the main driving force for plant growth. Although genome sequence analysis revealed a high number of cell cycle genes in plants, little is known about the molecular complexes steering cell division. In a targeted proteomics approach, we mapped the core complex machinery at the heart of the Arabidopsis thaliana cell cycle control. Besides a central regulatory network of core complexes, we distinguished a peripheral network that links the core machinery to up- and downstream pathways. Over 100 new candidate cell cycle proteins were predicted and an in-depth biological interpretation demonstrated the hypothesis-generating power of the interaction data. The data set provided a comprehensive view on heterodimeric cyclin-dependent kinase (CDK)–cyclin complexes in plants. For the first time, inhibitory proteins of plant-specific B-type CDKs were discovered and the anaphase-promoting complex was characterized and extended. Important conclusions were that mitotic A- and B-type cyclins form complexes with the plant-specific B-type CDKs and not with CDKA;1, and that D-type cyclins and S-phase-specific A-type cyclins seem to be associated exclusively with CDKA;1. Furthermore, we could show that plants have evolved a combinatorial toolkit consisting of at least 92 different CDK–cyclin complex variants, which strongly underscores the functional diversification among the large family of cyclins and reflects the pivotal role of cell cycle regulation in the developmental plasticity of plants.
doi:10.1038/msb.2010.53
PMCID: PMC2950081  PMID: 20706207
Arabidopsis thaliana; cell cycle; interactome; protein complex; protein interactions
13.  TOBFAC: the database of tobacco transcription factors 
BMC Bioinformatics  2008;9:53.
Background
Regulation of gene expression at the level of transcription is a major control point in many biological processes. Transcription factors (TFs) can activate and/or repress the transcriptional rate of target genes and vascular plant genomes devote approximately 7% of their coding capacity to TFs. Global analysis of TFs has only been performed for three complete higher plant genomes – Arabidopsis (Arabidopsis thaliana), poplar (Populus trichocarpa) and rice (Oryza sativa). Presently, no large-scale analysis of TFs has been made from a member of the Solanaceae, one of the most important families of vascular plants. To fill this void, we have analysed tobacco (Nicotiana tabacum) TFs using a dataset of 1,159,022 gene-space sequence reads (GSRs) obtained by methylation filtering of the tobacco genome. An analytical pipeline was developed to isolate TF sequences from the GSR data set. This involved multiple (typically 10–15) independent searches with different versions of the TF family-defining domain(s) (normally the DNA-binding domain) followed by assembly into contigs and verification. Our analysis revealed that tobacco contains a minimum of 2,513 TFs representing all of the 64 well-characterised plant TF families. The number of TFs in tobacco is higher than previously reported for Arabidopsis and rice.
Results
TOBFAC: the database of tobacco transcription factors, is an integrative database that provides a portal to sequence and phylogeny data for the identified TFs, together with a large quantity of other data concerning TFs in tobacco. The database contains an individual page dedicated to each of the 64 TF families. These contain background information, domain architecture via Pfam links, a list of all sequences and an assessment of the minimum number of TFs in this family in tobacco. Downloadable phylogenetic trees of the major families are provided along with detailed information on the bioinformatic pipeline that was used to find all family members. TOBFAC also contains EST data, a list of published tobacco TFs and a list of papers concerning tobacco TFs. The sequences and annotation data are stored in relational tables using a PostgrelSQL relational database management system. The data processing and analysis pipelines used the Perl programming language. The web interface was implemented in JavaScript and Perl CGI running on an Apache web server. The computationally intensive data processing and analysis pipelines were run on an Apple XServe cluster with more than 20 nodes.
Conclusion
TOBFAC is an expandable knowledgebase of tobacco TFs with data currently available for over 2,513 TFs from 64 gene families. TOBFAC integrates available sequence information, phylogenetic analysis, and EST data with published reports on tobacco TF function. The database provides a major resource for the study of gene expression in tobacco and the Solanaceae and helps to fill a current gap in studies of TF families across the plant kingdom. TOBFAC is publicly accessible at .
doi:10.1186/1471-2105-9-53
PMCID: PMC2246155  PMID: 18221524
14.  PlantPAN: Plant promoter analysis navigator, for identifying combinatorial cis-regulatory elements with distance constraint in plant gene groups 
BMC Genomics  2008;9:561.
Background
The elucidation of transcriptional regulation in plant genes is important area of research for plant scientists, following the mapping of various plant genomes, such as A. thaliana, O. sativa and Z. mays. A variety of bioinformatic servers or databases of plant promoters have been established, although most have been focused only on annotating transcription factor binding sites in a single gene and have neglected some important regulatory elements (tandem repeats and CpG/CpNpG islands) in promoter regions. Additionally, the combinatorial interaction of transcription factors (TFs) is important in regulating the gene group that is associated with the same expression pattern. Therefore, a tool for detecting the co-regulation of transcription factors in a group of gene promoters is required.
Results
This study develops a database-assisted system, PlantPAN (Plant Promoter Analysis Navigator), for recognizing combinatorial cis-regulatory elements with a distance constraint in sets of plant genes. The system collects the plant transcription factor binding profiles from PLACE, TRANSFAC (public release 7.0), AGRIS, and JASPER databases and allows users to input a group of gene IDs or promoter sequences, enabling the co-occurrence of combinatorial transcription factor binding sites (TFBSs) within a defined distance (20 bp to 200 bp) to be identified. Furthermore, the new resource enables other regulatory features in a plant promoter, such as CpG/CpNpG islands and tandem repeats, to be displayed. The regulatory elements in the conserved regions of the promoters across homologous genes are detected and presented.
Conclusion
In addition to providing a user-friendly input/output interface, PlantPAN has numerous advantages in the analysis of a plant promoter. Several case studies have established the effectiveness of PlantPAN. This novel analytical resource is now freely available at .
doi:10.1186/1471-2164-9-561
PMCID: PMC2633311  PMID: 19036138
15.  TTS Mapping: integrative WEB tool for analysis of triplex formation target DNA Sequences, G-quadruplets and non-protein coding regulatory DNA elements in the human genome 
BMC Genomics  2009;10(Suppl 3):S9.
Background
DNA triplexes can naturally occur, co-localize and interact with many other regulatory DNA elements (e.g. G-quadruplex (G4) DNA motifs), specific DNA-binding proteins (e.g. transcription factors (TFs)), and micro-RNA (miRNA) precursors. Specific genome localizations of triplex target DNA sites (TTSs) may cause abnormalities in a double-helix DNA structure and can be directly involved in some human diseases. However, genome localization of specific TTSs, their interconnection with regulatory DNA elements and physiological roles in a cell are poor defined. Therefore, it is important to identify comprehensive and reliable catalogue of specific potential TTSs (pTTSs) and their co-localization patterns with other regulatory DNA elements in the human genome.
Results
"TTS mapping" database is a web-based search engine developed here, which is aimed to find and annotate pTTSs within a region of interest of the human genome. The engine provides descriptive statistics of pTTSs in a given region and its sequence context. Different annotation tracks of TTS-overlapping gene region(s), G4 motifs, CpG Island, miRNA precursors, miRNA targets, transcription factor binding sites (TFBSs), Single Nucleotide Polymorphisms (SNPs), small nucleolar RNAs (snoRNA), and repeat elements are also mapped based onto a sequence location provided by UCSC genome browser, G4 database http://www.quadruplex.org and several other datasets. The results pages provide links to UCSC genome browser annotation tracks and relative DBs. BLASTN program was included to check the uniqueness of a given pTTS in the human genome. Recombination- and mutation-prone genes (e.g. EVI-1, MYC) were found to be significantly enriched by TTSs and multiple co-occurring with our regulatory DNA elements. TTS mapping reveals that a high-complementary and evolutionarily conserved polypurine and polypyrimidine DNA sequence pair linked by a non-conserved short DNA sequence can form miR-483 transcribed from intron 2 of IGF2 gene and bound double-strand nucleic acid TTSs forming natural triplex structures.
Conclusion
TTS mapping provides comprehensive visual and analytical tools to help users to find pTTSs, G-quadruplets and other regulatory DNA elements in various genome regions. TTS Mapping not only provides sequence visualization and statistical information, but also integrates knowledge about co-localization TTS with various DNA elements and facilitates that data analysis. In particular, TTS Mapping reveals complex structural-functional regulatory module of gene IGF2 including TF MZF1 binding site and ncRNA precursor mir-483 formed by the high-complementary and evolutionarily conserved polypurine- and polypyrimidine-rich DNA pair. Such ncRNAs capable of forming helical triplex structures with a polypurine strand of a nucleic acid duplexes (DNA or RNA) via Hoogsteen or reverse Hoogsteen hydrogen bonds. Our web tool could be used to discover biologically meaningful genome modules and to optimize experimental design of anti-gene treatment.
doi:10.1186/1471-2164-10-S3-S9
PMCID: PMC2788396  PMID: 19958507
16.  CGKB: an annotation knowledge base for cowpea (Vigna unguiculata L.) methylation filtered genomic genespace sequences 
BMC Bioinformatics  2007;8:129.
Background
Cowpea [Vigna unguiculata (L.) Walp.] is one of the most important food and forage legumes in the semi-arid tropics because of its ability to tolerate drought and grow on poor soils. It is cultivated mostly by poor farmers in developing countries, with 80% of production taking place in the dry savannah of tropical West and Central Africa. Cowpea is largely an underexploited crop with relatively little genomic information available for use in applied plant breeding. The goal of the Cowpea Genomics Initiative (CGI), funded by the Kirkhouse Trust, a UK-based charitable organization, is to leverage modern molecular genetic tools for gene discovery and cowpea improvement. One aspect of the initiative is the sequencing of the gene-rich region of the cowpea genome (termed the genespace) recovered using methylation filtration technology and providing annotation and analysis of the sequence data.
Description
CGKB, Cowpea Genespace/Genomics Knowledge Base, is an annotation knowledge base developed under the CGI. The database is based on information derived from 298,848 cowpea genespace sequences (GSS) isolated by methylation filtering of genomic DNA. The CGKB consists of three knowledge bases: GSS annotation and comparative genomics knowledge base, GSS enzyme and metabolic pathway knowledge base, and GSS simple sequence repeats (SSRs) knowledge base for molecular marker discovery. A homology-based approach was applied for annotations of the GSS, mainly using BLASTX against four public FASTA formatted protein databases (NCBI GenBank Proteins, UniProtKB-Swiss-Prot, UniprotKB-PIR (Protein Information Resource), and UniProtKB-TrEMBL). Comparative genome analysis was done by BLASTX searches of the cowpea GSS against four plant proteomes from Arabidopsis thaliana, Oryza sativa, Medicago truncatula, and Populus trichocarpa. The possible exons and introns on each cowpea GSS were predicted using the HMM-based Genscan gene predication program and the potential domains on annotated GSS were analyzed using the HMMER package against the Pfam database. The annotated GSS were also assigned with Gene Ontology annotation terms and integrated with 228 curated plant metabolic pathways from the Arabidopsis Information Resource (TAIR) knowledge base. The UniProtKB-Swiss-Prot ENZYME database was used to assign putative enzymatic function to each GSS. Each GSS was also analyzed with the Tandem Repeat Finder (TRF) program in order to identify potential SSRs for molecular marker discovery. The raw sequence data, processed annotation, and SSR results were stored in relational tables designed in key-value pair fashion using a PostgreSQL relational database management system. The biological knowledge derived from the sequence data and processed results are represented as views or materialized views in the relational database management system. All materialized views are indexed for quick data access and retrieval. Data processing and analysis pipelines were implemented using the Perl programming language. The web interface was implemented in JavaScript and Perl CGI running on an Apache web server. The CPU intensive data processing and analysis pipelines were run on a computer cluster of more than 30 dual-processor Apple XServes. A job management system called Vela was created as a robust way to submit large numbers of jobs to the Portable Batch System (PBS).
Conclusion
CGKB is an integrated and annotated resource for cowpea GSS with features of homology-based and HMM-based annotations, enzyme and pathway annotations, GO term annotation, toolkits, and a large number of other facilities to perform complex queries. The cowpea GSS, chloroplast sequences, mitochondrial sequences, retroelements, and SSR sequences are available as FASTA formatted files and downloadable at CGKB. This database and web interface are publicly accessible at .
doi:10.1186/1471-2105-8-129
PMCID: PMC1868039  PMID: 17445272
17.  Boolean modeling of transcriptome data reveals novel modes of heterotrimeric G-protein action 
Classical mechanisms of heterotrimeric G-protein signaling are observed to function in regulation of the transcriptome. Conversely, many theoretical regulatory modes of the G-protein are not manifested in the transcriptomes we investigate.A new mechanism of G-protein signaling is revealed, in which the β subunit regulates gene expression identically in the presence or absence of the α subunit.We find evidence of cross-talk between G-protein-mediated and hormone-mediated transcriptional regulation.We find evidence of system specificity in G-protein signaling.
Heterotrimeric G-proteins, composed of α, β, and γ subunits, participate in a wide range of signaling pathways in eukaryotes (Morris and Malbon, 1999). According to the typical, mammalian paradigm, in its inactive state, the G-protein exists as an associated heterotrimer. G-protein signaling begins with ligand binding that results in a conformational change in a G-protein-coupled receptor (GPCR). Once activated by the GPCR, the Gα separates from the associated Gβγ dimer and the freed Gα and Gβγ proteins can then interact with downstream effector molecules, alone or in combination, to transduce the signal. Subsequent to signal propagation, Gα re-associates with the Gβγ dimer to reform the G-protein complex.
There are several classical routes for signal propagation through heterotrimeric G-proteins that have been categorized in mammalian systems (Marrari et al, 2007; Dupre et al, 2009). One route, which we designate classical I, requires the presence of both subunits, and can invoke one of two distinct mechanisms. In one mechanism, on GPCR activation, freed Gα and Gβγ each interact with downstream effectors to elicit the downstream response. In a related mechanism, Gα but not Gβγ interacts with downstream effectors, but the Gβγ dimer is nevertheless required to facilitate coupling of Gα with the relevant GPCR (Marrari et al, 2007). In a second route, which we designate classical II, it is solely the Gβγ dimer that interacts with downstream effectors; in this case, sequestration of Gβγ within the heterotrimer prevents signal propagation. In addition, a few non-classical G-protein regulatory modes have also been implicated in some systems, for example signaling by the intact heterotrimer in yeast (Klein et al, 2000; Frank et al, 2005). Observations such as these lead to a fundamental question, namely, which of all the theoretical regulatory modes of G-protein signaling are realized biologically. Our study answers this question in the context of the model plant Arabidopsis thaliana, and in addition analyzes the manner in which G-protein signaling couples with signaling by the plant hormone abscisic acid. The Arabidopsis genome encodes only one canonical Gα subunit, GPA1, and one canonical Gβ subunit, AGB1, and knockout mutants are available for each of these, allowing clear dissection of Gα- and Gβ-related phenotypes.
Abscisic acid (ABA) is a major plant hormone, which inhibits growth and promotes tolerance of abiotic stresses such as drought, salinity, and cold. ABA signaling is known to interact with heterotrimeric G-protein signaling in both developmental and stress responses in a complex manner, causing, for example, ABA hyposensitivity of guard cell stomatal opening in gpa1 and agb1 single mutants as well as agb1 gpa1 double mutants (Fan et al, 2008), but ABA hypersensitivity of the inhibition of seed germination and post-germination seedling development in the same mutants (Pandey et al, 2006). These experimental observations implicate G-proteins as one of the components of ABA signaling, but to date no systematic study has been conducted in either plant or metazoan systems to define the co-regulatory modes of a G-protein and a hormone.
In this study, we conduct genome-wide gene expression profiling in G-protein subunit mutants of A. thaliana guard cells and leaves, with or without treatment with ABA. By introducing one or more mediators acting downstream of the G-protein and ABA to control transcript levels, we propose nine G-protein/ABA signaling pathways including ABA-independent G-protein signaling pathways, G-protein-independent ABA signaling pathways, and seven distinct ABA–G-protein-coupled signaling pathways (Figure 1). We develop a Boolean modeling framework to systematically enumerate 14 possible theoretical regulatory modes of the G-protein and 142 co-regulatory modes of the G-protein and ABA, and then use a pattern matching approach to associate target genes with theoretical regulatory modes.
Our analysis shows that the G-protein regulatory mode that requires the presence of both Gα and Gβγ subunits (consistent with classical I mechanisms), is well represented in both guard cells and leaves. The G-protein regulatory mode that requires a freed Gβγ subunit (classical II G-protein regulatory mechanism) is well supported in guard cells and somewhat less so in leaves. In addition, a G-protein regulatory mode representing a non-classical regulatory mechanism is prevalent in guard cells but less so in leaves (Figure 5). In this regulatory mode, signaling by Gβ(γ) occurs, and this signaling is not regulated in any way by Gα.
By relating the target genes with the nine proposed G-protein/ABA signaling pathways, we are able to gauge the plausibility of regulatory modes of the G-protein and ABA at the pathway level. We find that G-protein-independent ABA signaling pathways are prevalent in both guard cells and leaves. The existence of an ABA-independent regulatory activity of the G-protein is well supported in guard cells, but not supported in leaves. Additive regulation by G-protein signaling plus G-protein-independent ABA signaling is rare in both guard cells and leaves. In addition, combinatorial cross-talk between G-protein signaling and ABA signaling and additive cross-talk between ABA–G-protein signaling and G-protein-independent ABA signaling are observed in both guard cells and leaves. Our transcriptome analysis indicates that in some cases, ABA definitely does not influence G-protein signaling, though it may do so in some other cases.
To investigate whether previously observed hypersensitivity or hyposensitivity of developmental and dynamic transient responses to ABA in G-protein mutants is recapitulated at the level of transcriptional regulation, we compare gene regulation by ABA in guard cells and leaves of the G-protein mutants versus wild type. We find that in guard cells, equal ABA hyposensitivity of all mutants combined is significant, although hyposensitivity in individual mutants is not. There is also a separate group of genes in guard cells that show ABA hypersensitivity in the gpa1 mutant, suggesting complex interactions between ABA and G-protein signaling in gene regulation in this cell type. In leaves, ABA hyposensitivity of gene expression in the three individual mutants and equal hyposensitivity in all mutants are strongly supported. In addition, several of the functional categories identified by our analysis of G-protein regulatory modes have been implicated in previous physiological analyses of G-protein mutants, providing validation to the biological interpretation of our results.
In summary, by conducting a genome-wide gene expression profiling study in G-protein subunit mutants of A. thaliana guard cells and leaves and developing a Boolean modeling framework, we systematically evaluate the biological utilization of mechanisms of G-protein regulatory action and reveal novel regulatory modes of the G-protein. The results generate empirical evidence and insights regarding molecular events of G-protein signaling and response at the physiological level in both plants and mammals.
Heterotrimeric G-proteins mediate crucial and diverse signaling pathways in eukaryotes. Here, we generate and analyze microarray data from guard cells and leaves of G-protein subunit mutants of the model plant Arabidopsis thaliana, with or without treatment with the stress hormone, abscisic acid. Although G-protein control of the transcriptome has received little attention to date in any system, transcriptome analysis allows us to search for potentially uncommon yet significant signaling mechanisms. We describe the theoretical Boolean mechanisms of G-protein × hormone regulation, and then apply a pattern matching approach to associate gene expression profiles with Boolean models. We find that (1) classical mechanisms of G-protein signaling are well represented. Conversely, some theoretical regulatory modes of the G-protein are not supported; (2) a new mechanism of G-protein signaling is revealed, in which Gβ regulates gene expression identically in the presence or absence of Gα; (3) guard cells and leaves favor different G-protein modes in transcriptome regulation, supporting system specificity of G-protein signaling. Our method holds significant promise for analyzing analogous ‘switch-like' signal transduction events in any organism.
doi:10.1038/msb.2010.28
PMCID: PMC2913393  PMID: 20531402
abscisic acid; Arabidopsis thaliana; Boolean modeling; heterotrimeric G-protein; transcriptome
18.  miRFANs: an integrated database for Arabidopsis thaliana microRNA function annotations 
BMC Plant Biology  2012;12:68.
Background
Plant microRNAs (miRNAs) have been revealed to play important roles in developmental control, hormone secretion, cell differentiation and proliferation, and response to environmental stresses. However, our knowledge about the regulatory mechanisms and functions of miRNAs remains very limited. The main difficulties lie in two aspects. On one hand, the number of experimentally validated miRNA targets is very limited and the predicted targets often include many false positives, which constrains us to reveal the functions of miRNAs. On the other hand, the regulation of miRNAs is known to be spatio-temporally specific, which increases the difficulty for us to understand the regulatory mechanisms of miRNAs.
Description
In this paper we present miRFANs, an online database for Arabidopsis thalianamiRNA function annotations. We integrated various type of datasets, including miRNA-target interactions, transcription factor (TF) and their targets, expression profiles, genomic annotations and pathways, into a comprehensive database, and developed various statistical and mining tools, together with a user-friendly web interface. For each miRNA target predicted by psRNATarget, TargetAlign and UEA target-finder, or recorded in TarBase and miRTarBase, the effect of its up-regulated or down-regulated miRNA on the expression level of the target gene is evaluated by carrying out differential expression analysis of both miRNA and targets expression profiles acquired under the same (or similar) experimental condition and in the same tissue. Moreover, each miRNA target is associated with gene ontology and pathway terms, together with the target site information and regulating miRNAs predicted by different computational methods. These associated terms may provide valuable insight for the functions of each miRNA.
Conclusion
First, a comprehensive collection of miRNA targets for Arabidopsis thaliana provides valuable information about the functions of plant miRNAs. Second, a highly informative miRNA-mediated genetic regulatory network is extracted from our integrative database. Third, a set of statistical and mining tools is equipped for analyzing and mining the database. And fourth, a user-friendly web interface is developed to facilitate the browsing and analysis of the collected data.
doi:10.1186/1471-2229-12-68
PMCID: PMC3489716  PMID: 22583976
19.  MEME Suite: tools for motif discovery and searching 
Nucleic Acids Research  2009;37(Web Server issue):W202-W208.
The MEME Suite web server provides a unified portal for online discovery and analysis of sequence motifs representing features such as DNA binding sites and protein interaction domains. The popular MEME motif discovery algorithm is now complemented by the GLAM2 algorithm which allows discovery of motifs containing gaps. Three sequence scanning algorithms—MAST, FIMO and GLAM2SCAN—allow scanning numerous DNA and protein sequence databases for motifs discovered by MEME and GLAM2. Transcription factor motifs (including those discovered using MEME) can be compared with motifs in many popular motif databases using the motif database scanning algorithm Tomtom. Transcription factor motifs can be further analyzed for putative function by association with Gene Ontology (GO) terms using the motif-GO term association tool GOMO. MEME output now contains sequence LOGOS for each discovered motif, as well as buttons to allow motifs to be conveniently submitted to the sequence and motif database scanning algorithms (MAST, FIMO and Tomtom), or to GOMO, for further analysis. GLAM2 output similarly contains buttons for further analysis using GLAM2SCAN and for rerunning GLAM2 with different parameters. All of the motif-based tools are now implemented as web services via Opal. Source code, binaries and a web server are freely available for noncommercial use at http://meme.nbcr.net.
doi:10.1093/nar/gkp335
PMCID: PMC2703892  PMID: 19458158
20.  MotifCombinator: a web-based tool to search for combinations of cis-regulatory motifs 
BMC Bioinformatics  2007;8:100.
Background
A combination of multiple types of transcription factors and cis-regulatory elements is often required for gene expression in eukaryotes, and the combinatorial regulation confers specific gene expression to tissues or environments. To reveal the combinatorial regulation, computational methods are developed that efficiently infer combinations of cis-regulatory motifs that are important for gene expression as measured by DNA microarrays. One promising type of computational method is to utilize regression analysis between expression levels and scores of motifs in input sequences. This type takes full advantage of information on expression levels because it does not require that the expression level of each gene be dichotomized according to whether or not it reaches a certain threshold level. However, there is no web-based tool that employs regression methods to systematically search for motif combinations and that practically handles combinations of more than two or three motifs.
Results
We here introduced MotifCombinator, an online tool with a user-friendly interface, to systematically search for combinations composed of any number of motifs based on regression methods. The tool utilizes well-known regression methods (the multivariate linear regression, the multivariate adaptive regression spline or MARS, and the multivariate logistic regression method) for this purpose, and uses the genetic algorithm to search for combinations composed of any desired number of motifs. The visualization systems in this tool help users to intuitively grasp the process of the combination search, and the backup system allows users to easily stop and restart calculations that are expected to require large computational time. This tool also provides preparatory steps needed for systematic combination search – i.e., selecting single motifs to constitute combinations and cutting out redundant similar motifs based on clustering analysis.
Conclusion
MotifCombinator helps users to systematically search for motif combinations that play an important role in gene expression as measured by microarrays.
doi:10.1186/1471-2105-8-100
PMCID: PMC1838919  PMID: 17378935
21.  Gene coexpression clusters and putative regulatory elements underlying seed storage reserve accumulation in Arabidopsis 
BMC Genomics  2011;12:286.
Background
In Arabidopsis, a large number of genes involved in the accumulation of seed storage reserves during seed development have been characterized, but the relationship of gene expression and regulation underlying this physiological process remains poorly understood. A more holistic view of this molecular interplay will help in the further study of the regulatory mechanisms controlling seed storage compound accumulation.
Results
We identified gene coexpression networks in the transcriptome of developing Arabidopsis (Arabidopsis thaliana) seeds from the globular to mature embryo stages by analyzing publicly accessible microarray datasets. Genes encoding the known enzymes in the fatty acid biosynthesis pathway were found in one coexpression subnetwork (or cluster), while genes encoding oleosins and seed storage proteins were identified in another subnetwork with a distinct expression profile. In the triacylglycerol assembly pathway, only the genes encoding diacylglycerol acyltransferase 1 (DGAT1) and a putative cytosolic "type 3" DGAT exhibited a similar expression pattern with genes encoding oleosins. We also detected a large number of putative cis-acting regulatory elements in the promoter regions of these genes, and promoter motifs for LEC1 (LEAFY COTYLEDON 1), DOF (DNA-binding-with-One-Finger), GATA, and MYB transcription factors (TF), as well as SORLIP5 (Sequences Over-Represented in Light-Induced Promoters 5), are overrepresented in the promoter regions of fatty acid biosynthetic genes. The conserved CCAAT motifs for B3-domain TFs and binding sites for bZIP (basic-leucine zipper) TFs are enriched in the promoters of genes encoding oleosins and seed storage proteins.
Conclusions
Genes involved in the accumulation of seed storage reserves are expressed in distinct patterns and regulated by different TFs. The gene coexpression clusters and putative regulatory elements presented here provide a useful resource for further experimental characterization of protein interactions and regulatory networks in this process.
doi:10.1186/1471-2164-12-286
PMCID: PMC3126783  PMID: 21635767
22.  The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community 
Nucleic Acids Research  2003;31(1):224-228.
Arabidopsis thaliana is the most widely-studied plant today. The concerted efforts of over 11 000 researchers and 4000 organizations around the world are generating a rich diversity and quantity of information and materials. This information is made available through a comprehensive on-line resource called the Arabidopsis Information Resource (TAIR) (http://arabidopsis.org), which is accessible via commonly used web browsers and can be searched and downloaded in a number of ways. In the last two years, efforts have been focused on increasing data content and diversity, functionally annotating genes and gene products with controlled vocabularies, and improving data retrieval, analysis and visualization tools. New information include sequence polymorphisms including alleles, germplasms and phenotypes, Gene Ontology annotations, gene families, protein information, metabolic pathways, gene expression data from microarray experiments and seed and DNA stocks. New data visualization and analysis tools include SeqViewer, which interactively displays the genome from the whole chromosome down to 10 kb of nucleotide sequence and AraCyc, a metabolic pathway database and map tool that allows overlaying expression data onto the pathway diagrams. Finally, we have recently incorporated seed and DNA stock information from the Arabidopsis Biological Resource Center (ABRC) and implemented a shopping-cart style on-line ordering system.
PMCID: PMC165523  PMID: 12519987
23.  Mutations in a Novel, Cryptic Exon of the Luteinizing Hormone/Chorionic Gonadotropin Receptor Gene Cause Male Pseudohermaphroditism 
PLoS Medicine  2008;5(4):e88.
Background
Male pseudohermaphroditism, or Leydig cell hypoplasia (LCH), is an autosomal recessive disorder in individuals with a 46,XY karyotype, characterized by a predominantly female phenotype, a blind-ending vagina, absence of breast development, primary amenorrhea, and the presence of testicular structures. It is caused by mutations in the luteinizing hormone/chorionic gonadotropin receptor gene (LHCGR), which impair either LH/CG binding or signal transduction. However, molecular analysis has revealed that the LHCGR is apparently normal in about 50% of patients with the full clinical phenotype of LCH. We therefore searched the LHCGR for novel genomic elements causative for LCH.
Methods and Findings
In the present study we have identified a novel, primate-specific bona fide exon (exon 6A) within the LHCGR gene. It displays composite characteristics of an internal/terminal exon and possesses stop codons triggering nonsense-mediated mRNA decay (NMD) in LHCGR. Transcripts including exon 6A are physiologically highly expressed in human testes and granulosa cells, and result in an intracellular, truncated LHCGR protein of 209 amino acids. We sequenced exon 6A in 16 patients with unexplained LCH and detected mutations in three patients. Functional studies revealed a dramatic increase in the expression of the mutated internal exon 6A transcripts, indicating aberrant NMD. These altered ratios of LHCGR transcripts result in the generation of predominantly nonfunctional LHCGR isoforms, thereby preventing proper expression and functioning.
Conclusions
The identification and characterization of this novel exon not only identifies a new regulatory element within the genomic organization of LHCGR, but also points toward a complex network of receptor regulation, including events at the transcriptional level. These findings add to the molecular diagnostic tools for LCH and extend our understanding of the endocrine regulation of sexual differentiation.
Joerg Gromoll and colleagues describe the identification and characterization of a novel exon that appears to be a new regulatory element within the luteinizing hormone/chorionic gonadotropin receptor gene of three individuals with Leydig cell hypoplasia.
Editors' Summary
Background.
A person's sex is determined by their complement of X and Y (sex) chromosomes. Someone who has two X chromosomes is genetically female and usually has ovaries and female external sex organs. Someone who has an X and a Y chromosome is genetically male and has testes and male external sex organs. Sometimes, though, the development of the reproductive organs proceeds abnormally, resulting in a person with an “intersex” condition whose chromosomes, gonads (ovaries or testes), and external sex organs do not correspond. Leydig cell hypoplasia (LCH; also called male pseudohermaphroditism or a disorder of sex development) is an XY female intersex condition. People with this inherited condition develop testes but also have a vagina (which is not connected to a womb), and they do not develop breasts or have periods. This mixture of sexual characteristics arises because the Leydig cells in the testes are underdeveloped. Leydig cells normally secrete testosterone, the hormone that promotes the development and maintenance of male sex characteristics. Before birth, chorionic gonadotropin (CG; a hormone made by the placenta) stimulates Leydig cell development and testosterone production; after birth, luteinizing hormone (LH), which is made by the pituitary gland, stimulates testosterone production. Both hormones bind to the LH/CG receptor, a protein on the surface of Leydig cells. In LCH, this receptor either does not bind CG and LH or fails to tell the Leydig cells to make testosterone.
Why Was This Study Done?
The gene that encodes the LH/CG receptor is called LHCGR. Several mutations (genetic changes) that inactivate the LC/CG receptor have been identified in people with LCH. However, the LHCGR gene is apparently normal in 50% of people with this intersex condition. In this study, the researchers examine the LHCGR gene in detail to try to find the underlying genetic defect in these individuals.
What Did the Researchers Do and Find?
The researchers used several molecular biology techniques to identify a new exon—exon 6A—within the human LHCGR gene. (Exons are DNA sequences that contain the information for making proteins; introns are DNA sequences that interrupt the coding sequence of a gene. Both introns and exons are transcribed into messenger RNA [mRNA] and the exons are then “spliced” together to make the mature mRNA, which is translated into protein.) The researchers identify several differently spliced LHCGR mRNA transcripts that contain exon 6A—a terminal exon 6A mRNA that contains exons 1–6 and exon 6A, and two internal exon 6A mRNAs that also contain exons 7–11. The researchers report that human testes express high levels of the terminal exon 6A transcript, which is translated into a short version of LHCGR protein that remains within the cell (full-length LHCGR moves to the cell surface). By contrast, testes contain low levels of the internal exon 6A mRNAs. This is because exon 6A contains two premature stop codons (DNA sequences that mark the end of a protein), which trigger “nonsense-mediated decay” (NMD), a cellular surveillance mechanism that regulates protein synthesis by degrading mRNAs that contain internal stop codons. When the researchers screened 16 people with LCH but without known mutations in the LHCGR gene, three had mutations in exon 6A. Laboratory experiments show that these mutations greatly increased the amounts of the internal exon 6A transcripts present in cells and interfered with the cells' normal response to chorionic gonadotropin.
What Do These Findings Mean?
These findings identify a new, functional exon in the LHCGR gene and show that mutations in this exon cause some cases of LCH. This is the first time that a human disease has been associated with mutations in an exon that is a target for NMD. In addition, these findings provide important insights into how the LHCGR is regulated. The researchers speculate that a complex network that involves the exon 6A-containing transcripts and NMD normally tightly regulates the production of functional LHCGR already at the transcriptional level. When mutations are present in exon 6A, they suggest, NMD is the predominant pathway for all the exon 6A-containing transcripts, thereby drastically decreasing the amount of functional LHCGR.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0050088.
The MedlinePlus Encyclopedia has a page on intersex conditions (in English and Spanish)
Wikipedia has pages on intersexuality and on the LH/CG receptor (note that Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
The Intersex Society of North America provides information and support for the parents of children with intersex conditions
The Androgen Insensitivity Syndrome Support Group also provides some general information about intersex conditions, including information about LCH and other XY female conditions (in several languages)
Sequence-Structure-Function-Analysis (SSFA), run by a group of researchers in Germany (Leibniz-Institut für Molekulare Pharmakologie; Humboldt-Universitätzu Berlin), is a database dealing the sequence, structure, and function of glycoprotein hormone receptors
Glycoprotein-hormone Receptors Information System (GRIS), from Université Libre de Bruxelles and Institut de Recherche Interdisciplinaire en Biologie Humaine et Moléculaire, is a database giving structural information on the LHCGR
doi:10.1371/journal.pmed.0050088
PMCID: PMC2323302  PMID: 18433292
24.  Differential motif enrichment analysis of paired ChIP-seq experiments 
BMC Genomics  2014;15(1):752.
Background
Motif enrichment analysis of transcription factor ChIP-seq data can help identify transcription factors that cooperate or compete. Previously, little attention has been given to comparative motif enrichment analysis of pairs of ChIP-seq experiments, where the binding of the same transcription factor is assayed under different conditions. Such comparative analysis could potentially identify the distinct regulatory partners/competitors of the assayed transcription factor under different conditions or at different stages of development.
Results
We describe a new methodology for identifying sequence motifs that are differentially enriched in one set of DNA or RNA sequences relative to another set, and apply it to paired ChIP-seq experiments. We show that, using paired ChIP-seq data for a single transcription factor, differential motif enrichment analysis identifies all the known key transcription factors involved in the transformation of non-cancerous immortalized breast cells (MCF10A-ER-Src cells) into cancer stem cells whereas non-differential motif enrichment analysis does not. We also show that differential motif enrichment analysis identifies regulatory motifs that are significantly enriched at constrained locations within the bound promoters, and that these motifs are not identified by non-differential motif enrichment analysis. Our methodology differs from other approaches in that it leverages both comparative enrichment and positional enrichment of motifs in ChIP-seq peak regions or in the promoters of genes bound by the transcription factor.
Conclusions
We show that differential motif enrichment analysis of paired ChIP-seq experiments offers biological insights not available from non-differential analysis. In contrast to previous approaches, our method detects motifs that are enriched in a constrained region in one set of sequences, but not enriched in the same region in the comparative set. We have enhanced the web-based CentriMo algorithm to allow it to perform the constrained differential motif enrichment analysis described in this paper, and CentriMo’s on-line interface (http://meme.ebi.edu.au) provides dozens of databases of DNA- and RNA-binding motifs from a full range of organisms. All data and output files presented here are available at http://research.imb.uq.edu.au/t.bailey/supplementary_data/Lesluyes2014.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-752) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-752
PMCID: PMC4167127  PMID: 25179504
Comparative ChIP-seq analysis; Constrained differential motif enrichment analysis; MCF10A-ER-Src cells; ChIP-seq; Regulation of transcription; Gene expression
25.  SyStemCell: A Database Populated with Multiple Levels of Experimental Data from Stem Cell Differentiation Research 
PLoS ONE  2012;7(7):e35230.
Elucidation of the mechanisms of stem cell differentiation is of great scientific interest. Increasing evidence suggests that stem cell differentiation involves changes at multiple levels of biological regulation, which together orchestrate the complex differentiation process; many related studies have been performed to investigate the various levels of regulation. The resulting valuable data, however, remain scattered. Most of the current stem cell-relevant databases focus on a single level of regulation (mRNA expression) from limited stem cell types; thus, a unifying resource would be of great value to compile the multiple levels of research data available. Here we present a database for this purpose, SyStemCell, deposited with multi-level experimental data from stem cell research. The database currently covers seven levels of stem cell differentiation-associated regulatory mechanisms, including DNA CpG 5-hydroxymethylcytosine/methylation, histone modification, transcript products, microRNA-based regulation, protein products, phosphorylation proteins and transcription factor regulation, all of which have been curated from 285 peer-reviewed publications selected from PubMed. The database contains 43,434 genes, recorded as 942,221 gene entries, for four organisms (Homo sapiens, Mus musculus, Rattus norvegicus, and Macaca mulatta) and various stem cell sources (e.g., embryonic stem cells, neural stem cells and induced pluripotent stem cells). Data in SyStemCell can be queried by Entrez gene ID, symbol, alias, or browsed by specific stem cell type at each level of genetic regulation. An online analysis tool is integrated to assist researchers to mine potential relationships among different regulations, and the potential usage of the database is demonstrated by three case studies. SyStemCell is the first database to bridge multi-level experimental information of stem cell studies, which can become an important reference resource for stem cell researchers. The database is available at http://lifecenter.sgst.cn/SyStemCell/.
doi:10.1371/journal.pone.0035230
PMCID: PMC3396617  PMID: 22807998

Results 1-25 (1030495)