PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-14 (14)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  Exon resequencing of H3K9 methyltransferase complex genes, EHMT1, EHTM2 and WIZ, in Japanese autism subjects 
Molecular Autism  2014;5(1):49.
Background
Histone H3 methylation at lysine 9 (H3K9) is a conserved epigenetic signal, mediating heterochromatin formation by trimethylation, and transcriptional silencing by dimethylation. Defective GLP (Ehmt1) and G9a (Ehmt2) histone lysine methyltransferases, involved in mono and dimethylation of H3K9, confer autistic phenotypes and behavioral abnormalities in animal models. Moreover, EHMT1 loss of function results in Kleefstra syndrome, characterized by severe intellectual disability, developmental delays and psychiatric disorders. We examined the possible role of histone methyltransferases in the etiology of autism spectrum disorders (ASD) and suggest that rare functional variants in these genes that regulate H3K9 methylation may be associated with ASD.
Methods
Since G9a-GLP-Wiz forms a heteromeric methyltransferase complex, all the protein-coding regions and exon/intron boundaries of EHMT1, EHMT2 and WIZ were sequenced in Japanese ASD subjects. The detected variants were prioritized based on novelty and functionality. The expression levels of these genes were tested in blood cells and postmortem brain samples from ASD and control subjects. Expression of EHMT1 and EHMT2 isoforms were determined by digital PCR.
Results
We identified six nonsynonymous variants: three in EHMT1, two in EHMT2 and one in WIZ. Two variants, the EHMT1 ankyrin repeat domain (Lys968Arg) and EHMT2 SET domain (Thr961Ile) variants were present exclusively in cases, but showed no statistically significant association with ASD. The EHMT2 transcript expression was significantly elevated in the peripheral blood cells of ASD when compared with control samples; but not for EHMT1 and WIZ. Gene expression levels of EHMT1, EHMT2 and WIZ in Brodmann area (BA) 9, BA21, BA40 and the dorsal raphe nucleus (DoRN) regions from postmortem brain samples showed no significant changes between ASD and control subjects. Nor did expression levels of EHMT1 and EHMT2 isoforms in the prefrontal cortex differ significantly between ASD and control groups.
Conclusions
We identified two novel rare missense variants in the EHMT1 and EHMT2 genes of ASD patients. We surmise that these variants alone may not be sufficient to exert a significant effect on ASD pathogenesis. The elevated expression of EHMT2 in the peripheral blood cells may support the notion of a restrictive chromatin state in ASD, similar to schizophrenia.
Electronic supplementary material
The online version of this article (doi:10.1186/2040-2392-5-49) contains supplementary material, which is available to authorized users.
doi:10.1186/2040-2392-5-49
PMCID: PMC4233047  PMID: 25400900
Autism; Rare variant; GLP; G9a; Wiz; Histone methyltransferase; H3K9
2.  An assignment of intrinsically disordered regions of proteins based on NMR structures 
Journal of structural biology  2012;181(1):29-36.
Intrinsically disordered proteins (IDPs) do not adopt stable three-dimensional structures in physiological conditions, yet these proteins play crucial roles in biological phenomena. In most cases, intrinsic disorder manifests itself in segments or domains of an IDP, called intrinsically disordered regions (IDRs), but fully disordered IDPs also exist. Although IDRs can be detected as missing residues in protein structures determined by X-ray crystallography, no protocol has been developed to identify IDRs from structures obtained by Nuclear Magnetic Resonance (NMR). Here, we propose a computational method to assign IDRs based on NMR structures. We compared missing residues of X-ray structures with residue-wise deviations of NMR structures for identical proteins, and derived a threshold deviation that gives the best correlation of ordered and disordered regions of both structures. The obtained threshold of 3.2 Å was applied to proteins whose structures were only determined by NMR, and the resulting IDRs were analyzed and compared to those of X-ray structures with no NMR counterpart in terms of sequence length, IDR fraction, protein function, cellular location, and amino acid composition, all of which suggest distinct characteristics. The structural knowledge of IDPs is still inadequate compared with that of structured proteins. Our method can collect and utilize IDRs from structures determined by NMR, potentially enhancing the understanding of IDPs.
doi:10.1016/j.jsb.2012.10.017
PMCID: PMC3529935  PMID: 23142703
intrinsically disordered proteins; deviations; missing residues; Matthews’s correlation coefficient; Nuclear Magnetic Resonance
3.  IDEAL in 2014 illustrates interaction networks composed of intrinsically disordered proteins and their binding partners 
Nucleic Acids Research  2013;42(Database issue):D320-D325.
IDEAL (Intrinsically Disordered proteins with Extensive Annotations and Literature, http://www.ideal.force.cs.is.nagoya-u.ac.jp/IDEAL/) is a collection of intrinsically disordered proteins (IDPs) that cannot adopt stable globular structures under physiological conditions. Since its previous publication in 2012, the number of entries in IDEAL has almost tripled (120 to 340). In addition to the increase in quantity, the quality of IDEAL has been significantly improved. The new IDEAL incorporates the interactions of IDPs and their binding partners more explicitly, and illustrates the protein–protein interaction (PPI) networks and the structures of protein complexes. Redundant experimental data are arranged based on the clustering of Protein Data Bank entries, and similar sequences with the same binding mode are grouped. As a result, the new IDEAL presents more concise and informative experimental data. Nuclear magnetic resonance (NMR) disorder is annotated in a systematic manner, by identifying the regions with large deviations among the NMR models. The ordered/disordered and new domain predictions by DICHOT are available, as well as the domain assignments by HMMER. Some examples of the PPI networks and the highly deviated regions derived from NMR models will be described, together with other advances. These enhancements will facilitate deeper understanding of IDPs, in terms of their flexibility, plasticity and promiscuity.
doi:10.1093/nar/gkt1010
PMCID: PMC3965115  PMID: 24178034
4.  Influence of Structural Symmetry on Protein Dynamics 
PLoS ONE  2012;7(11):e50011.
Structural symmetry in homooligomeric proteins has intrigued many researchers over the past several decades. However, the implication of protein symmetry is still not well understood. In this study, we performed molecular dynamics (MD) simulations of two forms of trp RNA binding attenuation protein (TRAP), the wild-type 11-mer and an engineered 12-mer, having two different levels of circular symmetry. The results of the simulations showed that the inter-subunit fluctuations in the 11-mer TRAP were significantly smaller than the fluctuations in the 12-mer TRAP while the internal fluctuations were larger in the 11-mer than in the 12-mer. These differences in thermal fluctuations were interpreted by normal mode analysis and group theory. For the 12-mer TRAP, the wave nodes of the normal modes existed at the flexible interface between the subunits, while the 11-mer TRAP had its nodes within the subunits. The principal components derived from the MD simulations showed similar mode structures. These results demonstrated that the structural symmetry was an important determinant of protein dynamics in circularly symmetric homooligomeric proteins.
doi:10.1371/journal.pone.0050011
PMCID: PMC3506605  PMID: 23189176
5.  A novel biclustering approach with iterative optimization to analyze gene expression data 
Video abstract
Video
Objective
With the dramatic increase in microarray data, biclustering has become a promising tool for gene expression analysis. Biclustering has been proven to be superior over clustering in identifying multifunctional genes and searching for co-expressed genes under a few specific conditions; that is, a subgroup of all conditions. Biclustering based on a genetic algorithm (GA) has shown better performance than greedy algorithms, but the overlap state for biclusters must be treated more systematically.
Results
We developed a new biclustering algorithm (binary-iterative genetic algorithm [BIGA]), based on an iterative GA, by introducing a novel, ternary-digit chromosome encoding function. BIGA searches for a set of biclusters by iterative binary divisions that allow the overlap state to be explicitly considered. In addition, the average of the Pearson’s correlation coefficient was employed to measure the relationship of genes within a bicluster, instead of the mean square residual, the popular classical index. As compared to the six existing algorithms, BIGA found highly correlated biclusters, with large gene coverage and reasonable gene overlap. The gene ontology (GO) enrichment showed that most of the biclusters are significant, with at least one GO term over represented.
Conclusion
BIGA is a powerful tool to analyze large amounts of gene expression data, and will facilitate the elucidation of the underlying functional mechanisms in living organisms.
doi:10.2147/AABC.S32622
PMCID: PMC3459542  PMID: 23055751
biclustering; microarray data; genetic algorithm; Pearson’s correlation coefficient
6.  PSCDB: a database for protein structural change upon ligand binding 
Nucleic Acids Research  2011;40(Database issue):D554-D558.
Proteins are flexible molecules that undergo structural changes to function. The Protein Data Bank contains multiple entries for identical proteins determined under different conditions, e.g. with and without a ligand molecule, which provides important information for understanding the structural changes related to protein functions. We gathered 839 protein structural pairs of ligand-free and ligand-bound states from monomeric or homo-dimeric proteins, and constructed the Protein Structural Change DataBase (PSCDB). In the database, we focused on whether the motions were coupled with ligand binding. As a result, the protein structural changes were classified into seven classes, i.e. coupled domain motion (59 structural changes), independent domain motion (70), coupled local motion (125), independent local motion (135), burying ligand motion (104), no significant motion (311) and other type motion (35). PSCDB provides lists of each class. On each entry page, users can view detailed information about the motion, accompanied by a morphing animation of the structural changes. PSCDB is available at http://idp1.force.cs.is.nagoya-u.ac.jp/pscdb/.
doi:10.1093/nar/gkr966
PMCID: PMC3245091  PMID: 22080505
7.  IDEAL: Intrinsically Disordered proteins with Extensive Annotations and Literature 
Nucleic Acids Research  2011;40(Database issue):D507-D511.
IDEAL, Intrinsically Disordered proteins with Extensive Annotations and Literature (http://www.ideal.force.cs.is.nagoya-u.ac.jp/IDEAL/), is a collection of knowledge on experimentally verified intrinsically disordered proteins. IDEAL contains manual annotations by curators on intrinsically disordered regions, interaction regions to other molecules, post-translational modification sites, references and structural domain assignments. In particular, IDEAL explicitly describes protean segments that can be transformed from a disordered state to an ordered state. Since in most cases they can act as molecular recognition elements upon binding of partner proteins, IDEAL provides a data resource for functional regions of intrinsically disordered proteins. The information in IDEAL is provided on a user-friendly graphical view and in a computer-friendly XML format.
doi:10.1093/nar/gkr884
PMCID: PMC3245138  PMID: 22067451
8.  SAHG, a comprehensive database of predicted structures of all human proteins 
Nucleic Acids Research  2010;39(Database issue):D487-D493.
Most proteins from higher organisms are known to be multi-domain proteins and contain substantial numbers of intrinsically disordered (ID) regions. To analyse such protein sequences, those from human for instance, we developed a special protein-structure-prediction pipeline and accumulated the products in the Structure Atlas of Human Genome (SAHG) database at http://bird.cbrc.jp/sahg. With the pipeline, human proteins were examined by local alignment methods (BLAST, PSI-BLAST and Smith–Waterman profile–profile alignment), global–local alignment methods (FORTE) and prediction tools for ID regions (POODLE-S) and homology modeling (MODELLER). Conformational changes of protein models upon ligand-binding were predicted by simultaneous modeling using templates of apo and holo forms. When there were no suitable templates for holo forms and the apo models were accurate, we prepared holo models using prediction methods for ligand-binding (eF-seek) and conformational change (the elastic network model and the linear response theory). Models are displayed as animated images. As of July 2010, SAHG contains 42 581 protein-domain models in approximately 24 900 unique human protein sequences from the RefSeq database. Annotation of models with functional information and links to other databases such as EzCatDB, InterPro or HPRD are also provided to facilitate understanding the protein structure-function relationships.
doi:10.1093/nar/gkq1057
PMCID: PMC3013665  PMID: 21051360
9.  Two Distinct Mechanisms for Actin Capping Protein Regulation—Steric and Allosteric Inhibition 
PLoS Biology  2010;8(7):e1000416.
A crystallographic study reveals the structural basis for regulation by two different inhibitors of the actin capping protein, a critical factor controlling actin-driven cell motility.
The actin capping protein (CP) tightly binds to the barbed end of actin filaments, thus playing a key role in actin-based lamellipodial dynamics. V-1 and CARMIL proteins directly bind to CP and inhibit the filament capping activity of CP. V-1 completely inhibits CP from interacting with the barbed end, whereas CARMIL proteins act on the barbed end-bound CP and facilitate its dissociation from the filament (called uncapping activity). Previous studies have revealed the striking functional differences between the two regulators. However, the molecular mechanisms describing how these proteins inhibit CP remains poorly understood. Here we present the crystal structures of CP complexed with V-1 and with peptides derived from the CP-binding motif of CARMIL proteins (CARMIL, CD2AP, and CKIP-1). V-1 directly interacts with the primary actin binding surface of CP, the C-terminal region of the α-subunit. Unexpectedly, the structures clearly revealed the conformational flexibility of CP, which can be attributed to a twisting movement between the two domains. CARMIL peptides in an extended conformation interact simultaneously with the two CP domains. In contrast to V-1, the peptides do not directly compete with the barbed end for the binding surface on CP. Biochemical assays revealed that the peptides suppress the interaction between CP and V-1, despite the two inhibitors not competing for the same binding site on CP. Furthermore, a computational analysis using the elastic network model indicates that the interaction of the peptides alters the intrinsic fluctuations of CP. Our results demonstrate that V-1 completely sequesters CP from the barbed end by simple steric hindrance. By contrast, CARMIL proteins allosterically inhibit CP, which appears to be a prerequisite for the uncapping activity. Our data suggest that CARMIL proteins down-regulate CP by affecting its conformational dynamics. This conceptually new mechanism of CP inhibition provides a structural basis for the regulation of the barbed end elongation in cells.
Author Summary
Actin is a ubiquitous eukaryotic protein that polymerizes into bidirectional filaments and plays essential roles in a variety of biological processes, including cell division, muscle contraction, neuronal development, and cell motility. The actin capping protein (CP) tightly binds to the fast-growing end of the filament (the barbed end) to block monomer association and dissociation at this end, thus acting as an important regulator of actin filament dynamics in cells. Using X-ray crystallography, we present the atomic structures of CP in complex with fragments of two inhibitory proteins, V-1 and CARMIL, to compare the modes of action of these two regulators. The structures demonstrate that V-1 directly blocks the actin-binding site of CP, thereby preventing filament capping, whereas CARMIL functions in a very different manner. Detailed comparison of several CP structures revealed that CP has two stable domains that are continuously twisting relative to each other. CARMIL peptides were found to bind across the two domains of CP on a surface distinct from its actin binding sites. We propose that CARMIL peptides attenuate the binding of CP to actin filaments by suppressing the twisting movement required for tight barbed end capping. Our comparative structural studies therefore have revealed substantial insights in the variety of mechanisms by which different actin regulatory factors function.
doi:10.1371/journal.pbio.1000416
PMCID: PMC2897767  PMID: 20625546
10.  An evaluation of minimal cellular functions to sustain a bacterial cell 
BMC Systems Biology  2009;3:111.
Background
Both computational and experimental approaches have been used to determine the minimal gene set required to sustain a bacterial cell. Such studies have provided clues to the minimal cellular-function set needed for life. We evaluate a minimal cellular-function set directly, instead of a geneset.
Results
We estimated the essentialities of KEGG pathway maps as the entities of cellular functions, based on comparative genomics and metabolic network analyses. The former examined the evolutionary conservation of each pathway map by homology searches, and detected "conserved pathway maps". The latter identified "organism-specific pathway maps" that supply compounds required for the conserved pathway maps. We defined both pathway maps as "autonomous pathway maps". Among the set of autonomous pathway maps, the one that could synthesize all of the biomass components (the essential constituents for the cellular component of Escherichia coli/Bacillus subtilis), and that was composed of a minimal number of pathway maps, was determined for each of E. coli and B. subtilis, as "minimal pathway maps". We consider that they correspond to a minimal cellular-function set. The network of minimal pathway maps, composed of 20 conserved pathway maps and 21 organism-specific pathway maps for E. coli, starts a sequence of catabolic processes from carbohydrate metabolism. The catabolized compounds are used for anabolism, thus creating materials for cell components and for genetic information processing.
Conclusion
Our analyses of these pathway maps revealed that those functioning in "genetic information processing" are likely to be conserved, but those for catabolism are not, reflecting an evolutionary aspect of cellular functions. Minimal pathway maps were compared with a systematic gene knockout experiment, other computational results and parasitic genomes, and showed qualitative agreement, with some reasonable exceptions due to the experimental conditions or differences of computational methods. Our method provides an alternative way to explore the minimal cellular function set.
doi:10.1186/1752-0509-3-111
PMCID: PMC2789071  PMID: 19943949
11.  An enhanced partial order curve comparison algorithm and its application to analyzing protein folding trajectories 
BMC Bioinformatics  2008;9:344.
Background
Understanding how proteins fold is essential to our quest in discovering how life works at the molecular level. Current computation power enables researchers to produce a huge amount of folding simulation data. Hence there is a pressing need to be able to interpret and identify novel folding features from them.
Results
In this paper, we model each folding trajectory as a multi-dimensional curve. We then develop an effective multiple curve comparison (MCC) algorithm, called the enhanced partial order (EPO) algorithm, to extract features from a set of diverse folding trajectories, including both successful and unsuccessful simulation runs. The EPO algorithm addresses several new challenges presented by comparing high dimensional curves coming from folding trajectories. A detailed case study on miniprotein Trp-cage [1] demonstrates that our algorithm can detect similarities at rather low level, and extract biologically meaningful folding events.
Conclusion
The EPO algorithm is general and applicable to a wide range of applications. We demonstrate its generality and effectiveness by applying it to aligning multiple protein structures with low similarities. For user's convenience, we provide a web server for the algorithm at .
doi:10.1186/1471-2105-9-344
PMCID: PMC2571979  PMID: 18710565
12.  GTOP: a database of protein structures predicted from genome sequences 
Nucleic Acids Research  2002;30(1):294-298.
Large-scale genome projects generate an unprecedented number of protein sequences, most of them are experimentally uncharacterized. Predicting the 3D structures of sequences provides important clues as to their functions. We constructed the Genomes TO Protein structures and functions (GTOP) database, containing protein fold predictions of a huge number of sequences. Predictions are mainly carried out with the homology search program PSI-BLAST, currently the most popular among high-sensitivity profile search methods. GTOP also includes the results of other analyses, e.g. homology and motif search, detection of transmembrane helices and repetitive sequences. We have completed analyzing the sequences of 41 organisms, with the number of proteins exceeding 120 000 in total. GTOP uses a graphical viewer to present the analytical results of each ORF in one page in a ‘color-bar’ format. The assigned 3D structures are presented by Chime plug-in or RasMol. The binding sites of ligands are also included, providing functional information. The GTOP server is available at http://spock.genes.nig.ac.jp/~genome/gtop.html.
PMCID: PMC99104  PMID: 11752318
13.  DNA Data Bank of Japan (DDBJ) in collaboration with mass sequencing teams 
Nucleic Acids Research  2000;28(1):24-26.
We at DDBJ (http://www.ddbj.nig.ac.jp ) process and publicise the massive amounts of data submitted mainly by Japanese genome projects and sequencing teams. It is emphasised that the collaboration between data producing teams and the data bank is crucial in carrying out these processes smoothly. The amount of data submitted in 1999 is so large that it alone exceeds the total amount submitted in the preceding 10 years. To cope with this situation, we have developed tools not only for processing such massive amounts of data but also for efficiently retrieving data on demand.
PMCID: PMC102400  PMID: 10592172
14.  Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones 
Imanishi, Tadashi | Itoh, Takeshi | Suzuki, Yutaka | O'Donovan, Claire | Fukuchi, Satoshi | Koyanagi, Kanako O | Barrero, Roberto A | Tamura, Takuro | Yamaguchi-Kabata, Yumi | Tanino, Motohiko | Yura, Kei | Miyazaki, Satoru | Ikeo, Kazuho | Homma, Keiichi | Kasprzyk, Arek | Nishikawa, Tetsuo | Hirakawa, Mika | Thierry-Mieg, Jean | Thierry-Mieg, Danielle | Ashurst, Jennifer | Jia, Libin | Nakao, Mitsuteru | Thomas, Michael A | Mulder, Nicola | Karavidopoulou, Youla | Jin, Lihua | Kim, Sangsoo | Yasuda, Tomohiro | Lenhard, Boris | Eveno, Eric | Suzuki, Yoshiyuki | Yamasaki, Chisato | Takeda, Jun-ichi | Gough, Craig | Hilton, Phillip | Fujii, Yasuyuki | Sakai, Hiroaki | Tanaka, Susumu | Amid, Clara | Bellgard, Matthew | de Fatima Bonaldo, Maria | Bono, Hidemasa | Bromberg, Susan K | Brookes, Anthony J | Bruford, Elspeth | Carninci, Piero | Chelala, Claude | Couillault, Christine | de Souza, Sandro J. | Debily, Marie-Anne | Devignes, Marie-Dominique | Dubchak, Inna | Endo, Toshinori | Estreicher, Anne | Eyras, Eduardo | Fukami-Kobayashi, Kaoru | R. Gopinath, Gopal | Graudens, Esther | Hahn, Yoonsoo | Han, Michael | Han, Ze-Guang | Hanada, Kousuke | Hanaoka, Hideki | Harada, Erimi | Hashimoto, Katsuyuki | Hinz, Ursula | Hirai, Momoki | Hishiki, Teruyoshi | Hopkinson, Ian | Imbeaud, Sandrine | Inoko, Hidetoshi | Kanapin, Alexander | Kaneko, Yayoi | Kasukawa, Takeya | Kelso, Janet | Kersey, Paul | Kikuno, Reiko | Kimura, Kouichi | Korn, Bernhard | Kuryshev, Vladimir | Makalowska, Izabela | Makino, Takashi | Mano, Shuhei | Mariage-Samson, Regine | Mashima, Jun | Matsuda, Hideo | Mewes, Hans-Werner | Minoshima, Shinsei | Nagai, Keiichi | Nagasaki, Hideki | Nagata, Naoki | Nigam, Rajni | Ogasawara, Osamu | Ohara, Osamu | Ohtsubo, Masafumi | Okada, Norihiro | Okido, Toshihisa | Oota, Satoshi | Ota, Motonori | Ota, Toshio | Otsuki, Tetsuji | Piatier-Tonneau, Dominique | Poustka, Annemarie | Ren, Shuang-Xi | Saitou, Naruya | Sakai, Katsunaga | Sakamoto, Shigetaka | Sakate, Ryuichi | Schupp, Ingo | Servant, Florence | Sherry, Stephen | Shiba, Rie | Shimizu, Nobuyoshi | Shimoyama, Mary | Simpson, Andrew J | Soares, Bento | Steward, Charles | Suwa, Makiko | Suzuki, Mami | Takahashi, Aiko | Tamiya, Gen | Tanaka, Hiroshi | Taylor, Todd | Terwilliger, Joseph D | Unneberg, Per | Veeramachaneni, Vamsi | Watanabe, Shinya | Wilming, Laurens | Yasuda, Norikazu | Yoo, Hyang-Sook | Stodolsky, Marvin | Makalowski, Wojciech | Go, Mitiko | Nakai, Kenta | Takagi, Toshihisa | Kanehisa, Minoru | Sakaki, Yoshiyuki | Quackenbush, John | Okazaki, Yasushi | Hayashizaki, Yoshihide | Hide, Winston | Chakraborty, Ranajit | Nishikawa, Ken | Sugawara, Hideaki | Tateno, Yoshio | Chen, Zhu | Oishi, Michio | Tonellato, Peter | Apweiler, Rolf | Okubo, Kousaku | Wagner, Lukas | Wiemann, Stefan | Strausberg, Robert L | Isogai, Takao | Auffray, Charles | Nomura, Nobuo | Gojobori, Takashi | Sugano, Sumio
PLoS Biology  2004;2(6):e162.
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.
An international team has systematically validated and annotated just over 21,000 human genes using full-length cDNA, thereby providing a valuable new resource for the human genetics community
doi:10.1371/journal.pbio.0020162
PMCID: PMC393292  PMID: 15103394

Results 1-14 (14)