Search tips
Search criteria

Results 1-5 (5)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  RefSeq: an update on mammalian reference sequences 
Nucleic Acids Research  2013;42(Database issue):D756-D763.
The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration ( We report here on growth of the mammalian and human subsets, changes to NCBI’s eukaryotic annotation pipeline and modifications affecting transcript and protein records. Recent changes to NCBI’s eukaryotic genome annotation pipeline provide higher throughput, and the addition of RNAseq data to the pipeline results in a significant expansion of the number of transcripts and novel exons annotated on mammalian RefSeq genomes. Recent annotation changes include reporting supporting evidence for transcript records, modification of exon feature annotation and the addition of a structured report of gene and sequence attributes of biological interest. We also describe a revised protein annotation policy for alternatively spliced transcripts with more divergent predicted proteins and we summarize the current status of the RefSeqGene project.
PMCID: PMC3965018  PMID: 24259432
2.  Current status and new features of the Consensus Coding Sequence database  
Nucleic Acids Research  2013;42(Database issue):D865-D872.
The Consensus Coding Sequence (CCDS) project ( is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.
PMCID: PMC3965069  PMID: 24217909
3.  Tracking and coordinating an international curation effort for the CCDS Project 
The Consensus Coding Sequence (CCDS) collaboration involves curators at multiple centers with a goal of producing a conservative set of high quality, protein-coding region annotations for the human and mouse reference genome assemblies. The CCDS data set reflects a ‘gold standard’ definition of best supported protein annotations, and corresponding genes, which pass a standard series of quality assurance checks and are supported by manual curation. This data set supports use of genome annotation information by human and mouse researchers for effective experimental design, analysis and interpretation. The CCDS project consists of analysis of automated whole-genome annotation builds to identify identical CDS annotations, quality assurance testing and manual curation support. Identical CDS annotations are tracked with a CCDS identifier (ID) and any future change to the annotated CDS structure must be agreed upon by the collaborating members. CCDS curation guidelines were developed to address some aspects of curation in order to improve initial annotation consistency and to reduce time spent in discussing proposed annotation updates. Here, we present the current status of the CCDS database and details on our procedures to track and coordinate our efforts. We also present the relevant background and reasoning behind the curation standards that we have developed for CCDS database treatment of transcripts that are nonsense-mediated decay (NMD) candidates, for transcripts containing upstream open reading frames, for identifying the most likely translation start codons and for the annotation of readthrough transcripts. Examples are provided to illustrate the application of these guidelines.
Database URL:
PMCID: PMC3308164  PMID: 22434842
4.  A Complex Chromatin Landscape Revealed by Patterns of Nuclease Sensitivity and Histone Modification within the Mouse β-Globin Locus 
Molecular and Cellular Biology  2003;23(15):5234-5244.
In order to create an extended map of chromatin features within a mammalian multigene locus, we have determined the extent of nuclease sensitivity and the pattern of histone modifications associated with the mouse β-globin genes in adult erythroid tissue. We show that the nuclease-sensitive domain encompasses the β-globin genes along with several flanking olfactory receptor genes that are inactive in erythroid cells. We describe enhancer-blocking or boundary elements on either side of the locus that are bound in vivo by the transcription factor CTCF, but we found that they do not coincide with transitions in nuclease sensitivity flanking the locus or with patterns of histone modifications within it. In addition, histone hyperacetylation and dimethylation of histone H3 K4 are not uniform features of the nuclease-sensitive mouse β-globin domain but rather define distinct subdomains within it. Our results reveal a complex chromatin landscape for the active β-globin locus and illustrate the complexity of broad structural changes that accompany gene activation.
PMCID: PMC165715  PMID: 12861010
5.  Conserved CTCF Insulator Elements Flank the Mouse and Human β-Globin Loci 
Molecular and Cellular Biology  2002;22(11):3820-3831.
A binding site for the transcription factor CTCF is responsible for enhancer-blocking activity in a variety of vertebrate insulators, including the insulators at the 5′ and 3′ chromatin boundaries of the chicken β-globin locus. To date, no functional domain boundaries have been defined at mammalian β-globin loci, which are embedded within arrays of functional olfactory receptor genes. In an attempt to define boundary elements that could separate these gene clusters, CTCF-binding sites were searched for at the most distal DNase I-hypersensitive sites (HSs) of the mouse and human β-globin loci. Conserved CTCF sites were found at 5′HS5 and 3′HS1 of both loci. All of these sites could bind to CTCF in vitro. The sites also functioned as insulators in enhancer-blocking assays at levels correlating with CTCF-binding affinity, although enhancer-blocking activity was weak with the mouse 5′HS5 site. These results show that with respect to enhancer-blocking elements, the architecture of the mouse and human β-globin loci is similar to that found previously for the chicken β-globin locus. Unlike the chicken locus, the mouse and human β-globin loci do not have nearby transitions in chromatin structure but the data suggest that 3′HS1 and 5′HS5 may function as insulators that prevent inappropriate interactions between β-globin regulatory elements and those of neighboring domains or subdomains, many of which possess strong enhancers.
PMCID: PMC133827  PMID: 11997516

Results 1-5 (5)