Search tips
Search criteria

Results 1-19 (19)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  Sequencing strategies and characterization of 721 vervet monkey genomes for future genetic analyses of medically relevant traits 
BMC Biology  2015;13:41.
We report here the first genome-wide high-resolution polymorphism resource for non-human primate (NHP) association and linkage studies, constructed for the Caribbean-origin vervet monkey, or African green monkey (Chlorocebus aethiops sabaeus), one of the most widely used NHPs in biomedical research. We generated this resource by whole genome sequencing (WGS) of monkeys from the Vervet Research Colony (VRC), an NIH-supported research resource for which extensive phenotypic data are available.
We identified genome-wide single nucleotide polymorphisms (SNPs) by WGS of 721 members of an extended pedigree from the VRC. From high-depth WGS data we identified more than 4 million polymorphic unequivocal segregating sites; by pruning these SNPs based on heterozygosity, quality control filters, and the degree of linkage disequilibrium (LD) between SNPs, we constructed genome-wide panels suitable for genetic association (about 500,000 SNPs) and linkage analysis (about 150,000 SNPs). To further enhance the utility of these resources for linkage analysis, we used a further pruned subset of the linkage panel to generate multipoint identity by descent matrices.
The genetic and phenotypic resources now available for the VRC and other Caribbean-origin vervets enable their use for genetic investigation of traits relevant to human diseases.
Electronic supplementary material
The online version of this article (doi:10.1186/s12915-015-0152-2) contains supplementary material, which is available to authorized users.
PMCID: PMC4494155  PMID: 26092298
Vervet; Non-human primate; Whole genome sequencing; SNP; Linkage; Association
2.  Genomic Comparison of Non-Typhoidal Salmonella enterica Serovars Typhimurium, Enteritidis, Heidelberg, Hadar and Kentucky Isolates from Broiler Chickens 
PLoS ONE  2015;10(6):e0128773.
Non-typhoidal Salmonella enterica serovars, associated with different foods including poultry products, are important causes of bacterial gastroenteritis worldwide. The colonization of the chicken gut by S. enterica could result in the contamination of the environment and food chain. The aim of this study was to compare the genomes of 25 S. enterica serovars isolated from broiler chicken farms to assess their intra- and inter-genetic variability, with a focus on virulence and antibiotic resistance characteristics.
Methodology/Principal Finding
The genomes of 25 S. enterica isolates covering five serovars (ten Typhimurium including three monophasic 4,[5],12:i:, four Enteritidis, three Hadar, four Heidelberg and four Kentucky) were sequenced. Most serovars were clustered in strongly supported phylogenetic clades, except for isolates of serovar Enteritidis that were scattered throughout the tree. Plasmids of varying sizes were detected in several isolates independently of serovars. Genes associated with the IncF plasmid and the IncI1 plasmid were identified in twelve and four isolates, respectively, while genes associated with the IncQ plasmid were found in one isolate. The presence of numerous genes associated with Salmonella pathogenicity islands (SPIs) was also confirmed. Components of the type III and IV secretion systems (T3SS and T4SS) varied in different isolates, which could explain in part, differences of their pathogenicity in humans and/or persistence in broilers. Conserved clusters of genes in the T3SS were detected that could be used in designing effective strategies (diagnostic, vaccination or treatments) to combat Salmonella. Antibiotic resistance genes (CMY, aadA, ampC, florR, sul1, sulI, tetAB, and srtA) and class I integrons were detected in resistant isolates while all isolates carried multidrug efflux pump systems regardless of their antibiotic susceptibility profile.
This study showed that the predominant Salmonella serovars in broiler chickens harbor genes encoding adhesins, flagellar proteins, T3SS, iron acquisition systems, and antibiotic and metal resistance genes that may explain their pathogenicity, colonization ability and persistence in chicken. The existence of mobile genetic elements indicates that isolates from a given serovar could acquire and transfer genetic material. Conserved genes in the T3SS and T4SS that we have identified are promising candidates for identification of diagnostic, antimicrobial or vaccine targets for the control of Salmonella in broiler chickens.
PMCID: PMC4470630  PMID: 26083489
3.  Complete Genome Sequence of Streptococcus thermophilus SMQ-301, a Model Strain for Phage-Host Interactions 
Genome Announcements  2015;3(3):e00480-15.
Streptococcus thermophilus is used by the dairy industry to manufacture yogurt and several cheeses. Using PacBio and Illumina platforms, we sequenced the genome of S. thermophilus SMQ-301, the host of several virulent phages. The genome is composed of 1,861,792 bp and contains 2,037 genes, 67 tRNAs, and 18 rRNAs.
PMCID: PMC4440953  PMID: 25999573
4.  Excretion of Host DNA in Feces Is Associated with Risk of Clostridium difficile Infection 
Journal of Immunology Research  2015;2015:246203.
Clostridium difficile infection (CDI) is intricately linked to the health of the gastrointestinal tract and its indigenous microbiota. In this study, we assessed whether fecal excretion of host DNA is associated with CDI development. Assuming that shedding of epithelial cell increases in the inflamed intestine, we used human DNA excretion as a marker of intestinal insult. Whole-genome shotgun sequencing was employed to quantify host DNA excretion and evaluate bacterial content in fecal samples collected from patients with incipient CDI, hospitalized controls, and healthy subjects. Human DNA excretion was significantly increased in patients admitted to the hospital for a gastrointestinal ailment, as well as prior to an episode of CDI. In multivariable analyses, human read abundance was independently associated with CDI development. Host DNA proportions were negatively correlated with intestinal microbiota diversity. Enterococcus and Escherichia were enriched in patients excreting high quantities of human DNA, while Ruminococcus and Odoribacter were depleted. These findings suggest that intestinal inflammation can occur prior to CDI development and may influence patient susceptibility to CDI. The quantification of human DNA in feces could serve as a simple and noninvasive approach to assess bowel inflammation and identify patients at risk of CDI.
PMCID: PMC4451987  PMID: 26090486
5.  Population Structure and Antimicrobial Resistance of Invasive Serotype IV Group B Streptococcus, Toronto, Ontario, Canada 
Emerging Infectious Diseases  2015;21(4):585-591.
Conjugate vaccines should include polysaccharide or virulence proteins of this serotype to provide complete protection.
We recently showed that 37/600 (6.2%) invasive infections with group B Streptococcus (GBS) in Toronto, Ontario, Canada, were caused by serotype IV strains. We report a relatively high level of genetic diversity in 37 invasive strains of this emerging GBS serotype. Multilocus sequence typing identified 6 sequence types (STs) that belonged to 3 clonal complexes. Most isolates were ST-459 (19/37, 51%) and ST-452 (11/37, 30%), but we also identified ST-291, ST-3, ST-196, and a novel ST-682. We detected further diversity by performing whole-genome single-nucleotide polymorphism analysis and found evidence of recombination events contributing to variation in some serotype IV GBS strains. We also evaluated antimicrobial drug resistance and found that ST-459 strains were resistant to clindamycin and erythromycin, whereas strains of other STs were, for the most part, susceptible to these antimicrobial drugs.
PMCID: PMC4378482  PMID: 25811284
bacterial infection; invasive bacterial disease; group B Streptococcus; streptococci; Streptococcus agalactiae; bacteria; serotype IV; multilocus sequence typing; whole-genome sequencing; antimicrobial resistance; population structure; Toronto; Canada
6.  First Complete Genome Sequence of Staphylococcus xylosus, a Meat Starter Culture and a Host to Propagate Staphylococcus aureus Phages 
Genome Announcements  2014;2(4):e00671-14.
Staphylococcus xylosus is a bacterial species used in meat fermentation and a commensal microorganism found on animals. We present the first complete circular genome from this species. The genome is composed of 2,757,557 bp, with a G+C content of 32.9%, and contains 2,514 genes and 79 structural RNAs.
PMCID: PMC4110768  PMID: 25013142
7.  Systems Biology of the Vervet Monkey 
ILAR Journal  2013;54(2):122-143.
Nonhuman primates (NHP) provide crucial biomedical model systems intermediate between rodents and humans. The vervet monkey (also called the African green monkey) is a widely used NHP model that has unique value for genetic and genomic investigations of traits relevant to human diseases. This article describes the phylogeny and population history of the vervet monkey and summarizes the use of both captive and wild vervet monkeys in biomedical research. It also discusses the effort of an international collaboration to develop the vervet monkey as the most comprehensively phenotypically and genomically characterized NHP, a process that will enable the scientific community to employ this model for systems biology investigations.
PMCID: PMC3814400  PMID: 24174437
African green monkey; genetics; genomics; phenomics; simian immunodeficiency virus [SIV]; systems biology; transcriptomics; vervet
8.  The MAPT H1 haplotype is associated with tangle-predominant dementia 
Acta neuropathologica  2012;124(5):693-704.
Tangle-predominant dementia (TPD) patients exhibit cognitive decline that is clinically similar to early to moderate-stage Alzheimer disease (AD), yet autopsy reveals neurofibrillary tangles in the medial temporal lobe composed of the microtubule-associated protein tau without significant amyloid-beta (Aβ)-positive plaques. We performed a series of neuropathological, biochemical and genetic studies using autopsy brain tissue drawn from a cohort of 34 TPD, 50 AD and 56 control subjects to identify molecular and genetic signatures of this entity. Biochemical analysis demonstrates a similar tau protein isoform composition in TPD and AD, which is compatible with previous histological and ultrastructural studies. Further, biochemical analysis fails to uncover elevation of soluble Aβ in TPD frontal cortex and hippocampus compared to control subjects, demonstrating that non-plaque-associated Aβ is not a contributing factor. Unexpectedly, we also observed high levels of secretory amyloid precursor protein α (sAPPα) in the frontal cortex of some TPD patients compared to AD and control subjects, suggesting differences in APP processing. Finally, we tested whether TPD is associated with changes in the tau gene (MAPT). Haplotype analysis demonstrates a strong association between TPD and the MAPT H1 haplotype, a genomic inversion associated with some tauopathies and Parkinson disease (PD), when compared to age-matched control subjects with mild degenerative changes, i.e., successful cerebral aging. Next-generation resequencing of MAPT followed by association analysis shows an association between TPD and two polymorphisms in the MAPT 3′ untranslated region (UTR). These results support the hypothesis that haplotype-specific variation in the MAPT 3′ UTR underlies an Aβ-independent mechanism for neurodegeneration in TPD.
PMCID: PMC3608475  PMID: 22802095
Dementia; Neurofibrillary tangle; Tau; Amyloid; MAPT; 3′ Untranslated region; Aging; Alzheimer’s disease; sAPPα
9.  A non-human primate system for large-scale genetic studies of complex traits 
Human Molecular Genetics  2012;21(15):3307-3316.
Non-human primates provide genetic model systems biologically intermediate between humans and other mammalian model organisms. Populations of Caribbean vervet monkeys (Chlorocebus aethiops sabaeus) are genetically homogeneous and large enough to permit well-powered genetic mapping studies of quantitative traits relevant to human health, including expression quantitative trait loci (eQTL). Previous transcriptome-wide investigation in an extended vervet pedigree identified 29 heritable transcripts for which levels of expression in peripheral blood correlate strongly with expression levels in the brain. Quantitative trait linkage analysis using 261 microsatellite markers identified significant (n = 8) and suggestive (n = 4) linkages for 12 of these transcripts, including both cis- and trans-eQTL. Seven transcripts, located on different chromosomes, showed maximum linkage to markers in a single region of vervet chromosome 9; this observation suggests the possibility of a master trans-regulator locus in this region. For one cis-eQTL (at B3GALTL, beta-1,3-glucosyltransferase), we conducted follow-up single nucleotide polymorphism genotyping and fine-scale association analysis in a sample of unrelated Caribbean vervets, localizing this eQTL to a region of <200 kb. These results suggest the value of pedigree and population samples of the Caribbean vervet for linkage and association mapping studies of quantitative traits. The imminent whole genome sequencing of many of these vervet samples will enhance the power of such investigations by providing a comprehensive catalog of genetic variation.
PMCID: PMC3392106  PMID: 22556363
10.  Reductions in intestinal Clostridiales precede the development of nosocomial Clostridium difficile infection 
Microbiome  2013;1:18.
Antimicrobial use is thought to suppress the intestinal microbiota, thereby impairing colonization resistance and allowing Clostridium difficile to infect the gut. Additional risk factors such as proton-pump inhibitors may also alter the intestinal microbiota and predispose patients to Clostridium difficile infection (CDI). This comparative metagenomic study investigates the relationship between epidemiologic exposures, intestinal bacterial populations and subsequent development of CDI in hospitalized patients. We performed a nested case–control study including 25 CDI cases and 25 matched controls. Fecal specimens collected prior to disease onset were evaluated by 16S rRNA gene amplification and pyrosequencing to determine the composition of the intestinal microbiota during the at-risk period.
The diversity of the intestinal microbiota was significantly reduced prior to an episode of CDI. Sequences corresponding to the phylum Bacteroidetes and to the families Bacteroidaceae and Clostridiales Incertae Sedis XI were depleted in CDI patients compared to controls, whereas sequences corresponding to the family Enterococcaceae were enriched. In multivariable analyses, cephalosporin and fluoroquinolone use, as well as a decrease in the abundance of Clostridiales Incertae Sedis XI were significantly and independently associated with CDI development.
This study shows that a reduction in the abundance of a specific bacterial family - Clostridiales Incertae Sedis XI - is associated with risk of nosocomial CDI and may represent a target for novel strategies to prevent this life-threatening infection.
PMCID: PMC3971611  PMID: 24450844
Intestinal microbiota; Clostridium difficile infection; 16S rRNA gene sequencing; Clostridiales Incertae Sedis XI
11.  Sequencing of the Dutch Elm Disease Fungus Genome Using the Roche/454 GS-FLX Titanium System in a Comparison of Multiple Genomics Core Facilities 
As part of the DNA Sequencing Research Group of the Association of Biomolecular Resource Facilities, we have tested the reproducibility of the Roche/454 GS-FLX Titanium System at five core facilities. Experience with the Roche/454 system ranged from <10 to >340 sequencing runs performed. All participating sites were supplied with an aliquot of a common DNA preparation and were requested to conduct sequencing at a common loading condition. The evaluation of sequencing yield and accuracy metrics was assessed at a single site. The study was conducted using a laboratory strain of the Dutch elm disease fungus Ophiostoma novo-ulmi strain H327, an ascomycete, vegetatively haploid fungus with an estimated genome size of 30–50 Mb. We show that the Titanium System is reproducible, with some variation detected in loading conditions, sequencing yield, and homopolymer length accuracy. We demonstrate that reads shorter than the theoretical minimum length are of lower overall quality and not simply truncated reads. The O. novo-ulmi H327 genome assembly is 31.8 Mb and is comprised of eight chromosome-length linear scaffolds, a circular mitochondrial conti of 66.4 kb, and a putative 4.2-kb linear plasmid. We estimate that the nuclear genome encodes 8613 protein coding genes, and the mitochondrion encodes 15 genes and 26 tRNAs.
PMCID: PMC3526337  PMID: 23542132
massively parallel DNA sequencing; fungal genomics; Ophiostoma novo-ulmi
12.  Sequencing of the Dutch Elm Disease Fungus Genome Using the Roche/454 GS-FLX Titanium System in a Comparison of Multiple Genomics Core Facilities 
As part of the DNA Sequencing Research Group of the Association of Biomolecular Resource Facilities, we have tested the reproducibility of the Roche/454 GS-FLX Titanium System at five core facilities. Experience with the Roche/454 system ranged from <10 to >340 sequencing runs performed. All participating sites were supplied with an aliquot of a common DNA preparation and were requested to conduct sequencing at a common loading condition. The evaluation of sequencing yield and accuracy metrics was assessed at a single site. The study was conducted using a laboratory strain of the Dutch elm disease fungus Ophiostoma novo-ulmi strain H327, an ascomycete, vegetatively haploid fungus with an estimated genome size of 30–50 Mb. We show that the Titanium System is reproducible, with some variation detected in loading conditions, sequencing yield, and homopolymer length accuracy. We demonstrate that reads shorter than the theoretical minimum length are of lower overall quality and not simply truncated reads. The O. novo-ulmi H327 genome assembly is 31.8 Mb and is comprised of eight chromosome-length linear scaffolds, a circular mitochondrial conti of 66.4 kb, and a putative 4.2-kb linear plasmid. We estimate that the nuclear genome encodes 8613 protein coding genes, and the mitochondrion encodes 15 genes and 26 tRNAs.
PMCID: PMC3526337  PMID: 23542132
massively parallel DNA sequencing; fungal genomics; Ophiostoma novo-ulmi
13.  A founder mutation in the PEX6 gene is responsible for increased incidence of Zellweger syndrome in a French Canadian population 
BMC Medical Genetics  2012;13:72.
Zellweger syndrome (ZS) is a peroxisome biogenesis disorder due to mutations in any one of 13 PEX genes. Increased incidence of ZS has been suspected in French-Canadians of the Saguenay-Lac-St-Jean region (SLSJ) of Quebec, but this remains unsolved.
We identified 5 ZS patients from SLSJ diagnosed by peroxisome dysfunction between 1990–2010 and sequenced all coding exons of known PEX genes in one patient using Next Generation Sequencing (NGS) for diagnostic confirmation.
A homozygous mutation (c.802_815del, p.[Val207_Gln294del, Val76_Gln294del]) in PEX6 was identified and then shown in 4 other patients. Parental heterozygosity was confirmed in all. Incidence of ZS was estimated to 1 in 12,191 live births, with a carrier frequency of 1 in 55. In addition, we present data suggesting that this mutation abolishes a SF2/ASF splice enhancer binding site, resulting in the use of two alternative cryptic donor splice sites and predicted to encode an internally deleted in-frame protein.
We report increased incidence of ZS in French-Canadians of SLSJ caused by a PEX6 founder mutation. To our knowledge, this is the highest reported incidence of ZS worldwide. These findings have implications for carrier screening and support the utility of NGS for molecular confirmation of peroxisomal disorders.
PMCID: PMC3483250  PMID: 22894767
Zellweger syndrome; Founder effect; Peroxisome biogenesis disorders; Next generation sequencing
14.  Fourteen-Genome Comparison Identifies DNA Markers for Severe-Disease-Associated Strains of Clostridium difficile▿† 
Journal of Clinical Microbiology  2011;49(6):2230-2238.
Clostridium difficile is a common cause of infectious diarrhea in hospitalized patients. A severe and increased incidence of C. difficile infection (CDI) is associated predominantly with the NAP1 strain; however, the existence of other severe-disease-associated (SDA) strains and the extensive genetic diversity across C. difficile complicate reliable detection and diagnosis. Comparative genome analysis of 14 sequenced genomes, including those of a subset of NAP1 isolates, allowed the assessment of genetic diversity within and between strain types to identify DNA markers that are associated with severe disease. Comparative genome analysis of 14 isolates, including five publicly available strains, revealed that C. difficile has a core genome of 3.4 Mb, comprising ∼3,000 genes. Analysis of the core genome identified candidate DNA markers that were subsequently evaluated using a multistrain panel of 177 isolates, representing more than 50 pulsovars and 8 toxinotypes. A subset of 117 isolates from the panel had associated patient data that allowed assessment of an association between the DNA markers and severe CDI. We identified 20 candidate DNA markers for species-wide detection and 10,683 single nucleotide polymorphisms (SNPs) associated with the predominant SDA strain (NAP1). A species-wide detection candidate marker, the sspA gene, was found to be the same across 177 sequenced isolates and lacked significant similarity to those of other species. Candidate SNPs in genes CD1269 and CD1265 were found to associate more closely with disease severity than currently used diagnostic markers, as they were also present in the toxin A-negative and B-positive (A-B+) strain types. The genetic markers identified illustrate the potential of comparative genomics for the discovery of diagnostic DNA-based targets that are species specific or associated with multiple SDA strains.
PMCID: PMC3122728  PMID: 21508155
15.  Combining Computational Prediction of Cis-Regulatory Elements with a New Enhancer Assay to Efficiently Label Neuronal Structures in the Medaka Fish 
PLoS ONE  2011;6(5):e19747.
The developing vertebrate nervous system contains a remarkable array of neural cells organized into complex, evolutionarily conserved structures. The labeling of living cells in these structures is key for the understanding of brain development and function, yet the generation of stable lines expressing reporter genes in specific spatio-temporal patterns remains a limiting step. In this study we present a fast and reliable pipeline to efficiently generate a set of stable lines expressing a reporter gene in multiple neuronal structures in the developing nervous system in medaka. The pipeline combines both the accurate computational genome-wide prediction of neuronal specific cis-regulatory modules (CRMs) and a newly developed experimental setup to rapidly obtain transgenic lines in a cost-effective and highly reproducible manner. 95% of the CRMs tested in our experimental setup show enhancer activity in various and numerous neuronal structures belonging to all major brain subdivisions. This pipeline represents a significant step towards the dissection of embryonic neuronal development in vertebrates.
PMCID: PMC3103512  PMID: 21637758
16.  Rfx6 Directs Islet Formation and Insulin Production in Mice and Humans 
Nature  2010;463(7282):775-780.
Insulin from the β-cells of the pancreatic islets of Langerhans controls energy homeostasis in vertebrates, and its deficiency causes diabetes mellitus. During embryonic development, the transcription factor Neurogenin3 initiates the differentiation of the β-cells and other islet cell types from pancreatic endoderm, but the genetic program that subsequently completes this differentiation remains incompletely understood. Here we show that the transcription factor Rfx6 directs islet cell differentiation downstream of Neurogenin3. Mice lacking Rfx6 failed to generate any of the normal islet cell types except for pancreatic-polypeptide-producing cells. In human infants with a similar autosomal recessive syndrome of neonatal diabetes, genetic mapping and subsequent sequencing identified mutations in the human RFX6 gene. These studies demonstrate a unique position for Rfx6 in the hierarchy of factors that coordinate pancreatic islet development in both mice and humans. Rfx6 could prove useful in efforts to generate β-cells for patients with diabetes.
PMCID: PMC2896718  PMID: 20148032
17.  Long-range regulation is a major driving force in maintaining genome integrity 
The availability of newly sequenced vertebrate genomes, along with more efficient and accurate alignment algorithms, have enabled the expansion of the field of comparative genomics. Large-scale genome rearrangement events modify the order of genes and non-coding conserved regions on chromosomes. While certain large genomic regions have remained intact over much of vertebrate evolution, others appear to be hotspots for genomic breakpoints. The cause of the non-uniformity of breakpoints that occurred during vertebrate evolution is poorly understood.
We describe a machine learning method to distinguish genomic regions where breakpoints would be expected to have deleterious effects (called breakpoint-refractory regions) from those where they are expected to be neutral (called breakpoint-susceptible regions). Our predictor is trained using breakpoints that took place along the human lineage since amniote divergence. Based on our predictions, refractory and susceptible regions have very distinctive features. Refractory regions are significantly enriched for conserved non-coding elements as well as for genes involved in development, whereas susceptible regions are enriched for housekeeping genes, likely to have simpler transcriptional regulation.
We postulate that long-range transcriptional regulation strongly influences chromosome break fixation. In many regions, the fitness cost of altering the spatial association between long-range regulatory regions and their target genes may be so high that rearrangements are not allowed. Consequently, only a limited, identifiable fraction of the genome is susceptible to genome rearrangements.
PMCID: PMC2741452  PMID: 19682388
18.  DNA sequence of human chromosome 17 and analysis of rearrangement in the human lineage 
Zody, Michael C. | Garber, Manuel | Adams, David J. | Sharpe, Ted | Harrow, Jennifer | Lupski, James R. | Nicholson, Christine | Searle, Steven M. | Wilming, Laurens | Young, Sarah K. | Abouelleil, Amr | Allen, Nicole R. | Bi, Weimin | Bloom, Toby | Borowsky, Mark L. | Bugalter, Boris E. | Butler, Jonathan | Chang, Jean L. | Chen, Chao-Kung | Cook, April | Corum, Benjamin | Cuomo, Christina A. | de Jong, Pieter J. | DeCaprio, David | Dewar, Ken | FitzGerald, Michael | Gilbert, James | Gibson, Richard | Gnerre, Sante | Goldstein, Steven | Grafham, Darren V. | Grocock, Russell | Hafez, Nabil | Hagopian, Daniel S. | Hart, Elizabeth | Norman, Catherine Hosage | Humphray, Sean | Jaffe, David B. | Jones, Matt | Kamal, Michael | Khodiyar, Varsha K. | LaButti, Kurt | Laird, Gavin | Lehoczky, Jessica | Liu, Xiaohong | Lokyitsang, Tashi | Loveland, Jane | Lui, Annie | Macdonald, Pendexter | Major, John E. | Matthews, Lucy | Mauceli, Evan | McCarroll, Steven A. | Mihalev, Atanas H. | Mudge, Jonathan | Nguyen, Cindy | Nicol, Robert | O'Leary, Sinéad B. | Osoegawa, Kazutoyo | Schwartz, David C. | Shaw-Smith, Charles | Stankiewicz, Pawel | Steward, Charles | Swarbreck, David | Venkataraman, Vijay | Whittaker, Charles A. | Yang, Xiaoping | Zimmer, Andrew R. | Bradley, Allan | Hubbard, Tim | Birren, Bruce W. | Rogers, Jane | Lander, Eric S. | Nusbaum, Chad
Nature  2006;440(7087):1045-1049.
Chromosome 17 is unusual among the human chromosomes in many respects. It is the largest human autosome with orthology to only a single mouse chromosome1, mapping entirely to the distal half of mouse chromosome 11. Chromosome 17 is rich in protein-coding genes, having the second highest gene density in the genome2,3. It is also enriched in segmental duplications, ranking third in density among the autosomes4. Here we report a finished sequence for human chromosome 17, as well as a structural comparison with the finished sequence for mouse chromosome 11, the first finished mouse chromosome. Comparison of the orthologous regions reveals striking differences. In contrast to the typical pattern seen in mammalian evolution5,6, the human sequence has undergone extensive intrachromosomal rearrangement, whereas the mouse sequence has been remarkably stable. Moreover, although the human sequence has a high density of segmental duplication, the mouse sequence has a very low density. Notably, these segmental duplications correspond closely to the sites of structural rearrangement, demonstrating a link between duplication and rearrangement. Examination of the main classes of duplicated segments provides insight into the dynamics underlying expansion of chromosome-specific, low-copy repeats in the human genome.
PMCID: PMC2610434  PMID: 16625196
19.  Improved genome assembly and evidence-based global gene model set for the chordate Ciona intestinalis: new insight into intron and operon populations 
Genome Biology  2008;9(10):R152.
An improved assembly of the Ciona intestinalis genome reveals that it contains non-canonical introns and that about 20% of Ciona genes reside in operons.
The draft genome sequence of the ascidian Ciona intestinalis, along with associated gene models, has been a valuable research resource. However, recently accumulated expressed sequence tag (EST)/cDNA data have revealed numerous inconsistencies with the gene models due in part to intrinsic limitations in gene prediction programs and in part to the fragmented nature of the assembly.
We have prepared a less-fragmented assembly on the basis of scaffold-joining guided by paired-end EST and bacterial artificial chromosome (BAC) sequences, and BAC chromosomal in situ hybridization data. The new assembly (115.2 Mb) is similar in length to the initial assembly (116.7 Mb) but contains 1,272 (approximately 50%) fewer scaffolds. The largest scaffold in the new assembly incorporates 95 initial-assembly scaffolds. In conjunction with the new assembly, we have prepared a greatly improved global gene model set strictly correlated with the extensive currently available EST data. The total gene number (15,254) is similar to that of the initial set (15,582), but the new set includes 3,330 models at genomic sites where none were present in the initial set, and 1,779 models that represent fusions of multiple previously incomplete models. In approximately half, 5'-ends were precisely mapped using 5'-full-length ESTs, an important refinement even in otherwise unchanged models.
Using these new resources, we identify a population of non-canonical (non-GT-AG) introns and also find that approximately 20% of Ciona genes reside in operons and that operons contain a high proportion of single-exon genes. Thus, the present dataset provides an opportunity to analyze the Ciona genome much more precisely than ever.
PMCID: PMC2760879  PMID: 18854010

Results 1-19 (19)