Supergenes are tight clusters of loci that facilitate the co-segregation of adaptive variation, providing integrated control of complex adaptive phenotypes1. Polymorphic supergenes, in which specific combinations of traits are maintained within a single population, were first described for ‘pin’ and ‘thrum’ floral types in Primula1 and Fagopyrum2, but classic examples are also found in insect mimicry3–5 and snail morphology6. Understanding the evolutionary mechanisms that generate these co-adapted gene sets, as well as the mode of limiting the production of unfit recombinant forms, remains a substantial challenge7–10. Here we show that individual wing-pattern morphs in the polymorphic mimetic butterfly Heliconius numata are associated with different genomic rearrangements at the supergene locus P. These rearrangements tighten the genetic linkage between at least two colour-pattern loci that are known to recombine in closely related species9–11, with complete suppression of recombination being observed in experimental crosses across a 400-kilobase interval containing at least 18 genes. In natural populations, notable patterns of linkage disequilibrium (LD) are observed across the entire P region. The resulting divergent haplotype clades and inversion breakpoints are found in complete association with wing-pattern morphs. Our results indicate that allelic combinations at known wing-patterning loci have become locked together in a polymorphic rearrangement at the Plocus, forming a supergene that acts as a simple switch between complex adaptive phenotypes found in sympatry. These findings highlight how genomic rearrangements can have a central role in the coexistence of adaptive phenotypes involving several genes acting in concert, by locally limiting recombination and gene flow.
Model organisms are becoming increasingly important for the study of complex diseases such as type 1 diabetes (T1D). The non-obese diabetic (NOD) mouse is an experimental model for T1D having been bred to develop the disease spontaneously in a process that is similar to humans. Genetic analysis of the NOD mouse has identified around 50 disease loci, which have the nomenclature Idd for insulin-dependent diabetes, distributed across at least 11 different chromosomes. In total, 21 Idd regions across 6 chromosomes, that are major contributors to T1D susceptibility or resistance, were selected for finished sequencing and annotation at the Wellcome Trust Sanger Institute. Here we describe the generation of 40.4 mega base-pairs of finished sequence from 289 bacterial artificial chromosomes for the NOD mouse. Manual annotation has identified 738 genes in the diabetes sensitive NOD mouse and 765 genes in homologous regions of the diabetes resistant C57BL/6J reference mouse across 19 candidate Idd regions. This has allowed us to call variation consequences between homologous exonic sequences for all annotated regions in the two mouse strains. We demonstrate the importance of this resource further by illustrating the technical difficulties that regions of inter-strain structural variation between the NOD mouse and the C57BL/6J reference mouse can cause for current next generation sequencing and assembly techniques. Furthermore, we have established that the variation rate in the Idd regions is 2.3 times higher than the mean found for the whole genome assembly for the NOD/ShiLtJ genome, which we suggest reflects the fact that positive selection for functional variation in immune genes is beneficial in regard to host defence. In summary, we provide an important resource, which aids the analysis of potential causative genes involved in T1D susceptibility.
The newly available pig genome sequence has provided new information to fine map quantitative trait loci (QTL) in order to eventually identify causal variants. With targeted genomic sequencing efforts, we were able to obtain high quality BAC sequences that cover a region on pig chromosome 17 where a number of meat quality QTL have been previously discovered. Sequences from 70 BAC clones were assembled to form an 8-Mbp contig. Subsequently, we successfully mapped five previously identified QTL, three for meat color and two for lactate related traits, to the contig. With an additional 25 genetic markers that were identified by sequence comparison, we were able to carry out further linkage disequilibrium analysis to narrow down the genomic locations of these QTL, which allowed identification of the chromosomal regions that likely contain the causative variants. This research has provided one practical approach to combine genetic and molecular information for QTL mining.
meat quality QTL; pig chromosome 17; integrated analysis
Current models of schizophrenia and bipolar disorder implicate multiple genes,
however their biological relationships remain elusive. To test the genetic role
of glutamate receptors and their interacting scaffold proteins, the exons of ten
glutamatergic ‘hub’ genes in 1304 individuals were re-sequenced in
case and control samples. No significant difference in the overall number of
non-synonymous single nucleotide polymorphisms (nsSNPs) was observed between
cases and controls. However, cluster analysis of nsSNPs identified two exons
encoding the cysteine-rich domain and first transmembrane helix of GRM1 as a
risk locus with five mutations highly enriched within these domains. A new
splice variant lacking the transmembrane GPCR domain of GRM1 was discovered in
the human brain and the GRM1 mutation cluster could perturb the regulation of
this variant. The predicted effect on individuals harbouring multiple mutations
distributed in their ten hub genes was also examined. Diseased individuals
possessed an increased load of deleteriousness from multiple concurrent rare and
common coding variants. Together, these data suggest a disease model in which
the interplay of compound genetic coding variants, distributed among glutamate
receptors and their interacting proteins, contribute to the pathogenesis of
schizophrenia and bipolar disorders.
DNA methylation constitutes the most stable type of epigenetic modifications modulating the transcriptional plasticity of mammalian genomes. Using bisulfite DNA sequencing, we report high-resolution methylation reference profiles of human chromosomes 6, 20 and 22, providing a resource of about 1.9 million CpG methylation values derived from 12 different tissues. Analysis of 6 annotation categories, revealed evolutionary conserved regions to be the predominant sites for differential DNA methylation and a core region surrounding the transcriptional start site as informative surrogate for promoter methylation. We find 17% of the 873 analyzed genes differentially methylated in their 5′-untranslated regions (5′-UTR) and about one third of the differentially methylated 5′-UTRs to be inversely correlated with transcription. While our study was controlled for factors reported to affect DNA methylation such as sex and age, we did not find any significant attributable effects. Our data suggest DNA methylation to be ontogenetically more stable than previously thought.
We report a genome-wide association (GWA) study of severe malaria in The Gambia. The initial GWA scan included 2,500 children genotyped on the Affymetrix 500K GeneChip, and a replication study included 3,400 children. We used this to examine the performance of GWA methods in Africa. We found considerable population stratification, and also that signals of association at known malaria resistance loci were greatly attenuated owing to weak linkage disequilibrium (LD). To investigate possible solutions to the problem of low LD, we focused on the HbS locus, sequencing this region of the genome in 62 Gambian individuals and then using these data to conduct multipoint imputation in the GWA samples. This increased the signal of association, from P = 4 × 10−7 to P = 4 × 10−14, with the peak of the signal located precisely at the HbS causal variant. Our findings provide proof of principle that fine-resolution multipoint imputation, based on population-specific sequencing data, can substantially boost authentic GWA signals and enable fine mapping of causal variants in African populations.
The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing.
Assemblies of the BAC clone derived genome sequence have been annotated using the Pre-Ensembl and Ensembl automated pipelines and made accessible through the Pre-Ensembl/Ensembl browsers. The current annotated genome assembly (Sscrofa9) was released with Ensembl 56 in September 2009. A revised assembly (Sscrofa10) is under construction and will incorporate whole genome shotgun sequence (WGS) data providing > 30× genome coverage. The WGS sequence, most of which comprise short Illumina/Solexa reads, were generated from DNA from the same single Duroc sow as the source of the BAC library from which clones were preferentially selected for sequencing. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement the data have been released into public sequence repositories (Genbank/EMBL, NCBI/Ensembl trace repositories) in a timely manner and in advance of publication.
In this marker paper, the Swine Genome Sequencing Consortium (SGSC) sets outs its plans for analysis of the pig genome sequence, for the application and publication of the results.
The cyclin dependent kinase (CDK) inhibitors p15, p16, p21 and p27 are frequently deleted, silenced or downregulated in many malignancies. Inactivation of CDK inhibitors predisposes mice to tumor development demonstrating that these genes can act as tumor suppressors. Here we describe high-throughput murine leukemia virus (MuLV) insertional mutagenesis screens in mice deficient for one or a combination of two CDK inhibitors. We retrieved 9117 retroviral insertions from 476 lymphomas and find hundreds of loci that are mutated significantly more frequently than expected by chance. Many of these are skewed toward a specific genetic context of predisposing germline and somatic mutations. We also find associations between these loci and gender, age of tumor onset and with lymphocyte lineage (B or T cell). Comparison of retroviral insertion sites with SNPs associated with chronic lymphocytic leukemia (CLL) reveals significant overlap between these datasets. Together these data highlight the importance of genetic context within large-scale mutation detection studies and demonstrate a novel use for insertional mutagenesis data in prioritization of disease associated genes resulting from genome-wide association studies.
CDK inhibitors; insertional mutagenesis; lymphoma; CLL; Down syndrome
Autoimmune diseases are thought to result from imbalances in normal immune physiology and regulation. Here, we show that autoimmune disease susceptibility and resistance alleles on mouse chromosome 3 (Idd3) correlate with differential expression of the key immunoregulatory cytokine interleukin-2 (IL-2). In order to test directly that an approximately two-fold reduction in IL-2 underpins the Idd3-linked destabilization of immune homeostasis, we demonstrate that engineered haplodeficiency of IL-2 gene expression not only reduces T cell IL-2 production by two-fold but also mimics the autoimmune dysregulatory effects of the naturally-occurring susceptibility alleles of IL-2. Reduced IL-2 production achieved by both genetic mechanisms correlates with fewer and less functional CD4+CD25+ regulatory T cells, which are critical for maintaining immune homeostasis.
Non-obese diabetic (NOD) mice spontaneously develop type 1 diabetes (T1D) due to the progressive loss of insulin-secreting β-cells by an autoimmune driven process. NOD mice represent a valuable tool for studying the genetics of T1D and for evaluating therapeutic interventions. Here we describe the development and characterization by end-sequencing of bacterial artificial chromosome (BAC) libraries derived from NOD/MrkTac (DIL NOD) and NOD/ShiLtJ (CHORI-29), two commonly used NOD substrains. The DIL NOD library is composed of 196,032 BACs and the CHORI-29 library is composed of 110,976 BACs. The average depth of genome coverage of the DIL NOD library, estimated from mapping the BAC end-sequences to the reference mouse genome sequence, was 7.1-fold across the autosomes and 6.6-fold across the X chromosome. Clones from this library have an average insert size of 150 kb and map to over 95.6% of the reference mouse genome assembly (NCBIm37), covering 98.8% of Ensembl mouse genes. By the same metric, the CHORI-29 library has an average depth over the autosomes of 5.0-fold and 2.8-fold coverage of the X chromosome, the reduced X chromosome coverage being due to the use of a male donor for this library. Clones from this library have an average insert size of 205 kb and map to 93.9% of the reference mouse genome assembly, covering 95.7% of Ensembl genes. We have identified and validated 191,841 single nucleotide polymorphisms (SNPs) for DIL NOD and 114,380 SNPs for CHORI-29. In total we generated 229,736,133 bp of sequence for the DIL NOD and 121,963,211 bp for the CHORI-29. These BAC libraries represent a powerful resource for functional studies, such as gene targeting in NOD embryonic stem (ES) cell lines, and for sequencing and mapping experiments.
Bacterial artificial chromosome; NOD/MrkTac; NOD/ShiLtJ; Mouse genome; Non-obese diabetic (NOD); Type 1 diabetes; T1D; Insulin-dependent diabetes; IDD
An accurate and precisely annotated genome assembly is a fundamental requirement for functional genomic analysis. Here, the complete DNA sequence and gene annotation of mouse Chromosome 11 was used to test the efficacy of large-scale sequencing for mutation identification. We re-sequenced the 14,000 annotated exons and boundaries from over 900 genes in 41 recessive mutant mouse lines that were isolated in an N-ethyl-N-nitrosourea (ENU) mutation screen targeted to mouse Chromosome 11. Fifty-nine sequence variants were identified in 55 genes from 31 mutant lines. 39% of the lesions lie in coding sequences and create primarily missense mutations. The other 61% lie in noncoding regions, many of them in highly conserved sequences. A lesion in the perinatal lethal line l11Jus13 alters a consensus splice site of nucleoredoxin (Nxn), inserting 10 amino acids into the resulting protein. We conclude that point mutations can be accurately and sensitively recovered by large-scale sequencing, and that conserved noncoding regions should be included for disease mutation identification. Only seven of the candidate genes we report have been previously targeted by mutation in mice or rats, showing that despite ongoing efforts to functionally annotate genes in the mammalian genome, an enormous gap remains between phenotype and function. Our data show that the classical positional mapping approach of disease mutation identification can be extended to large target regions using high-throughput sequencing.
Here we show that tiny DNA lesions can be found in huge amounts of DNA sequence data, similar to finding a needle in a haystack. These lesions identify many new candidates for disease genes associated with birth defects, infertility, and growth. Further, our data suggest that we know very little about what mammalian genes do. Sequencing methods are becoming cheaper and faster. Therefore, our strategy, shown here for the first time, will become commonplace.
Gibbon species have accumulated an unusually high number of chromosomal changes since diverging from the common hominoid ancestor 15–18 million years ago. The cause of this increased rate of chromosomal rearrangements is not known, nor is it known if genome architecture has a role. To address this question, we analyzed sequences spanning 57 breaks of synteny between northern white-cheeked gibbons (Nomascus l. leucogenys) and humans. We find that the breakpoint regions are enriched in segmental duplications and repeats, with Alu elements being the most abundant. Alus located near the gibbon breakpoints (<150 bp) have a higher CpG content than other Alus. Bisulphite allelic sequencing reveals that these gibbon Alus have a lower average density of methylated cytosine that their human orthologues. The finding of higher CpG content and lower average CpG methylation suggests that the gibbon Alu elements are epigenetically distinct from their human orthologues. The association between undermethylation and chromosomal rearrangement in gibbons suggests a correlation between epigenetic state and structural genome variation in evolution.
Mammalian genomes are remarkably stable (with few exceptions). In humans, wrong recombination events occur quite rarely, manifesting themselves in genomic disorders or cancer. On exceptional occasions, the rate of genome evolution has been accelerated by genome-wide reshuffling events giving rise to some highly derivative karyotypes. The genomes of gibbon species (Hylobatidae) are an example of accelerated genome structural evolution; gibbons display a rate of chromosome evolution 10–20 fold higher than the default rate found in mammals (one chromosome change every 4 million years). As we are interested in investigating the possible genetic causes of this phenomenon, we sequenced a considerable number of chromosomal breakpoints in the northern white-cheeked gibbon genome and analyzed the genomic features of these sites. We observe that the gibbon breakpoints are mostly associated with endogenous retrotransposons called Alus, which are normally abundant in the genomes of primates. Furthermore, our analysis revealed that gibbon Alus have a lower content of methylated CpG when compared to the orthologous human Alus. In mammals, CpG methylation is known to be responsible for keeping retrotransposons in a repressed state and protect genome integrity. We therefore suggest that a glitch in the methylation apparatus might have driven the higher genome recombination in gibbons.
Since Haldane first noticed an excess of paternally derived mutations, it has
been considered that most mutations derive from errors during germ line
replication. Miyata et al. (1987) proposed that differences in the rate of
neutral evolution on X, Y, and autosome can be employed to measure the extent of
this male bias. This commonly applied method assumes replication to be the sole
source of between-chromosome variation in substitution rates. We propose a
simple test of this assumption: If true, estimates of the male bias should be
independent of which two chromosomal classes are compared. Prior evidence from
rodents suggested that this might not be true, but conclusions were limited by a
lack of rat Y-linked sequence. We therefore sequenced two rat Y-linked bacterial
artificial chromosomes and determined evolutionary rate by comparison with
mouse. For estimation of rates we consider both introns and synonymous rates.
Surprisingly, for both data sets the prediction of congruent estimates of
α is strongly rejected. Indeed, some comparisons suggest a female bias
with autosomes evolving faster than Y-linked sequence. We conclude that the
method of Miyata et al. (1987) has the potential to provide incorrect estimates.
Correcting the method requires understanding of the other causes of substitution
that might differ between chromosomal classes. One possible cause is
recombination-associated substitution bias for which we find some evidence. We
note that if, as some suggest, this association is dominantly owing to male
recombination, the high estimates of α seen in birds is to be expected
as Z chromosomes recombine in males.
male-mutation bias; male-driven evolution; mutation; recombination; introns; rodents
The centromeric and telomeric heterochromatin of eukaryotic chromosomes is mainly composed of middle-repetitive elements, such as transposable elements and tandemly repeated DNA sequences. Because of this repetitive nature, Whole Genome Shotgun Projects have failed in sequencing these regions. We describe a novel kind of transposon-based approach for sequencing highly repetitive DNA sequences in BAC clones. The key to this strategy relies on physical mapping the precise position of the transposon insertion, which enables the correct assembly of the repeated DNA. We have applied this strategy to a clone from the centromeric region of the Y chromosome of Drosophila melanogaster. The analysis of the complete sequence of this clone has allowed us to prove that this centromeric region evolved from a telomere, possibly after a pericentric inversion of an ancestral telocentric chromosome. Our results confirm that the use of transposon-mediated sequencing, including positional mapping information, improves current finishing strategies. The strategy we describe could be a universal approach to resolving the heterochromatic regions of eukaryotic genomes.
A comprehensive, domain-wide comparative analysis of genomic imprinting between mammals that imprint and those that do not can provide valuable information about how and why imprinting evolved. The imprinting status, DNA methylation, and genomic landscape of the Dlk1-Dio3 cluster were determined in eutherian, metatherian, and prototherian mammals including tammar wallaby and platypus. Imprinting across the whole domain evolved after the divergence of eutherian from marsupial mammals and in eutherians is under strong purifying selection. The marsupial locus at 1.6 megabases, is double that of eutherians due to the accumulation of LINE repeats. Comparative sequence analysis of the domain in seven vertebrates determined evolutionary conserved regions common to particular sub-groups and to all vertebrates. The emergence of Dlk1-Dio3 imprinting in eutherians has occurred on the maternally inherited chromosome and is associated with region-specific resistance to expansion by repetitive elements and the local introduction of noncoding transcripts including microRNAs and C/D small nucleolar RNAs. A recent mammal-specific retrotransposition event led to the formation of a completely new gene only in the eutherian domain, which may have driven imprinting at the cluster.
Mammals have two copies of each gene in their somatic cells, and most of these gene pairs are regulated and expressed simultaneously. A fraction of mammalian genes, however, is subject to imprinting—a chemical modification that marks a gene according to its parental origin, so that one parent's copy is expressed while the other parent's copy is silenced. How and why this process evolved is the subject of much speculation. Here we have shown that all the genes in one genomic region, Dlk1-Dio3, which are imprinted in placental mammals such as mouse and human, are not imprinted in marsupial (wallaby) or monotreme (platypus) mammals. This is in contrast to a small number of other imprinted genes that are imprinted in marsupials and other therian mammals and indicates that imprinting arose at each genomic domain at different stages of mammalian evolution. We have compared the sequence of the Dlk1-Dio3 region between seven vertebrate species and identified sequences that are differentially represented in mammals that imprint compared to those that do not. Our data indicate that once imprinted gene regulation is acquired in a domain, it becomes evolutionarily constrained to remain unchanged.
A comparative analysis of genomic imprinting between mammals that imprint and those that don't has provided insights into how and why imprinting evolved.
p53 and p19ARF are tumor suppressors frequently mutated in human tumors. In a high-throughput screen in mice for mutations collaborating with either p53 or p19ARF deficiency, we identified 10,806 retroviral insertion sites, implicating over 300 loci in tumorigenesis. This dataset reveals 20 genes that are specifically mutated in either p19ARF-deficient, p53-deficient or wild-type mice (including Flt3, mmu-mir-106a-363, Smg6, and Ccnd3), as well as networks of significant collaborative and mutually exclusive interactions between cancer genes. Furthermore, we found candidate tumor suppressor genes, as well as distinct clusters of insertions within genes like Flt3 and Notch1 that induce mutants with different spectra of genetic interactions. Cross species comparative analysis with aCGH data of human cancer cell lines revealed known and candidate oncogenes (Mmp13, Slamf6, and Rreb1) and tumor suppressors (Wwox and Arfrp2). This dataset should prove to be a rich resource for the study of genetic interactions that underlie tumorigenesis.
SYSBIO; SIGNALING; HUMDISEASE
A combination of approaches was used to close 8 of the 11 gaps in the original sequence of human chromosome 22, and to generate a total 1.018 Mb of new sequence.
Although the human genome sequence was declared complete in 2004, the sequence was interrupted by 341 gaps of which 308 lay in an estimated approximately 28 Mb of euchromatin. While these gaps constitute only approximately 1% of the sequence, knowledge of the full complement of human genes and regulatory elements is incomplete without their sequences.
We have used a combination of conventional chromosome walking (aided by the availability of end sequences) in fosmid and bacterial artificial chromosome (BAC) libraries, whole chromosome shotgun sequencing, comparative genome analysis and long PCR to finish 8 of the 11 gaps in the initial chromosome 22 sequence. In addition, we have patched four regions of the initial sequence where the original clones were found to be deleted, or contained a deletion allele of a known gene, with a further 126 kb of new sequence. Over 1.018 Mb of new sequence has been generated to extend into and close the gaps, and we have annotated 16 new or extended gene structures and one pseudogene.
Thus, we have made significant progress to completing the sequence of the euchromatic regions of human chromosome 22 using a combination of detailed approaches. Our experience suggests that substantial work remains to close the outstanding gaps in the human genome sequence.
The human major histocompatibility complex (MHC) is contained within about 4 Mb on the short arm of chromosome 6 and is recognised as the most variable region in the human genome. The primary aim of the MHC Haplotype Project was to provide a comprehensively annotated reference sequence of a single, human leukocyte antigen-homozygous MHC haplotype and to use it as a basis against which variations could be assessed from seven other similarly homozygous cell lines, representative of the most common MHC haplotypes in the European population. Comparison of the haplotype sequences, including four haplotypes not previously analysed, resulted in the identification of >44,000 variations, both substitutions and indels (insertions and deletions), which have been submitted to the dbSNP database. The gene annotation uncovered haplotype-specific differences and confirmed the presence of more than 300 loci, including over 160 protein-coding genes. Combined analysis of the variation and annotation datasets revealed 122 gene loci with coding substitutions of which 97 were non-synonymous. The haplotype (A3-B7-DR15; PGF cell line) designated as the new MHC reference sequence, has been incorporated into the human genome assembly (NCBI35 and subsequent builds), and constitutes the largest single-haplotype sequence of the human genome to date. The extensive variation and annotation data derived from the analysis of seven further haplotypes have been made publicly available and provide a framework and resource for future association studies of all MHC-associated diseases and transplant medicine.
Major histocompatibility complex; Haplotype; Polymorphism; Retroelement; Genetic predisposition to disease; Population genetics
CpG islands (CGIs) are dense clusters of CpG sequences that punctuate the CpG-deficient human genome and associate with many gene promoters. As CGIs also differ from bulk chromosomal DNA by their frequent lack of cytosine methylation, we devised a CGI enrichment method based on nonmethylated CpG affinity chromatography. The resulting library was sequenced to define a novel human blood CGI set that includes many that are not detected by current algorithms. Approximately half of CGIs were associated with annotated gene transcription start sites, the remainder being intra- or intergenic. Using an array representing over 17,000 CGIs, we established that 6%–8% of CGIs are methylated in genomic DNA of human blood, brain, muscle, and spleen. Inter- and intragenic CGIs are preferentially susceptible to methylation. CGIs showing tissue-specific methylation were overrepresented at numerous genetic loci that are essential for development, including HOX and PAX family members. The findings enable a comprehensive analysis of the roles played by CGI methylation in normal and diseased human tissues.
The human genome contains about 22,000 genes, each encoding one of the proteins required for human life. A particular cell type (e.g., blood, skin, etc.) expresses a specific subset of protein genes and silences the remainder. To shed light on the mechanisms that cause genes to be activated or shut down, we studied DNA sequences called “CpG islands” (CGIs). These sequences are found at over half of all human genes and can exist in either the active or silent state depending on the presence or absence of methyl groups on the DNA. We devised a method for purifying all CGIs and showed that, unexpectedly, only half occur at the beginning of genes near the promoter, the rest occurring within or between genes. Notably, methylation of CGIs causes stable gene silencing. We tested 17,000 CGIs in four human tissues and found that 6%–8% were methylated in each. Genes whose protein products play an essential role during embryonic development were preferentially methylated, suggesting that gene expression during development could be regulated by CGI methylation.
CpG island methylation, an epigenetic phenomenon usually associated with abnormality in disease, is little characterised in the context of "normal" human cells. Here we highlight tissue-specific CpG Island methylation, which frequently associates with developmental genes.
The sequencing, annotation and comparative analysis of an 8Mb region of pig chromosome 17 allows the coverage and quality of the pig genome sequencing project to be assessed
We describe here the sequencing, annotation and comparative analysis of an 8 Mb region of pig chromosome 17, which provides a useful test region to assess coverage and quality for the pig genome sequencing project. We report our findings comparing the annotation of draft sequence assembled at different depths of coverage.
Within this region we annotated 71 loci, of which 53 are orthologous to human known coding genes. When compared to the syntenic regions in human (20q13.13-q13.33) and mouse (chromosome 2, 167.5 Mb-178.3 Mb), this region was found to be highly conserved with respect to gene order. The most notable difference between the three species is the presence of a large expansion of zinc finger coding genes and pseudogenes on mouse chromosome 2 between Edn3 and Phactr3 that is absent from pig and human. All of our annotation has been made publicly available in the Vertebrate Genome Annotation browser, VEGA. We assessed the impact of coverage on sequence assembly across this region and found, as expected, that increased sequence depth resulted in fewer, longer contigs. One-third of our annotated loci could not be fully re-aligned back to the low coverage version of the sequence, principally because the transcripts are fragmented over several contigs.
We have demonstrated the considerable advantages of sequencing at increased read depths and discuss the implications that lower coverage sequence may have on subsequent comparative and functional studies, particularly those involving complex loci such as GNAS.
A new physical map of the bovine genome has been constructed by integrating data from genetic and radiation hybrid maps, and a new bovine BAC map, with the bovine genome draft assembly.
The domestic pig is being increasingly exploited as a system for modeling human disease. It also has substantial economic importance for meat-based protein production. Physical clone maps have underpinned large-scale genomic sequencing and enabled focused cloning efforts for many genomes. Comparative genetic maps indicate that there is more structural similarity between pig and human than, for example, mouse and human, and we have used this close relationship between human and pig as a way of facilitating map construction.
Here we report the construction of the most highly continuous bacterial artificial chromosome (BAC) map of any mammalian genome, for the pig (Sus scrofa domestica) genome. The map provides a template for the generation and assembly of high-quality anchored sequence across the genome. The physical map integrates previous landmark maps with restriction fingerprints and BAC end sequences from over 260,000 BACs derived from 4 BAC libraries and takes advantage of alignments to the human genome to improve the continuity and local ordering of the clone contigs. We estimate that over 98% of the euchromatin of the 18 pig autosomes and the X chromosome along with localized coverage on Y is represented in 172 contigs, with chromosome 13 (218 Mb) represented by a single contig. The map is accessible through pre-Ensembl, where links to marker and sequence data can be found.
The map will enable immediate electronic positional cloning of genes, benefiting the pig research community and further facilitating use of the pig as an alternative animal model for human disease. The clone map and BAC end sequence data can also help to support the assembly of maps and genome sequences of other artiodactyls.
The zebrafish (Danio rerio) is an important vertebrate model organism system for biomedical research. The syntenic conservation between the zebrafish and human genome allows one to investigate the function of human genes using the zebrafish model. To facilitate analysis of the zebrafish genome, genetic maps have been constructed and sequence annotation of a reference zebrafish genome is ongoing. However, the duplicative nature of teleost genomes, including the zebrafish, complicates accurate assembly and annotation of a representative genome sequence. Cytogenetic approaches provide "anchors" that can be integrated with accumulating genomic data.
Here, we cytogenetically define the zebrafish genome by first estimating the size of each linkage group (LG) chromosome using flow cytometry, followed by the cytogenetic mapping of 575 bacterial artificial chromosome (BAC) clones onto metaphase chromosomes. Of the 575 BAC clones, 544 clones localized to apparently unique chromosomal locations. 93.8% of these clones were assigned to a specific LG chromosome location using fluorescence in situ hybridization (FISH) and compared to the LG chromosome assignment reported in the zebrafish genome databases. Thirty-one BAC clones localized to multiple chromosomal locations in several different hybridization patterns. From these data, a refined second generation probe panel for each LG chromosome was also constructed.
The chromosomal mapping of the 575 large-insert DNA clones allows for these clones to be integrated into existing zebrafish mapping data. An accurately annotated zebrafish reference genome serves as a valuable resource for investigating the molecular basis of human diseases using zebrafish mutant models.
In an effort to locate susceptibility genes for type 1 diabetes (T1D) several genome-wide linkage scans have been undertaken. A chromosomal region designated IDDM10 retained genome-wide significance in a combined analysis of the main linkage scans. Here, we studied sequence polymorphisms in 23 Mb on chromosome 10p12-q11, including the putative IDDM10 region, to identify genes associated with T1D.
Initially, we resequenced the functional candidate genes, CREM and SDF1, located in this region, genotyped 13 tag single nucleotide polymorphisms (SNPs) and found no association with T1D. We then undertook analysis of the whole 23 Mb region. We constructed and sequenced a contig tile path from two bacterial artificial clone libraries. By comparison with a clone library from an unrelated person used in the Human Genome Project, we identified 12,058 SNPs. We genotyped 303 SNPs and 25 polymorphic microsatellite markers in 765 multiplex T1D families and followed up 22 associated polymorphisms in up to 2,857 families. We found nominal evidence of association in six loci (P = 0.05 – 0.0026), located near the PAPD1 gene. Therefore, we resequenced 38.8 kb in this region, found 147 SNPs and genotyped 84 of them in the T1D families. We also tested 13 polymorphisms in the PAPD1 gene and in five other loci in 1,612 T1D patients and 1,828 controls from the UK. Overall, only the D10S193 microsatellite marker located 28 kb downstream of PAPD1 showed nominal evidence of association in both T1D families and in the case-control sample (P = 0.037 and 0.03, respectively).
We conclude that polymorphisms in the CREM and SDF1 genes have no major effect on T1D. The weak T1D association that we detected in the association scan near the PAPD1 gene may be either false or due to a small genuine effect, and cannot explain linkage at the IDDM10 region.