The male-specific region of the human Y chromosome (MSY) contains eight large inverted repeats (palindromes), in which high-sequence similarity between repeat arms is maintained by gene conversion. These palindromes also harbor microsatellites, considered to evolve via a stepwise mutation model (SMM). Here, we ask whether gene conversion between palindrome microsatellites contributes to their mutational dynamics. First, we study the duplicated tetranucleotide microsatellite DYS385a,b lying in palindrome P4. We show, by comparing observed data with simulated data under a SMM within haplogroups, that observed heteroallelic combinations in which the modal repeat number difference between copies was large, can give rise to homoallelic combinations with zero-repeats difference, equivalent to many single-step mutations. These are unlikely to be generated under a strict SMM, suggesting the action of gene conversion. Second, we show that the intercopy repeat number difference for a large set of duplicated microsatellites in all palindromes in the MSY reference sequence is significantly reduced compared with that for nonpalindrome-duplicated microsatellites, suggesting that the former are characterized by unusual evolutionary dynamics. These observations indicate that gene conversion violates the SMM for microsatellites in palindromes, homogenizing copies within individual Y chromosomes, but increasing overall haplotype diversity among chromosomes within related groups.
Y chromosome; gene conversion; palindrome; microsatellite; stepwise mutation model; DYS385
The greater Himalayan region demarcates two of the most prominent linguistic phyla in Asia: Tibeto-Burman and Indo-European. Previous genetic surveys, mainly using Y-chromosome polymorphisms and/or mitochondrial DNA polymorphisms suggested a substantially reduced geneflow between populations belonging to these two phyla. These studies, however, have mainly focussed on populations residing far to the north and/or south of this mountain range, and have not been able to study geneflow patterns within the greater Himalayan region itself. We now report a detailed, linguistically informed, genetic survey of Tibeto-Burman and Indo-European speakers from the Himalayan countries Nepal and Bhutan based on autosomal microsatellite markers and compare these populations with surrounding regions. The genetic differentiation between populations within the Himalayas seems to be much higher than between populations in the neighbouring countries. We also observe a remarkable genetic differentiation between the Tibeto-Burman speaking populations on the one hand and Indo-European speaking populations on the other, suggesting that language and geography have played an equally large role in defining the genetic composition of present-day populations within the Himalayas.
Ensembl (http://www.ensembl.org) creates tools and data resources to facilitate genomic analysis in chordate species with an emphasis on human, major vertebrate model organisms and farm animals. Over the past year we have increased the number of species that we support to 77 and expanded our genome browser with a new scrollable overview and improved variation and phenotype views. We also report updates to our core datasets and improvements to our gene homology relationships from the addition of new species. Our REST service has been extended with additional support for comparative genomics and ontology information. Finally, we provide updated information about our methods for data access and resources for user training.
The domestic pig is known as an excellent model for human immunology and the two species share many pathogens. Susceptibility to infectious disease is one of the major constraints on swine performance, yet the structure and function of genes comprising the pig immunome are not well-characterized. The completion of the pig genome provides the opportunity to annotate the pig immunome, and compare and contrast pig and human immune systems.
The Immune Response Annotation Group (IRAG) used computational curation and manual annotation of the swine genome assembly 10.2 (Sscrofa10.2) to refine the currently available automated annotation of 1,369 immunity-related genes through sequence-based comparison to genes in other species. Within these genes, we annotated 3,472 transcripts. Annotation provided evidence for gene expansions in several immune response families, and identified artiodactyl-specific expansions in the cathelicidin and type 1 Interferon families. We found gene duplications for 18 genes, including 13 immune response genes and five non-immune response genes discovered in the annotation process. Manual annotation provided evidence for many new alternative splice variants and 8 gene duplications. Over 1,100 transcripts without porcine sequence evidence were detected using cross-species annotation. We used a functional approach to discover and accurately annotate porcine immune response genes. A co-expression clustering analysis of transcriptomic data from selected experimental infections or immune stimulations of blood, macrophages or lymph nodes identified a large cluster of genes that exhibited a correlated positive response upon infection across multiple pathogens or immune stimuli. Interestingly, this gene cluster (cluster 4) is enriched for known general human immune response genes, yet contains many un-annotated porcine genes. A phylogenetic analysis of the encoded proteins of cluster 4 genes showed that 15% exhibited an accelerated evolution as compared to 4.1% across the entire genome.
This extensive annotation dramatically extends the genome-based knowledge of the molecular genetics and structure of a major portion of the porcine immunome. Our complementary functional approach using co-expression during immune response has provided new putative immune response annotation for over 500 porcine genes. Our phylogenetic analysis of this core immunome cluster confirms rapid evolutionary change in this set of genes, and that, as in other species, such genes are important components of the pig’s adaptation to pathogen challenge over evolutionary time. These comprehensive and integrated analyses increase the value of the porcine genome sequence and provide important tools for global analyses and data-mining of the porcine immune response.
Immune response; Porcine; Genome annotation; Co-expression network; Phylogenetic analysis; Accelerated evolution
The Ensembl project (http://www.ensembl.org) provides genome information for sequenced chordate genomes with a particular focus on human, mouse, zebrafish and rat. Our resources include evidenced-based gene sets for all supported species; large-scale whole genome multiple species alignments across vertebrates and clade-specific alignments for eutherian mammals, primates, birds and fish; variation data resources for 17 species and regulation annotations based on ENCODE and other data sets. Ensembl data are accessible through the genome browser at http://www.ensembl.org and through other tools and programmatic interfaces.
Genome sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2,951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease-causing variants, as well as common LoF variants in non-essential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes, and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.
The Ensembl project (http://www.ensembl.org) provides genome resources for chordate genomes with a particular focus on human genome data as well as data for key model organisms such as mouse, rat and zebrafish. Five additional species were added in the last year including gibbon (Nomascus leucogenys) and Tasmanian devil (Sarcophilus harrisii) bringing the total number of supported species to 61 as of Ensembl release 64 (September 2011). Of these, 55 species appear on the main Ensembl website and six species are provided on the Ensembl preview site (Pre!Ensembl; http://pre.ensembl.org) with preliminary support. The past year has also seen improvements across the project.
We have analysed Y-chromosomal data from Indian caste, Indian tribal and East Asian populations in order to investigate the impact of the caste system on male genetic variation. We find that variation within populations is lower in India than in East Asia, while variation between populations is overall higher. This observation can be explained by greater subdivision within the Indian population, leading to more genetic drift. However, the effect is most marked in the tribal populations, and the level of variation between caste populations is similar to the level between Chinese populations. The caste system has therefore had a detectable impact on Y-chromosomal variation, but this has been less strong than the influence of the tribal system, perhaps because of larger population sizes in the castes, more gene flow or a shorter period of time.
Y chromosome; genetic variation; Indian caste system; endogamy; population substructure
The Y-STR DYS19 is firmly established in the repertoire of Y-chromosomal markers used in forensic analysis yet is poorly understood at the molecular level, lying in a complex genomic environment and exhibiting null alleles, as well as duplications and occasional triplications in population samples. Here, we analyse three null alleles and 51 duplications and show that DYS19 can also be involved in inversion events, so that even its location within the short arm of the Y chromosome is uncertain. Deletion mapping in the three chromosomes carrying null alleles shows that their deletions are less than ~300 kb in size. Haplotypic analysis with binary markers shows that they belong to three different haplogroups and so represent independent events. In contrast, a collection of 51 DYS19 duplication chromosomes belong to only four haplogroups: two are singletons and may represent somatic mutation in lymphoblastoid cell lines, but two, in haplogroups G and C3c, represent founder lineages that have spread widely in Central Europe/West Asia and East Asia, respectively. Consideration of candidate mechanisms underlying both deletions and duplications provides no evidence for the involvement of non-allelic homologous recombination, and they are likely to represent sporadic events with low mutation rates. Understanding the basis and population distribution of these DYS19 alleles will aid in the utilisation and interpretation of profiles that contain them.
Y chromosome; Y-STR; DYS19; Duplication; Deletion; Inversion
The human Y chromosome shows frequent structural variants, some of which are selectively neutral, while others cause impaired fertility due to the loss of spermatogenic genes. The large-scale use of multiple Y-chromosomal microsatellites in forensic and population genetic studies can reveal such variants, through the absence or duplication of specific markers in haplotypes. We describe Y chromosomes in apparently normal males carrying null and duplicated alleles at the microsatellite DYS448, which lies in the proximal part of the azoospermia factor c (AZFc) region, important in spermatogenesis, and made up of “ampliconic” repeats that act as substrates for nonallelic homologous recombination (NAHR). Physical mapping in 26 DYS448 deletion chromosomes reveals that only three cases belong to a previously described class, representing independent occurrences of an~1.5-Mb deletion mediated by recombination between the b1 and b3 repeat units. The remainder belong to five novel classes; none appears to be mediated through homologous recombination, and all remove some genes, but are likely to be compatible with normal fertility. A combination of deletion analysis with binary-marker and microsatellite haplotyping shows that the 26 deletions represent nine independent events. Nine DYS448 duplication chromosomes can be explained by four independent events. Some lineages have risen to high frequency in particular populations, in particular a deletion within haplogroup (hg) C*(xC3a,C3c) found in 18 Asian males. The nonrandom phylogenetic distribution of duplication and deletion events suggests possible structural predisposition to such mutations in hgs C and G. Hum Mutat 29(10), 1171–1180, 2008.
Y chromosome; AZFc; microsatellite; deletion; duplication
Malaria is perhaps the most important parasitic infection and strongest known force for selection in the recent evolutionary history of the human genome. Genetically-determined resistance to malaria has been well-documented in some populations, mainly from Africa. The disease is also endemic in South Asia, the world’s second most populous region, where resistance to malaria has also been observed, for example in Nepal. The biological basis of this resistance, however, remains unclear. We have therefore investigated whether known African resistance alleles also confer resistance in Asia. We typed seven single nucleotide polymorphisms (SNPs) from the genes HBB, FY, G6PD, TNFSF5, TNF, NOS2 and FCGR2A in 928 healthy individuals from Nepal. Five loci were found to be fixed for the non-resistant allele (HBB, FY, G6PD, TNFSF5 and NOS2). The remaining two (rs1800629 and rs1801274) showed the presence of the resistant allele at a frequency of 93% and 27% in TNF and FCGR2A, respectively. However, the frequencies of these alleles did not differ significantly between highland (susceptible) and lowland (resistant) populations. The observed differences in allele and genotype frequencies in Nepalese populations therefore seem to reflect demographic processes or other selective forces in the Himalayan region, rather than malaria selection pressure actin on these alleles.
Malaria; Himalayas; Nepal; single nucleotide polymorphisms; selection; resistance
Arab forces conquered the Indus Delta region in 711 A.D. and, although a Muslim state was established there, their influence was barely felt in the rest of South Asia at that time. By the end of the tenth century, Central Asian Muslims moved into India from the northwest and expanded throughout the subcontinent. Muslim communities are now the largest minority religion in India, comprising more than 138 million people in a predominantly Hindu population of over one billion. It is unclear whether the Muslim expansion in India was a purely cultural phenomenon or had a genetic impact on the local population. To address this question from a male perspective, we typed eight microsatellite loci and 16 binary markers from the Y chromosome in 246 Muslims from Andhra Pradesh, and compared them to published data on 4,204 males from China, Central Asia, other parts of India, Sri Lanka, Pakistan, Iran, the Middle East, Turkey, Egypt and Morocco. We find that the Muslim populations in general are genetically closer to their non-Muslim geographical neighbors than to other Muslims in India, and that there is a highly significant correlation between genetics and geography (but not religion). Our findings indicate that, despite the documented practice of marriage between Muslim men and Hindu women, Islamization in India did not involve large-scale replacement of Hindu Y chromosomes. The Muslim expansion in India was predominantly a cultural change and was not accompanied by significant gene flow, as seen in other places, such as China and Central Asia.
Y-chromosomal polymorphism; India; Muslim; Hindu
Structural polymorphism is increasingly recognised as a major form of human genome variation, and is particularly prevalent on the Y chromosome. Assay of the Amelogenin Y gene (AMELY) on Yp is widely used in DNA-based sex testing, and sometimes reveals males who have interstitial deletions. In a collection of 45 deletion males from 12 populations, we used a combination of STS (sequence-tagged site) mapping, and binary-marker and Y-STR (short tandem repeat) haplotyping to understand the structural basis of this variation. 41/45 males carry indistinguishable deletions, 3.0-3.8Mb in size. Breakpoint mapping strongly implicates a mechanism of non-allelic homologous recombination between the proximal major array of TSPY-gene-containing repeats, and a single distal copy of TSPY; this is supported by estimation of TSPY copy number in deleted and non-deleted males. The remaining four males carry three distinct non-recurrent deletions (2.5-4.0Mb) which may be due to non-homologous mechanisms. Haplotyping shows that TSPY-mediated deletions have arisen seven times independently in the sample. One instance, represented by 30 chromosomes mostly of Indian origin within haplogroup J2e1*/M241, has a time-to-most-recent-common-ancestor of ∼7700 ± 1300 years. In addition to AMELY, deletion males all lack the genes PRKY and TBL1Y, and the rarer deletion classes also lack PCDH11Y. The persistence and expansion of deletion lineages, together with direct phenotypic evidence, suggests that absence of these genes has no major deleterious effects.
We have analyzed 7137 samples from 125 different caste, tribal and religious groups of India and 99 samples from three populations of Nepal for the length variation in the COII/tRNALys region of mtDNA. Samples showing length variation were subjected to detailed phylogenetic analysis based on HVS-I and informative coding region sequence variation. The overall frequencies of the 9-bp deletion and insertion variants in South Asia were 1.8% and 0.5%, respectively. We have also defined a novel deep-rooting haplogroup M43 and identified the rare haplogroup H14 in Indian populations carrying the 9bp-deletion by complete mtDNA sequencing. Moreover, we redefined haplogroup M6 and dissected it into two well-defined subclades. The presence of haplogroups F1 and B5a in Uttar Pradesh suggests minor maternal contribution from Southeast Asia to Northern India. The occurrence of haplogroup F1 in the Nepalese sample implies that Nepal might have served as a bridge for the flow of eastern lineages to India. The presence of R6 in the Nepalese, on the other hand, suggests that the gene flow between India and Nepal has been reciprocal.
South Asia; 9bp indel; mtDNA; Haplogroup
We have analyzed 7,137 samples from 125 different caste, tribal and religious groups of India and 99 samples from three populations of Nepal for the length variation in the COII/tRNALys region of mtDNA. Samples showing length variation were subjected to detailed phylogenetic analysis based on HVS-I and informative coding region sequence variation. The overall frequencies of the 9-bp deletion and insertion variants in South Asia were 1.9 and 0.6%, respectively. We have also defined a novel deep-rooting haplogroup M43 and identified the rare haplogroup H14 in Indian populations carrying the 9-bp deletion by complete mtDNA sequencing. Moreover, we redefined haplogroup M6 and dissected it into two well-defined subclades. The presence of haplogroups F1 and B5a in Uttar Pradesh suggests minor maternal contribution from Southeast Asia to Northern India. The occurrence of haplogroup F1 in the Nepalese sample implies that Nepal might have served as a bridge for the flow of eastern lineages to India. The presence of R6 in the Nepalese, on the other hand, suggests that the gene flow between India and Nepal has been reciprocal.
South Asia; 9bp indel; mtDNA; Haplogroup
Malaria is perhaps the most important parasitic infection and strongest known force for selection in the recent evolutionary history of the human genome. Genetically-determined resistance to malaria has been well-documented in some populations, mainly from Africa. The disease is also endemic in South Asia, the world's second most populous region, where resistance to malaria has also been observed, for example in Nepal. The biological basis of this resistance, however, remains unclear. We have therefore investigated whether known African resistance alleles also confer resistance in Asia. We typed seven single nucleotide polymorphisms (SNPs) from the genes HBB, FY, G6PD, TNFSF5, TNF, NOS2 and FCGR2A in 928 healthy individuals from Nepal. Five loci were found to be fixed for the non-resistant allele (HBB, FY, G6PD, TNFSF5 and NOS2). The remaining two (rs1800629 and rs1801274) showed the presence of the resistant allele at a frequency of 93% and 27% in TNF and FCGR2A, respectively. However, the frequencies of these alleles did not differ significantly between highland (susceptible) and lowland (resistant) populations. The observed differences in allele and genotype frequencies in Nepalese populations therefore seem to reflect demographic processes or other selective forces in the Himalayan region, rather than malaria selection pressure acting on these alleles.
Malaria; Himalayas; Nepal; single nucleotide polymorphisms; selection; resistance