In order to explore the diversity and selective signatures of duplication and deletion human copy number variants (CNVs), we sequenced 236 individuals from 125 distinct human populations. We observed that duplications exhibit fundamentally different population genetic and selective signatures than deletions and are more likely to be stratified between human populations. Through reconstruction of the ancestral human genome, we identify megabases of DNA lost in different human lineages and pinpoint large duplications that introgressed from the extinct Denisova lineage now found at high frequency exclusively in Oceanic populations. We find that the proportion of CNV base pairs to single nucleotide variant base pairs is greater among non-Africans than it is among African populations, but we conclude that this difference is likely due to unique aspects of non-African population history as opposed to differences in CNV load.
Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterisation of African genetic diversity is needed. The African Genome Variation Project (AGVP) provides a resource to help design, implement and interpret genomic studies in sub-Saharan Africa (SSA) and worldwide. The AGVP represents dense genotypes from 1,481 and whole genome sequences (WGS) from 320 individuals across SSA. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across SSA. We identify new loci under selection, including for malaria and hypertension. We show that modern imputation panels can identify association signals at highly differentiated loci across populations in SSA. Using WGS, we show further improvement in imputation accuracy supporting efforts for large-scale sequencing of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa, showing for the first time that such designs are feasible.
Amplified gene families on sex chromosomes can harbour genes with important biological functions, especially relating to fertility. The Y-linked heat shock transcription factor (HSFY) family has become amplified on the Y chromosome of the domestic pig (Sus scrofa), in an apparently independent event to an HSFY expansion on the Y chromosome of cattle (Bos taurus). Although the biological functions of HSFY genes are poorly understood, they appear to be involved in gametogenesis in a number of mammalian species, and, in cattle, HSFY gene copy number may correlate with levels of fertility.
We have investigated the HSFY family in domestic pig, and other suid species including warthog, bushpig, babirusa and peccaries. The domestic pig contains at least two amplified variants of HSFY, distinguished predominantly by presence or absence of a SINE within the intron. Both these variants are expressed in testis, and both are present in approximately 50 copies each in a single cluster on the short arm of the Y. The longer form has multiple nonsense mutations rendering it likely non-functional, but many of the shorter forms still have coding potential. Other suid species also have these two variants of HSFY, and estimates of copy number suggest the HSFY family may have amplified independently twice during suid evolution.
The HSFY genes have become amplified in multiple species lineages independently. HSFY is predominantly expressed in testis in domestic pig, a pattern conserved with cattle, in which HSFY may play a role in fertility. Further investigation of the potential associations of HSFY with fertility and testis development may be of agricultural interest.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1650-x) contains supplementary material, which is available to authorized users.
Evolution; Sex chromosomes; Gene sequencing
To investigate the information about Y-structural variants (SVs) in the general population that could be obtained by low-coverage whole-genome sequencing.
We investigated SVs on the male-specific portion of the Y chromosome in the 70 individuals from Africa, Europe, or East Asia sequenced as part of the 1000 Genomes Pilot project, using data from this project and from additional studies on the same samples. We applied a combination of read-depth and read-pair methods to discover candidate Y-SVs, followed by validation using information from the literature, independent sequence and single nucleotide polymorphism-chip data sets, and polymerase chain reaction experiments.
We validated 19 Y-SVs, 2 of which were novel. Non-reference allele counts ranged from 1 to 64. The regions richest in variation were the heterochromatic segments near the centromere or the DYZ19 locus, followed by the ampliconic regions, but some Y-SVs were also present in the X-transposed and X-degenerate regions. In all, 5 of the 27 protein-coding gene families on the Y chromosome varied in copy number.
We confirmed that Y-SVs were readily detected from low-coverage sequence data and were abundant on the chromosome. We also reported both common and rare Y-SVs that are novel.
We have assessed copy number variation (CNV) in the male-specific part of the human Y chromosome discovered by array comparative genomic hybridization (array-CGH) in 411 apparently healthy UK males, and validated the findings using SNP genotype intensity data available for 149 of them. After manual curation taking account of the complex duplicated structure of Y-chromosomal sequences, we discovered 22 curated CNV events considered validated or likely, mean 0.93 (range 0–4) per individual. 16 of these were novel. Curated CNV events ranged in size from <1 kb to >3 Mb, and in frequency from 1/411 to 107/411. Of the 24 protein-coding genes or gene families tested, nine showed CNV. These included a large duplication encompassing the AMELY and TBL1Y genes that probably has no phenotypic effect, partial deletions of the TSPY cluster and AZFc region that may influence spermatogenesis, and other variants with unknown functional implications, including abundant variation in the number of RBMY genes and/or pseudogenes, and a novel complex duplication of two segments overlapping the AZFa region and including the 3′ end of the UTY gene.
Electronic supplementary material
The online version of this article (doi:10.1007/s00439-015-1562-5) contains supplementary material, which is available to authorized users.
Isolated populations are emerging as a powerful study design in the search for low frequency and rare variant associations with complex phenotypes. Here we genotype 2,296 samples from two isolated Greek populations, the Pomak villages (HELIC-Pomak) in the North of Greece; and the Mylopotamos villages (HELIC-MANOLIS) on Crete. We compare their genomic characteristics to the general Greek population and establish them as genetic isolates. In the MANOLIS cohort we observe an enrichment of missense variants amongst the variants that have drifted up in frequency by >5 fold. In the Pomak cohort we find novel associations at variants on chr11p15.4 showing large allele frequency increases (from 0.2% in the general Greek population to 4.6% in the isolate) with haematological traits, for example with mean corpuscular volume (rs7116019, p=2.3×10−26). We replicate this association in a second set of Pomak samples (combined p=2.0×10−36). We demonstrate significant power gains in detecting medical trait associations.
Y-chromosomal haplogroup G1 is a minor component of the overall gene pool of South-West and Central Asia but reaches up to 80% frequency in some populations scattered within this area. We have genotyped the G1-defining marker M285 in 27 Eurasian populations (n= 5,346), analyzed 367 M285-positive samples using 17 Y-STRs, and sequenced ~11 Mb of the Y-chromosome in 20 of these samples to an average coverage of 67X. This allowed detailed phylogenetic reconstruction. We identified five branches, all with high geographical specificity: G1-L1323 in Kazakhs, the closely related G1-GG1 in Mongols, G1-GG265 in Armenians and its distant brother clade G1-GG162 in Bashkirs, and G1-GG362 in West Indians. The haplotype diversity, which decreased from West Iran to Central Asia, allows us to hypothesize that this rare haplogroup could have been carried by the expansion of Iranic speakers northwards to the Eurasian steppe and via founder effects became a predominant genetic component of some populations, including the Argyn tribe of the Kazakhs. The remarkable agreement between genetic and genealogical trees of Argyns allowed us to calibrate the molecular clock using a historical date (1405 AD) of the most recent common genealogical ancestor. The mutation rate for Y-chromosomal sequence data obtained was 0.78×10-9 per bp per year, falling within the range of published rates. The mutation rate for Y-chromosomal STRs was 0.0022 per locus per generation, very close to the so-called genealogical rate. The “clan-based” approach to estimating the mutation rate provides a third, middle way between direct farther-to-son comparisons and using archeologically known migrations, whose dates are subject to revision and of uncertain relationship to genetic events.
•Revisited the previous discovery of a rare Y haplogroup in two Ecuador populations.•Hypotheses for the origin of the haplogroup tested with autosomal SNP genotype data.•We favoured one of the three hypotheses, ‘founder plus drift’.
The colonization of Americas is thought to have occurred 15–20 thousand years ago (Kya), with little or no subsequent migration into South America until the European expansions beginning 0.5 Kya. Recently, however, haplogroup C3* Y chromosomes were discovered in two nearby Native American populations from Ecuador. Since this haplogroup is otherwise nearly absent from the Americas but is common in East Asia, and an archaeological link between Ecuador and Japan is known from 6 Kya, an additional migration 6 Kya was suggested. Here, we have generated high-density autosomal SNP genotypes from the Ecuadorian populations and compared them with genotypes from East Asia and elsewhere to evaluate three hypotheses: a recent migration from Japan, a single pulse of migration from Japan 6 Kya, and no migration after the First Americans. First, using forward-time simulations and an appropriate demographic model, we investigated our power to detect both ancient and recent gene flow at different levels. Second, we analyzed 207,321 single nucleotide polymorphisms from 16 Ecuadorian individuals, comparing them with populations from the HGDP panel using descriptive and formal tests for admixture. Our simulations revealed good power to detect recent admixture, and that ≥5% admixture 6 Kya ago could be detected. However, in the experimental data we saw no evidence of gene flow from Japan to Ecuador. In summary, we can exclude recent migration and probably admixture 6 Kya as the source of the C3* Y chromosomes in Ecuador, and thus suggest that they represent a rare founding lineage lost by drift elsewhere.
Past human migrations; Ecuador; Admixture; Simulations
Many studies of human populations have used the male-specific region of the Y chromosome (MSY) as a marker, but MSY sequence variants have traditionally been subject to ascertainment bias. Also, dating of haplogroups has relied on Y-specific short tandem repeats (STRs), involving problems of mutation rate choice, and possible long-term mutation saturation. Next-generation sequencing can ascertain single nucleotide polymorphisms (SNPs) in an unbiased way, leading to phylogenies in which branch-lengths are proportional to time, and allowing the times-to-most-recent-common-ancestor (TMRCAs) of nodes to be estimated directly. Here we describe the sequencing of 3.7 Mb of MSY in each of 448 human males at a mean coverage of 51×, yielding 13,261 high-confidence SNPs, 65.9% of which are previously unreported. The resulting phylogeny covers the majority of the known clades, provides date estimates of nodes, and constitutes a robust evolutionary framework for analyzing the history of other classes of mutation. Different clades within the tree show subtle but significant differences in branch lengths to the root. We also apply a set of 23 Y-STRs to the same samples, allowing SNP- and STR-based diversity and TMRCA estimates to be systematically compared. Ongoing purifying selection is suggested by our analysis of the phylogenetic distribution of nonsynonymous variants in 15 MSY single-copy genes.
Y-chromosome phylogeny; single nucleotide polymorphisms; targeted resequencing; Y-STRs; purifying selection
Isolated populations are emerging as a powerful study design in the search for low-frequency and rare variant associations with complex phenotypes. Here we genotype 2,296 samples from two isolated Greek populations, the Pomak villages (HELIC-Pomak) in the North of Greece and the Mylopotamos villages (HELIC-MANOLIS) in Crete. We compare their genomic characteristics to the general Greek population and establish them as genetic isolates. In the MANOLIS cohort, we observe an enrichment of missense variants among the variants that have drifted up in frequency by more than fivefold. In the Pomak cohort, we find novel associations at variants on chr11p15.4 showing large allele frequency increases (from 0.2% in the general Greek population to 4.6% in the isolate) with haematological traits, for example, with mean corpuscular volume (rs7116019, P=2.3 × 10−26). We replicate this association in a second set of Pomak samples (combined P=2.0 × 10−36). We demonstrate significant power gains in detecting medical trait associations.
Isolated populations can increase power to detect low frequency and rare risk variants associated with complex phenotypes. Here, the authors identify variants associated with haematological traits in two isolated Greek populations that would be difficult to detect in the general population, due to their low frequency.
The male-specific region of the human Y chromosome (MSY) contains eight large inverted repeats (palindromes), in which high-sequence similarity between repeat arms is maintained by gene conversion. These palindromes also harbor microsatellites, considered to evolve via a stepwise mutation model (SMM). Here, we ask whether gene conversion between palindrome microsatellites contributes to their mutational dynamics. First, we study the duplicated tetranucleotide microsatellite DYS385a,b lying in palindrome P4. We show, by comparing observed data with simulated data under a SMM within haplogroups, that observed heteroallelic combinations in which the modal repeat number difference between copies was large, can give rise to homoallelic combinations with zero-repeats difference, equivalent to many single-step mutations. These are unlikely to be generated under a strict SMM, suggesting the action of gene conversion. Second, we show that the intercopy repeat number difference for a large set of duplicated microsatellites in all palindromes in the MSY reference sequence is significantly reduced compared with that for nonpalindrome-duplicated microsatellites, suggesting that the former are characterized by unusual evolutionary dynamics. These observations indicate that gene conversion violates the SMM for microsatellites in palindromes, homogenizing copies within individual Y chromosomes, but increasing overall haplotype diversity among chromosomes within related groups.
Y chromosome; gene conversion; palindrome; microsatellite; stepwise mutation model; DYS385
The somatic mutations in a cancer genome are the aggregate outcome of one or more mutational processes operative through the life of the cancer patient1-3. Each mutational process leaves a characteristic mutational signature determined by the mechanisms of DNA damage and repair that constitute it. A role was recently proposed for the APOBEC family of cytidine deaminases in generating particular genome-wide mutational signatures1,4 and a signature of localized hypermutation called kataegis1,4. A germline copy number polymorphism involving APOBEC3A and APOBEC3B, which effectively deletes APOBEC3B5, has been associated with a modest increased risk of breast cancer6-8. Here, we show that breast cancers in carriers of the deletion show more mutations of the putative APOBEC-dependent genome-wide signatures than cancers in non-carriers. The results suggest that the APOBEC3A/3B germline deletion allele confers cancer susceptibility through increased activity of APOBEC-dependent mutational processes, although the mechanism by which this occurs remains unknown.
The search for a method that utilizes biological information to predict humans’ place of origin has occupied scientists for millennia. Over the past four decades, scientists have employed genetic data in an effort to achieve this goal but with limited success. While biogeographical algorithms using next-generation sequencing data have achieved an accuracy of 700km in Europe, they were inaccurate elsewhere. Here we describe the Geographic Population Structure (GPS) algorithm and demonstrate its accuracy with three datasets using 40,000-130,000 SNPs. GPS placed 83% of worldwide-individuals in their country of origin. Applied to over 200 Sardinians villagers, GPS placed a quarter of them in their villages and most of the rest within 50km of their villages. GPS’s accuracy and power to infer the biogeography of worldwide-individuals down to their country or, in some cases, village, of origin, underscores the promise of admixture-based methods for biogeography and has ramifications for genetic ancestry testing.
The Wellcome Trust Case Control Consortium 3 anorexia nervosa genome-wide association scan includes 2907 cases from 15 different populations of European origin genotyped on the Illumina 670K chip. We compared methods for identifying population stratification, and suggest list of markers that may help to counter this problem. It is usual to identify population structure in such studies using only common variants with minor allele frequency (MAF) >5% we find that this may result in highly informative SNPs being discarded, and suggest that instead all SNPs with MAF >1% may be used. We established informative axes of variation identified via principal component analysis and highlight important features of the genetic structure of diverse European-descent populations, some studied for the first time at this scale. Finally, we investigated the substructure within each of these 15 populations and identified SNPs that help capture hidden stratification. This work can provide information regarding the designing and interpretation of association results in the International Consortia.
population stratification; AIMs; principal component analysis
In a worldwide collaborative effort, 19,630 Y-chromosomes were sampled from 129 different populations in 51 countries. These chromosomes were typed for 23 short-tandem repeat (STR) loci (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS385ab, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635, GATAH4, DYS481, DYS533, DYS549, DYS570, DYS576, and DYS643) and using the PowerPlex Y23 System (PPY23, Promega Corporation, Madison, WI). Locus-specific allelic spectra of these markers were determined and a consistently high level of allelic diversity was observed. A considerable number of null, duplicate and off-ladder alleles were revealed. Standard single-locus and haplotype-based parameters were calculated and compared between subsets of Y-STR markers established for forensic casework. The PPY23 marker set provides substantially stronger discriminatory power than other available kits but at the same time reveals the same general patterns of population structure as other marker sets. A strong correlation was observed between the number of Y-STRs included in a marker set and some of the forensic parameters under study. Interestingly a weak but consistent trend toward smaller genetic distances resulting from larger numbers of markers became apparent.
Gene diversity; Discriminatory power; AMOVA; Population structure; Database
Population differentiation has proved to be effective for identifying loci under geographically localized positive selection, and has the potential to identify loci subject to balancing selection. We have previously investigated the pattern of genetic differentiation among human populations at 36.8 million genomic variants to identify sites in the genome showing high frequency differences. Here, we extend this dataset to include additional variants, survey sites with low levels of differentiation, and evaluate the extent to which highly differentiated sites are likely to result from selective or other processes.
We demonstrate that while sites with low differentiation represent sampling effects rather than balancing selection, sites showing extremely high population differentiation are enriched for positive selection events and that one half may be the result of classic selective sweeps. Among these, we rediscover known examples, where we actually identify the established functional SNP, and discover novel examples including the genes ABCA12, CALD1 and ZNF804, which we speculate may be linked to adaptations in skin, calcium metabolism and defense, respectively.
We identify known and many novel candidate regions for geographically restricted positive selection, and suggest several directions for further research.
A report on the 'Genomic Disorders 2013: from 60 years of DNA to human genomes in the clinic' meeting, held at Homerton College, Cambridge, UK, April 10-12, 2013.
Following the dispersal out of Africa, where hominins evolved in warm environments for millions of years, our species has colonised different climate zones of the world, including high latitudes and cold environments. The extent to which human habitation in (sub-)Arctic regions has been enabled by cultural buffering, short-term acclimatization and genetic adaptations is not clearly understood. Present day indigenous populations of Siberia show a number of phenotypic features, such as increased basal metabolic rate, low serum lipid levels and increased blood pressure that have been attributed to adaptation to the extreme cold climate. In this study we introduce a dataset of 200 individuals from ten indigenous Siberian populations that were genotyped for 730,525 SNPs across the genome to identify genes and non-coding regions that have undergone unusually rapid allele frequency and long-range haplotype homozygosity change in the recent past. At least three distinct population clusters could be identified among the Siberians, each of which showed a number of unique signals of selection. A region on chromosome 11 (chr11:66–69 Mb) contained the largest amount of clustering of significant signals and also the strongest signals in all the different selection tests performed. We present a list of candidate cold adaption genes that showed significant signals of positive selection with our strongest signals associated with genes involved in energy regulation and metabolism (CPT1A, LRP5, THADA) and vascular smooth muscle contraction (PRKG1). By employing a new method that paints phased chromosome chunks by their ancestry we distinguish local Siberian-specific long-range haplotype signals from those introduced by admixture.
The greater Himalayan region demarcates two of the most prominent linguistic phyla in Asia: Tibeto-Burman and Indo-European. Previous genetic surveys, mainly using Y-chromosome polymorphisms and/or mitochondrial DNA polymorphisms suggested a substantially reduced geneflow between populations belonging to these two phyla. These studies, however, have mainly focussed on populations residing far to the north and/or south of this mountain range, and have not been able to study geneflow patterns within the greater Himalayan region itself. We now report a detailed, linguistically informed, genetic survey of Tibeto-Burman and Indo-European speakers from the Himalayan countries Nepal and Bhutan based on autosomal microsatellite markers and compare these populations with surrounding regions. The genetic differentiation between populations within the Himalayas seems to be much higher than between populations in the neighbouring countries. We also observe a remarkable genetic differentiation between the Tibeto-Burman speaking populations on the one hand and Indo-European speaking populations on the other, suggesting that language and geography have played an equally large role in defining the genetic composition of present-day populations within the Himalayas.
Interpreting variants, especially noncoding ones, in the increasing
number of personal genomes is challenging. We used patterns of polymorphisms in
functionally annotated regions in 1092 humans to identify deleterious variants;
then we experimentally validated candidates. We analyzed both coding and
noncoding regions, with the former corroborating the latter. We found regions
particularly sensitive to mutations (“ultrasensitive”) and
variants that are disruptive because of mechanistic effects on
transcription-factor binding (that is, “motif-breakers”). We also
found variants in regions with higher network centrality tend to be deleterious.
Insertions and deletions followed a similar pattern to single-nucleotide
variants, with some notable exceptions (e.g., certain deletions and enhancers).
On the basis of these patterns, we developed a computational tool (FunSeq),
whose application to ~90 cancer genomes reveals nearly a hundred
candidate noncoding drivers.
We have compared phylogenies and time estimates for Y-chromosomal lineages based on resequencing ∼9 Mb of DNA and applying the program GENETREE to similar analyses based on the more standard approach of genotyping 26 Y-SNPs plus 21 Y-STRs and applying the programs NETWORK and BATWING. We find that deep phylogenetic structure is not adequately reconstructed after Y-SNP plus Y-STR genotyping, and that times estimated using observed Y-STR mutation rates are several-fold too recent. In contrast, an evolutionary mutation rate gives times that are more similar to the resequencing data. In principle, systematic comparisons of this kind can in future studies be used to identify the combinations of Y-SNP and Y-STR markers, and time estimation methodologies, that correspond best to resequencing data.
Human Y chromosome; Male history; Time estimation; Networks; BATWING
Patterns of genetic variation in a population carry information about the prehistory of the population, and for the human Y chromosome an especially informative phylogenetic tree has previously been constructed from fully-sequenced chromosomes. This revealed contrasting bifurcating and starlike phylogenies for the major lineages associated with the Neolithic expansions in sub-Saharan Africa and Western Europe, respectively.
We used coalescent simulations to investigate the range of demographic models most likely to produce the phylogenetic structures observed in Africa and Europe, assessing the starting and ending genetic effective population sizes, duration of the expansion, and time when expansion ended. The best-fitting models in Africa and Europe are very different. In Africa, the expansion took about 12 thousand years, ending very recently; it started from approximately 40 men and numbers expanded approximately 50-fold. In Europe, the expansion was much more rapid, taking only a few generations and occurring as soon as the major R1b lineage entered Europe; it started from just one to three men, whose numbers expanded more than a thousandfold.
Although highly simplified, the demographic model we have used captures key elements of the differences between the male Neolithic expansions in Africa and Europe, and is consistent with archaeological findings.
Human Y chromosome; Neolithic transition; Population expansion; Demographic modeling; Coalescent simulations; Haplogroup; R1b; E1b1a
Recent studies have examined the influence on patterns of human genetic variation of a variety of cultural practices. In India, centuries-old marriage customs have introduced extensive social structuring into the contemporary population, potentially with significant consequences for genetic variation. Social stratification in India is evident as social classes that are defined by endogamous groups known as castes. Within a caste, there exist endogamous groups known as gols (marriage circles), each of which comprises a small number of exogamous gotra (lineages). Thus, while consanguinity is strictly avoided and some randomness in mate selection occurs within the gol, gene flow is limited with populations outside the gol. Gujarati Patels practice this form of “exogamic endogamy.” We have analyzed genetic variation in one such group of Gujarati Patels, the Chha Gaam Patels (CGP), who comprise individuals from six villages. Population structure analysis of 1,200 autosomal loci offers support for the existence of distinctive multilocus genotypes in the CGP with respect to both non-Gujaratis and other Gujaratis, and indicates that CGP individuals are genetically very similar. Analysis of Y-chromosomal and mitochondrial haplotypes provides support for both patrilocal and patrilineal practices within the gol, and a low-level of female gene flow into the gol. Our study illustrates how the practice of gol endogamy has introduced fine-scale genetic structure into the population of India, and contributes more generally to an understanding of the way in which marriage practices affect patterns of genetic variation.
Endogamy; Gene Flow; HVS1; India; Y-chromosomal
Some individuals with a particular disease-causing mutation or genotype fail to express most if not all features of the disease in question, a phenomenon that is known as ‘reduced (or incomplete) penetrance’. Reduced penetrance is not uncommon; indeed, there are many known examples of ‘disease-causing mutations’ that fail to cause disease in at least a proportion of the individuals who carry them. Reduced penetrance may therefore explain not only why genetic diseases are occasionally transmitted through unaffected parents, but also why healthy individuals can harbour quite large numbers of potentially disadvantageous variants in their genomes without suffering any obvious ill effects. Reduced penetrance can be a function of the specific mutation(s) involved or of allele dosage. It may also result from differential allelic expression, copy number variation or the modulating influence of additional genetic variants in cis or in trans. The penetrance of some pathogenic genotypes is known to be age- and/or sex-dependent. Variable penetrance may also reflect the action of unlinked modifier genes, epigenetic changes or environmental factors. At least in some cases, complete penetrance appears to require the presence of one or more genetic variants at other loci. In this review, we summarize the evidence for reduced penetrance being a widespread phenomenon in human genetics and explore some of the molecular mechanisms that may help to explain this enigmatic characteristic of human inherited disease.
All non-human great apes are endangered in the wild, and it is therefore important to gain an understanding of their demography and genetic diversity. Whole genome assembly projects have provided an invaluable foundation for understanding genetics in all four genera, but to date genetic studies of multiple individuals within great ape species have largely been confined to mitochondrial DNA and a small number of other loci. Here, we present a genome-wide survey of genetic variation in gorillas using a reduced representation sequencing approach, focusing on the two lowland subspecies. We identify 3,006,670 polymorphic sites in 14 individuals: 12 western lowland gorillas (Gorilla gorilla gorilla) and 2 eastern lowland gorillas (Gorilla beringei graueri). We find that the two species are genetically distinct, based on levels of heterozygosity and patterns of allele sharing. Focusing on the western lowland population, we observe evidence for population substructure, and a deficit of rare genetic variants suggesting a recent episode of population contraction. In western lowland gorillas, there is an elevation of variation towards telomeres and centromeres on the chromosomal scale. On a finer scale, we find substantial variation in genetic diversity, including a marked reduction close to the major histocompatibility locus, perhaps indicative of recent strong selection there. These findings suggest that despite their maintaining an overall level of genetic diversity equal to or greater than that of humans, population decline, perhaps associated with disease, has been a significant factor in recent and long-term pressures on wild gorilla populations.