|Home | About | Journals | Submit | Contact Us | Français|
The human Y chromosome shows frequent structural variants, some of which are selectively neutral, while others cause impaired fertility due to the loss of spermatogenic genes. The large-scale use of multiple Y-chromosomal microsatellites in forensic and population genetic studies can reveal such variants, through the absence or duplication of specific markers in haplotypes. We describe Y chromosomes in apparently normal males carrying null and duplicated alleles at the microsatellite DYS448, which lies in the proximal part of the azoospermia factor c (AZFc) region, important in spermatogenesis, and made up of “ampliconic” repeats that act as substrates for nonallelic homologous recombination (NAHR). Physical mapping in 26 DYS448 deletion chromosomes reveals that only three cases belong to a previously described class, representing independent occurrences of an~1.5-Mb deletion mediated by recombination between the b1 and b3 repeat units. The remainder belong to five novel classes; none appears to be mediated through homologous recombination, and all remove some genes, but are likely to be compatible with normal fertility. A combination of deletion analysis with binary-marker and microsatellite haplotyping shows that the 26 deletions represent nine independent events. Nine DYS448 duplication chromosomes can be explained by four independent events. Some lineages have risen to high frequency in particular populations, in particular a deletion within haplogroup (hg) C*(xC3a,C3c) found in 18 Asian males. The nonrandom phylogenetic distribution of duplication and deletion events suggests possible structural predisposition to such mutations in hgs C and G. Hum Mutat 29(10), 1171–1180, 2008.
The human Y chromosome is remarkable for its high level of structural variability. Cytogenetic and molecular studies have shown that many structural variants exist within human populations, including deletions [Jobling et al., 1996, 2007; Repping et al., 2003, 2006], duplications [Bosch and Jobling, 2003; Jobling et al., 1996; Repping et al., 2006], and inversions [Affara et al., 1986; Bernstein et al., 1986; Page, 1986; Repping et al., 2006; Verma et al., 1982]. While many variants appear to be selectively neutral, some are certainly pathogenic, causing failure of spermatogenesis [Vogt et al., 1996], through loss of azoospermia factor (AZF) loci. Others may have subtle effects on fertility through variation in the copy number of spermatogenic genes, though this is difficult to demonstrate conclusively, and remains controversial [McElreavey et al., 2006]. Underlying the structural variability of the Y chromosome is a high rate of mutation through nonallelic homologous recombination (NAHR) between highly similar paralogous sequences, of which the Y bears a particularly large proportion [Skaletsky et al., 2003]. These paralogs also act as substrates for frequent gene conversion events [Bosch et al., 2004; Rozen et al., 2003].
As well as its importance in male infertility, the Y chromosome is a powerful marker for studying issues in human evolutionary genetics. The strength of this haplotypic system [Jobling and Tyler-Smith, 2003] comes from combining a robust and well-resolved phylogeny based on slow-mutating binary markers such as single-nucleotide polymorphisms (SNPs) [Underhill et al., 2000; Y Chromosome Consortium, 2002], with fine-scale haplo-typing based on rapidly mutating multiallelic markers, principally multiple microsatellites. The total number of informative microsatellites available is over 200 [Kayser et al., 2004], and some population studies analyze more than 50 of these [King et al., 2007; Xue et al., 2005].
When multiple microsatellites distributed throughout the chromosome are used in population studies, in effect they also act as a surveillance system for Y-chromosomal structural variants. When a particular microsatellite lies in a deleted region, its absence from a haplotype clearly signals the deletion (a “null” allele). The case of a duplication is less straightforward, and this could lead to underascertainment: if both copies of the duplicated microsatellite have the same number of repeat units, they are indistinguishable by length, though quantitative analysis could demonstrate that two copies are present. If the two copies have different repeat numbers due to microsatellite mutation, then the “duplicated” allele can clearly be seen as two peaks in an electropherogram, though it needs to be distinguished from possible mosaicism following somatic mutation [Clayton et al., 2004].
Microsatellite haplotype anomalies have already been used to identify a number of different Y-chromosomal rearrangements. Previously, we have used duplicated [Bosch and Jobling, 2003] and null alleles [King et al., 2005] of several clustered microsatellites to diagnose duplications and deletions of the AZFa region on Yq; the counting of multiple alleles and semiquantitative analysis of the microsatellite DYS464 [Butler and Schoske, 2005] reveals copy number variation of parts of the AZFc region, and the absence of all copies of the same marker indicates pathogenic AZFc deletions [King et al., 2005]; finally, absence of the microsatellite DYS458 signals recurrent and apparently nonpathogenic deletions on Yp that encompass the Amelogenin Y gene [Jobling et al., 2007].
Here, we describe Y chromosomes in apparently normal males carrying null and duplicated alleles at the hexanucleotide-repeat microsatellite DYS448. In the Y-chromosomal reference sequence [Skaletsky et al., 2003] this microsatellite lies within “u2” (Fig. 1a), a~170-kb segment in the proximal part of the 4.5-Mb region on Yq that includes AZFc [Kuroda-Kawaguchi et al., 2001]. This region is mostly composed of large “ampliconic” repeat units [Skaletsky et al., 2003], many arranged as palindromic sequences, which may act as substrates for illegitimate recombination events that cause chromosomal rearrangements.
We use deletion mapping, as well as haplotyping with binary markers and multiple microsatellites, to explore the molecular basis for the underlying rearrangements, which reveal novel deletions and duplications affecting gene copy number and illustrate the complexity and variability of the organization of this important and dynamic region of the Y chromosome. Some rearrangements have come to high frequency in particular populations, and the phylogenetic distributions of independent duplication and deletion events suggest that some branches of the Y phylogeny might bear structures that predispose to, or protect against, rearrangement mutations.
Most DNA samples were from collections of the authors, and were obtained with appropriate informed consent. Some samples form part of sets described previously [Jobling et al., 1996; Parkin et al., 2006, 2007; Roewer et al., 2007]. The 684 male samples from the Centre d'Etude du Polymorphisme Humain–Human Genome Diversity Project (HGDP-CEPH) panel [Cann et al., 2002] were also included. Some samples were subjected to whole genome amplification [Dean et al., 2002] using the Genomiphi kit (GE Healthcare, Amersham, United Kingdom) before analysis. Two of the deletion chromosomes (448del1 [m38] and 448del6 [m252]) were previously described in a set carrying deletions of the marker 50f2/C [Jobling et al., 1996].
Deletions and duplications were ascertained using a published multiplex incorporating DYS448 [Butler et al., 2002] or the commercial forensic kit Y-filer (Applied Biosystems, Warrington, United Kingdom), and were confirmed by repeated typing. DYS448 was considered to be duplicated when its two peaks in an electropherogram were of approximately equal height and area.
Y-specific sequence-tagged sites (STSs) around DYS448, with primer sequences available from the literature [Jobling et al., 1998; Repping et al., 2003; Skaletsky et al., 2003; Tilford et al., 2001; Vollrath et al., 1992], were amplified by PCR and analyzed by agarose gel electrophoresis. An STS was considered to be deleted when reproducibly absent in the presence of a larger independent Y-specific control amplicon coamplified in the same PCR reaction [Jobling et al., 2007]. The PCR system in most cases was as described [Jobling et al., 2007], and cycling conditions were as follows: 33 cycles of (94°C for 30 s, 60°C for 30 s, and 70°C for 30 s). The marker 50f2/C (DYS7C) was typed using a previously described assay which generates a small (196-bp) test amplicon and a larger control amplicon from Yp (minisatellite MSY1) using a single primer pair.
Binary markers were typed in a hierarchical fashion, using either the SNaPshot minisequencing protocol (Applied Biosystems) on an ABI3100 capillary electrophoresis apparatus (Applied Biosystems), or primer extension on the Sequenom mass spectrometry system (Sequenom, San Diego, CA). Amplification and extension primers were based on ones published previously [Bosch et al., 2006; Hurles et al., 2005; Paracchini et al., 2002], with additional primers based on published sequences [Y Chromosome Consortium, 2002]. The binary markers define haplogroups that are represented in a maximum parsimony tree [Jobling and Tyler-Smith, 2003; Y Chromosome Consortium, 2002].
A total of 24 Y-specific microsatellites (DYS19, DYS385a/b, DYS388, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS426, DYS435, DYS436, DYS437, DYS438, DYS439, DYS447, DYS448, DYS460, DYS461, DYS462, YCAIIa/b, and Y-GATA-H4.1) were typed in two multiplexes [Butler et al., 2002; Parkin et al., 2006]. PCR products were resolved on an ABI3100 capillary electrophoresis apparatus (Applied Biosystems), and analyzed using GeneMapper software (Applied Biosystems). Allele nomenclature was as described [Parkin et al., 2006], and compatible with International Society for Forensic Genetics (ISFG) recommendations [Gusmão et al., 2006].
DYS464 was amplified using published primers [Redd et al., 2002], the forward primer being 5′-labeled with the dye 6-FAM. PCR reactions were set up as described [Butler et al., 2002], with an annealing temperature of 60°C, and analyzed by capillary electrophoresis. A total of five DNA samples from the YCC collection [Y Chromosome Consortium, 2002] for which published electropherograms were available [Redd et al., 2002] were used as controls.
A weighted median-joining network [Bandelt et al., 1999] was constructed from microsatellite haplotypes using Network 4.0 (www.fluxus-engineering.com/sharenet.htm). The network assumes a single-step microsatellite mutational model, allows nodes (median vectors) to be unoccupied by sampled haplotypes, and represents the union of the most parsimonious trees. Of the total 24 microsatellites analyzed we omitted DYS448 itself, and also YCAIIa, YCAIIb, and DYS385a, DYS385b because of allele assignment problems, leaving 19 microsatellites. Two chromosomes (448del2 and 448del3) carried duplications at DYS19 and DYS461, respectively (Supplementary Table S1; available online at http://www.interscience.wiley.com/jpages/1059-7794/suppmat); to allow these loci to be included in the network, we considered only the smaller allele in each case. Weighting [Qamar et al., 2002] was used to remove some reticulations (closed structures) within networks by taking into account the range of different mutation rates of the markers, reflected indirectly by their allele length variances among all chromosomes included in the network.
Time-to-most-recent-common-ancestor (TMRCA) of clusters within networks, together with standard deviation, was estimated using the rho statistic within the Network program. This measure represents the mean number of mutations between a root haplotype and every other haplotype in the cluster, counted from the network itself. Root haplotypes were chosen to be as close as possible to the center of a cluster, and to be comprised of more than one haplotype. In the case of haplogroup (hg) C3*(xC3a,C3c), choice of alternative roots made little difference to the TMRCA. Rho is related to time in generations (t) by ρ=μt, where μ is the per-haplotype per-generation mutation rate, calculated from the per-microsatellite per-generation mutation rate of 2.0 × 10−3 [Gusmão et al., 2005]. Generation time was taken to be 31 years [Fenner, 2005].
To test whether rearrangements involving DYS448 were randomly distributed across the Y phylogeny, we combined all of our data (on Central Asian, Bhutanese, Nepalese, French, English, and CEPH-HGDP samples; see Supplementary Table S2) into haplogroup designations at the “letter” clade level (A–R) for compatibility. We then carried out Fisher's exact tests to ask whether rearrangements were over- or underrepresented in particular haplogroups given their relative frequencies, using a false discovery rate correction for multiple testing [Benjamini and Hochberg, 1995].
While the majority of Y chromosomes carry a single DYS448 allele, we observed that some carry null alleles; in principle, this can result from small-scale primer-site mutations, but the analysis described below confirms locus deletion. We also found that some chromosomes carry two alleles of different lengths, but of equal peak heights in electropherograms produced by capillary electrophoresis, suggesting a genuine allele duplication. From our population studies (a total of 3,303 chromosomes; our unpublished data) [Parkin et al., 2006, 2007; Roewer et al., 2007] we assembled a collection of 26 deletions and nine duplications for further molecular analysis (Table 1).
The location of DYS448 within an ampliconic repeat region immediately suggests a candidate mechanism for deletions and duplications (assuming that the starting structure has the same organization as the reference sequence): NAHR causing unequal exchange between the repeat units b1 and b3 (Fig. 1a). Three b1/b3-mediated deletions have been reported before in one fertile and two infertile men [Hucklenbroich et al., 2005; Repping et al., 2003].
To investigate this possibility, we undertook deletion mapping in chromosomes carrying null alleles, using STSs around DYS448 following an established approach [Repping et al., 2003]. This mapping reveals six physically distinct deletion classes, and, surprisingly, supports the b1/b3 NAHR mechanism in only 3 out of 26 cases: in these chromosomes the adjacent set of six STSs, sY1161, sY1196, sY1197, sY1192, 50f2/C, and sY1291, are absent, while flanking STSs sY1258 (proximal) and sY1206 (distal) are present (Fig. 1b).
Among the remaining 23 deletion chromosomes, STS deletion mapping reveals five physically distinct non-b1/b3 deletion classes (Fig. 1b; Table 1). The smallest deletions (non-b1/b3-class I) are carried by two chromosomes, and lack only the two STSs flanking DYS448. A total of 18 chromosomes (non-b1/b3-class II), like the b1/b3 deletions, have a proximal breakpoint within the b1 repeat, but a distal breakpoint proximal to b3, between STSs 50f2/C and sY1291. The remaining three classes of non-b1/b3-mediated deletions are all found as singletons, and all share a proximal breakpoint outside the AZFc region (between sY1315 and sY1279), but have three different distal breakpoints. Sparse and uneven spacing of informative STSs means that the physical size of the deletions is uncertain (Fig. 1b), but even given this uncertainty all of the non-b1/b3 deletion classes must have smaller deletions than the b1/b3 class (1,550–1,611 kb). Importantly, all of the five non-b1/b3 deletion classes cannot be mediated by NAHR between ampliconic repeats, at least on the basis of a starting structure resembling the reference sequence organization. They may result from nonhomologous processes, though verification of this would require the sequencing of breakpoints.
In the case of the DYS448 duplication chromosomes, breakpoint positions cannot be determined without the use of techniques such as quantitative PCR or fluorescence in situ hybridization (FISH) analysis, and unfortunately the duplication DNA samples are unsuitable for these methods. While b1/b3-mediated duplication may be the most parsimonious mechanism to explain DYS448 duplications (Fig. 1d), it is abundantly clear from the complexity of the different deletion classes we have observed that other duplication mechanisms are also possible.
Haplotype analysis provides a means to ask whether rearrangements that are indistinguishable by physical mapping are likely to be identical by descent, or identical by state; i.e., recurrent. If a rearrangement is mediated by NAHR between substantial repeat sequences, then we expect it to recur; in contrast, if a rearrangement is not NAHR-mediated it is more likely to be due to a sporadic event, in which case a set of chromosomes that shares it will probably have common ancestry.
We first used binary polymorphisms to determine the haplogroups of deleted and duplicated chromosomes (Table 1; Fig. 2a), reasoning that if two deletion or duplication chromosomes belong to different haplogroups in the Y-chromosomal binary-marker phylogeny, then they must have occurred independently. We also determined 23-locus microsatellite haplotypes (Supplementary Table S1), to provide an additional estimator of the relatedness between the chromosomes carrying rearrangements. Membership of the same haplogroup, and possession of similar microsatellite haplotypes, can indicate that a rearrangement carried by a set of chromosomes is identical by descent.
The three b1/b3 deletion chromosomes belong to three different haplogroups (O3e, C*, C3c; Fig. 2a), thus providing clear evidence of independent recurrent mutation, and supporting NAHR as the mutational mechanism. In contrast, the 18 non-b1/b3-class II chromosomes all belong to hg C3*(xC3a,C3c), with all but one of their microsatellite haplotypes forming a tight cluster in a median joining network (Fig. 2b), strongly suggesting a common deletion origin for this group. Use of the rho statistic within Network yields a TMRCA for this group of 2,900±766 years. Of the remaining four classes of deletion chromosomes, three (classes III, IV, and V) are represented by singletons, in haplogroups (hgs) G, O2, and D*; the pair of chromosomes in class I belong to two different haplogroups, O2 and E3b, indicating that they have independent origins.
The nine duplication chromosomes belong to four haplogroups (Fig. 2a), again indicating recurrence of DYS448 duplication. A total of four lie in hg G, and form a tight cluster in the network (Fig. 2b) clearly suggesting identity-by-descent of the duplication; they have a TMRCA of 408±289 years. A total of three lie in hg E1, suggesting a common origin, although their diverged microsatellite haplotypes indicate ancient divergence (TMRCA of 5,984±1,246 years); and the remaining two independent cases belong to haplogroups O3e1* and Q*(xQ3a).
In summary, the 26 deletion chromosomes can be explained by nine independent deletion events, of which three are probably mediated by b1/b3 recombination; the nine duplication chromosomes can be explained by four independent events.
The testis-specific deleted in azoospermia (DAZ) genes [Reijo et al., 1995] play an essential role in both primordial germ-cell development and in the differentiation and maturation of sperm [Yen, 2004]. In one of the non-b1/b3 deletion classes (class II), the distal breakpoints lie between the STSs 50f2/C and sY1291. This interval contains the so-called “red” repeats r1 and r2 (Fig. 1a), including two copies of DAZ (Fig. 1c)—present in four copies within the reference sequence. To address the issue of DAZ copy number variation in these 18 class II DYS448 deletion chromosomes, we exploited the presence of another microsatellite, DYS464, that also lies within the “red” repeats (Fig. 1a). Previous work [Berger et al., 2003; Butler and Schoske, 2003, 2005; Redd et al., 2002] has shown that most Y chromosomes give DYS464 allelic patterns compatible with the presence of four copies. In electropherograms, this can present as four peaks if all four copies of the microsatellite carry different repeat numbers, or fewer peaks if some copies carry the same repeat number. In some studies [Butler and Schoske, 2003, 2005] the ratio of peak heights or areas has been used to infer copy number; for example, apparent ratios of 2:1:1, or 1:3 being interpreted as four copies. Our experiments suggest that, in the absence of control samples in which copy number is known for certain, such semiquantitative inferences are risky. Here, we therefore rely on the less ambiguous evidence of the number of alleles distinguishable by length, thereby determining the minimum number of DYS464 copies.
We analyzed the allelic patterns for DYS464 for the deletion chromosomes; examples are shown in Fig. 3. In non-b1/b3 class I, III, IV, and V deletion chromosomes we do not expect the deletions to have affected “red” repeat copy number, so they should each carry four copies. In agreement with this expectation, chromosomes in classes I, III, and V present two or three peaks, and the class IV chromosome presents four. In b1/b3 deletion chromosomes we expect two copies: the chromosomes in this class present only one or two DYS464 peaks (Fig. 3a and b), which is also consistent with our expectations. We now turn to the chromosomes in which copy number is in question. The 18 non-b1/b3 class II chromosomes all present either two, three or four peaks (Fig. 3c, d and e), compatible with their carrying four copies. This strongly suggests that the deletion that all these chromosomes share by descent does not encompass the r1 and r2 repeats, and that DAZ gene copy number is “normal”, at a probable four copies.
We can also address the issue of whether or not the duplications encompassing DYS448 include the r1 and r2 repeats, which would lead to a DAZ gene copy number of six—this would be the case if the duplications were b1/b3-mediated. We analyzed allelic patterns for DYS464 for the nine duplication chromosomes. All show only two or three peaks (e.g., Fig. 3f), so this provides no evidence to support a “red” repeat copy number of greater than four. However, it is important to note that the greater the number of copies of DYS464, the more likely it is that some copies will be identical in allele length, and therefore indistinguishable in our analysis. Power to diagnose duplications involving the “red” repeats is therefore limited.
The proximal AZFc region contains many other genes in addition to DAZ, and the different deletion or duplication classes are expected to affect the copy numbers of different subsets of these. Here we consider only the deletions, because their breakpoints are better characterized. All five of the novel deletion classes are smaller in extent that the b1/b3 class. Based on the reference sequence, and depending on the breakpoint positions within the b1 and b3 repeat units, b1/b3 deletions remove six or seven of the many testis-specific protein-coding genes and five of the nontranslated transcription units in the region (Fig. 1c). Uncertainty regarding the precise positions of the breakpoints in the other deletion classes leads to corresponding uncertainty in the number of deleted genes, and these are summarized in Table 2.
Three of the deletion events correspond to independent occurrences of the~1.5-Mb b1/b3 deletion originally described [Repping et al., 2003]. In principle, the mutation rate of b1/b3-mediated deletion can be calculated given an estimate of the number of generations encompassed within the phylogeny relating all sampled chromosomes [Bosch et al., 2004; Repping et al., 2006]. Such an estimate, 52,000 generations, has been calculated for a set of 47 chromosomes representing the major clades of the Y chromosome tree [Repping et al., 2006]. We analyzed 3,303 chromosomes among which all the clades were also represented, and found three independent b1/b3 deletion events. Time elapsed within the phylogeny relating the surveyed chromosomes is difficult to estimate; although our sample size is large, many chromosomes are closely related to each other. A minimum bound must be 52,000 generations, and a maximum bound of 10 times higher seems reasonable, which yields an approximate range of mutation rates of ~5 × 10−5 to ~5 × 10−6 per generation. This rate is similar to that of another NAHR-mediated Y-chromosomal deletion, encompassing AZFa, which has been determined in sperm DNA as 2.16 × 10−5 [Turner et al., 2007].
Most of the rearrangement chromosomes we have identified derive from Asian populations. To ask is this reflected a true over-representation, or sampling bias, we expanded our sample size by including other published data. The inclusion of DYS448 in a commercial forensic multiplex (Y-filer; Applied Biosystems) means that population data are accumulating in the forensic literature, and reporting of “null” and duplicated alleles appears to be thorough (though haplogroup data are not available). We extracted population data from 25 such studies, comprising 6,614 chromosomes, and containing 15 DYS448 deletions and five duplications (Supplementary Table S2). This brings the total sample size to 9,916, containing 41 deletion chromosomes and 14 duplication chromosomes. Deletion chromosomes are strikingly over-represented in Asia, at 0.7% (n=5,776), compared to another large sample, Europe, at only 0.03% (n=2,650). For duplication chromosomes, we see 0.8% in Africa (n=898), compared to 0.1% or less elsewhere. This prevalence of duplications or deletions in certain regions could reflect the presence of founder lineages such as those represented by the duplications observed within haplogroups E1 in Africa and G in Asia, and the class II deletions within hg C3*(xC3a,C3c).
However, DYS448 rearrangements are not only over-represented in terms of the number of chromosomes, but also in the number of independent mutation events that underlie them. It is possible that some haplogroups carry a structural predisposition to, or protection against, rearrangements in the proximal AZFc region. To test this, we considered the population samples for which haplogroup frequency data were available (Supplementary Table S2; P.B., D.C.-S., E.J.P., unpublished observations), allowing comparisons of the frequencies of DYS448 deletion/duplication events in 2,995 chromosomes, belonging to 18 different haplogroups. A total of three haplogroups showed significant over- or underrepresentation of rearrangement events. We observe no DYS448 deletion/duplication events among 783 haplogroup R chromosomes, which is significantly fewer than expected by chance (Fisher's exact test: P=0.027), suggesting that this lineage could be structurally protected against them. The reference sequence, which is the only basis we currently have to propose putative mechanisms for deletions and duplications, belongs to this haplogroup—reinforcing the difficulty of interpreting structural variation on this most dynamic of human chromosomes. In contrast, 60 haplogroup G chromosomes include two rearrangement events (P=0.027) and 97 haplogroup C chromosomes include three events (P=0.007). These significant overrepresentations of DYS448 rearrangement events suggest that sequence structures in these lineages could predispose them to mutation. Associations to haplogroups remain significant after correction for multiple testing (n=3; threshold=0.0416) using the false discovery rate [Benjamini and Hochberg, 1995].
Using a combination of deletion mapping and haplotyping, we have ascertained and characterized nine independent deletion events that encompass the microsatellite DYS448, and extend into the proximal part of the AZFc region. The nine events correspond to six physically distinct deletion classes, attesting to the complexity of this region of the Y chromosome, and cautioning against simple interpretations of deletion data in terms of probable NAHR without careful mapping.
These rearrangements are by no means the only examples of polymorphic AZFc structures in apparently normal males. Previous work identified recurrent deletions and duplications involving the marker 50f2/C [Jobling et al., 1996], which lies within the u3 element, ~520 kb distal to DYS448 (Fig. 1a). Inclusion of 50f2/C in our deletion mapping allows clarification of the relationships between these sets of chromosomes. Some chromosomes lack both markers: the b1/b3 and non-b1/b3 class II DYS448 deletion chromosomes correspond to the previously defined 50f2/C deletion classes 2L (e.g., 448del1 [m38]) and 10L (448del6 [m252]). However, typing of DYS448 (data not shown) demonstrates that all other 50f2/C-deletion classes actually possess a DYS448 allele. In these cases loss of 50f2/C cannot be caused by b1/b3 deletion; most have been previously shown to carry different rearrangements, such as a g1/g3-mediated deletion following a b2/b3-mediated inversion [Fernandes et al., 2004].
Considering the smallest extents of all DYS448 deletion classes except class V, the repeated nature of the genes in the proximal AZFc region means that deletion results in a reduction in copy number, rather than a complete elimination, of any gene family. However, larger extents of three deletion classes, and any extent of the class V deletion, would remove both copies of the PRY gene. Reduction of gene copy number within this region, including the loss of two copies of DAZ, may predispose carriers to spermatogenic failure. Sperm count data are unavailable for the individuals we have studied, but for many cases we know whether or not they have fathered offspring (Table 1). This information shows that 448del1, who carries a b1/b3 deletion, and 448del4, who carries a class I deletion, are both fertile. The discovery of a set of chromosomes (class II) that share a deletion by descent and have a TMRCA of 2,900±766 years, shows that the deletion they carry is certainly compatible with male fertility, and this is supported by direct information demonstrating fertility in 12 out of 18 cases. Since this deletion class is relatively common among Central Asians, a directed study of any subtle effects on sperm count, and more detailed physical mapping, could be undertaken in a newly recruited cohort from this region. Class III, IV, and V deletions are present as singletons, and there is no information about fertility, so it remains possible that spermatogenesis is impaired in these cases. Nonetheless, there is no evidence from our study that deletions of parts of the proximal AZFc region are deleterious.
There is less reason to suppose that duplications might affect sperm count, unless gene dosage is critical, which seems improbable. Duplication chromosomes are underascertained, since their discovery relies on a length mutation having occurred at one copy of the duplicated microsatellite of interest; the probability of this depends on the mutation rate of the microsatellite, and the number of generations that the chromosome has persisted in the population [Bosch and Jobling, 2003]. Given the mutation rate of microsatellites (typically ~2×10−3 per locus per generation [Gusmão et al., 2005]), any duplication chromosome carrying two different-length alleles is likely to have been present in the population for a number of generations, and is therefore at least compatible with male fertility [Bosch and Jobling, 2003]. The four duplications within hg G, and the three within hg E1, are identical by descent, and provide further evidence that these rearrangements persist through the generations, and probably represent neutral copy-number variants. Direct information on fertility for the hg G duplication supports this.
Rearrangements such as those we describe here appear to owe their heterogeneous population frequencies to two major factors. First, having once occurred, a neutral rearrangement can be successfully propagated in a particular population due to its social organization. The overrepresentation of particular haplotype clusters in Asia is already well established: two distinct examples within hg C3—the “Manchu” [Xue et al., 2005] and “Khan” [Zerjal et al., 2003] haplotypes—have been ascribed to social selection in the history of dynastic lineages. Cultural transmission of fertility [Heyer et al., 2005] has ensured that these lineages persist at high frequencies today. Second, haplogroups in general are highly geographically differentiated, and some haplogroups appear to be either predisposed to, or protected against, rearrangements.
The advantage of using microsatellites to detect deletions and duplications is the very large number of chromosomes that can be surveyed, allowing the detection of rare rearrangements. As the number of markers [Kayser et al., 2004] and the size of databases [Willuweit et al., 2007] increases, it seems likely that most regions of the chromosome will effectively have been surveyed in most populations, allowing an unbiased assessment of structural polymorphism on this highly variable chromosome.
We thank all DNA donors, Sjoerd Repping for information about the phenotype of the b1/b3 deletion male WHT3453, and Matt Hurles for primers. M.A.J. is a Wellcome Trust Senior Fellow in Basic Biomedical Science (grant number 057559); P.B., G.R.B., and C.T.S. were supported by the Wellcome Trust. E.J.P. and D.R.C.S. were supported by the Arts and Humanities Research Council and the EC Sixth Framework Programme under Contract no. ERAS-CT-2003-980409, within the framework of the European Science Foundation EUROCORES programme “The Origin of Man, Language and Languages.” P.dK. was supported by the Netherlands Organization for Scientific Research (NWO) project 231-70-001 within the same EUROCORES programme. I.N. and M.S. were supported by the Max Planck Society.
Grant sponsor: Max Planck Society; Grant sponsor:WellcomeTrust; Grant number: 057559; Grant sponsor: Arts and Humanities Research Council and the EC Sixth Framework Programme; Grant number: ERAS-CT-2003-980409; Grant sponsor: Netherlands Organization for Scientific Research (NWO); Grant number: 231-70-001.
The Supplementary Material referred to in this article can be accessed at http://www.interscience.wiley.com/jpages/1059-7794/suppmat.