If we want to know whether or not a genetic variant influences a phenotype of interest, e.g. sperm count, a standard approach is to measure the frequency of the variant in samples of individuals who differ in the phenotype. If we found, say, 40% A allele in the men with high sperm count and 70% in the men with low sperm count, we might want to conclude that the A allele marked a genetic background that led to low sperm count. But we should be very cautious before coming to this conclusion: the two samples might differ for other reasons, for example, if they come from different geographical regions. This is known as ‘population stratification’. It is the important characteristic that makes the Y chromosome so popular for evolutionary studies, noted above, but it also makes association studies involving the Y chromosome fraught with difficulty. The magnitude of this effect is illustrated by a paper published in 1999 which investigated the association between Y haplogroup and infertility in Italy (
Previderéet al., 1999). In the raw data, haplogroup P was present at 42% in the controls but 24% in the infertile men, a statistically significant difference. But the infertile men were mostly sampled in Central Italy, whereas the controls were from several parts of Italy. When only Central Italians were considered, the frequencies were 27 and 26% respectively, a non-significant difference. This is a far greater degree of geographical differentiation than detected with ~10 000 autosomal SNPs (
Bauchet et al., 2007), rendering ineffective one of the recommended methods of correcting for stratification, genomic control. How then can the careful investigator of Y-chromosomal associations produce reliable results? Precise geographical matching is essential and replication in an independent sample is also necessary.
Many studies have sought to identify Y-chromosomal influences on spermatogenic failure, sperm count or male infertility. Some have taken the approach of comparing haplogroups between relevant samples and reported no effect (
Paracchini et al., 2002;
Lu et al., 2007) or significant differences (e.g.
Kuroki et al., 1999;
Krausz et al., 2001;
Yang et al., 2008), although such differences have not always been replicated (
Carvalho et al., 2003). Others have compared Y variants that alter the gene content, particularly partial deletions of the
AZFc region (
Repping et al., 2003 and many subsequent studies), a complex field that will be covered in another minireview in this series.
Another set of studies has investigated the effect of haplogroup background on microdeletion frequency. Here, the microdeletions themselves almost always lead to spermatogenic failure; the question is whether or not the mutations that produce such deletions occur at different rates on different lineage backgrounds. Studies using pooled samples from several European regions (
Paracchini et al., 2000;
Quintana-Murci et al., 2001) or Israel (
Carvalho et al., 2004) detected no effect; but a study that carefully matched controls and deletions from the same part of Italy (
Arredi et al., 2007) found an increased susceptibility to
AZFc microdeletion in one lineage, haplogroup E, while a study of samples from Sichuan (Southwest China) reported an increased frequency in O3* (
Yang et al., 2008). These investigations therefore meet one of the criteria for demonstrating an effect, precise geographical matching, but now need to be replicated in independent samples.