Isolated populations have a long history in genetic mapping studies of inherited disorders with advantages including reduced environmental, phenotypic and genotypic heterogeneity when compared with outbred populations (
1–
9). In particular, the reduction in genotypic heterogeneity observed in isolated populations due to founder effects, bottlenecks and genetic drift, may allow otherwise rare mutant alleles to rise to a higher frequency in these populations while at the same time narrowing the spectrum of candidate mutations (
10,
11). The increased chance for homozygosity has been a key factor in identifying mutations responsible for rare monogenic diseases in isolated populations (
3,
12,
13). Here we investigate the advantages and limitations of exploiting homozygosity in isolated populations for the analysis of alleles effecting complex traits (
14–
16). Genome-wide association studies (GWAS) performed in outbred populations have thus far identified many common variation contributing to complex diseases (
17). One limitation in these studies is that rare variants that are not in linkage disequilibrium with the common variants assayed are usually not detected (
18). However, when the populations analyzed in a GWAS have a substantial number of individuals that share a recent common ancestor, not only common variants (
19,
20), but sometimes also rare variants affecting complex disorders can be identified (
21–
23).
Genetic analysis of isolated populations pose unique challenges for traditional GWAS methods that have been mainly focused on outbred populations (
24). The crux of the difference is that in isolated populations the likelihood of any two individuals in the population to be related is not negligible. The resulting direct and cryptic relatedness confounds assumptions of independence between genotypes of different individuals, as well as between their heritable phenotypes. Isolated populations therefore contain a large amount of cross-individual correlations, which pose problems for most mapping methods. Further, consanguinity may be present so that two alleles in a random individual may also be correlated. The hidden correlation in the data can cause an overdispersion of the naïve test scores for association and consequently, false positive associations. Hence, the major challenge for performing GWAS in isolated populations is to account for this non-random intra- and inter-individual correlation.
Although standard association tests assume independence of genotypes and phenotypes across samples, several specialized approaches for association do account for underlying relatedness expected in isolated populations. In this work, we set out to select and evaluate different methods to map complex traits in an isolated, founder population. Many methods exist that rely on knowledge of the underlying family structure in the population. These approaches overcome the confounding effects of non-random correlation via deconstruction of the population into family units and the analysis of association independently within each unit (within family variance) (
25–
27). Some of these ‘family-based’ methods can be extended by adding the variance between families to the within family variance (
28–
32). A different type of ‘population-based’ approach does not utilize prior knowledge of family structure, but rather explicitly models the relatedness between all pairs of individuals based on their genotypes, and incorporates this variance into a mixed model for association (
30,
33–
38). Such models have recently been extended to genome-scale human studies and have been shown to effectively control for population structure (
37,
38). Here our emphasis is on the evaluation of such methods in the context of extensive relatedness in study samples. We compared the performance of four representative methods that account differently for relatedness while testing for genome-wide association: (i) focusing on allele transmission to offspring within families (
25) (ii) measuring association within as well as between families (
28,
29,
32) and finally (iii) capturing the relatedness between all individuals in the population to construct a mixed model to test for association (
37). Our simulations showed the mixed-model method to have increased statistical power to detect association, offering a 1.8–2-fold improvement over the family-based approaches.
As a proof of principle, we then used the mixed-model method for genome scans relating to metabolic traits and electrocardiographic measures in 2906 related individuals from the Island of Kosrae, Federated States of Micronesia, who were previously genotyped for over 350 000 SNPs. We reanalyzed data for 17 phenotypes previously studied in this cohort (
19,
39), along with 8 additional phenotypes. As positive controls, we observe nine genome-wide significant associations with known loci of measured levels of plasma cholesterol, high-density lipoprotein (HDL), low-density lipoprotein (LDL), triglycerides (TGs), thyroid stimulating hormone (TSH), homocysteine (HOMO), C-reactive protein (CRP) and uric acid, with only one detected in previous analysis of the same traits (
19). In addition, we refined a broad signal peak for uric acid levels (chr11:59.1–65.3 Mb) by analysis of identity-by-descent (IBD) shared genetic segments that underlie the peak and dissected the full set of long-range shared haplotypes in this region. We identified a single 3% carrier frequency haplotype that accounted for all the signal in the region, and replicated one of the two previously known signals, refining that signal by a factor of 4. Finally, we show a region of novel association for height (HGT) (rs17629022,
P< 2.1 × 10
−8).