|Home | About | Journals | Submit | Contact Us | Français|
The rapid development of next-generation sequencing (NGS) technology has led to renewed interest in the potential contribution of rarer forms of genetic variation to complex, non-Mendelian phenotypes, such as psychiatric illnesses. Although challenging, family-based studies offer some advantages, especially in communities with large families and a limited number of founders. Here we revisit family-based studies of mental illnesses in traditional Amish and Mennonite communities -- known collectively as the Plain people. We discuss the new opportunities for NGS in these populations, with a particular emphasis on investigating psychiatric disorders. We also address some of the challenges facing NGS-based studies of complex phenotypes in founder populations.
The increasing power and decreasing cost of genome sequencing have fueled widespread interest in NGS technology to discover rare alleles that contribute to risks for human illnesses. NGS studies have already identified many rare variants involved in single-gene mendelian disorders 1, 2, 3 and 4 as well as alleles involved in rare and genetically-heterogeneous conditions such as brain malformations . The discovery of rare copy-number variants conferring high risk for disorders such as autism and schizophrenia  has also led to a growing recognition of the importance of rare forms of genetic variation in psychiatric illnesses. Variants with rare or moderate allele frequency may confer greater risk for disease than common variants because high-risk alleles that contribute to early-onset diseases may reduce reproductive fitness and be driven to low frequencies by purifying selection . High-risk alleles can be very informative because they implicate genes that may play a more important etiologic role than low-risk variants. Such genes provide excellent targets for functional genetic studies and new therapies. Although each rare high-risk allele can account for little heritability, sets of high-risk alleles could collectively account for substantial fractions of a common disease  and may act on a polygenic background to modulate clinical features.
As high-throughput sequencing rapidly advances, whole-genome sequencing (WGS) is growing in popularity relative to exome sequencing which captures only the transcribed ~1% of the genome. WGS requires no capture procedures and provides information on both transcribed and intergenic regions, where much regulatory variation resides. However, WGS is still two- to threefold more expensive than exome sequencing, and it requires more sophisticated quality control, data management, and bioinformatics. Much of the variation identified by WGS is difficult to interpret - but this is changing. For example, the Encyclopedia of DNA Elements (ENCODE) project has begun to shed some light on the functional consequences of genetic variation outside coding regions .
Extended families are a valuable asset for genome sequencing studies. Because the same chromosomal segments recur among relatives, otherwise rare alleles can be observed repeatedly in multiple individuals, reducing false positive findings due to sequencing errors that can be difficult to identify in singletons. Statistical imputation can be used to infer sequence information for relatives who are not sequenced but genotyped with inexpensive single-nucleotide polymorphism (SNP) arrays , reducing sequencing costs. Families may also provide a genetically more homogeneous set of individuals from whom shared variants conferring risk for disease may be identified . This approach (Figure 1) complements the traditional case control design that is more often used in sequencing studies and has been so valuable for genome-wide association studies. Extended families that are geographically stable are becoming less common in developed countries, prompting a renewed interest in traditional societies that still produce large families who live together. These isolated founder populations provide a unique opportunity to discover important alleles involved in genetically complex phenotypes such as mental illnesses.
The Plain people of North America are one such population. The Plain people collectively encompass several groups descended from the Alsace region, where Switzerland, France, and Germany converge. Persecuted for their rejection of Catholicism and the Protestant Reformation, they migrated to North American, starting in the early 1700s. A relatively small group of founders settled initially in Pennsylvania, where they enjoyed religious tolerance and liberty, and have now spread all over the continent. Endogamy is the rule, and very few people join the community from the outside, leading to high rates of consanguinity and an increased incidence of several autosomal recessive disorders .
The largest and best-known Plain people are the Old Order Amish and ‘Horse-and-Bugg’ Mennonites. They typically reject automobiles, public utilities, and all forms of electronic entertainment in favor of a low-technology agrarian lifestyle focused on family and community. The largest Plain populations now live in Holmes County (Ohio), Lancaster County (Pennsylvania), and northern Indiana (Figure 2). With large families, Plain populations are growing and spreading across North America. Amish communities are organized into church districts comprising around 20–30 families. Marriage within the church district is common. Baptism occurs in early adulthood and is voluntary, but ~90% of Amish young people stay in the community of their birth. Strong community ties make possible a system of mutual support and assistance in times of need 12, 13 and 14, reducing stress and social isolation, but also narrowing the range of acceptable expressions of individuality. This creates a unique backdrop in which to study psychiatric disorders and their causes.
In 1964, Victor McKusick (Figure 3) and colleagues published a now-famous paper on medical genetics in the Amish . They noted several reasons why the Old Order Amish would be a ‘useful’ population for genetic studies:
Owing in part to these advantages and in part to Dr McKusick’s personal relationship with the Amish, a large list of mendelian conditions have been described and many have been solved genetically. A quick search of Online Mendelian Inheritance in Man reveals over 50 examples. These include diseases rarely seen outside the Amish community, such as cartilage-hair hypoplasia (#250250), Ellis Van Creveld syndrome (#225500), and several inborn errors of metabolism, as well as more widespread diseases such as ataxia-telangiectasia (#208900), progressive familial intrahepatic cholestasis (#211600), and several types of muscular dystrophy. This work continues at centers such as The Clinic for Special Children in Strasburg, PA, where treatments for some of these inherited disorders are beginning to be developed 16, 17 and 18.
Progress in the study of non-mendelian disorders has been more limited, mirroring the slower progress in other populations for these more challenging disorders. One clear success is the identification of a common allele in the gene encoding a cytochrome P450 enzyme (CYP2C19) that has an impact upon response to anti-platelet therapy in Amish patients with heart disease . Importantly, this gene may also have relevance for anti-platelet therapy in other populations.
The culture and lifestyle of the Amish and other Plain peoples can also create difficulties for research. Most Old Order Amish communities eschew electronic forms of communication, requiring researchers to find and evaluate potential volunteers in the field. The total Amish population is about 275 000, which limits the number of potential volunteers, even with relatively efficient case-finding methods. German and Swiss dialects are spoken at home (although English is taught at school) and formal schooling usually ends after the 8th grade. This creates educational and language barriers that can complicate informed consent, volunteer enrollment, and evaluation, especially for psychiatric disorders. ‘Horse-and-Buggy’ and other Old Order Mennonites are less genetically homogeneous and more mobile than the Amish, but they share a common ancestry and can be easier to reach and diagnose, adding to the pool of potential study participants. Other Plain populations, such as the Hutterites, are small and typically live in remote areas, limiting access.
The history of Amish and Mennonite settlements in the US is well known. Extensive genealogical records have been kept by the Plain people , and an electronic genealogical database is curated at the NIH . However, few studies have examined the precise genetic relationships between the Amish in Lancaster County and those who arrived later and settled in the mid-West or elsewhere in North America. One thing is clear, however: each of the Plain populations is based on a small number of founding couples, who themselves represented a fraction of the ancestral population in Europe.
These facts account for two of the major population genetic forces that have shaped the Plain populations: founder effects and genetic drift. The founder effect is the concentration of unique, private, or otherwise low-frequency alleles in isolated populations whose ancestry goes back to a few founding individuals, who presumably carried those same alleles. Founder effects and endogamy explain much of the increased autosomal recessive disease burden long observed among the Plain people. Genetic drift refers to the random fluctuations in allele frequencies that occur in relatively small, isolated populations. Genetic drift alone can create large differences in allele frequencies, complicating the identification of disease alleles in founder populations. Footprints of these same population dynamics probably exist in everyone , albeit obscured by mixing between populations.
One unique advantage of inbred populations for disease gene mapping is the phenomenon of autozygosity. Inbreeding can increase the incidence of rare recessive disorders. Each isolated population has its own genetic load that depends on which alleles were present in the founders or arose through new mutations. Rare alleles that act recessively to influence disease risk tend to reside within longer stretches of homozygosity that can be easily mapped using standard SNP arrays. This strategy has been used successfully to identify causal alleles for several autosomal recessive disorders prevalent in Plain populations . Although common psychiatric disorders do not exhibit obvious recessive inheritance, recessive alleles may play a role as components of the risk. For example, excess runs of homozygosity have been reported in the mentally-ill offspring of first-cousins in Japan .
The Amish and other Plain people are well suited for research approaches based on large kindreds. Large families increase the opportunities for ascertaining extended relatives with the same disorder. Because most Plain people marry within their community, even individuals belonging to different nuclear families are often related, forming an extended kindred ideal for genetic studies. Furthermore, because the Plain communities were each founded by relatively few original settlers, genetic diversity is reduced, which may mean smaller sets of risk alleles involved in common disorders. Individuals also tend to stay in the area where they were born, facilitating longitudinal and offspring studies 25 and 26.
Although their large sizes offer many advantages for genetic research, Plain kindreds also tend to be very complex. Individuals are usually related in several ways, creating so-called ‘inbreeding loops’ that complicate traditional segregation and linkage analyses. Dense SNP arrays enable alternative approaches that rely on detecting long chromosomal segments inherited from a common ancestor. These identity-by-descent (IBD) methods are discussed next.
One of the big challenges for studies based on large-scale sequencing is the enormous number of variants that are present. Each individual is likely to carry many thousands of variants. Of these, some 10 000 are predicted to be potentially functional based on bioinformatic analysis. Because common diseases are thought to be genetically heterogeneous, many genes are potentially involved, and narrowing down the list of candidate variants to a few ‘causal’ mutations is challenging.
Under many genetic models, causal alleles can be expected to reside within regions of IBD shared by affected relatives. This is because mutations are passed on from previous generations together with a flanking stretch of DNA that gradually diminishes from generation to generation due to recombination. IBD regions represent chromosomal segments that are inherited without recombination from a common ancestor, and therefore all of the variants within IBD regions are expected to be in strong linkage disequilibrium. In distant relatives, IBD regions constitute a small portion of the genome (Figure 4), and can thus greatly reduce the search space for causal alleles . This approach has been used to map both monogenic and complex disorders in the Plain people  and in other founder populations 28 and 29. In complex disorders, not all distantly related cases may share the same causal alleles, but regions of IBD that are shared more often by pairs of distantly related cases are good places to look for causal alleles. Several studies have shown that IBD analysis might be sufficiently robust to detect loci involved in genetically heterogeneous traits where standard genetic linkage analysis has been impossible or ambiguous 30 and 31.
Statistical approaches to identifying IBD are rapidly evolving ( for recent review). It is generally argued in the literature that IBD detection is robust to inbreeding and actually benefits from the long stretches of IBD observed in founder populations . Statistical issues also arise with regard to testing for association of rare variants with phenotypes in families. Traditional family-based association tests are limited by family size, and typically assume that all affected relatives share the same causal allele. Newer methods are under development that can handle more extended pedigree structures  and collapse variants in the same gene, thus enabling burden tests that are more powerful for rare allele studies. Such methods generally require sequencing of both affected and unaffected individuals.
Several important questions require further study. How closely related should people be when considering IBD sharing? Are close relatives better than distant relatives in some situations? What about clusters of more closely related individuals who share both genes and environment? These very interesting and important questions currently lack a precise answer. In our preliminary data we have found that IBD sharing among the Amish can be reliably measured with SNP arrays in relatives as distant as third cousins, after which it becomes difficult to distinguish from noise (Hou, L. et al., unpublished).
NGS can help fine-map regions of IBD sharing in several ways. First, because it gives complete information on variable sites, NGS can better define the boundaries of shared segments. Second, because it detects alternative alleles that can be annotated for function, NGS helps narrow the focus to the variants most likely to play a causal role in the disease. Third, by allowing nucleotide-level comparisons between segments shared IBD by distant relatives, NGS can help to identify variants that have arisen de novo in recent generations and may contribute to disease penetrance in individuals who carry risk haplotypes (similar to the ‘Clan Genomics’ model discussed in ).
We have thus far enumerated several advantages of applying NGS methods to Plain populations, including (i) a homogenous environment, which increases the proportion of phenotypic variance due to genes, (ii) long stretches of linkage disequilibrium, which facilitates the detection of chromosomal segments shared IBD, (iii) enrichment for recessive alleles, and (iv) presumably reduced heterogeneity of causative alleles owing to bottlenecks and founder effects. Although these characteristics make the Plain people a useful population for many complex human diseases, we argue that there are additional features that make this population promising for studying mental illnesses.
For example, the generally stable and predictable rural lifestyle, together with low rates of substance abuse, reduces non-inherited influences on thoughts and behavior that can complicate diagnosis. Family informants are abundant, available, and cooperative, often providing crucial diagnostic information. Inpatient care is usually provided by a relatively small network of psychiatric hospitals, facilitating focused ascertainment. Furthermore, because of the close-knit community structure, aberrant behavior rarely goes undetected.
There is a long history of bipolar disorder studies among the Amish in Lancaster County, PA 37 and 38. The genetic component of these studies generally used traditional linkage methods 39, 40 and 41. Although important in their time, these studies encountered many of same problems as other linkage studies of complex phenotypes, such as small sample size, low marker density across the genome, and lack of mature statistical tools to handle linkage analysis in pedigrees with very complex structures. Some intriguing recent studies have taken fresh approaches. For example, copy-number variants that may have contributed to illness in a large multiplex Old Order Amish pedigree were recently examined . Another study examined circadian rhythms - known to be disturbed in individuals with bipolar disorder - in cultured cells derived from Amish cases and matched controls . Sample size remains a limiting factor in all of these studies, however, and replication will be critical.
The presentation of major psychiatric disorders among the Amish seems to be generally similar to that in other North American populations. The published literature is dominated by Egeland’s pioneering work on mood disorders among the Old Order Amish in Lancaster County in the late 1900’s (Figure 5). Despite an over-representation of males among identified cases (bipolar disorder usually shows no sex differences in prevalence), these studies found that clinical symptoms, age at onset, and illness course were all broadly consistent with mood disorders within non-Amish populations. Our preliminary observations within the mid-Western Amish in Ohio and Indiana generally agree with this literature, although we observe lower rates of psychosis and comorbid substance-use disorders among the Amish than in other populations (Kassem, L. et al., unpublished data).
Studies of psychiatric disorders among the Plain people also face several challenges. Large epidemiologic studies of psychiatric disorders have not, to our knowledge, been performed in this population, and therefore the true prevalence and geographic distribution of disorders are unknown. Diagnostic instruments and criteria have generally been validated in people drawn from modern industrialized societies, where thoughts and emotions are often expressed in different ways. Intensive social supports among the Plain people may ameliorate impairment and disability, but also obscure typical yardsticks of impairment and incapacity. For example, bipolar disorder (especially the episodes of elevated mood and overactivity known as mania), can be more difficult to diagnose where access to credit cards, alcohol, and automobiles is limited, protecting manic patients from some of the most serious consequences of their behavior.
The Amish and other Plain people have long contributed to our understanding of rare genetic disorders. With the advent of NGS we are now able to test the proposition that those populations may also have much to teach us about common non-mendelian disorders, and in particular psychiatric disorders. Although too few suitable cases may exist for powerful studies of very rare risk alleles, a problem which applies mainly to uncommon illnesses, this difficulty should fade over time as the Plain population continues to expand rapidly, and new cases arise in each generation. Other challenges include causal alleles that may be too rare or psychiatric disorders that are too heterogeneous to enable true discoveries in existing sets of cases. The best way to approach this problem is to increase sample size. The relative homogeneity of Plain populations facilitates increasing sample size without a proportional increase in heterogeneity, but genetic differences between geographically separated populations need to be considered . In addition, not all distant relatives may share causal alleles. In that case, researchers can return to the original probands and collect more closely related cases, or adapt one of the collapsing methods that consider the burden of deleterious alleles within a gene or group of genes , with appropriate corrections for relatedness. Another potential pitfall is that candidate variants may be unique to the Plain population and may not be seen again in outbred case control samples. In that situation, it may still be valuable to resequence in other people the same genes that contain rare variants associated with disease in the Plain population . Even though the same rare variants will probably not be present in outbred populations, the variants identified in the Plain populations may mark a gene involved in the disease. If so, then study of that gene in outbred populations with the same disease should uncover additional causative variants. Finally, the list of candidate variants may be too long, even after filtering and bioinformatic annotation. This can be addressed by increasing the stringency of bioinformatics filters, considering convergent data from other kinds of studies (e.g., differential gene expression), and expanding the sample to include more distantly related individuals. Of course, high-risk alleles for common psychiatric disorders may simply not exist. In that case, the data collected from the Plain population will still be valuable for studying the phenotypic effects of common low-risk alleles in extended kindreds.
Most of these pitfalls apply equally to any population in which common complex diseases are studied. The unique history, genetic structure, and relative accessibility of the Plain people for medical research present grounds for optimism that significant progress can be made. Psychiatric genetic studies may particularly benefit from the homogenous environment, long stretches of IBD, enrichment for recessive alleles, reduced genetic heterogeneity, close-knit families, and lower prevalence of substance abuse among the Plain people. Innovation in human genetics is often driven by new technologies, such as NGS. The application of these new technologies to well-studied populations with a record of supporting progress in research may be the next innovative step that advances this challenging field.
Supported by the NIMH Intramural Research Program. We thank Daniel Weaver (Holmes County, OH) and Dr Richard Stevick (Messiah College, Lebanon, PA) for critical review of the manuscript.