|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: HL DZ. Performed the experiments: AH EP. Analyzed the data: ML DZ TW. Wrote the paper: HL DZ ML.
Stochastic processes and imprinting, along with genetic factors, lead to monoallelic or allele-biased gene expression. Stochastic monoallelic expression fine-tunes information processing in immune cells and the olfactory system, and imprinting plays an important role in development. Recent studies suggest that both stochastic events and imprinting may be more widespread than previously considered. We are interested in allele-biased gene expression occurring in the brain because parent-of-origin effects suggestive of imprinting appear to play a role in the transmission of schizophrenia (SZ) and autism spectrum disorders (ASD) in some families. In addition, allele-biased expression could help explain monozygotic (MZ) twin discordance and reduced penetrance. The ability to study allele-biased expression in human neurons has been transformed with the advent of induced pluripotent stem cell (iPSC) technology and next generation sequencing. Using transcriptome sequencing (RNA-Seq) we identified 801 genes in differentiating neurons that were expressed in an allele-biased manner. These included a number of putative SZ and ASD candidates, such as A2BP1 (RBFOX1), ERBB4, NLGN4X, NRG1, NRG3, NRXN1, and NLGN1. Overall, there was a modest enrichment for SZ and ASD candidate genes among those that showed evidence for allele-biased expression (chi-square, p=0.02). In addition to helping explain MZ twin discordance and reduced penetrance, the capacity to group many candidate genes affecting a variety of molecular and cellular pathways under a common regulatory process – allele-biased expression – could have therapeutic implications.
Stochastic and imprinted monoallelically expressed genes influence differentiation, development and cellular function. Imprinted genes are expressed in a parent-of-origin manner, whereas in stochastic monoallelic expression either the maternal or paternal allele is active in a given cell. Imprinted genes are marked during gametogenesis by differential methylation at CpG-rich islands and by chromatin modifications, and typically maintain their transcription-competent state after fertilization –. Some genes, however, acquire parent-of-origin imprints post-fertilization , . Approximately 100 imprinted genes have been identified . Whole genome expression studies expanded the family of parent-of-origin gene expression in the brain, although a subsequent re-analysis has shown that many imprinted gene calls were likely false positive findings and needed independent confirmation –. Imprinting plays a key role in some neuropsychiatric conditions, such as Prader-Willi Syndrome and Angelman Syndrome . Parent-of-origin effects have also been observed in some families with schizophrenia (SZ), bipolar disorder (BD) and autism spectrum disorders (ASD) –. In addition, some investigators have suggested that genetic imprinting influences clinical phenotype, with an imbalance between the effects of paternally and maternally expressed genes in the developing brain resulting in an extreme paternal (ASD) or maternal (SZ) pattern of behavior , , .
Stochastic monoallelic expression is widespread in mammalian genomes. Examples include T-cell receptor and immunoglobulin genes, pheromone receptors, p120 catenin, odorant receptors, and the 5q31-linked PCDH family of protocadherins –. Recently, Gimelbrant et al. showed that random monoallelic expression affected nearly 10% of genes expressed in lymphoblasts, and similar to some imprinted genes, there was a degree of plasticity in that biallelic expression was observed in some clones .
Stochastic monoallelic expression in the brain could help explain some interesting epidemiological features of neuropsychiatric disorders, such as discordance in monozygotic (MZ) twins, where a range of ~30–90% has been found in SZ, ASD and BD (see discussion) –.
Two experimental tools have emerged that provide the means to evaluate the role of allele-biased expression in neuronal differentiation and neuropsychiatric disorders; iPSC technology, and next generation sequencing (RNA-Seq). We, along with other groups, are using iPSCs for in vitro disease modeling in a variety of neuropsychiatric disorders –. In addition to their utility for disease modeling in terms of identifying patient vs control differences in gene expression, morphology and neuronal function, iPSCs can also be used to study human neurogenesis, which is particularly relevant to SZ and ASD considering that both have a neurodevelopmental basis –.
RNA-Seq provides increased sensitivity and the capacity to detect novel transcripts –. It is also an ideal platform to assess allele-biased expression by quantifying differences in expression that might occur across heterozygous single nucleotide polymorphisms (SNPs) . We have used this approach to screen for allele-biased expression in differentiating human neurons. The findings highlight the degree to which allele-biased expression occurs during human neurogenesis and suggest a plausible mechanism to explain incomplete penetrance and MZ twin discordance in SZ, ASD, and other neuropsychiatric disorders.
iPSCs were grown and induced to differentiate into neurons using two different protocols: A and B, which are described in detail in Methods S1. Briefly, in protocol A, iPSCs were maintained on irradiated mouse embryo fibroblasts supplemented with FGF2 (10 ng/ml). Colonies were subsequently detached and grown as embryoid bodies (EBs) on non-adherent plates in the absence of FGF2. After 4 days, EBs were plated on laminin, which resulted in the development of clusters of neurons . These were manually dissected after 10 days. The neurons derived from this protocol are primarily glutatmatergic (~90%) . In protocol B, iPSCs were maintained on matrigel and mTeSR1® (StemCell Technologies, Vancouver, Canada). EBs were formed and neural rosettes were cultivated using standard techniques , . Neurons emerged after rosettes were isolated and grown on Poly-dL-Ornithine/laminin coated plates. Neurons (generally an equal mix of glutamatergic and GABAergic neuorns) were harvested after 14 days.
RNA-Seq was carried out using line iPSC-1, which was derived from a control female. Validation by Sanger sequencing was carried out on iPSC-2 (a control male), and 3 lines derived from male subjects with SZ (SZ39, SZ97 and iPSC-15, the latter of which has a 22q11.2 deletion).
Paired-end RNA-Seq was carried out using an Illumina HiSeq 2000 instrument, as described in Methods S1 and in our previously published work (GEO accession number GSE32625) . In the current study, we reprocessed the previously reported RNA-Seq data to identify allele-biased gene expression . The read length was 104 bases for each of the paired-end reads. A summary of the data relevant to this study is shown in Table 1. Genomic DNA was analyzed using the Affymetrix Genome-Wide Human 6.0 array and genotypes were scored using the Affymetrix Power Tools (APT) software based on the Birdseed calling algorithm. A total of 905,625 SNPs were called (call rate was 98.98%), and the average rate of heterozygosity and homozygosity was 28.04% and 70.94%, respectively (253,585 heterozygous SNPs were identified). Parental DNA was also genotyped using the Affymetrix Genome-Wide Human 6.0 array (paternal call rate, 98.44%; maternal call rate 98.34%). To reduce the known bias in mapping RNA-Seq reads towards reference alleles, we aligned the raw RNA-Seq reads to a modified reference human genome (hg19) by the software bowtie 2 in which bases at the identified heterozygous sites were replaced with “N” –. Reads mapped to heterozygous sites were then analyzed. To reduce false positives, we focused on sites that were covered by at least 10 high-quality reads and excluded bases with a Phred score lower than 13. We used samtools/bcftools (mpileup command) to record high-quality reads mapped across each heterozygous SNP site. We then called the consensus base(s) (by maximizing the posterior probability given the read distribution and average base quality), and calculated the phred-scaled probability of the base being called a homozygote or heterozygote . Subsequently, the numbers of reads for the reference and alternative alleles were counted and used for a binomial test to determine if the ratio of the two numbers significantly deviated from 0.5, reflecting the null hypothesis that both alleles were equally expressed. The resulting p-values were further adjusted by multiple-testing correction using the B-H method , . Sites with an adjusted p-value (i.e., FDR) <0.05 and a phred-scaled probability ≥20 were considered to be expressed in an allele-biased manner. These analyses were carried out for iPSCs and differentiated neurons separately; the resulting allele-biased expressed SNP sites are listed in Table S1. Genes with allele-biased expression were those containing SNPs shown in Table S1 that mapped between transcription start and termination sites, having an expression value >1 FPKM.
RNA-Seq reads for some genes showed a preference for one allele in iPSCs, with a switch to the other in differentiating neurons. Heterozygous SNPs showing >4-fold changes in allele preference during differentiation were tabulated (Table 2).
We also assessed potential sequencing chemistry and mapping bias for allele-biased SNP calls. There was no preference towards any one of the four nucleotides (Figure S1).
Selected monoallelic SNPs were validated by capillary sequencing using RT-PCR amplified material (see Methods S1).
RNA-Seq was carried out on iPSCs and differentiating neurons harvested from iPSC-1 using procedure A, which leads primarily to the production of glutamatergic neurons . Of the 253,585 heterozygous SNPs subjected to analysis for allele-biased expression, a total of 1,909 satisfied the criteria described in the methods section (Table S1). These mapped to a total of 314 genes in iPSCs and 801 in differentiating neurons (Tables S2 and S3, respectively). These included 181 neuronal genes that contain two or more highly significant allele-biased SNPs, of which several are candidate genes for SZ, ASD, and intellectual disability (see below). Based on genotyping data obtained from the parents of iPSC-1, the ratio of paternal to maternal allele-biased SNPs is 1.21 in iPSCs and 1.11 in neurons, excluding the X-chromosome, indicating that there was no significant parent-of-origin bias.
Overall, 24.5% of all genes expressed in an allele-biased manner in iPSCs and 12.4% in neurons were X-linked. The abundance of X-linked genes is consistent with the finding that human iPSCs often maintain a clonal X-chromosome active state . However, in other studies the pattern of X-chromosome inactivation has been found to be heterogeneous , . All of the X-linked monoallelic SNPs were maternal, with the exception of XIST, which is expressed on the inactive X-chromosome (paternal in this sample), and WASF4P and KIF4, suggesting that the strict criteria used for our definition of allele-biased expressed SNPs resulted in few false positives.
Technical factors can inflate the false positive rate when RNA-Seq data are used to evaluate allele-biased expression; consequently, we validated several genes using Sanger dideoxy sequencing . We also analyzed iPSCs and neurons that were maintained and induced to differentiate using a different protocol to determine whether culture conditions might influence allele-biased expression (see Methods S1). Finally, when SNPs were informative, we validated allele-biased expression in other iPSC lines.
First, we analyzed four X-linked genes (GABRA3, TSPAN6, NXT2, and KAL1). Heterozygosity for each was confirmed by sequencing gDNA (Figure 1). However, only one peak was detected when cDNA was sequenced from template RNA derived from iPSCs and neurons derived from the line iPSC-1 grown using protocol A, which was used in the RNA-Seq experiment. An exception is KAL1 in neurons, which showed allele-bias rather than complete monoallelic expression.
Each of the X-linked validated SNPs met the criteria described in the methods section except for rs6627588 in GABRA3. The expression of only a single allele shows that our conservative method for determining allele-biased expression is leading to some degree of type II error.
Interestingly, a somewhat different pattern was observed using iPSCs and neurons grown under condition B. Here there is also clear evidence for monoallelic expression in GABRA3 and TSPAN6 in both iPSCs and neurons, and in iPSCs for KAL1. However, biallelic expression is seen for KAL1 in neurons and for NXT2 in iPSCs and neurons, suggesting that X-activation/inactivation is somewhat variable under different culture/differentiation conditions.
X-linked markers could not be verified in the other iPSC lines since they were each derived from male subjects.
We also analyzed CTNNA3, which has been shown to be imprinted in placenta, and KCNQ1, a known imprinted gene , , . KCNQ1 was not included in the list shown in Tables S2 and S3 because the minimum FPKM >1 threshold was not achieved for this gene. Nevertheless, Sanger sequencing clearly shows allele-biased expression for both genes (Figure 2). Thus, similar to the expression pattern for GABRA3, this further illustrates that our allele-biased algorithm is associated with some type II error.
Two informative SNPs were analyzed in CTNNA3, rs1925621 and rs1925622. As seen in Figure 2, a single allele at rs1925621 is expressed in undifferentiated iPSCs and neurons derived from iPSC-1 grown using protocol A. However, when the cells were cultivated under protocol B, expression was essentially biallelic for both SNPs in iPSC-1. Interestingly, when a biological replicate of iPSC-1 was analyzed (iPSC-1 (II)), the other parental allele was activated in neurons grown using protocol A. In addition, in in the replicate sample, allele-bias was also seen in neurons grown using protocol B. Similarly, when these SNPs were analyzed in another iPSC line (iPSC-2, which was derived from another control subject), allele-biased expression was seen in iPSCs and neurons cultivated under both protocols.
The same results were obtained when rs1925622 was analyzed.
These data confirm allele-biased expression for CTNNA3 under some culture and differentiation conditions, and suggest that the phenomenon is random in this system and not imprinted.
Considering the parent-of-origin effects seen in some families and the possibility that allele-biased expression could help explain MZ twin discordance and reduced penetrance, we were interested in specifically validating SZ and ASD candidate genes that showed evidence for allelic imbalance in the RNA-Seq experiment. Overall, there were 48 SZ candidate genes and 26 ASD candidates expressed in an allele-biased manner in neurons, while in iPSCs, there were 17 and 9, respectively (downloaded from Allen Brain Atlas list of 883 SZ and 244 ASD candidate genes) (Table S4). A Chi-square test was used to determine whether a statistically significant enrichment of SZ/ASD genes is found in the allele-biased gene sets. More specifically, a 2×2 table was set up to compare the allelic expression status (biased vs non-biased) of SZ/ASD candidates vs. all others; genes in both groups were selected for containing at least one SNP covered by >/=10 reads. A significant enrichment was found for SZ/ASD candidate genes expressed in neurons (p=0.02), while no significant enrichment was observed in iPSCs (p=0.14).
The SZ and ASD genes that showed the most robust evidence for allele-biased expression, based on containing more than one significant SNP, were RBFOX1 (A2BP1) (15 SNPs), NLGN4X (14 SNPs), NRG3 (10 SNPs), NRG1 (9 SNPs), CASK (7 SNPs), and ERBB4, CNTNAP2, NLGN1, and NPAS3, each of which had 3 SNPs. Other genes of interest, such as NRXN1 and DISC1, were only represented by one SNP.
We validated allele-biased expression in five SZ and ASP candidate genes, including A2BP1 (Figure 3). RNA-Seq data showed that in iPSC-1, one parental allele was expressed at this locus in undifferentiated iPSCs (T allele), and the other (C allele) in neurons (Table S1; compare columns H, I, N and O for rs17139246). This was one of 42 genes that showed evidence for a switch from one parental allele to the other during the transition from iPSCs to neurons (Table 2). The switch in A2BP1 alleles was confirmed by Sanger sequencing (Figure 3; undifferentiated iPSCs and neurons grown under protocol A, which was produced from the same sample used for RNA-Seq). However, when grown under protocol B, expression is essentially biallelic (there is a hint of allele-biased expression, but this should be viewed cautiously because capillary sequencing is only semi-quantitative).
Switching from one allele to the other could either reflect random activation of parental alleles occurring independently in iPSCs and neurons, or programmed expression of one parental allele in iPSCs and the other in neurons. Random expression is supported in the analysis of a biological replicate; iPSC-1 (II). As seen in Figure 3, the expression pattern was different compared to the original sample; some degree of allele-biased expression seen only for the “C” allele in iPSCs and neurons grown under protocol A suggesting that allele-biased expression in A2BP1 during early neurogenesis is random. Whether other genes shown in Table 2 switch allelic preference during differentiation by the same random mechanism is under investigation.
Further validation of allele-biased expression in A2BP1 is seen in three other iPSC lines, each of which is heterozygous at rs17139246. Allelic bias and a switch were clearly seen in SZ39 with protocol A. With protocol B however the predominate allele was C, as opposed to T in neurons grown under protocol A. These findings support the idea that allele-biased expression of A2BP1 is a stochastic process.
For the other two lines, iPSC-15 showed allele-biased expression protocol A neurons, but not in undifferentiated cells, while in SZ97, expression was essentially biallelic.
Validation of NRG1 and ERBB4 is shown in Figure 4. For NRG1, two SNPs were analyzed - rs4602844 and rs1481757, which map near the promoter of the NRG1 isoform HRG-β1C and an intron in the GGF2/HRG-β1D isoforms, respectively. Sanger sequencing confirmed the RNA-Seq allele-biased calls in iPSC-1 neurons for rs4602844 (expression was too low in undifferentiated cells for reliable sequencing). In addition, we were able to confirm allele-biased expression for this SNP in the replicate sample, iPSC-1 (II), and in iPSC-15. With protocol B, a different pattern emerged. Neurons derived from both iPSC-1and iPSC-15 showed biallelic expression, but in iPSC-1 (II), allele bias was clearly seen. For rs1481757, allele-biased expression was seen in iPSC-1 but not in iPSC-1 (II), for both protocol A and B neurons. For SZ39, there was a suggestion of allele-biased expression only in protocol A.
These findings suggest that isoforms within the same gene may be subject to different regulatory signals with respect to allele-biased expression.
For rs10175279 (ERBB4) allele-biased expression of the C allele was seen in iPSC-1 in protocol A. However, in iPSC-1 (II), the other parental allele (T) is expressed. Furthermore, while biallelic expression was detected in iPSC-1 in protocol B, expression was biased towards the T allele in the replicate. These findings suggest that similar to A2BP1, allele-biased expression in ERBB4 is random.
Finally, Sanger sequencing was also carried out for SNPs in NRG3 and AUTS2. This analysis essentially showed a similar mixed pattern of allele-biased or biallelic expression for neurons that varied according to growth and differentiation protocol (Figure S2).
To summarize, the validation of X-linked, imprinted, and SZ and ASD candidate genes confirms the monoallelic SNP calls made in the RNA-Seq analysis and demonstrates that the phenomenon is not restricted to a single iPSC line. In addition, allele-biased expression is random for some genes, and can vary depending on culture conditions. These findings are similar to the change that occurs in NANOG mRNA (from a monoallelic to biallelic pattern of expression) in ES cells treated under different culture conditions, as well observations made on allele-biased gene expression in lymphoblasts , .
Using RNA-Seq to identify allele-biased expression genome-wide has the capacity to add a novel dimension to our understanding of transcriptional regulation in different cell types. However, interpreting RNA-Seq data is complicated and can be confounded by false positive allele-biased calls . One of the more important factors is in silico mapping bias when RNA-Seq reads are aligned to the reference genome, as described in the methods section. In this study, the reference allele was called with a 1.51 bias (Figure 5), despite precautions taken while aligning RNA-Seq reads (see methods). The cause of this persistent bias is under investigation, but does not appear to arise from sequencing bias (Figure S1). Nevertheless, the validation studies strongly support the RNA-Seq findings.
Allele-biased expression could be caused by epigenetic phenomena or a genetic process, such as copy loss or promoter variants. It has been suggested that most allele-biased expression is due to genetic factors , . This may well be the case for many of the genes in this study. However, several of the genes we validated, including CTNNA3 and the SZ candidate genes A2BP1 and ERBB4, showed random allele-biased expression suggesting an epigenetic phenomenon. Distinguishing between genetic and epigenetic processes (both random monoallelic expression and imprinting) will require a much more extensive analysis using several different lines and multiple replicates. This is an important consideration in SZ and ASD because an epigenetic process accounting for allele-biased expression could have an impact on determining the efficacy of epigenetics-based treatment. Confirmation in differentiating mouse and/or human neurons derived from embryonic neural progenitors is also needed.
Finally, allele-biased expression, in particular a stochastic model, has the potential to explain the ~50% MZ twin discordance seen in SZ –. Here, a disease-causing mutant allele would have an equal probability of being expressed (causing disease), or suppressed (avoiding disease) if such a process occurred in a pathway or cortical segment that was clonally derived. A stochastic model is also consistent with the observation that the prevalence of SZ in the children of discordant and concordant MZ twins is similar .
GWAS and CNV discovery have resulted in the identification of numerous candidate genes for neuropsychiatric disorders, a number that is bound to increase dramatically with rare variants detected using exome and whole genome sequencing. Translating the heterogeneous mix of susceptibility genes into novel medications could prove to be very challenging. This study presents the possibility of incorporating seemingly disparate candidate genes into a common molecular pathway – epigenetic regulation leading to allele-biased gene expression. Based on this idea, it is conceivable that one approach to treating a subgroup of patients would be at the epigenetic level (assuming a non-genetic cause of allele-biased expression) using drugs capable of activating a dormant normal allele as a means to buffer the effects of their monoallelically expressed abnormal partner. Proof of principle for this concept has recently been demonstrated in an animal model of Angelman Syndrome .
Distribution of nucleotides for allele-biased SNPs.
validation of replicates and additional genes.
Heterozygous SNPs showing allele-biased expression.
allele-biased genes in undifferentiated iPSCs.
allele-biased genes in neurons.
SZ and ASD genes showing allele-biased expression in iPSCs and neurons (from Allen Brain Atlas list of 883 SZ and 244 ASD candidate genes).
Detailed methods, reagents and PCR primers.
The authors thank participating families, Dr. Vladislav Sandler for providing two iPSC lines, Dr. Steven Cohen for doing skin biopsies, and Drs. Shahina Maqbool and John Greally of the Einstein Center for Epigenomics for carrying out RNA-Seq.
This work was supported by the National Institutes of Health [MH087840 and MH073164 to HML]. This publication was also supported in part by the CTSA Grant UL1RR025750, KL2RR025749 and TL1RR025748 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH), and NIH roadmap for Medical Research. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.