Complementary DNA was generated using RNA extracted from iPSCs and a 10 day cluster of neurons. Samples were amplified prior to paired-end deep sequencing as described in the methods
section. More than 1×108
pairs of reads were obtained for each sample, of which 74.3% and 82.3% had at least one end mapped to the reference genome (see legend for details). The expression of 9008 genes was significantly altered as the iPSCs differentiated into neurons (q-value<0.05) () (for a complete list of genes that showed significant changes in expression see Table S2
). Of these, 5953 genes decreased in expression, while 3055 genes increased. Although the majority of these significantly altered genes were protein coding (4060 and 1808, respectively), approximately one-third were non-coding, findings similar to previous reports showing widespread expression in both coding and non-coding regions 
Consistent with the differentiation of iPSCs into neurons, substantial 10 to >1,000-fold decreases were detected in the expression of genes associated with pluripotency such as OCT4 (POU5F1)
, and LIN28A
, with a concomitant increase in expression of transcription factors associated with neural differentiation, including POU3F2
, and MEF2C
, along with ASCL1
, are transcription factors capable of converting fibroblasts directly into neurons 
The most highly expressed genes in iPSCs were TPT1
, and HSP90AB1
(tumor protein, translationally-controlled 1) is involved in maintaining embryonic stem cell phenotype and interacts with OCT4
(nucleolin), a nucleolar phosphoprotein involved in ribosomal maturation 
(prothymosin, alpha isoform 2) is over-expressed in a number of different cancers and interacts with histone acetyltransferases. NASP
codes for nuclear autoantigenic sperm protein isoform 2 a histone H1 binding protein expressed in all dividing cells, inhibits proliferation and induces apoptosis in prostate cancer PC-3 cells, and is one of the hub genes involved in maintaining stem cell fate 
Among the most highly expressed genes in neurons were TPT1, TUBA1B, HNRNPA2B1, C5orf13, RPS6, NCL, CANX, and RPL35A
. C5orf13 is neuronal protein 3.1 isoform A and is expressed in fetal and adult brains. Ribosomal protein S6 (RPS6
) is phosphorylated in response to an array of growth factors and mitogens, a process that may be disrupted in Rett Syndrome 
. However, among the top 25 genes that are the most highly expressed in neurons, all show higher levels of expression in iPSCs.
More interesting are the genes that show the most dramatic shifts in gene expression – both increases and decreases – during the transition from iPSCs to neurons. Among the transcripts that decreased radically were SCGB3A2
, and IDO1
(secretoglobin, family 3A, member 2 precursor) maps to 5q32 and is a downstream target of the homeodomain transcription factor NKX2-1, which is critical for the development of lung, thyroid and ventral forebrain 
(dermokine isoform 1 precursor) is primarily expressed in skin and its expression in iPSCs is probably due to residual expression of fibroblast specific genes. PRDM14
is a member of the PR domain and zinc finger proteins involved in transcriptional regulation. It is expressed in pluripotent stem cells and is involved in mouse embryonic stem cell (mESC) self-renewal, overlapping with Nanog
in its binding domain 
. In addition, PRDM14
regulates the expression of OCT4
and is part of the network of genes involved in reprogramming human fibroblasts 
. Finally, POU5F1P1
may be a processed pseudogene or codes for a protein that binds to the nucleus that is believed to be involved in malignant transformation 
Considering the high level of expression in iPSCs and the rapid and dramatic decline that occurs during differentiation, some of these genes could be candidates for maintaining pluripotency or modulating cell type or region specific neural differentiation.
Another gene that showed one of the most significant fold-decreases during neuronal differentiation is FZD5
(frizzled 5 precursor), a receptor for WNT5A, which is particularly interesting in the context of neuropsychiatric disorders because signaling through canonical WNT-mediated signal transduction appears to be involved in SZ and BD in subgroups of patients 
. Down regulation of FZD5 mediated signal transduction, similar to that described in differentiating hESCs, is consistent with a role in maintaining pluripotency 
codes for LINE-1 type transposase domain containing 1. This gene appears to be a partner in the Oct4 interactome based on a protein affinity-based assay 
. There is some evidence that L1 retrotransposition occurring selectively in neuronal progenitor cells could be a factor in brain development 
Among the genes that showed the largest fold increase in expression during the transition from iPSCs to neurons was CARTPT
(also known as CART
), which codes for a neuropeptide involved in regulating food intake, reward and endocrine functions 
. It is expressed in the hypothalamus and throughout the mesocorticolimbic dopamine reward system (nucleus accumbens shell, amygdala complex, extended amygdala and orbitofrontal cortex 
Others include STMN4
(stathmin-like 4), a member of the stathmin family that encode phosphoproteins involved in regulating the microtubule filament system in the brain, the AMPA2 receptor gene GRIA2
, a zinc finger transcription factor expressed in the dorsal midline of the forebrain and in the boundary between the diencephalon and telencephalon, SCN3A
, which codes for a sodium channel, voltage-gated, type III, alpha subunit and has been implicated in seizure disorders and neuropsychiatric problems, EPHA3
, an ephrin receptor, the potassium channel encoding KCNK2
, DCC (deleted in colorectal cancer) which codes for a receptor for the axon guidance protein netrin-1, and SYT4
(synaptotagmin 4), a negative regulator of oxytocin release 
Several members from each of the four HOX
families are among the most differentially expressed genes. HOX
genes are involved in brain patterning in the development of forebrain, midbrain and hindbrain, and abnormal expression has been implicated in mental retardation, subtypes of ASD, and epilepsy 
Finally, previous work from our lab shows that the differentiation method we use results primarily in the development of glutamatergic neurons. This is reflected in the gene expression profile here, which shows that the glutamate transporter, SLC17A6
) is among the most highly differentially expressed genes (Tables S2
). By contrast, SLC32A1
(VGAT, GABA vesicular transporter), SLC6A20
(NET, norepinephrine transporter), SLC6A4
(SERT, serotonin transporter), and SLC18A1
(VMAT1 and VMAT2A; vesicular monoamine transporters) were not expressed. The dopamine transporter gene, SLC6A3
, (DAT1) was expressed, but at 1/50th
the level of SLC17A6
The list of significantly altered genes was subjected to bioinformatics analysis using Gene Ontology (GO) to determine functional enrichment 
. This showed changes in gene expression that were consistent with the conversion of pluripotent stem cells into terminally differentiated neurons (). In addition, there was high correlation (R
0.64) between the results obtained with pre-amplified RNA using RNA-Seq and total RNA using standard microarray expression profiling (). The findings show that faithful amplification of RNA can be achieved, which is important for studying gene expression profiles in small numbers of neurons.
Gene Ontology (GO) terms for differentially expressed genes.
Comparison of fold-changes (log2-transformed) of gene expression between iPSCs and day 10 neurons as measured by RNA-Seq (x-axis) and Affymetrix microarray (y-axis).
lcnRNAs and lincRNAs
A substantial fraction of transcripts in mammalian genomes do not code for proteins. These include lncRNAs and lincRNAs (long intergenic non-coding RNAs), which are arbitrarily defined as greater than 200 bases in length, to distinguish them from functional small non-coding transcripts, such as microRNAs (miRNAs), piwi-interacting RNAs (piRNAs), and small nucleolar RNAs (snoRNAs).
LncRNAs (also known as processed transcripts) by definition are found within protein coding genes, overlapping with promoters, exons or introns in either sense or antisense orientations. LincRNAs, on the other hand, are found in intergenic regions. Two well-known lncRNAs are Xist, which initiates X-chromosome inactivation in females and HOTAIR
, which is embedded within the HOXC
locus, but has a trans-repressive effect on HOXD
. Other lncRNAs involved in regulating HOX
gene expression are HOTAIRM1
and HOTTIP 
. There is increasing evidence that lncRNAs are involved in stem cell pluripotency and brain development 
. Khalil et al showed that ~20% of lincRNA bind to polycomb repressive complex 2 (PRC2), suggesting that many regulate gene expression at the chromatin level 
As shown in , 1622 non-coding genes (including lncRNAs and lincRNAs) significantly increased or decreased in expression during the transition from iPSCs to neurons. The most abundantly expressed lincRNAs in iPSCs are MALAT1
, and PVT1
(metastasis-associated lung adenocarcinoma transcript 1) is also the most abundantly expressed lincRNA in differentiating neurons. It was initially identified in a screen for genes differentially expressed in metastatic non-small cell carcinoma 
. MALAT1 appears to play a role in alternative splicing of pre-mRNAs since it localizes to nuclear speckles, dynamic structures that are essentially concentrations of splicing factors; it also recruits SR proteins, which participate in maturation of the splicosome 
is up-regulated in the brains of PCP-treated mice and plays a role in hippocampal synaptic function 
. Although the difference in expression between iPSCs and day 10 neurons was statistically significant, mRNA levels increased only ~10%. However, to our surprise, there was a substantial difference in splice isoforms generated in iPSCs and neurons, including several that were specific to each (Fig. S1
). The importance of this finding will require further study.
SNHG6/SNORD87 code for nucleolar RNAs, and PVT1
is non-coding RNA oncogene that regulates and is regulated by c-myc 
is highly expressed in iPSCs and decreases 41-fold in neurons.
Among the lincRNAs that show the largest fold increases in neurons is CRNDE
, a non-coding RNA of unknown function. CRNDE
maps to 16q12.2, near the IRX5
gene, a member of the Iroquois homeobox family, which are involved in brain patterning and neurogenesis 
. The close proximity to IRX5
suggests that CRNDE
may influence its expression.
Among the most highly expressed lncRNAs expressed in iPSCs are XIST
, and RP11-509I21.1,RP11-509I21.2,TDGF1
). Embryonic stem cells from female mice are characterized by the absence of XIST
expression resulting in activation of both X-chromosomes, one of which is subsequently randomly inactivated upon differentiation. However in human ESCs and iPSCs, XIST expression is heterogeneous, resulting in various patterns of X-chromosome inactivation; lines that do not express XIST
show expression of genes from both X-chromosomes 
. The iPSC line used in this RNA-Seq analysis was from a female.
AC00967.5 maps to 2q31.1 and overlaps with the METTL5
(methyltransferase like 5) gene. It decreases 6-fold in neurons. RP11-509I21.1
overlap with the OCT4 target TDGF
1 (teratocarcinoma-derived growth factor 1; also known as CRIPTO
) and is closely juxtaposed to the tumor suppressor gene LRRC2
(leucine rich repeat containing 2) 
. A >1,000-fold decrease in expression of transcripts in this region was found during the transition from iPSCs to neurons. TDGF1
is protein coding and is involved in NODAL signaling, which is a developmental program affecting midline, forebrain and left-right axis development 
. Other lncRNAs that are highly expressed in iPSCs but decrease more than 50-fold in day 10 neurons include RP11-132A1.3, RP3-430A16.1, RP3-430A16.1, RP11-1144P22.1, RP11-509I21.1/RP11-509I21.2/TDGF1, RP11-498P14.2, RP11-366M4.3, AC005062.2/AC007001.3/MACC1. Some of these genes should be viewed as potential candidates for maintaining pluripotency.
Among the most interesting lncRNAs that show marked increases in expression in neurons is HOTAIRM1
, which was suggested to be specific for myeloid cells 
. However, we detected a 54.6-fold increase in transcripts in early differentiating neurons. Also, several other lncRNAs that map to HOX
clusters showed dramatic increases in expression in day 10 neurons, including RP11-357H14.12 and AC036222.1, which map to the HOXB
Other lncRNAs that show little or no expression in iPSCs, but significantly increase in early differentiating neurons, include CTA-211A9.5/MIAT
, and CTB-12O2.1
is only ~2kb upstream from POU3F3
), a member of the brain-expressed POU family of transcription factors that is transcribed in the opposite orientation. Whether or not they share a common promoter, as we previously described for closely aligned genes transcribed in opposite orientations, remains to be seen 
was not expressed in either the iPSC or neuron sample used in the RNA-Seq experiment.
Validation by RT-PCR and q-PCR
The expression of some of these non-coding elements was validated using RT-PCR and qPCR (). Considering the importance of HOX genes in neuronal migration and brain patterning, we focused primarily on those that map to the HOX loci since lncRNAs are known to be involved in their regulation. The RNA samples used in the validation were not amplified and were obtained from an independent set of neurons. AC018730.1 transcripts, for example, could not be detected in iPSCs, but were abundant in both day 14 and 27 neurons. In addition, it is expressed in both fetal and adult brain samples. Similarly, HOTAIRM1 RNA was detected in differentiating neurons but not iPSCs. However, in contrast to AC018730.1, expression in brain was restricted to the 9 week fetal sample. Expression of the other lncRNAs in the HOX gene loci - AC036222.1 and RP11-357H14.12 - was restricted to early differentiating neurons, and could not be detected in either fetal or adult brain. These findings suggest that the lncRNAs within the HOXA and HOXB loci are involved in early neurogenesis and brain development, and regulate expression of the HOX genes, and perhaps others, within specific developmental time frames.
Validation by RT-PCR and qPCR in unamplified RNAs.
We also validated the expression of CRNDE and MIAT by RT-PCR; lncRNAs that were highly expressed in both iPSCs and neurons. As seen in , there is a visible band in iPSCs that appears to increase in early differentiating neurons, similar to the RNA-Seq findings, which showed detectable transcripts in iPSCs that increased 3.7 and 3.6-fold in neurons (CRNDE and MIAT, respectively). Interestingly, CRNDE is expressed in fetal brain but transcripts could not be detected in adult brain samples, whereas MIAT is expressed in both.
qPCR was used to quantify the difference in expression in the un-amplified RNA samples. As seen in , a significant increase in expression was detected for both using qPCR that was in a range similar to that observed in the amplified RNA-Seq sample (CRNDE
: 4.3-fold+/−2.1; MIAT
: 8.7-fold+/−5.1; n
3 or 4, each qPCR determination carried out in triplicate, p
0.05 for both).
In addition to long non-coding RNAs, small non-coding RNAs were also found to be differentially expressed (). However, considering that the RNA amplification method we used does not accurately amplify small RNAs, the findings will not be discussed (a more definitive RNA-Seq analysis using a small RNA library prepared from an unamplified sample is currently being carried out).
Some pseudogenes appear to be biologically functional with proposed effects on regulating gene expression through a variety of means, including acting as a miRNA decoy and producing endogenous short interfering RNAs 
. The expression of 1371 pseudogenes changed significantly during the transition from iPSCs to neurons ().
The most highly expressed in both iPSCs and day 10 neurons were RP11-442A13.1 and RP11-320L13.2, which map to 9q33.1 and Xq13.1, respectively (Table S6
). The pseudogenes that showed the most significant fold changes upon differentiation were TDGF3
(teratocarcinoma-derived growth factor 3, pseudogene; CRIPTO-3
), which maps to Xq21.1, and RP11-492I2.1, which maps to an ELAVL4
transcripts increase significantly in our differentiating neurons and is induced in the brains of PCP treated rats 
is expressed in malignant cells and decreases upon differentiation of human teratocellular carcinoma cells. Among the more interesting pseudogenes that decrease in neurons are POU5F1P4
, which is derived from POU5F1
, and TERF1P3
. The parent gene (TERF1
) codes for a component of the telomere nucleoprotein complex.
Neuron specific splicing is regulated by several proteins, including NOVA1, a neuron specific RNA-binding protein, and the splicing inhibitor PTBP1 (polypyrimidine tract binding protein). PTBP1
expression decreases ~3-fold in day 10 neurons, while NOVA1
expression increases ~3-fold, consistent with their known effects on splicing in neuronal tissue. Our paired-end sequence reads allowed us to identify 143,586 splice junctions in iPSCs and 144,885 in day 10 neurons. Two examples are depicted in , which shows the changes in splice isoforms that occur in the SZ and ASD candidate gene, NRXN1
. Expression across exons 19, 20 and 21 (asterisk in figure) is particularly interesting considering the importance of splice variants in this region, known as splice site 4, in determining glutamatergic vs GABAergic differentiation 
. We validated the apparent increase in expression across splice site 4 in this region by qPCR, which showed a 2.65-fold+/−0.79 increase in neurons (p<0.05).
Splice variants in NRXN1 (panel A) and NRG1 (panel B).
We also show splice isoforms in the SZ candidate gene NRG1. Overall, NRG1 transcripts increase ~10-fold in day 10 neurons and a rich array of splice isoforms that change during differentiation is observed. We validated the splice variant formed between exons 10 and 11 in the NRG1 protein isoform HRG-β2b using qPCR (denoted by asterisk). As seen in the figure, a 12.7+/−0.79 fold increase was detected in differentiating neurons (p<0.05).
These findings show that RNA-Seq using amplified RNA will faithfully identify splice isoforms, and that dramatic changes in the overall pattern of alternatively spliced transcripts in candidate genes for neuropsychiatric disorders is an early event in neurogenesis.
Relevance to neuropsychiatric disorders
As recently reported by Brennand et al., using neurons derived from patient-specific iPSCs is a valuable resource for gene expression profiling, which is the goal of our ongoing research as well 
. However, important information related to SZ, ASD and BD pathogenesis can also be gleaned by studying molecular events occurring during human neurogenesis, even in the absence of patient vs control comparisons. For example, some of the most dramatic changes in gene expression in differentiating neurons in this study occur in genes involved in neuropsychiatric disorders that code for transcription factors and chromatin modifying enzymes. Subtle alterations in the expression of such genes during neurogenesis caused by genetic or environmental factors have the potential to induce long-lasting changes in neuronal and brain function. Among the SZ, BD, and ASD candidate transcription factors and chromatin modifiers that change dramatically in day 10 neurons are POU3F2
, and NPAS3
, and the chromatin modifier JARID2
, which regulates the polycomb repressive complex.
In addition, expression of a number of genes coding for proteins involved in cell-cell adhesion, synaptogenesis and trans-synaptic bridging that have been implicated in neuropsychiatric disorders increase in these early differentiating neurons. These include NRXN1
(and its modulator ST8SIA2
and PCDH9 
Another way RNA-Seq data can be useful in understanding the underlying molecular and genetic basis of neuropsychiatric disorders is to identify non-coding elements transcribed in differentiating neurons that map to regions of the genome implicated by GWAS or CNVs. Many SNPs associated with neuropsychiatric disorders map to large introns or intergenic regions that are far removed from coding elements or known regulatory domains. This makes it difficult to appreciate the biological relevance of the association signal; presumably associated SNPs are in LD with disease-causing, biologically active variants.
One such SNP is rs893703, which maps to chromosome 3 (position139,250,649) within an RBP1
(retinol binding protein) intron 
. The SNP falls within a lncRNA that significantly increases in expression in differentiating neurons, RP11-319G6.1
is transcribed in the opposite orientation from RBP1
, the latter of which decreases in differentiating neurons. RP11-319G6.1
expression was validated in a different sample of differentiating neurons and is also expressed in fetal and adult brain (not shown). Whether or not RP11-319G6
expression or other genes, and is a factor in the positive association found between rs893703 and SZ remains to be determined. Retinoids, it should be pointed out, play a role in brain development and have been implicated in SZ, although the evidence is not strong 
Similarly the lncRNA RP11-586K2.1
, which maps to chromosome 8 (89,044,236–89,749,363) contains a SNP (rs12527359) found to be associated with SZ and BD in a GWAS meta-analysis 
increases ~3.5-fold in differentiating neurons. It should be viewed as a potential candidate gene for this GWAS signal since the nearest coding genes to rs12527359 are MMP16
(matrix metalloproteinase 16 isoform 1 and RIPK2
(receptor-interacting serine-threonine kinase 2
), which are ~400 kb–1,000 kb away, respectively.
A list of coding and non-coding genes that are differentially expressed in the RNA-Seq analysis that contain SNPs mapped in several SZ, BD and ASD GWAS studies are shown in Table S7 
Finally, our transcriptome analysis could be useful in narrowing down candidate genes in well-defined large copy variants that have been found in SZ and ASD; coding genes, lncRNAs and lincRNAs that significantly change in differentiating neurons should be viewed as feasible candidates underlying the psychiatric manifestations of the copy variant. The CNVs we evaluated in this manner were 22q11.2 del, 15q11.2 del, 15q13.3 del, 16p11.2 dup, 17q12 del, 1q21.1 del, and 3q29 del 
A total of 51 coding and non-coding genes in these CNVs are differentially expressed during early neurogenesis (Table S7
). Of these a few appear to be particularly interesting, including the nicotinic cholinergic receptor subunit gene CHRNA7
, and QPRT
, which codes for quinolinate phosphoribosyltransferase, an enzyme involved in metabolizing an endogenous exitotoxin (quinolinate) implicated in Alzheimer Disease and Huntington Disease. Interesting two genes coding for proteins involved in phosphatidylinositol glycan (GPI) anchor biosynthesis – PIGW
(chr17) and PIGX
(chr3) – map to CNVs that are differentially expressed during neurogenesis. GPI anchors are used in cells to attach hydrophilic proteins to cell membranes. The hydrophobicity WNT3A, a signal transduction pathway believed to be involved in subsets of patients with SZ and BD, is increased by GPI-like anchors