We first assessed the feasibility of using NHNPs as a model system for studying neuronal differentiation. Low passage NHNPs were utilized for all experiments, and when the cells were provided with growth-inducing signals in the form of EGF and bFGF they maintained a proliferative state (; see Materials and Methods). While the doubling time of these human cells is prolonged compared to embryonic rodent neural progenitors, the NHNPs can be kept in culture in a proliferative state for at least a year, if not longer (data not shown). We then tested the characteristics of the NHNPs after receiving differentiation cues, which was achieved by replacing EGF and bFGF with retinoic acid to promote cell cycle exit (see Materials and Methods). In addition, we also supplemented the media with BDNF, NT-3, and KCl to provide supportive growth factors and appropriate conductive ions for neurons. Under these conditions, we found a three-fold decrease in proliferating, Ki67+ cells after two weeks ().
We next assessed whether we could generate neurons under these differentiation conditions. We conducted immunostaining for nestin, a marker of proliferating progenitors (34
); Tuj1, a marker of immature neurons (35
); MAP2, a marker of mature neurons (36
); and GFAP, a marker of glia (37
). We found a significant decrease in nestin-positive cells and a significant increase in both Tuj1 and MAP2-positive cells with four weeks of differentiation ( and data not shown). Tuj1-positive cells represent about 50% of the cells in culture at four weeks (), whereas the number of GFAP-positive cells stayed consistent throughout differentiation (data not shown). Thus, using our differentiation conditions, we can efficiently differentiate human neural progenitors into post-mitotic neurons. Moreover, based on gene expression profiling (see below), the NHNPs may differentiate into a number of neuronal types including both excitatory and inhibitory neurons. This is evidenced by an increase in expression of genes such as DRD3, DRD4, GABBR1, GABRB3, GAD1, GAD2, GRIK1, GRIN1, GRM2, GRM3, SLC23A1,
for example (Supplementary Table 2
Next, we characterized this process of differentiation by conducting whole-genome microarray analyses on a time course of differentiating NHNPs. We selected four time points to assess: time 0 or undifferentiated cells, two weeks, four weeks, and eight weeks of differentiation (T0, D2, D4, and D8). These time points were chosen for the following reasons: D2 because we observed a significant decrease in proliferation at this time (), D4 because there was a significant increase in Tuj1 at this time (), and D8 because it represented a period of time twice as long as D4 for direct comparison with the D2 to D4 transition during neuronal maturation. Four biological replicates from each time point were included for analysis. Hierarchical clustering demonstrates that the biological replicates clustered with one another and that the time point most different from the other three when considering expression of all genes was the one containing the undifferentiated cells (Supplemental Figure 1A
). When only the top 500 most variable genes were analyzed, T0 and D2 clustered separately from D4 and D8 (Supplementary Figure 1B
), suggesting that the genes changing the most during the time course represent differential patterns of expression between the early and late phase of differentiation. This pattern also suggested that neural specification may be entrained at time point D4.
To investigate the possibility that D4 was a critical time point in NHNP differentiation, we examined the genes changing at D4 compared to T0. Using criteria of a false discovery rate (FDR) ≤0.01 together with a fold change of ±1.5, 2218 genes increased in abundance and 2209 genes decreased at D4 compared to T0 (). Known key markers of neuronal differentiation increased at this time point including MAP1B
), ephrins (EFNB3
), and semaphorins (SEMA5A
). In addition, the expression pattern of genes in the NHNPs is consistent with the cells being derived from forebrain progenitors. For example, SP8
are expressed in the proliferating cells (43
), whereas RELN
, BCL11B (CTIP2)
, and NGFR
are induced with differentiation (44
). Supplementary Table 2
lists all of the genes changing at the D4 time point. Gene Ontology (GO) analysis revealed that the most significantly enriched categories for upregulated genes were nervous system development, neurogenesis, neuron differentiation, and synaptogenesis while those for downregulated genes were cell cycle and mitosis ( and Supplementary Table 3
), consistent with cessation of the proliferative state. Disease association data were also derived from the DAVID database based on Gene Ontology classifications (see Materials and Methods). All of these data were derived from gene expression profiling in one cell line. However, we found significant overlap (P<3.73E-85 – P=0.0) using similar differentiation parameters with two different lines of NHNPs (line 4, Supplemental Table 1
and an additional third line, data not shown).
Most significantly enriched gene ontology (biological function) and disease categories of differentially expressed genes at D4 compared to T0
We next compared the genes changing at D4 to those changing at least 1.3-fold in human brain in vivo
between 18 and 23 gestational weeks (45
)to determine to what extent the in vitro
changes recapitulated those observed in vivo
. Out of a potential 575 genes that are available for overlap comparison using our criteria for fold change, 167 genes are increased (P=7.56E-89) and 202 are decreased (P=5.64E-114) under both maturation conditions (Supplementary Table 4
), an overlap of about two-thirds. These data are even more remarkable considering that the microarrays utilized are different, only a relatively short period of in vivo
development is represented and the in vivo
data for comparison consists of pooled data that includes subcortical areas. Thus, these data support the hypothesis that four weeks of differentiation of NHNPs recapitulates the appropriate timing for robust neuronal differentiation in human cells and tissue and reflects molecular processes relevant to in vivo
To assess whether this model system could be appropriate for understanding ASD, we cross-referenced two lists of genes with strong evidence for association with ASD (8
) with the genes changing during D4 (FDR≤0.01 and fold change ±1.3). Out of a total of 28 genes implicated in ASD, we found a significant overlap of genes going both up (7 genes; P=2.6E-02) and down (7 genes; P=2.4E-02) at D4 (). Next, we examined whether we could independently confirm the genes changing on the microarrays using real-time RT-PCR (qRT-PCR). Using a threshold of 1.5-fold change, we confirmed approximately 80% of the genes tested (). Therefore, the microarrays indicate that a significant number of ASD candidate genes are regulated during human neuronal differentiation in vitro
Expression and confirmation of ASD-candidate genes during differentiation
One interesting observation was that neurexin 1 (NRXN1
), an autism and schizophrenia susceptibility gene, and one probe for neurexin 3 (NRXN3
) change in opposite directions on the microarrays. We subsequently confirmed by qRT-PCR that NRXN1
was upregulated during NHNP differentiation () consistent with its known role in synaptic function (47
). However, using three different primer pairs to different NRXN3
isoforms, we found NRXN3
was downregulated (). This was somewhat surprising, since there was no previous suggestion that these two highly homologous genes had different functions or expression patterns in human brain. To see if this corresponded to the in vivo
situation, we performed in situ
hybridization in human fetal brain, which remarkably confirmed these results. NRXN3
is expressed in areas of progenitor cells in human fetal brain, while NRXN1
is expressed in areas of post mitotic neurons (). Inspection of the mouse in situ
hybridization data in Allen Brain Atlas also shows a similar pattern using different probes (http://www.brain-map.org/
). This proof of principle demonstrates how a simple in vitro
human system can be used to guide in vivo
discovery. These particular data suggest that NRXN3
has a previously unknown role in neural progenitor biology, distinct from canonical neurexin function at the synapse, which is likely conserved from mice to humans.
While distinct up or down changes in gene expression are important parameters of any cellular process, we have begun to appreciate that gene expression datasets contain other inherent patterns of gene co-expression that can be mined, so as to reach new functional insights (49
). Specifically, we have used weighted gene expression co-expression network analysis (WGCNA; (50
)) as previously described (14
)to provide a systems-level view of the organization of the neural transcriptome. Genes that are highly co-expressed are grouped into “modules.” Modules’ correspondence to different aspects of function, such as neuronal phenotypes, sample characteristics (disease vs. control), or cellular organelles is determined by analysis of the module eigen gene or first principle component (15
). In addition, central or “hub” genes can be identified within modules (15
WGCNA of the NHNP differentiation time course identified a total of 22 modules, 14 of which could be easily characterized (Supplementary Table 5
). Four modules were clearly associated with differentiation over time throughout the entire time course from T0 through D8. Four other modules correlated best with the undifferentiated cells (T0), while one module eigengene was highly correlated with D2 and another with D8. The module whose eigengene is most correlated with D2 (the green module) is of interest since these genes are likely those involved in defining the onset of neural specification and the cessation of proliferation and cell division. The D8, or sea green, module may contain genes involved in neuronal maturation and phenotype maintenance. Finally, four other modules depicted the transition from D2 to D4. We hypothesized that these modules likely contain genes that are critical for the differentiation process, as D4 was the time point when a significant increase in Tuj1 positive cells was observed () and the GO of genes changing at D4 was enriched for CNS differentiation.
We next sought to determine whether there may be shared biological processes or pathways between distinct ASD risk genes in an unbiased manner using WGCNA. To broaden the number of genes available for this analysis, we queried the SFARI gene database for the most inclusive list of genes with a relationship to ASD (http://gene.sfari.org
). While many of these genes may currently have a weak association to ASD, inclusion of as many genes as possible in this analysis should yield networks of genes with the greatest potential for ASD-related gene co-expression and provides an unbiased view. Using the SFARI database, we demarcated 224 genes with a potential role in ASD. Ninety of these genes overlapped with the top 6000 most-connected genes in the current gene expression dataset and were included in the network analysis. Although these 90 genes were distributed across 17 modules (Supplementary Table 6
), there were four modules with a particular concentration of ASD genes: the royal blue module (with 18 ASD genes), the pink module (with 17 ASD genes), the turquoise module (with 8 ASD genes), and the black module (with 7 ASD genes) (). This non-random concentration of genes within this cohort of modules suggested some shared pathways and previously un-recognized connections between ASD susceptibility genes.
The module eigengene of the royal blue module is related to T0; thus these genes are most related to the proliferative state of the neural progenitors. Not surprisingly, therefore, the top enriched GO categories for the royal blue module include cell adhesion, nervous system development, and neurogenesis (Supplementary Table 3
). While none of the ASD genes are “hub” (or the most connected) genes in the royal blue module, at least four of the candidate ASD genes, CBS
, are included in the top 250 connections within this module (Supplemental Figure 2A
The pink module contains genes related to differentiation across the entire time course. Therefore, these genes are most correlated with the process of differentiation. The pink module contains genes that are enriched in cell cycle GO categories (P=2.29E-05) and axon guidance (P=1.74E-02) (Supplementary Table 3
). Visualization of the top connected genes in the pink module reveals ABAT
, and SCN1A
, all ASD candidate genes, as being among the most connected genes within the module (Figure 5B). Interestingly, MET
, another ASD gene that likely coordinates with PLAUR to transmit hepatocyte growth factor signaling (8
), is also contained within the pink module, but does not make the cutoff for visualization.
The black and turquoise modules contain genes involved in the D2 to D4 transition; therefore, these genes are likely to be important for early neuronal differentiation. The black module is enriched for genes involved in signal transduction (P=4.53E-07). Two out of the seven ASD candidate genes, NRXN1 and GLRA2, can be visualized among the most connected genes in the black module (Figure 5C). The most significant GO categories for the turquoise module are neuron projection (P=1.27E-06) and nervous system development (P=5.41E-06). Both NRXN1 and SLC9A6 are highly connected within the turquoise module (Figure 5D). Strikingly, of the almost 600 gene connections to ASD genes within the four ASD modules, only seven, or 1%, were previously known or could be surmised based on the literature, while the remaining 590 are novel, demonstrating the power of this analysis.
Another manner with which to identify potentially novel ASD signaling pathways is to focus on the interconnectedness of only the known ASD genes. For example, upon filtering for only known ASD-gene related connections in the turquoise module, not only are the ASD genes always connected by one node or fewer, but now other genes are prioritized, such as the cell adhesion genes CTNNA2
(Supplemental Figure 2A
). Using an identical approach with the royal blue module shows inclusion of 17 out of the 18 ASD genes in the module (Supplemental Figure 2B
). In addition, well-characterized genes such as SEMA5B
are easily identifiable as connected among these ASD genes. This level of connectivity is non-random, as the mean overall network connectivity degree is 17 times less than the ASD genes alone, and permutation analysis (See Supplemental Methods
) shows that the ASD gene connectivity is much more significant compared with a randomized list of genes of the same size (P=0). This demonstrates that within the complex and enormous process of neuronal differentiation, specific aspects and pathways are more clearly associated with ASD.
To validate whether there were any genetic associations increased in these four modules, we first examined the literature using GO. The royal blue module is enriched for genes with a genetic association with schizophrenia (although this P value did not reach our threshold for significance; P=5.86E-02). These data make sense in light of some of the overlapin the genetic origins of schizophrenia and ASD (52
). In contrast, other neurological disease genes were not significantly enriched, e.g. dementia (P=1.1E-01), Parkinson’s disease (P=3.9E-01), or bipolar disorder (P=4.9E-01). The pink module also contains genes that are enriched in schizophrenia (P=2.12E-02). The black module is enriched for genes implicated in autism (P=2.45E-02), bipolar disorder (P=2.50E-04), and Alzheimer’s disease (P=4.05E-02). Interestingly, the GO analysis reports four additional genes in the autism category that were not included in the SFARI.gene database as potential ASD candidate genes (BDNF
, and NRG1
). This highlights that the criteria for inclusion as an “autism gene” can vary among databases. Finally, the turquoise module is enriched for genes involved in panic disorder (P=1.28E-02) and attention deficit disorder (P=1.56E-02). Thus, these data illustrate that there are likely overlapping signal networks during brain development that are disrupted in various permutations to lead to a number of neuropsychiatric disorders.
Recent studies have shown that although genome-wide association studies (GWAS) may not reveal strongly associated genes with large effect sizes in neuropsychiatric disease, overall there may be a weaker signal spread across many genes, as has been demonstrated in schizophrenia and bipolar disorder (6
). Thus, it has been suggested that identification of potentially causal pathways could be a means with which to increase power to detect genetic association. To further assess whether there is independent genetic evidence for pathway enrichment, we tested for enhancement of genetic association signals for ASD within the four modules defined above. When the genes used for network analysis were analyzed for genetic association to ASD over 50% of genes with a P Value < 1.0E-02 fell into one of these four modules (Supplementary Table 6
). While this percentage of genes is not statistically enriched, it provides another level of information for prioritizing unannotated genes that are highly connected to known ASD genes in these modules.