|Home | About | Journals | Submit | Contact Us | Français|
A major unanswered question in neuroscience is whether there exists genomic variability between individual neurons of the brain, contributing to functional diversity or to an unexplained burden of neurological disease. To address this question, we developed a method to amplify genomes of single neurons from human brains. Since recent reports suggest frequent LINE-1 (L1) retrotransposition in human brains, we performed genome-wide L1 insertion profiling of 300 single neurons from cerebral cortex and caudate nucleus of 3 normal individuals, recovering >80% of germline insertions from single neurons. While we find somatic L1 insertions, we estimate <0.6 unique somatic insertions per neuron and most neurons lack detectable somatic insertions, suggesting that L1 is not a major generator of neuronal diversity in cortex and caudate. We then genotyped single cortical cells to characterize the mosaicism of a somatic AKT3 mutation identified in a child with hemimegalencephaly. Single-neuron sequencing allows systematic assessment of genomic diversity in the human brain.
It is unlikely that the genomes of any two cells in the body are identical, due to somatic mutations during replication and other mutagenic forces (Frumkin et al., 2005). The complexity and diversity of neuronal cell types in the brain has also led to suggestions that a somatic mutational mechanism may have been harnessed evolutionarily to diversify neuronal function (Muotri and Gage, 2006; Rehen et al., 2005). Endogenous retrotransposition of LINE-1 elements has been proposed as one potential mechanism generating neuronal genome diversity (Singer et al., 2010). Human-specific LINE-1 (L1Hs) retrotransposons comprise the only known active autonomous transposon family in humans, with ~80–100 active L1Hs elements per individual (Hancks and Kazazian, 2012), and somatic L1Hs insertions have been found both in cancerous and normal cells (Iskow et al., 2010; Lee et al., 2012a; Miki et al., 1992; van den Hurk et al., 2007). Recent studies observed rare retrotransposition of an L1Hs reporter in rodent brain in vivo (Muotri et al., 2005; Muotri et al., 2010) and human neural progenitors in vitro (Coufal et al., 2009), while other studies found evidence for more widespread somatic L1Hs insertions in the human brain by qPCR (Coufal et al., 2009) and bulk DNA sequencing (Baillie et al., 2011). qPCR estimates of these events in human brain approach 80 somatic insertions per cell (Coufal et al., 2009).
Although L1 retrotransposition and other somatic mutations could contribute to functional genomic diversity, they can also cause disease (Erickson, 2010; Hancks and Kazazian, 2012). Therefore, any potential somatic mutational mechanism must be balanced by the need for genome stability. Somatic mutations cause not only cancers but also several malformations of the brain (Gleeson et al., 2000; Riviere et al., 2012), emphasized by the recent identification of somatic mutations affecting genes of the PI3K-AKT3-mTOR pathway in hemimegalencephaly (HMG) (Lee et al., 2012b; Poduri et al., 2012), a severe epileptic brain malformation. However, the rates and types of somatic mutations occurring during normal brain development, and how much of the unexplained burden of neurogenetic disease may be caused by somatic mutations, are completely unknown (Erickson, 2010).
Systematically studying somatic mutations requires sequencing genomes of single cells (Kalisky et al., 2011), since the signals of somatic mutations present in a minority of cells can be missed due to sequencing error or insufficient sequencing depth. Single-cell sequencing overcomes this limitation, as shown by studies of single human cancer cells and single sperm that have yielded important new insight into tumor evolution and genetic heterogeneity (Hou et al., 2012; Navin et al., 2011; Wang et al., 2012; Xu et al., 2012). However, similar technologies have yet to be applied to the study of somatic mutation in normal human tissues such as brain, or to diseases other than cancer.
Here we describe a method to amplify genomes of single neurons from post-mortem and surgically resected human brain, enabling interrogation of a wide-range of somatic mutations by high-throughput sequencing. We performed genome-wide L1Hs insertion profiling of 300 single neurons from cerebral cortex and caudate nucleus of three neurologically normal individuals, and confirmed that somatic L1Hs retrotransposon insertions are present in the normal human brain. Our quantitative analysis of >200,000 L1Hs insertion sites in these 300 single neurons suggests a rate not higher than 0.6 unique somatic insertions per neuron, and possibly as low as 0.04 (1 insertion in 25 neurons), consistent with observed in vitro rates for human neural progenitors but substantially less than previous qPCR-based estimates for human brain (Coufal et al., 2009). We then sequenced single cells from HMG brain tissue harboring a known somatic AKT3 point mutation (c.49G→A; p.E17K) (Poduri et al., 2012), showing that our method can characterize the mosaicism of pathogenic somatic brain mutations. These single-cell studies provide a foundation for studying genomic variability among cells in the human brain, both in normal development and neurologic disease.
We purified nuclei from post-mortem human frontal cortex and caudate and labeled them with a neuron-specific antibody (NeuN) for sorting using fluorescence-activated cell sorting (FACS) (Figure 1A) (Matevossian and Akbarian, 2008; Spalding et al., 2005). Large nuclei with neuronal nuclear morphology (Parent and Carpenter, 1996) were readily apparent by microscopy (Figure S1A). NeuN immunoreactivity (Figure S1B) (Mullen et al., 1992) labels essentially all neuronal nuclei in cortex and caudate (Wolf et al., 1996), corresponding to 25–35% of all nuclei (population I) (Figures 1B and S1C). Consistent with their increased size on microscopy (Figure S1B), NeuN+ nuclei also had larger forward (FSC) and side (SSC) scatter (correlates of size) by flow cytometry compared to NeuN− nuclei (Figure S1D). Whereas for nuclei isolated from the caudate we performed a simple sort of the NeuN+ population (population I, Figure S1C), we further enriched nuclei from the cortex for pyramidal neuronal nuclei. Since neighboring cortical pyramidal neurons tend to have shared clonal origins due to their primarily radial migration (Magavi et al., 2012), enriching for pyramidal neuronal nuclei increases the chance of identifying clonal somatic mutations shared by multiple neurons. The largest neuronal nuclei in cortex correspond primarily to pyramidal projection neurons (Gittins and Harrison, 2004; Mills, 2007), and indeed their nuclei often show a pyramidal shape (Figure S1A). We therefore sorted cortical nuclei within the top 25% NeuN/FL-2 fluorescence of population I (population Ia), which were the largest nuclei in population I (Figure S1D). We confirmed the neuronal and non-neuronal identities of the sorted populations by RT-PCR and western blot analysis of additional neuronal (SNAP25 and SYT1) and non-neuronal (GFAP, AQP4, and Olig2) markers (Figures 1C and 1D). For every sort, a portion of the sorted nuclei was reanalyzed by FACS, confirming that nuclei remained intact during sorting and that sort purity was >98% (Figures 1B and S1C).
We used multiple displacement amplification (MDA) (Dean et al., 2002) for whole genome amplification of single nuclei because it produces large yields of high molecular weight amplicons, most of which are >30kb (Hou et al., 2012 and data not shown), allowing study of both single-nucleotide mutations and ~6kb full-length L1Hs insertions. We optimized MDA reactions for increased yield (Figure S1E), producing 15–20µg of amplified DNA from single cells. We also measured exogenous (non-human) DNA contamination in the reagents of the MDA reaction (Blainey and Quake, 2011), finding negligible (< 1fg) exogenous DNA (Figures S1F and S1G). Additional controls (see following section) excluded operator human DNA contamination. Quantitative MDA (qMDA) reactions (Zhang et al., 2006) further showed that, as the number of nuclei sorted in a well increased, the time-to-threshold-amplification decreased in a step-wise manner (p <0.01 for each additional nucleus) (Figure 1E), confirming that the desired number of nuclei was correctly sorted in each well. We concluded that our procedure can sort and amplify single neuronal genomes from human brains with high purity and in a high-throughput manner.
We next evaluated the genome-wide coverage and reproducibility of our single neuronal genome amplification. In an initial 4-locus multiplex PCR quality control, 97% of sorted single neurons amplified at least 3 of the 4 loci, indicating that their genomes were successfully amplified and suitable for further experiments. We then performed low-coverage whole-genome sequencing (Figure 2A) of eight randomly chosen single neurons (0.35× average coverage), six from a normal individual (46XY) and two from a trisomy 18 individual, as well as unamplified and MDA-amplified bulk reference samples. The two neurons from the trisomy 18 individual showed the expected increase in chromosome 18 copy number, and the six single neurons from the normal individual were all euploid, confirming that intact nuclei were sorted and that all chromosomes were amplified (Figure 2B). Counting sequencing reads across the genome in bins ~500kb in size (Navin et al., 2011) revealed a systematic, regional amplification bias for all MDA samples, compared to unamplified bulk DNA, regardless of the number of nuclei amplified (Figure S2A). This regional bias in MDA amplification could be controlled for using any of the MDA samples as a reference (Figure 2C), indicating that most of the regional variability in amplification is inherent to MDA rather than the number of nuclei amplified. Bias in amplification relative to GC content was also similar for all MDA samples types (Figure S2B).
In order to use single-neuron sequencing for somatic mutation detection, amplified genomes must reflect the diploid genotype (both alleles) of genomic loci. We therefore quantified the fraction of genomic loci that failed to amplify one (allelic dropout, AD) or both alleles (locus dropout, LD). Loss of one allele, AD, was measured with a panel of 16 polymorphic microsatellite markers (Identifiler fingerprinting) and by SNP microarray genotyping. AD measured by Identifiler of 92 single neurons across 1,183 heterozygous loci was 9.5% (Figure 2D), whereas AD measured by SNP microarray (for >60,000 loci that are heterozygous in the bulk DNA and called with high confidence in both the reference and sample) was 8–9% in 3 single neurons (Figure S2C and Table S1A), consistent with previous estimates (Hou et al., 2012). Some dropout tended to recur at specific loci even in MDA-amplified 100- and 1000-neuron samples (Figure S2D), probably reflecting difficulty of MDA to amplify specific loci. Loss of both alleles, LD (locus dropout), was 2.3% in the 92 single neurons assayed by Identifiler. In addition, LD was separately estimated by counting the percentage of low-coverage sequencing bins with less than 1/16 the copy number relative to an unamplified DNA reference, and was 2.0% for 1-neuron samples (Figure S2E). These low rates of AD (~10%) and LD (~2%) demonstrate comprehensive and reproducible amplification of single neuronal genomes, and suggest that genome-wide profiling of L1 insertions in single neurons could capture up to 90% of retrotransposon insertions per cell. These genotyping controls also excluded operator contamination, since all amplified single neuronal genomes tested were concordant with the bulk reference (Figures 2D, 2E and Tables S1B–C).
We performed genome-wide L1Hs insertion profiling (L1-IP) of single neurons by adapting the method of Ewing and Kazazian (2010) for high-throughput multiplexed sequencing. All known active and disease-causing L1Hs sub-families possess two sequences diagnostic of L1Hs (Hancks and Kazazian, 2012; Ovchinnikov et al., 2002), and a comprehensive study of somatic insertions in the setting of cancer found that 110/111 somatic insertions (with evidence of a target site duplication and poly-A tail) contained both sequences (Lee et al., 2012a). L1-IP targets these L1Hs-specific sequences and amplifies genomic DNA flanking L1Hs insertions containing these diagnostic sequences (Figures 3A, 3B and S3A).
We profiled from each of 3 neurologically normal individuals: 50 single neurons from cerebral cortex and 50 from caudate nucleus (i.e. 300 MDA-amplified single neurons total), unamplified bulk DNA from 5–6 tissues (cortex, caudate, cerebellum, heart, liver, lung), MDA-amplified 50,000-cell, 10,000-cell, 1,000-cell, and 100-neuron samples, as well as technical replicates to assess reproducibility (Figures S3B and S3C), for a total of 383 samples (see Table S2 for sample details). A custom data analysis pipeline classified detected peaks as known reference insertions present in the human genome reference (KR), known non-reference insertions identified in previous studies (KNR), or unknown (UNK) candidate insertions, and assigned a confidence score ranging from 0 to 1 (low-quality to high-quality peaks) based on the number of reads and the number of unique read start sites per peak (Figure 3C). The confidence score was derived from a logistic regression model of germline insertions reproducibly found in bulk DNA samples of the individual (Figure S3D, and see Extended Experimental Procedures for details of the analysis pipeline).
MDA is known to produce rare, low-level chimeric sequences due to local, occasional mispriming of single-stranded amplicons to each other during amplification (Lasken and Stockwell, 2007). These chimeras were seen in MDA-amplified samples as an excess of background reads and peaks with low read depth, and one or few unique read start sites, in the local ~20kb flanks of some, though not all L1 insertions (Figures 3B and S4A–D). Since chimeras form at different sites in different MDA reactions, they are not recurrent between samples (Figures S5A and S5B), and cloning of chimeras (representative example in Figures S5A–C) confirmed their MDA-derived mechanism of formation. Their low confidence scores (Figure S4B) allowed most MDA-chimera peaks to be filtered with minimal reduction in sensitivity for bona fide insertions (Figure 3C).
We first assessed the sensitivity of L1-IP to detect L1Hs insertions genome-wide. In 1-neuron samples, the sensitivity of L1-IP for KR insertions (mostly homozygous) present in bulk DNA of the individual was 81±6% (SD), with a confidence score threshold of 0.5 (Figure S6A), and of 300 1-neuron samples in this study, only 4 were low quality outliers (Figure S6B). Sensitivity increased to 87% when relaxing the confidence threshold to 0.1, though at this lower confidence score, more insertions with weaker evidence supporting them were also detected. Since somatic insertions are expected to be present in a single copy, sensitivity for single copy insertions in 1-neuron samples was assessed with chrX KR/KNR insertions in individual 1465 (male) and was only slightly lower at 75±10%, with a confidence score threshold of 0.5. We further confirmed that we detect the expected absolute number of insertions: the mean number of KR, KNR and UNK insertions (with confidence score > 0.5) per bulk DNA sample was 689, 113, and 43, respectively (Figure S6C), compared to 628 KR and 152 KNR/UNK insertions found on average in a previous study (Ewing and Kazazian, 2010). 605, 87 and 47 KR, KNR, and UNK peaks were found on average in 1-neuron samples (Figure S6C). A plot of L1Hs peaks found in bulk DNA, a 100-neuron sample, and two representative single neurons is shown in Figure 4.
In order to validate L1-IP predicted insertions, we optimized a 3’ junction PCR validation method (3’PCR) (Figure S6D), and further used it to directly measure allelic dropout (AD) and locus dropout (LD) of L1Hs insertions in amplified single neurons. The technical sensitivity of the 3’PCR validation method (i.e. 3’PCR detection rate of true germline insertions) was important to determine first, in order to estimate at what rate true insertions found by L1-IP fail to validate by 3’PCR. This was assayed by 3’PCR of 64 known germline insertions (33 KR and 31 KNR) in unamplified bulk DNA, and amplified unsorted-50k and 1-neuron samples. In 1-neuron samples, 3’PCR detected 94% of known germline insertions with the first primer attempted (the remainder were validated successfully with redesigned primers), and this detection rate was not significantly different between amplified and unamplified samples (Figures 3D and S6E). 3’PCR can therefore sensitively detect L1Hs insertions in amplified single neuronal genomes. 3’PCR also successfully validated, in both bulk and 1-neuron samples, 12 out of 12 unknown (UNK) germline candidate insertions that we tested (Figures 3D, S6E and Table S3), confirming that L1-IP can identify unknown germline insertions. AD of L1Hs insertions was then estimated by 3’PCR of 3 heterozygous insertions in a larger number of 83 single neurons (Figures 3E and S6F–G), finding 8.0% AD (20/249 alleles), consistent with previous estimates. LD estimated by 3’PCR of 3 homozygous insertions in the same cells (Figures 3E and S6G) was 1.2% (3/249 alleles). We concluded that L1-IP’s high sensitivity to detect germline insertions in single neurons, our robust 3’PCR validation method, and direct confirmation of <10% L1Hs allelic dropout, allows us to confidently search for somatic L1Hs insertions genome-wide in single neurons.
L1-IP can reliably detect population-polymorphic L1Hs insertions in single neurons (Figures 5A–C), serving as a fingerprint for each individual. All possible permutations of insertion polymorphisms among the 3 individuals were found (every possible pair of individuals and individual-specific), and as expected, KR and KNR insertions were enriched in fixed and polymorphic insertions, respectively (Figure 5A). Hierarchical clustering of all samples in the study according to L1Hs genotype correctly clustered all samples by individual except for 3 low-quality 1-neuron samples (Figure 5A). Importantly, since both population-polymorphic and somatic insertions belong to the same L1Hs subfamilies and have the same L1Hs diagnostic nucleotides (Beck et al., 2010; Lee et al., 2012a), detection of population-polymorphic L1Hs insertions in single neuronal genomes further illustrates that L1-IP has the potential to capture somatic insertions.
Our single-neuron L1-IP data allowed us to quantify the number of cortex- and caudate-specific somatic insertions in single-neuron samples and estimate an upper bound for the number of somatic L1Hs insertions per neuron (defined as absent from bulk DNA samples of the individual excluding the brain region being analyzed). Rather than using the same confidence score threshold across all samples, we adjusted the confidence score threshold for each single-neuron sample to maintain a constant sensitivity for KNR germline insertions. This controls for variability in single-neuron sample quality and allows for more accurate correction of insertion rates for sensitivity. A KNR reference was specifically chosen as it would be expected to better estimate sensitivity for single-copy somatic events than a mostly homozygous KR reference set. We excluded insertions found within 20kb of known (KR/KNR) insertions, leading to a minimal reduction in sensitivity (by excluding 1.5% of the genome, i.e. 45.5/3137Mb) with a substantial gain in specificity by filtering most, though not all, MDA chimera peaks (Figure S4A). At a sensitivity threshold that detects 50% of KNR insertions, we found an average of 1.1±2.3 (SD) somatic insertion candidates per neuron (corrected for sensitivity) (Figure 6A), and 68% of 1-neuron samples had no detectable somatic insertions. Additionally, we counted the number of unique somatic insertions per neuron (i.e. not present in other single neurons sequenced from the individual) and found 0.6±1.5 candidate unique insertions per neuron (Figure 6B); 82% of 1-neuron samples had no detectable unique somatic insertions.
The above upper bound estimate for the somatic insertion rate controls for sensitivity (false negative rate), but is likely an overestimate as it does not take into account specificity (i.e. false positive MDA chimera and other artifactual peaks still remaining after our sensitivity threshold and local 20kb filtering). We therefore screened for false positive candidates by carrying out 3’PCR validation and secondary validations of the 16 highest-scoring candidate somatic insertions from each tissue (96 total). Initial review of L1-IP raw data revealed that at least half of the candidates were likely MDA-chimeras or other recognizable technical artifacts that cannot be systematically filtered. These include peaks caused by read alignment errors, chimeras of older L1Pa insertions, and loci with systematic low-level reads present at sub-threshold levels in many unamplified bulk and MDA-amplified samples of unrelated individuals, but stochastically passing threshold as somatic candidates in one or a few single neuron samples (see Table S3 for annotation of the 96 candidates). Indeed, only 17 of the 81 candidates (21%) for which we could design primers passed 3’PCR validation (Figure S7A), significantly less than the 94% validation rate for known insertions (Figure S6E). Secondary validation sequencing of 3’PCR products and review of L1-IP raw data revealed that 12 of the remaining 17 candidates were chimeras or non-specific PCR products. Therefore, most of the somatic candidates are likely false positives, and the true somatic L1Hs insertion rate may be significantly lower than our upper-bound estimate prior to validation. The post-validation somatic and unique somatic insertion rate estimates are 0.07±0.15 and 0.04±0.10 insertions per neuron, respectively (Figures 6A and 6B).
The remaining 5 somatic candidates were studied further by attempting to clone their full-lengths, and screening for their presence by 3’PCR across all single neurons sorted from the individual in which they were found. We successfully cloned the full-length of one of the five somatic insertion candidates (Figure S7B). This insertion was detected in our L1-IP data in intron 4 of the gene IQCH (IQ motif containing H, chromosome 15), in neuron #2 from the cortex of individual 1465, and is a full-length, intact 6.1kb L1Hs with all the hallmarks of a bona fide L1Hs insertion: a target site duplication (TSD) (13bp), a poly-A tail (~71bp), and a 5’ transduction (101bp) allowing us to trace its source to a full-length, population-polymorphic KR L1Hs on chromosome 8 (Figures S7C and S7D). The full-length sequence of the somatic insertion (Table S3) precisely matched the sequence of the source L1Hs. The insertion was not detected by standard 3’PCR in brain and non-brain bulk tissues from the individual (Figure 6C) and was found in 2/83 (2.4%) cortical and 0/59 caudate single neurons tested (Figures 6D and 6E). The insertion was detected at low-levels in L1-IP data of some 50k-unsorted nuclei samples (Figure S7E), as expected for a low-level mosaic insertion, and with further optimization of our 3’PCR protocol (increased DNA input and higher-cycle PCR) we were able to amplify the insertion from these bulk samples as well (Figure S7F). The remaining four candidates were each found by 3’PCR only in the single neuron in which they were identified by L1-IP. Three of the four had poly-A tails by 3’PCR product sequencing (the fourth had an indeterminate poly-A tail since the breakpoint was within a genomic poly-A) (Table S3). Our results illustrate the ability of single-cell sequencing to identify somatic L1Hs insertions and highlight the potential of single-cell sequencing to identify very low-level mosaic mutations in human tissue.
Given the low rate of L1 retrotransposition in neocortical progenitors of normal brains, we next studied the ability of single-neuron sequencing to characterize a pathogenic somatic point mutation in the brain. An open question regarding the pathophysiology of hemimegalencephaly is the lineage (developmental origin) of the pathologic cells (Flores-Sarnat et al., 2003). We recently identified a child with isolated hemimegalencephaly (HMG) caused by a somatic missense (E17K) point mutation in AKT3 present in the brain but not the blood (case HMG-3, Poduri et al., 2012) (Figure 7A). Due to intractable epilepsy, the malformed hemisphere was surgically removed, allowing application of our single-cell method to genotype single sorted cells from this surgical sample and study the origin of the pathologic cells.
Previous analysis of resected bulk tissue indicated that the mutation was present at ~35% mosaicism based on cloning of PCR products (Poduri et al., 2012). Interestingly, 39±7% (SE; corrected for AD) of single sorted neuronal (NeuN+) nuclei contained the mutation (Figures 7B, 7C, and Table S4), similar to the mosaicism in unsorted bulk tissue containing both neuronal and non-neuronal cells. This suggested that the mutation was also present in non-neuronal cells, consistent with the abnormality of both gray matter and white matter in this patient by MRI (Poduri et al., 2012; Figure 7A). Indeed, we confirmed the presence of the mutation in single non-neuronal (NeuN−) nuclei, at an average percent mosaicism (corrected for AD) of 27±8% (Figure 7C and Table S4). These data indicate that the mutation was present in an early neocortical progenitor capable of giving rise to both neuronal- and non-neuronal cells throughout the majority of the hemisphere. The low mosaicism in neurons also indicates that mutant and non-mutant neurons are extensively intermingled in the abnormal hemisphere, presumably reflecting diverse clonal origins of cortical neurons in this pathological condition.
Here we present a single-cell sequencing study of the central nervous system, and perform genome-wide analysis to trace patterns of somatic mutation in human brain. We confirmed that somatic retrotransposon insertions can be detected in normal human brain. However, our analysis of L1 insertions found that somatic insertions are rare in normal human cortical and caudate neurons, suggesting that L1 retrotransposition is not a major source of neuronal diversity in cerebral cortex and caudate nucleus. Finally, we used single-cell analysis to study the mosaicism of a somatic AKT3 mutation, highlighting the potential of single-cell sequencing for cell lineage analysis in human brain.
Our validation of a somatic L1Hs insertion with all the hallmarks of a bona fide retrotransposition event, including a 5’ transduction identifying its source, confirms that somatic L1Hs insertions are present in the normal human brain. The very low-level mosaicism of this insertion, and its detection only in cortical neurons, further suggests that it may have occurred during cortical development. The source L1Hs on chromosome 8 from which the somatic insertion originated lies in antisense orientation within an intron of the gene KCNB2, and is a full-length insertion with both open reading frames intact. Although it is present in the human genome reference, it is polymorphic in the population and was present only in individual 1465, but not the other individuals in this study (data not shown). In addition to this source L1Hs, only one other L1Hs element has been previously confirmed to be active somatically in humans (van den Hurk et al., 2007). Further single-cell studies will help delineate the spectrum of somatic activity of L1Hs elements in different tissues and developmental stages.
Our quantitative analysis of retrotransposition indicates that somatic L1Hs events are rare in adult human cortical pyramidal neurons and caudate neurons. We find that, although we can detect hundreds of known germline insertions in single neurons, >80% of neurons show no unique somatic insertions (i.e. present in one neuron but not multiple neurons). Somatic L1Hs insertions present in multiple neurons but not all neurons, as seen for the full-length somatic insertion we identified, are also rare. On the other hand, we cannot exclude greater rates of L1Hs activity in other cell types or regions of the human brain, or activity of Alu and SVA retrotransposons in the cortex and caudate. Variability in the number of highly active “hot” L1s per individual (Beck et al., 2010) may also lead to variability in somatic retrotransposition rates among individuals; however, the low number of somatic insertions in 300 neurons from 3 individuals precludes it from being an essential source of neuronal diversity in cortex and caudate that is common in humans.
Our results are generally consistent with the rates of ~1/10,000 to ~1/100 insertion events per human neural progenitor measured in an in vitro L1RP reporter assay (Coufal et al., 2009). This rate is far lower than the rate measured by quantitative PCR (Coufal et al., 2009; Muotri et al., 2010) which estimated a relative copy number increase of L1 of ~5–10% and an absolute estimate of ~80 somatic L1 insertions per cell in human brain. Studies employing targeted capture of L1 sequences from human brain (Baillie et al., 2011) also reported widespread L1 retrotransposition. These methods are less direct, and do not analyze individual neurons, but instead analyze pooled DNA from bulk tissue. Compared to sequencing of bulk tissue (Baillie et al., 2011), our approach of single-cell sequencing has the additional advantage that potential artifacts, such as chimeric reads, are easier to recognize because they are present at lower read depth relative to true insertions. The identification of mammalian species that appear to have lost all L1 activity (Cantrell et al., 2008) further suggests that L1 retrotransposition is not a universal requirement for mammalian neurogenesis. Recent L1 profiling of 26 glial brain tumors did not reveal any somatic insertions (Iskow et al., 2010; Lee et al., 2012a), indicating that somatic L1 insertions may be uncommon in glial progenitors as well. While our study suggests that somatic L1 retrotransposition in the human cortex and caudate is rare, it remains possible that neuronal L1 retrotransposition may occur at higher rates in other brain regions, such as the hippocampus, and/or play a role as a mutagen in the human brain in neurological disease.
Our analysis of a somatic retrotransposon insertion and a somatic AKT3 mutation, each found in more than one cortical neuron as well as at low levels in bulk DNA, suggests that both occurred in progenitor cells of the brain, and that other focal brain malformations of unknown etiology may be similarly caused by progenitor mutations during development. The somatic AKT3 mutation in hemimegalencephalic brain was found in both neuronal and non-neuronal cells, further indicating that the mutation occurred in a neuroglial progenitor. Moreover, the normal-appearing basal ganglia of this patient by MRI (data not shown) would be consistent with a mutation occurring in a neuroglial progenitor in the developing neocortex, but not involving the ventral telencephalon, though caudate tissue was not available for testing.
Our study suggests potential future applications of somatic mutations as cell lineage markers in post-mortem human brain. Although retrotransposon insertions appear too rare for systematic study of cell lineages, and the specific AKT3 mutation assayed here clearly changes the behavior of cells carrying the mutation (Poduri et al., 2012), deeper sequencing of single cells might eventually identify diverse, nonfunctional mutations, including mutations at highly mutable sites like microsatellite repeats (Frumkin et al., 2005; Salipante et al., 2008), that may allow more systematic interrogation of lineage relationships even in human post-mortem brain.
Full protocols can be found in Extended Experimental Procedures.
Fresh-frozen post-mortem tissues of 3 normal individuals and a trisomy 18 fetus (UMB1465, UMB4638, UMB4643, and UMB866) were obtained from the NICHD Brain and Tissue Bank at the University of Maryland. Hemimegalencephalic brain tissue from case HMG-3 (Poduri et al., 2012) was obtained following neurosurgical resection of the affected right hemisphere.
Nuclei were purified by sucrose cushion ultra-centrifugation and labeled with NeuN antibody (Millipore, MAB377) for flow cytometry as previously described (Matevossian and Akbarian, 2008; Spalding et al., 2005). Single nuclei were sorted with a FACSAria II cell sorter into 96- or 384-well plates and amplified by MDA (Dean et al., 2002). Low-coverage sequencing libraries were made with the NEXTflex DNA-seq kit (Bioo Scientific).
L1Hs-insertion profiling libraries (L1-IP) were made by modification of the method of Ewing and Kazazian (2010) for a high-throughput workflow and high-level (up to 32-plex) multiplexing. Libraries were sequenced on HiSeq 2000 sequencers (Illumina). A custom data analysis pipeline was created to call and classify L1-IP peaks.
3’ junction PCR (3’PCR) was performed with one primer specific to L1Hs (L1Hs-AC-22) and a 5’ peak flank primer (upstream to the L1-IP peak), to verify the presence of the predicted insertion. Long-range PCR with 5’ and 3’ peak flank primers was performed to clone the entire length of candidate insertions.
G.D.E. and X.C. performed all experiments, with assistance from L.B.H, P.C.E., H.S.L, J.J.P., and K.D.A. G.D.E. and E.L. analyzed the L1-IP data with input from P.J.P. G.D.E. and X.C. analyzed all other data. G.D.E., X.C., and C.A.W conceived and designed the project with input from E.C.G and A.P. G.D.E., X.C., and C.A.W wrote the manuscript.
We thank Peter V. Kharchenko, Tim W. Yu, Vijay S. Ganesh and Nathan Silberman for helpful discussions; Hal Schneider, Richard Bennett, R. Sean Hill, and Christina Kourkoulis for technical assistance; Robert Johnson from the NICHD Brain and Tissue Bank; the Orchestra research computing support team (Harvard Medical School); and the Hematologic Neoplasia Flow Cytometry Core (Dana-Farber Cancer Institute). Brain image in Figure 1A adapted with permission from http://brainmuseum.org, supported by the US National Science Foundation. C.A.W. is supported by the Manton Center for Orphan Disease Research and grants from the NINDS (RO1 NS079277 and R01 NS35129). C.A.W. is an Investigator of the Howard Hughes Medical Institute.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Sequencing data from this study are deposited in the NCBI Sequence Read Archive (http://www.ncbi.nlm.nih.gov/sra) under the accession number SRA056303.