Ethics Statement
Informed consent for the use of human tissues for research was obtained in writing from all donors or their next of kin. All non-human primates used in this study suffered sudden deaths for reasons other than their participation in this study and without any relation to the tissue used. Biomedical Research Ethics Committee of Shanghai Institutes for Biological Sciences completed the review of the use and care of the animals in the research project (approval ID: ER-SIBS-260802P).
Illumina Sequencing Experiment
Human tissue was obtained from the NICHD Brain and Tissue Bank for Developmental Disorders at the University of Maryland, Baltimore, MD. The role of the NICHD Brain and Tissue Bank is to distribute tissue and, therefore, cannot endorse the studies performed or the interpretation of results. All subjects were defined as normal controls by forensic pathologists at the NICHD Brain and Tissue Bank. No subjects who suffered a prolonged agonal state were used. For the prefrontal cortex, samples were taken from the frontal part of the superior frontal gyrus: a cortical region approximately corresponding to Brodmann Area 9. For all samples, similar proportions of grey and white matter were dissected. Total RNA was isolated from the frozen prefrontal cortex tissue using the Trizol (Invitrogen, USA) protocol with no modifications. Prior to low molecular weight RNA isolation, the total RNA from 20 male individuals aged between 14 and 58 years was combined in equal amounts. Low molecular weight RNA was isolated, ligated to the adapters, amplified and sequenced following the Small RNA Preparation Protocol (Illumina, USA) with no modifications. Technical replication was completed by independently processing the mixed sample of 20 individuals starting from the low molecular weight RNA isolation step. We carried out the sample preparation and deep sequencing by choosing 5 adult chimpanzee individuals and 5 rhesus macaque individuals following the protocols used for human samples. Details of all samples are given in
Table S1. All original deep sequencing data is deposited in the NCBI GEO database [GSE26545].
Agilent miRNA Microarray Experiment
Total RNA was isolated using the mirVana miRNA isolation kit (Ambion). 100 ng of each RNA sample were hybridized to Agilent Human microRNA Microarray (G4471A, Agilent Technologies). MicroRNA labelling, hybridization and washing were carried out following Agilent's instructions
[45].
Agilent microRNA assays integrate eight individual microarrays on a single glass slide. Each microarray includes approximately 15 k features containing probes sourced from the miRBase public database. The probes are 60-mer oligonucleotides directly synthesized on the array. We used Human miRNA Microarray Version3, which contains probes for 866 human and 89 human viral microRNAs from the Sanger miRBase v12.0. All samples used in the prefrontal cortex comparison among three species were hybridized to one array. Data from samples hybridized on a single array were processed and analyzed separately to avoid possible batch effects.
Images of hybridized microarrays were acquired with a DNA microarray scanner (Agilent G2565BA); Feature Extraction software v.10.5.1.1 (Agilent G4462AA) was uses for image analysis with default protocols and settings.
As miRNA microarray probes are based on human mature miRNA sequences, expression levels of miRNA with sequence differences among species cannot be measured reliably. All probes corresponding to 150 such miRNA between human and chimpanzee and 313 such miRNAs between human and rhesus macaque present on the array were masked prior to expression level analysis, based on the mature sequence comparison.
Affymetrix Exon Array Experiment
mRNA samples for Affymetrix Human Exon 1.0 ST Arrays were prepared following the standard GeneChip Whole Transcript (WT) Sense Target Labelling Assay. We processed Exon Array datasets following the steps described in
[46]. We processed the human, chimpanzee and rhesus macaque datasets separately. For the human dataset, in order to identify array probes that contain mismatches and multiple locations to human genome (hg18), we mapped Human Exon 1.0 ST probes to the human genome using Bowtie
[47]. Based on these alignments, we included probes that matched the genome perfectly and at a single location. For the rhesus macaque and chimpanzee datasets, we applied the same procedure by mapping probes to the rhesus macaque genome (MMUL1.0) and the chimpanzee genome (panTro2.1) separately. Finally, we chose probes that match the (i) human and chimpanzee genomes for human and chimpanzee gene expression comparison and, (ii) all three species' genomes for human, chimpanzee and rhesus macaque gene expression comparison. To determine whether the signal intensity of a given probe was above the expected level of background noise, we compared the signal intensity for each probe to a distribution of signal intensities of the anti-genomic probes with the same GC content. Anti-genomic probes are specifically designed by Affymetrix to provide an estimate of the non-specific background hybridization
[48]. A probe was classified as detected if its intensity was larger than the 95% percentile of the background probes with the same GC content
[48]. To further remove any possible systematic experimental bias among arrays, we performed a PM-GCBG correction and quantile normalization using the R package "preprocessCore" (
http://svitsrv25.epfl.ch/Rdoc/library/preprocessCore/html/00Index.html). Prior to norm-alization, all intensities were log2 transformed. A transcript was classified as detected if more than 80% of probes and at least ten probes per transcript were classified as detected. The intensities of transcripts were summarized by the median polish method. We used the Transcript Cluster Annotations file to map the transcript clusters annotated by Affymetrix to Ensembl genes (Ensembl54). In cases where multiple transcript clusters mapped to the same gene, we calculated gene expression as the median of all corresponding transcript clusters. None of the transcript clusters overlapped. All original microarray data is deposited in the NCBI GEO database [GSE26545].
Sample Preparation and Label-Free 2D-MS/MS Thermo-LTQ Proteomics Methodology
Protein sample preparation and 2D LC-MS/MS analysis and peptide identification are described elsewhere
[46]. Briefly, proteins were extracted from 100 mg of frozen cerebellar tissue samples. The resulting protein solution was incubatedovernight with Trypsin, followed by ultrafiltration and lyophilization. Lyophilized protein samples werethen dissolved in a loading buffer for the LC-MS/MS analysis. Peptide fractionation and analysis were performed in a pH continuous online gradient (pCOG) 2D LC-MS/MS system. Peptide identification was achieved by searching against a database of human peptides (IPI human v3.22) and its reversed version representing mock database using SEQUEST program in BioWorks 3.2 software suite. A mass tolerance of 3.0 Da and one missed cleavage site of trypsin were allowed. Cysteine carboxyamidomethlation was set as static modification and no other modification was checked. All output results were filtered and integrated to proteins by an in-house software “BuildSummary”. Using a false discovery rate (FDR) of less than 0.5%, all of the matches passing a certain Xcorr and delta CN were regarded as valid. Further, all the peptides that could be assigned to multiple proteins were removed. All identified protein IDs were mapped to Ensembl gene IDs using Biomart. Protein expression of each gene was calculated as a median copy number of all peptides, assigned uniquely to any of the isoforms of the corresponding gene. Genes with more than 5 peptides identified in human and chimpanzee brains were used in the miRNA target effect analysis. Based on this cutoff, we quantified protein expression for a total of 981 genes. The processed protein dataset is provided in
Table S7.
Quantitative RT–PCR
For mature miRNA quantification we used the TaqMan MicroRNA Assay (Applied Biosystems) system
[49]. cDNA was synthesized from 50 ng total RNA from in a 15 µl reaction volume, according to the TaqMan MicroRNA Assay protocol. By using hairpin primers targeting specifically mature miRNAs, reverse transcription was performed using the following program: 30 min at 16°C, 30 min at 42°C, 5 min at 85°C and then held at 4°C. For relative quantification by real time, 1.5 µl cDNA were used in a total reaction volume of 20 µl with 1 µl custom TaqMan assay using a Roche LC480 RT PCR System. Each measurement was performed in triplicate for each assay. At least two biological replicates for each species were used. Ct (threshold cycle) values of RT PCR were normalized to the endogenous control U6 measured together with the test samples. The relative expression of each miRNA was calculated as log2 of 2-Ct values.
Mapping of Sequence Reads to the Human, Chimpanzee, and Rhesus Macaque Genomes
We mapped the deep sequencing data following the mapping steps of
[50]. For each of the brain sequencing datasets, to remove the adapter sequence at the 3′-end of the sequence reads, all unique sequences were trimmed using the custom trimming procedure. The trimmed sequences of each species were mapped to the corresponding genomes, human genome (hg18), chimpanzee (PanTro2.1) and rhesus macaque (MMUL1.0), using SOAP2 algorithm
[51]. Only sequences perfectly matching the genome and with a length ranging from 18 to 28 nucleotides were retained.
Known and Novel Star miRNA Quantification
We quantified the miRNAs expression following the quantification steps of
[50]. First, all sequences with at least one read mapping within three nucleotides upstream or downstream of the 5′-position of the mature miRNAs were retained. Then, for each mature miRNA, the sequence with a maximal copy number was designated as the reference sequence. Finally, the expression level of each miRNA was calculated as the sum of the copy number of the reference sequence and the sequences mapping at the same 5′-end position as the reference sequence. Besides the quantification of known miRNAs, novel miRNAs were detected following
[50]. Specifically, for the miRNA precursors with one annotated miRNA, small sequences mapping to the opposite arm of the precursor hairpin were analysed. The sequence with the maximal copy number was considered as a novel miRNA candidate. A further criterion required the existence of at least 14 basepairs between an annotated miRNA and a novel miRNA candidate within the precursor hairpin. The quantification process for novel miRNAs was the same as for known miRNAs.
miRNA Ortholog Finding in the Chimpanzee and Rhesus Macaque Genomes
Human microRNA information was downloaded from miRBase version 12
[52]-
[54]. We used two steps for the ortholog finding; first, we extracted the best precursor orthologs by using a combination of reciprocal BLAT, BLAST and liftOver in chimpanzee and rhesus macaque genomes. Specifically, we mapped all annotated human miRNA precursors to the chimpanzee and rhesus macaque genomes using reciprocal BLAT, BLAST and liftOver, and required one precursor ortholog to be supported by at least 2 out of 3 methods. For reciprocal BLAT, we chose the following parameter configuration: [-stepSize

=

5 -repMatch

=

2253 -minScore

=

0 -minIdentity

=

0]. We further required the length of each hit sequence to be more than 70% and less than 130% of the query sequence. For reciprocal BLAST, we chose the parameter configuration [-F F -b 1 –e 10
−5] and again required the length of hit sequence to be more than 70% and less than 130% of query sequence. For reciprocal liftOver, we chose the website parameter configuration with Perl LWP module [hglft_minMatch

=

>0.6, hglft_minSizeT

=

>0, hglft_minSizeQ

=

>0 boolshad.hglft_multiple

=

>0] and similarly required the length of the hit sequence to be more than 70% and less than 130% of query sequence.
We next extracted mature miRNAs based on aligned precursor sequences using ClustalW2 and Muscle, with default parameters. The extracted mature sequence by ClustalW2 and Muscle were highly consistent (<0.1% difference).
miRNA Differential Expression Detection
The procedure for identifying differentially expressed miRNAs in deep sequencing data was as follows: We normalized data from two species belonging to the same brain region (e.g. human and chimpanzee prefrontal cortex) using quantile normalization. We then used statistical significance, fold-change and detection level as criteria for differential expression (Fisher's exact test p<0.01, fold-change>2, at least 10 sequence reads in at least one of the two species). We further required that the candidate miRNA should fulfil these criteria in both technical replicates in the prefrontal cortex. Normalization by the number of the total mapped reads (transcripts per million, TPM) produced almost identical results [data not shown].
Alternatively, to identify miRNA differentially expressed between humans and chimpanzees or between humans and macaques, we applied a procedure implemented in the edgeR package
[23] using the following criteria:
p<0.001, FDR<0.01.
For identifying differentially expressed miRNA in Agilent miRNA microarray data, a similar approach was used. We first quantile normalized data contained within one Agilent array, and then used both statistical significance and fold-change as criteria for differential expression (Student t-test, p<0.01, fold-change>2).
miRNA Target Effects' Detection
For the miRNA differently expressed between humans and chimpanzees, we expected targets of miRNA highly expressed in humans to be down-regulated in humans. We first used TargetScan5
[10],
[25] to predict the miRNA targets as this algorithm is reported to have relatively high sensitivity and specificity
[26]. To test target effects on the mRNA level, we normalized gene expression between species using quantile normalization and excluded genes with absolute difference between species smaller than 0.5. Using the Wilcoxon signed-rank test, we then compared the expression difference between the targets of miRNA that were highly expressed in humans with targets of miRNA that were highly expressed in chimpanzees. Before applying the Wilcoxon signed-rank test, the genes that were targeted by both miRNA highly expressed in humans and miRNA highly expressed in chimpanzee (
i.e. targets with inconsistent miRNA effects) were excluded.
Due to greater intra-species variation in the protein data, when testing the miRNA target effects on protein expression, we revised the method to use the effect size to represent the expression difference between species. Only genes with absolute effect size greater than one were used in analysis.
To check the robustness of detected target effect at both mRNA and protein levels, we used different expression level cutoffs for identification of differentially expressed miRNA, which yielded qualitatively the same result as reported in the main text (
Table S8). We further determined that the target effects could be reproduced using another target prediction algorithm, PITA (TOP)
[27].
To calculate the percentage of differently expressed genes that could be explained by miRNA expression divergence between human and chimpanzee, we first identified differentially expressed genes with FDR less than 2% at the mRNA level and less than 5% at the protein level (FDR was estimated using 1000 permutations). Among these genes, we determined the percentage that were targeted by differentially expressed miRNA, where at least one miRNA-target gene pair showed expression change in opposite directions.
miRNA Transfection and Microarray Experiments
miRNA transfection experiments were conducted on two human derived neuroblastoma cell lines (SH-SY5Y and SK-N-SH) (
Table S1). Briefly, cells were plated in 0.5 ml of growth medium, without antibiotics, 24 h prior to transfection.miRNA mimics-Lipofectamine 2000 (Invitrogen) complexes were prepared freshly before transfection according to the manufacturer's protocol.SH-SY5Y and SK-N-SH cells were transfected in six-well plates using miRNA mimics-Lipofectamine 2000 with a final oligonucleotide concentration of 10 nmol/L. In parallel, negative control transfections with mock oligonucleotides were conducted according to the manufacturer's protocol. For each cell line, transfections with negative control oligonucleotides were carried out in two independent replicates. Cells were harvested after 24 h, total RNA were extracted with Trizol reagent(Invitrogen) and further processed and hybridized to Affymetrix Human Genome U133 Plus 2.0 arrays following the manufacturer's instructions. The gene expression levels were determined using R
RMA package. All original microarray data are deposited in the NCBI GEO database [GSE26545].
Functional Analysis of miRNA with Species-Specific Expression
We used DIANA-mirPath
[33] to determine putative functions of species-specific miRNA. DIANA-mirPath is a web-based computational tool that has been developed to identify molecular pathways potentially altered by the expression of single or multiple microRNAs
[33]. The software performs an enrichment analysis of multiple microRNA target genes by comparing each set of microRNA targets to all known KEGG pathways. We chose TargetScan5 and PicTar as target prediction tools and required a score threshold of 6.9 (
p<0.001) (
Table S10). Based on the DIANA-mirPath algorithm, targets of miR-184, miR-487a and miR-299-3p were significantly enriched in KEGG pathways that are related to neural functions (
Table S10). To test global significance of this result, 1000 simulations were done by randomly choosing five miRNA out of all 325 human miRNA expressed in brain (
Table S3) and applying the same test procedure. In 67 out of 1000 simulations, we observed three or more miRNA with enriched KEGG pathways that related to neural functions equal to or larger than the ones observed in the real data (permutations,
p
=

0.067).
In a parallel approach, the DAVID tool for functional annotation of gene sets
[32] was used to investigate the putative functions of genes targeted by human-specific or by chimpanzee-specific miRNAs, as predicted by TargetScan. Genes expressed in brain and targeted by human annotated miRNA were taken as background. Significant enriched biological processes based on the PANTHER (Protein ANalysis THrough Evolutionary Relationships) Classification System are listed in
Table S9 (Benjamini-Hochberg corrected Fisher's exact test
p<0.05)
[55].
Further, we used DAVID to investigate putative functions of experimentally verified target genes of miRNAs with human specific expression, based on our transfection results. Experimentally verified target genes were predicted by TargetScan and were required to show down-regulation by transfection of the corresponding miRNA in at least one of the two cell lines (
Table S13). Experimentally verified target genes expressed in brain were used in functional enrichment analyses. Genes expressed in both brain and at least one of two cell lines were used as a background. Significantly enriched biological processes based on the PANTHER Classification System and KEGG pathways are listed in
Table S15 (Fisher's exact test
p<0.05).
Effect of miRNA Regulation on Human-Specific Gene Expression Changing in Brain
To capture the majority of possible miRNA targets, including non-conserved ones, we combined predictions of 9 algorithms: TargetScan5
[10], PITA
[27], PicTar
[56], mirSVR
[57], MirTarget2
[58], microT v3.0
[59], TargetMiner
[60], Antar
[61] and 2step-SVM
[62] (
Table S12). In order to classify predicted targets as experimentally verified, we calculated target FDR, for each algorithm, based on the inhibitory effect observed in cell line transfection experiments. Specifically, we calculated proportions of predicted target genes and non-target genes inhibited after transfection in both cell lines, at a certain inhibition cutoff (calculated as the difference in expression between miRNA transfection and the negative control). FDR was calculated as the ratio of the proportion of non-target genes passing this inhibition cutoff compared to the total proportion of target genes expressed in the corresponding cell line. The unions of targets predicted by the 9 algorithms at FDR<10% cutoff were used as experimental verified miRNA targets, except for miR-34c-5p, which target FDR was taken at 15% due to a weaker inhibition effect, observed in our transfection experiment (
Table S14).
Genes with human-specific and chimpanzee-specific expression were determined by comparison of human-macaque and chimpanzee-macaque expression distances. Genes with greater human-macaque distance were classified to have human-specific expression. Although this requirement is non-conservative, it results in enrichment for genes with human-specific expression. Further, strict identification of human-specific gene expression changes was not a focus of this study. Fisher's exact test was used to determine whether genes with human-specific expression, and showing inverse expression change compared to a given miRNA, were enriched among experimental verified targets of this miRNA. Target genes of this miRNA that were not showing human-specific expression were used as a background.
In Situ Hybridizations
We designed two LNA-probes complementary to miR-184 and miR-299-3p respectively (
Table S11). Hybridizations were performed as described in
[63]. Briefly, 10 micrometer-thick tissue sections were collected on Superfrost/plus slides (Fisher). After washing in two changes of excess PBS, sections were acetylated with 0.1M triethanolamine/0.25% acetic anhydride for 10 minutes and then incubated in humidified bioassay trays for prehybridization at 55°C (20–25°C below the Tm of the probe) for 4 hours in hybridization buffer (5xSSC/lx Denhardt's solution/5 mM EDTA/0.1% Tween/0.1% DHAPS/50% deionized formamide/0.1 mg/ml Heparin and 0.3 mg/ml yeast tRNA)
[63],
[64]. This procedure was followed by an over-night hybridization step using a DIG-labelled LNA oligonucleotide probe complementary to the target miRNA. Below the temperature of 55°C, sections were rinsed and washed twice in 2xSSC and 3 times in 0.2xSSC. The
in situ hybridization signal was detected by incubation with alkaline phosphatase (AP)-conjugated anti-DIG antibody, using NBT/BCIP as substrate for 3–12 minutes.
Positive Selection in miRNA Regulatory Regions
SNPs used in a genome-wide scan for signals of positive selection in the human lineage, since divergence from the Neanderthal lineage (Selective Sweep Scan or S SNPs)
[40], were downloaded from UCSC. Following the published procedure, SNPs were defined as human derived when at least four out of six modern human genomes were derived while all observed Neanderthal alleles were ancestral
[40]. An overrepresentation of human derived SNPs in a region would imply that the region had undergone positive selection in the modern human lineage, since divergence from Neanderthals. 50kb sliding windows with a 10 kb step were used to scan the human derived SNPs along the human genome. For five human specific miRNA, we used Fisher's exact test to check overrepresentation of human derived SNPs in each sliding window, with 150 kb region upstream of the annotated miRNA precursor. Four windows in an upstream region of miR-34c-5p were significant at Bonferroni corrected
p<0.05. To test the global significance of this result, 1000 simulations were performed by randomly choosing five miRNA precursors out of all 622 annotated human miRNA precursors, and the same test procedure applied. In 44 out of 1000 simulations we observed four or more sliding windows with Fisher's exact test p-values equal to, or smaller than, the ones observed in the real data (permutation
p
=

0.044). Putative functions of miR-34c-5p targets were determined using CORNA
[65], using experimentally verified target genes of miR-34c-5p as predicted by the 9 aforementioned algorithms, at FDR

=

15%. Genes expressed in brain were used as a background.