Experimental design
This study was designed to provide a yearlong examination of gene activity responsible for the hibernation phenotype. shows the six time points when heart, skeletal muscle and white adipose tissue were harvested for the purpose of RNA preparation. The six points were chosen to provide comprehensive coverage of circannual events that contribute to the hibernation phenotype. Approximately 3.7 million cDNA sequence reads from 18 samples (6 time points ×3 tissues) were generated in 3 separate sequencing reactions. The sff (standard flowgram format) quality and sequence files from the 454 sequencing were deposited into the NCBI Short Read Archive with accession number SRA037446. The bioinformatic pipeline for converting these raw reads into identified sequence is summarized in .
Sequence reads and contig assembly
shows the size distribution of the raw sequence reads. After removing MIDs and adaptor sequences the mean trimmed read length was 335 nucleotides, with a standard deviation of 133 nucleotides. Of these reads, 3,125,337 were assembled into 140,703 contigs. Many of the finished contigs were approximately the size of the respective full-length cDNAs from the human mRNA Refseq database. is a histogram showing the range in contig size up to 19 Kb. As expected, overall contig abundance declined with increasing size.
The assembled contigs and remaining 619,767 unassembled sequences were used to identify proteins that are encoded by the raw sequence reads. , , show the RNA sequence read number for all six points in heart (), skeletal muscle () and white adipose tissue (). Approximately 90% of all the RNA sequence reads from heart and skeletal muscle were identified and matched to sequences in the human UniProt database, with only slightly lower percentages for read identification in white adipose tissue. Interestingly the mitochondrial genome encoded approximately 21% of the heart RNAs, 10% of the skeletal muscle RNAs, but only 2% of the WAT transcripts. The total number of human UniProt proteins matched by ground squirrel transcripts was 13,637 for heart, 12,496 for skeletal muscle, and 14,351 for white adipose tissue.
| Table 3Skeletal muscle read measurements. |
| Table 4White adipose tissue read measurements. |
Non-protein coding RNA and MALAT1
Reads and contigs which did not match human UniProt records included ribosomal RNA and other non-protein coding RNA. The ribosomal RNA was not surprising given its abundance, despite two rounds of oligo-dT selection. A comparison (using BLASTn) of our sequences with the fRNAdb
Mus musculus noncoding RNA database gave numerous matches
[14]. Unfortunately, the vast majority of these records were unannotated. A striking exception was the sequence for MALAT1 (also known as hepcarcin), a long (6 to 7 kb) noncoding RNA that is highly conserved in mammals and known to be involved in the regulation of RNA splicing
[15]. The expression of MALAT1 appeared to be quite variable, with the highest levels for heart and WAT seen in October ().
Sequence read density
The assembly of multiple raw sequence reads for a single mRNA generates high quality sequence due to the redundancy of reads spread throughout the contig. Multiple reads across the same region of the transcript increases sequence accuracy because sequencing errors can easily be identified and corrected. The overall distribution of reads for a single mRNA typically showed a higher density near the middle of the transcript. This distribution is expected if fragmentation and random priming are unbiased simply from the probability of overlaps along the transcript (a more mathematically detailed description is given in the next section).
To demonstrate sequence read coverage, shows the read density from two tissues over the largest contig encoding pancreatic triacylglycerol lipase (LIPP). LIPP is normally expressed in the pancreas, but during hibernation it provides low temperature lipolysis in both heart and white adipose tissue
[16],
[17],
[18]. The figure shows the majority of the reads are near the center of the coding region with fewer near the ends, including the 5′ UTR sequences which differ between the two tissues. This same trend was seen when the reads were projected over the much longer LIPP genomic sequence (). As predicted, the reads were confined to the exonic sequences of the LIPP gene as shown by the distinct spikes in read number. Due to the current 2× sequence coverage of the
I. tridecemlineatus genome, is not complete because the 3′ portion consists of six contigs that were assembled into a scaffold aligned against the human genome.
Assembly of ground squirrel mitochondrial genome from RNA reads
The mitochondrial genome is transcribed as a single transcript that is then processed into smaller pieces that are polyadenylated
[19]. The large number of mitochondrial reads obtained from heart and skeletal muscle enabled us to construct the complete
I. tridecemlineatus mitochondrial genome. The vast majority of reads are from the protein coding domain of a gene, but in rare cases reads spanned two genes or included rRNA, tRNA, and the D-loop sequence. These rare cases may be from fragments of the original full-length transcript. An apparently complete mitochondrial genome sequence was constructed from four of the largest contigs produced by MIRA (
Figure S1). The assembled genome sequence is 16,459 bases long, which is very similar to that of the Eurasian red squirrel (
Sciuris vulgaris) - the closest matching sequence in Genbank, which is 16,507 bases long (NCBI accession NC_002369). All of the standard mammalian genes - protein-coding, tRNA, and rRNA - are present and complete.
Modeling read density
The relative simplicity of mitochondrial transcripts - no introns or alternative splicing - provided a unique opportunity to examine potential biases in our library preparation. To do this we developed a mathematical model of read density.
To model the distribution of reads along a mitochondrial gene, we assume: (1) the gene produces a single transcript type of length L, and (2) the observed distribution of read lengths results from a uniform (unbiased) fragmentation of the original transcripts. If we have a read of length J≤L drawn from this distribution, it has L-J+1 possible starting locations along the transcript. For the I
th position on the transcript, the number of such read locations that overlap it is the minimum of {I, J, L-I+1, L-J+1}. So the probability p(I,J,L) of a read of length J overlapping position I in a transcript of length L is min({I, J, L-I+1, L-J+1})/(L-J+1). For fixed L and J this is a piecewise-linear function of I. An example graph of this function is shown in for the case L

=

2000 and J

=

400. We average these probabilities over all the observed read lengths for each mitochondrial gene, weighted by the frequency of those lengths, which results in an expected density that is somewhat smoother than the piecewise linear components. The predicted distributions from this model are shown in blue along with the actual read distributions (in black) in . Coefficients of determination, R
2, were computed for each protein-coding sequence and ranged from a high of 0.89 for COX2 to 0.05 for COX1. Deviations from the model are expected in some cases because overlapping genes do not satisfy the assumptions of the model. This does not explain the poor performance of the model for COX1, which has an anomalously low R
2 among the sequences that contain only one protein product.
Duplication of the ACOD gene
The nuclear encoded acyl-CoA desaturase (ACOD) mRNA is the single most abundant RNA transcript in white adipose tissue. ACOD converts a single carbon-carbon bond into a double bond thus turning a saturated fatty acid into an unsaturated fatty acid. This is an important reaction for a hibernating species because fatty acids with double bonds in the cis conformation have lower melting temperatures and thus greater fluidity. Examination of the thousands of ACOD reads reveals that the genome of I. tridecemlineatus encodes two distinct ACOD mRNAs. An alignment of the protein translations of these sequences with desaturases from human, mouse (Mus musculus), and golden hamster (Mesocricetus auratus), plus a corresponding maximum likelihood tree suggest that the I. tridecemlineatus ACOD gene locus was duplicated in a separate event from the Muroidea triplication (). Separating the normalized counts for reads mapping to ACOD into these two distinct genes shows that one is much more highly expressed, but both are differentially expressed within white adipose (). It is likely that other duplicated genes are present, but high read counts, such as those seen with ACOD, are necessary for reliable identification of duplicated sequences.
Differential gene expression
The fundamental physiological adaptation present in natural hibernators is a greatly reduced metabolic rate with accompanying reductions in body temperature, heart rate and respiration. In this study, changes in mRNA levels of specific genes throughout the year were analyzed to identify biological processes that are likely to be important for maintaining organ function during these physiological extremes.
To validate our transcriptome approach of measuring differential gene expression we performed qRT-PCR measurements of several genes across the year (
Figure S2). We tested the expression of 24 different genes in WAT isolated from individual animals (N

=

6) at five time points from August through March for correlation of quantitative RT-PCR results with the normalized 454 sequence count data (N

=

755 reactions). A list of the 24 genes can be found in the
Figure S2 legend. The analysis reveals a highly significant correlation between the two methods (Pearson's coefficient of correlation

=

−0.5614, 95% upper limit

=

−0.4558, 95% lower limit

=

−0.5614, p<0.0001) that supports the use of normalized 454 counts as a measure of relative gene expression across groups.
, , highlight some of the genes that met our stringent false discovery rate (FDR) cutoff (1.0×10
−11) with high mRNA levels at specific time points. The very small FDR was chosen in part to ensure that sufficient numbers of reads were present for reliable identification and assembly of contigs. Our criteria were likely to exclude genes that shared functional relationships but had lower mRNA levels than the genes displayed in –. However, we believe that our approach provides an unbiased picture of the seasonal transcriptome, and that the highlighted genes accurately reflect important changes in the physiology of each tissue. The normalized transcript levels for all genes identified at each of the six time points for all three tissues can be found in
Table S1.
The 1.0×10
−11 FDR criterion resulted in a final set of 155 heart-expressed genes with median mRNA expression of 882 normalized counts (1857, 453; upper and lower quartiles, respectively) across the six samples with Fisher's exact test p-value ≤1.4×10
−13. In skeletal muscle, 92 genes met our FDR cut off and had a median mRNA expression of 1037 (2180, 510) normalized counts and Fisher's exact test p-value ≤9.6×10
−14. In WAT, 132 genes met our FDR criteria and had median mRNA expression of 605 (1217, 232) normalized counts across samples and Fisher's exact test p-value ≤6.7×10
−14. The complete list of expressed genes that meet our FDR criterion for each tissue can be found in
Table S2.
Heart
We highlighted three examples of time point-specific differential gene expression for each of the three tissues. For heart we show highly expressed genes in torpor, torpor and IBA, and March time points (). Up-regulated genes during torpor () included sodium/calcium exchanger 1 (NAC1, 5.3-fold over March), pyruvate dehydrogenase kinase, isoenzyme 4 (PDK4, 8.0-fold over October) and ryanodine receptor 2 (RYR2, 3.5-fold over March).
Three genes that show high expression in heart across both the torpor and IBA time points () include spectrin beta chain, brain 1 (SPTB2, 3.0-fold over March), prostrate androgen-regulated mucin-like protein 1 (PARM1, 2.7-fold over March), and sodium/potassium-transporting ATPase subunit alpha-1 (AT1A1, 1.8-fold over April). Three mitochondrial genes showing high expression in the heart during March () included NADH-ubiquinone oxidoreductase chain 1 (NU1M, 2.6-fold over IBA), cytochrome b (CYB, 2.4-fold over IBA), and cytochrome c oxidase subunit 2 (COX2, 2.4-fold over IBA).
Skeletal muscle
Highly expressed skeletal muscle genes were shown for April, October, and torpor (). The five most highly expressed skeletal muscle genes in April () included sarcopilin (SARCO, 10.5-fold over August), glutamine synthetase (GLNA, 5.4-fold over IBA), mitochondrial S-type creatine kinase (KCRS, 5.1-fold over IBA), slow skeletal muscle troponin T (TNNT1, 2.9-fold over August) and mitochondrial 3-ketoacyl-CoA thiolase (THIM, 4.7-fold over August).
We highlight four highly expressed genes in skeletal muscle during October (). They are polyubiquitin-C (UBC, 2.3-fold over March), DNA damage-binding protein 1 (DDB1, 1.6-fold over August), ankyrin repeat and SOCS box protein 2 (ASB2, 3.6-fold over April) and coiled-coil domain-containing protein 69 (CCD69, 2.3-fold over IBA). Five highly expressed genes from skeletal muscle were highlighted for the torpor timepoint (). These genes included protein NDRG2 (NDRG2, 3.3-fold over April), pyruvate dehydrogenase kinase, isoenzyme 4 (PDK4, 4.1-fold over October), heat shock protein HSP 90-beta (HS90B, 1.4-fold over August), microtubule-associated protein 4 (MAP4, 2.0-fold over April), nuclear factor erythroid 2-related factor 1 (NF2L1, 3.6-fold over March).
White adipose tissue
Highly expressed genes in WAT were highlighted for the August, October, and IBA time points (). We have excluded the April point from our analysis because inclusion of mesenteric fat may have introduced some contamination with pancreatic tissue
[7] in the April WAT samples. The genes acyl-coA desaturase (ACOD, 3.2-fold over March) and fatty acid binding protein 4 (FABP4, 1.7-fold over March) were highlighted for WAT in August (). Notably, both ACOD and FABP4 were the most highly expressed genes detected in WAT at any of the six time points.
Three genes expressed highly in WAT during October included leptin (LEP, 6.4-fold over March), albumin (ALBU, at least 21.8-fold over all other points), and group XVI phospholipase A2 (PAG16, 2.2-fold over August), (). Interestingly, expression of albumin was barely detectable at any time point other than October. Genes that were highly expressed during IBA in WAT () included pyruvate dehydrogenase kinase, isoenzyme 4 (PDK4, 4.3-fold over October), phosphoenolpyruvate carboxykinase (PCKGC, 1.8-fold over August), annexin A8 (ANXA8, 3.4-fold over March), and hydroxymethylglutaryl-CoA synthase (HMCS2, 3.8-fold over August).