Transcript assembly
We selected a transcriptome assembly based on the features of the N50 plot over a range of k-mer lengths from 21 to 49. The point k

=

27 was the highest k-mer length in what seemed to be a plateau of N50 values before the N50 began to dip (Additional file
1: Figure S1). The average length over all sequences in this assembly is 599bp with an N50 value of 1,238bp, and when contigs shorter than 500bp are excluded these values rise to 1,429bp and 1,804bp respectively. These statistics indicate a population of robust and contiguous sequences among smaller fragmentary contigs. The final set obtained after mapping reads and removing those transcripts with fewer than 10 mapped reads contains 120,725 contigs with an average length of 878 bp and an N50 of 1,494 bp. We were able to annotate 45% of the assembled contigs with a total of 15,494 unique sequence descriptions.
While this set still contains many small fragments and some sequence overlaps, it indicates that two sequencing technologies are not necessary for
de novo assembly of non-model organisms as is common practice in recently published studies [
11,
12] with which our methods compare favorably.
Differentially expressed genes
There exists subtle but significant genetic structure within
F. grandis, where population divergence generally follows a pattern of isolation-by-distance with longitude across the northern Gulf of Mexico. Our “western” pool of samples (sites 1-3 on Figure ) included fish from the western phylogeographic cluster of populations, whereas the “eastern” pool of samples (sites 5-8 on Figure ) included fish from the eastern phylogeographic cluster of populations [
13]. Grand Terre, our oil-impacted site, is located near the center of the species’ range (site 4 on Figure ), and fish from this site have greatest affinity to the western phylogeographic cluster of populations. Among our reference fish are animals from Leeville, LA, which is only 21.26 km from the Grand Terre site; significant genetic structure is not observed within this species at that geographic scale [
13]. Since our reference samples are representative of the genetic diversity within the species, including from southern Louisiana, we consider it unlikely that genetic variation among populations confounds the treatment effects that we observe. Seasonal changes should also be minimal as fish were collected during the months of April (reference) and June (exposed). As shown in Figure , the unexposed samples cluster separately from the exposed samples bearing out the self-similarity of the pooled unexposed samples, and indicating global differences between the two sets.
At a significance level of P

<

0.01, we determined a set of 5,384 up and 5,882 down-regulated sequences (Figures and ). The assembly contains long well-assembled sequences in addition to short fragments, and the color-coding of points by length reveals that the longer sequences tend to provide more statistical power (green to orange points in Figure ). The short length of these reads, however, is beneficial when matching them to short sequence fragments. This allows many of the shorter contigs to also provide useful data (blue points in Figure ). Many of the sequences are alternative assemblies of the same original transcript or are fragments of the same transcript. Thus, we used the annotations derived from blast searching against the non-redundant database (nr) to group transcripts that likely represent the same gene by unique sequence annotations. This resulted in 1,070 down-regulated and 1,251 up-regulated unique sequence annotations. That still leaves 6,146 significantly differently expressed sequences that lack annotation, where 962 of these sequences are over 1kb in length. Some of these may be entirely new gene products unknown to prior studies while others are likely assembly errors.
Among the more interesting genes that were differentially expressed are two up-regulated aryl-hydrocarbon receptor sequences (
AHR1B and
AHR2) (Figure ). The response of
AHR pathway genes has been well documented following exposure to aromatic hydrocarbons, where activation of this pathway is diagnostic of exposure to this class of pollutant [
8,
14,
15]. Downstream in the AHR response pathway are cytochrome P450 genes;
CYP1C (mediated by
AHR2) and
CYP1B1 are found to be up-regulated in the exposed sample (Figure ), while others in the
CYP2N and
CYP2X sub-families are down-regulated (Additional file
1: Table S2). Whereas previous studies have highlighted the response of
CYP1A to dioxins, polycyclic aromatic hydrocarbons (PAHs), and other similar pollutants, in this analysis we detected no significant difference in
CYP1A mRNA expression upon exposure to DH oil. To verify this we searched the assembled
F. grandis transcript sequences using the
F. heteroclitus CYP1A sequence and found that only small fragments existed in the
de novo assembly. We then mapped
F. grandis reads to the
F. heteroclitus CYP1A sequence and analyzed its expression in the context of the assembled sequences. While there was an increase in the expression values in the oil exposed sample relative to either of the unexposed samples, it did not pass the p-value threshold. It may be that by the time of collection the animals had been exposed to levels and durations of DH oil or some prior exposure causing a de-sensitized response by
CYP1A response in the exposed population. Alternately, since a previous study that used the same RNA-Seq data did find a significant increase in
CYP1A, but used
F. heteroclitus ESTs as the reference transcriptome [
16], it could be that the residual fragmentary and erroneous sequences in the
de novo assembly led to reduced statistical power for some sequences.
Given clear damage to killifish gills caused by contaminating oil [
16], one might anticipate that compromised gill function, coupled with limited oxygen exchange between air and water in oil-slick contaminated areas, could result in hypoxic stress. In the up-regulated gene set from DH oil exposed animals we find several genes suggesting a hypoxic response.
Hypoxia inducible factor 2-alpha, its co-activator
E1A binding protein p300, and the activity enhancing
nuclear receptor coactivator 2 are all up-regulated [
17] (Figure ). It should be noted that the estuaries inhabited by
F. grandis often experience transient periods of hypoxia (see Additional file
1: Table S2), and given the broad range of sampling sites it is likely that our reference set includes RNA from individuals experiencing hypoxic conditions to some degree, thus capturing a range of normal and hypoxic expression levels for these genes. The inclusion of reference individuals exposed to hypoxia would make it more difficult to statistically identify a hypoxic response in the exposed population. Therefore, the genes indicated here are likely a conservative set of hypoxia response genes.
Another broad set of up-regulated genes in DH oil exposed fish indicates an immune response perhaps by circulatory leukocytes. The secreted form of the immunoglobulin mu heavy chain and heavy chain variable regions are both indicative of an active immune response. Tumor necrosis factor receptors 14 and 21, involved in immune response signaling pathways, are both up-regulated (Figure ). Previous studies have noted an association between an immune response induced by hypoxia and/or
HIF1 activation [
18], and hydrocarbons are well-known to alter immune system function [
19].
The set of genes that appears to be down-regulated in response to DH oil exposure is also large and diverse. One large group of down regulated transcripts includes a wide selection of 40S and 60S ribosomal proteins (Figure ). A similar response has been reported in hypoxic conditions in zebrafish gills [
20], and may be explained by an overall reduction in protein synthesis as a survival strategy during hypoxic conditions [
21].
One final group of genes that was previously observed to be strongly down-regulated are the group of choriogenins
ChgL,
ChgH, and
ChgHm[
16]. The sequences for these are highly repetitive and so did not assemble well. Thus they did not originally show up in our analysis except as unidentified small fragments albeit with expression levels that would confirm earlier observations. To better examine the expression levels in the context of the rest of the assembly, we added the
F. heteroclitus cDNA sequences to the assembled contigs and recalculated differential expression for these in the context of the rest of the assembled sequences. The results confirm that choriogenin genes are strongly down-regulated (Figure ). As has been previously noted, the expression of these genes is estrogen-dependent and is down-regulated by exposure to PAHs [
22].
Comparison to microarrays
The results of the microarray-based analysis correlate well with those measured by RNA-Seq (Figure ). One notable difference is that the dynamic range of expression levels seems to be much wider in the RNA-Seq method versus those observed for microarray results. The RNA-Seq method exhibits results for values ranging from approximately +5 to -14 log
2 fold change while the microarray results are in a more narrow range of approximately ±

3.5fold. This may indicate a greater sensitivity in the RNA-Seq based methodology which is an observation that is consistent with other similar comparisons [
23]. For the set of best hits between the data sets we calculated a Pearson’s correlation of
r
=

0.48, while the set of sequences found to be significant in both analyses had an
r value of 0.70. These indicate a reasonable correlation between the two data sets with a high degree of variability, and are similar to results of other such comparisons [
23]. Much of the variation between the data sets is likely related to the fact that only the exposed samples were common among the two analyses; the un-exposed samples in each used mRNA from different sets of fish. The unexposed RNA-Seq data represent two pools of 6 and 8 individuals collected from sites on both sides of the Mississippi river, while the microarray data set is comprised of a separate set of 75 individual animals collected in the summer of 2010 from different sites along the Gulf of Mexico.