|Home | About | Journals | Submit | Contact Us | Français|
Transcriptomic resources for coral species can provide insight into coral evolutionary history and stress-response physiology. Goniopora columna, Galaxea astreata, and Galaxea acrhelia are scleractinian corals of the Indo-Pacific, representing a diversity of morphologies and life-history traits. G. columna and G. astreata are common and cosmopolitan, while G. acrhelia is largely restricted to the coral triangle and Great Barrier Reef. Reference transcriptomes for these species were assembled from replicate colony fragments exposed to elevated (31°C) and ambient (27°C) temperatures. Trinity was used to create de novo assemblies for each species from 92–102 million raw Illumina Hiseq 2 × 150 bp reads. Host-specific assemblies contained 65 460–72 405 contigs, representing 26 693–37 894 isogroups (~genes) with an average N50 of 2254. Gene name and/or gene ontology annotations were possible for 58% of isogroups on average. Transcriptomes contained 93.1–94.3% of EuKaryotic Orthologous Groups comprising the core eukaryotic gene set, and 89.98–91.92% of the single-copy metazoan core gene set orthologs were complete, indicating fairly comprehensive assemblies. This work expands the complement of transcriptomic resources available for scleractinian coral species, including the first reference for a representative of Goniopora spp. as well as species with novel morphology.
A growing body of genomic information for reef-building corals has resolved phylogenetic relationships and helped reveal how this unique taxonomic group calcifies and responds to thermal stress [1–4]. Such information is critical for understanding the adaptive capacity of these ecologically important organisms, particularly in an era of global climate change . Transcriptomic and/or genomic resources are currently available for 23 scleractinian species representing 14 genera and 11 families [1, 4, 6–16]. We assembled the transcriptomes of 3 scleractinian coral species: the congeners Galaxea astreata, G. acrhelia, and Goniopora columna. This is the first sequence resource for Goniopora spp. and extends the phenotypic diversity represented by coral transcriptomic resources to include submassive (G. astreata) and columnar (G. columna) morphologies , which should facilitate additional insight into the evolutionary history of this taxonomic order.
Samples of Galaxea astreata and Galaxea acrhelia were collected from Davies Reef (18°49.816’S, 147°37.888’E) on 8–11 April 2015, and samples of Goniopora columna were collected from Pandora Reef (18°48.778’S, 146°25.593’E) on 20–22 April 2015 under Great Barrier Reef Marine Park Authority permit G12/35 236.1 and G14/37 318.1.
To generate more comprehensive reference transcriptomes, 4–5 replicate cores of a single colony were subjected to a 2-week temperature stress experiment as described in Kenkel and Bay (2017) , and paired samples from control (27°C) and heat (31°C) treatments were snap-frozen in liquid nitrogen on day 2, day 4, and day 17 (Table (Table1;1; note for G. acrhelia, heat-treated fragments were only included for day 4 and day 17). Samples were crushed in liquid nitrogen, and total RNA was extracted using an Aurum Total RNA mini kit (Bio-Rad, Irvine, CA, USA). RNA quality and quantity were assessed using the NanoDrop ND-200 UV-Vis Spectrophotometer (Thermo Scientific, Waltham, MA, USA) and gel electrophoresis.
For transcriptome sequencing, RNA samples from replicate fragments were pooled in equal proportions, and ~1 μg was shipped on dry ice to the Oklahoma Medical Research Foundation NGS Core, where Illumina TruSeq Stranded libraries were prepared and sequenced on 1 lane of the Illumina Hiseq 3000/4000 to generate 2 × 150 PE reads.
Sequencing yielded 92–102 million raw PE reads (Table (Table1).1). The fastx_toolkit  was used to discard reads <50 bp or having a homopolymer run of “A” ≥9 bases, retain reads with a PHRED quality of at least 20 over 80% of the read, and to trim TruSeq sequencing adaptors. Polymerase chain reaction duplicates were then removed using a custom perl script . Remaining high-quality filtered reads (26–35 million paired reads, 4–6 million unpaired reads) (Table (Table1)1) were assembled using Trinity v. 2.0.6 (Trinity, RRID:SCR_013048)  using the default parameters and an in silico read normalization step at the Texas Advanced Computing Center at the University of Texas at Austin.
Since corals are “holobionts” comprised of host, Symbiodinium, and other microbial components, resulting assemblies were filtered to identify the host component following the protocol described in Kitchen et al. (2015) , with one modification. Briefly, small clusters (= contigs, <400bp) were removed, and a hierarchical series of blast searches against potential contaminants was conducted. First, assemblies were compared to the most complete Cnidarian rRNA database (SILVA: ABAV01023297, ABAV01023333)  using BLASTn , and good matches (bit-score >45) were removed. Next, transcriptomes were compared to a Cnidarian mitochondrial genome using BLASTn (Acropora tenuis, NCBI: NC_0 03522.1) , again discarding contigs with match bit-scores >45. The taxonomic origin of remaining contigs was identified using a series of BLASTx searches against the most complete coral and Symbiodinium gene models (coral: Acropora digitifera, adi_v1.01_prot, ; Symbiodinium: S. kawagutii, Symbiodinium_kawagutii.0819.final.gene.pep, ) and NCBI’s nonredundant (nr) protein database (downloaded 25 July 2016) . For a contig to remain in the host-specific assembly, it had to both match (E value ≤ 10−5) a gene in the coral proteome more closely than the Symbiodinium proteome and match a metazoan sequence or have no match in the nr database. In addition, contigs with no match to either proteome were also retained if they exhibited a best match to a Cnidarian in the nr database search, a slightly less stringent criterion than that used by Kitchen et al. (2015) . Annotation of host transcriptomes was performed following the protocols and scripts described in . Host contigs were assigned putative gene names and gene ontologies using a BLASTx search (E value ≤ 10−4) against the UniProt Knowledgebase Swiss-Prot database . EuKaryotic Orthologous Groups (KOG) annotations were assigned using a BLAST search against the core eukaryotic gene set from the CEGMA pipeline (CEGMA, RRID:SCR_015055)  and the WebMGA server (WebMGA, RRID:SCR_011951; )  and Kyoto Encyclopedia of Genes and Genomes (KEGG) IDs using the KAAS server [31, 32]. The stats.sh command of the BBMap package  was used to calculate GC content of host transcriptomes. Transcriptome completeness was evaluated through comparison to the Benchmarking Universal Single-Copy Ortholog v. 2 (BUSCO, RRID:SCR_015008)  set for metazoans using the gVolante server [35, 36].
The initial holobiont assemblies contained 164 996–185 625 contigs over 400 bp in length (N50 = 1543–1848). Of these, 34–94 were discarded as matching non-mRNAs (9–10 rRNA, 25–74 mitochondrial). Following screening for biological contamination, 64 249–68 968 contigs had a best match to the Acropora digitifera proteome, and of these, 59 875–65 367 matched either a metazoan or had no match in NCBI’s nr database. An additional 5585–7038 contigs matched neither proteome but exhibited a best hit to a Cnidarian in the nr database and were also retained. These host-specific assemblies represented 26 693–37 894 isogroups (~genes) with an average length of 1492–1894 bp and an N50 of 1984–2480 (Table (Table1).1). Mean GC content of host-specific assemblies was 42% (Table (Table1),1), which is consistent with other anthozoan transcriptomes where Symbiodinium reads have been effectively filtered . Protein coverage exceeded 0.75 for 37–41% of contigs (Table (Table1).1). Gene name and/or gene ontology annotations were possible for 16 196–19 306 (50.1–62.4%) of these isogroups based on sequence homology comparisons to the Swiss-Prot database (Table (Table1)1) . KEGG pathway annotation  resulted in 4488–4728 unique matches for 7105–8712 isogroups. Comparison of these assemblies to the core eukaryotic 248-gene set  revealed that 93.1–94.3% of KOGs were represented, and annotation of isogroups resulted in 23–24 unique KOG matches for 8700–10 025 isogroups (Table (Table1).1). Of the 978 core BUSCO gene sets for metazoans , 89.98–91.92% were found to be complete, while an additional 3.07–3.68% were partially assembled, indicating that assemblies are fairly comprehensive (Table (Table11).
These coral host-specific assemblies are sufficient for use as transcriptome references for Tag-based RNAseq (TagSeq) , a cost-effective method that was recently shown to be more accurate at quantifying gene expression levels than traditional RNAseq . The fasta files and associated annotation files have been formatted for direct use in the TagSeq read mapping  and GO-MWU analysis pipelines .
Raw reads are archived at NCBI’s SRA under project numbers PRJNA350363: Goniopora columna; PRJNA352640: Galaxea archelia; PRJNA352641: Galaxea astreata. Transcriptomes, annotation files, and other supporting data are available via the Gigascience repository, GigaDB . The assembled transcriptomes and associated annotation files can also be obtained from http://dornsife.usc.edu/labs/carlslab/data/ or from the Australian Institute of Marine Science Data Centre at http://data.aims.gov.au/metadataviewer/faces/view.xhtml?uuid=3c2d31c9-b921–491c-ae27–0d169fa98c84.
KEGG: Kyoto Encyclopedia of Genes and Genomes; KOG: EuKaryotic Orthologous Groups; TagSeq: Tag-based RNAseq.
Funding for this study was provided by an National Science Foundation International Postdoctoral Research Fellowship, DBI-1 401 165, to C.D.K. and funding from the Australian Institute of Marine Science to C.D.K. and L.K.B.
The authors have no competing interests to declare.
C.D.K. conceived and designed the experiments; C.D.K. and L.K.B. performed the experiments; C.D.K. performed bioinformatics analyses and wrote the first draft. L.K.B. contributed to revisions and read and approved the final manuscript.
A. Bouriat was instrumental in performing temperature stress experiments. S. Noonan, V. Mocellin, A. Severati, and M. Nayfa helped with coral collection, and P. Muir provided advice on taxonomic identification. Bioinformatic analyses were carried out using the computational resources of the Texas Advanced Computer Center (TACC).