Sweet cherry (
Prunus avium L.), a non-model crop, is an important non-climacteric member of sub family Amygdoloideae where other members like peach and plum demonstrate climacteric fruit ripening. Sweet cherry is a diploid (2n = 16) and is estimated to be slightly larger than peach, 225-300 MB [
1,
2]. Sweet cherry underwent a recent breeding-related genetic bottleneck that reduced the diversity present in the germplasm [
3]. Genetic variability can be utilized to screen for resistance to diseases and improve the efficiency of selecting desirable genotypes through breeding especially in sweet cherry where natural diversity is lacking. Types of variation at the nucleotide level are: microsatellites or simple sequence repeats (SSRs), single nucleotide polymorphisms (SNPs), insertions and deletions (indels) and genomic rearrangements [
4]. Identification of genetic diversity in species which lack significant genomic resources has typically been a time-consuming and laborious process.
SSR markers have been used extensively for population genetics and genome mapping studies in several members of Rosaceae [
5,
6]. SSR identification techniques are typically costly and time consuming [
7-
9]. Most published SSRs are located in the intergenic regions [
4]. A recent study in
Populus attempted to identify SSRs in exons or expressed gene fragments. The abundance of microsatellites within the coding region was three-fold lower than intergenic regions and, when present, microsatellites do not show useful allelic variability. Further, the authors concluded that candidate gene approach for development of microsatellites may not be the best strategy [
4]. While SSRs remain difficult to develop, SNP identification and validation has rapidly improved in past years mostly due to reduction of sequencing costs. Previously, direct sequencing of a gene of interest related to supernodulation was used to identify SNPs [
10]. Similar studies in non-model species lacking such resources require sequence information from related species. SNPs have also been used for anchoring a linkage map and bovine genome [
11]. Ganal et al. [
12] reviewed recent SNP identification methods including DNA arrays, amplicon sequencing, mining existing EST resources, and using sequence data generated with second generation sequencing technologies. Compared to other methods, re-sequencing applications were determined to produce a higher percentage of validated SNPs, while non-reference based next-generation sequencing, or
de novo, approaches required the least amount of
a priori genetic/genomic information. A major caveat of using second generation sequencing
de novo is the ability to acquire sufficient depth to accurately identify SNPs. Therefore, a reduced representation sequencing approach was recommended. Many reduced representation methods integrating high throughput sequencing are discussed by Davey et al. [
13] and the authors further elaborated on the utility of SNP-based molecular markers.
Continued improvements in second generation DNA sequencing technologies have increased the ability to obtain significant sequencing depth in a rapid and cost efficient manner, compared to Sanger sequencing approaches [
14]. Bundock et al., [
15] performed amplicon sequencing on genes of interest with 454 technology to produce a large number of reliable SNPs from two parents of a QTL mapping population of sugar cane finding high success rates for SNP verification (93%). Recently, next generation technologies have been widely utilized for sequencing transcriptomes of various species [
16-
18]. Eveland et al. [
19] reported a quantitative transcriptomics approach based on selective sequencing of the 3'UTR of mRNA from
Zea mays. Their work demonstrated a clear ability to resolve the expression of nearly identical genes (99% nucleotide identity) based on variation in the 3'UTR (97% nucleotide identity). Through comparison with sequences in multiple maize databases, 93.8% of the SNPs identified by Eveland et al. were confirmed [
19]. Use of a 3'UTR directed approach exploits the higher number of variations found in the 3'UTR region compared to the coding region of a gene. Higher sequence variation, combined with physical linkage to a specific gene, increases the potential impact of 3'UTR polymorphisms in connecting genetics and functional genomics studies especially in non-model eukaryotes. This is in contrast to current approaches where intergenic polymorphisms are used for scoring a segregating phenotype without the associated gene-related information. The method presented here utilized the positive aspects of 3'UTR sequencing, as a reduced representation approach, to facilitate rapid gene-linked SNP identification.
In addition to identifying polymorphisms, current research in human genomics has demonstrated the utility of developing haplotype information as a way to more fully understand genotype to phenotype relationships, especially in context of health, disease and response to environmental cues [
20-
22]. Generally, haplotypes are comprised of allelic variants on each of the two chromosomes at the same locus, though the definition and utilization varies in application from linking multiple polymorphisms across several loci down to multiple polymorphisms in a single gene [
23]. Additionally, haplotype determination has been aided by DNA strand specific or genomic phase-based information generated using second generation sequencing technologies since each sequencing read is from only one homologous chromosome and not a consensus of the two [
24]. Similarly, next generation RNA-seq and 3'UTR sequencing has the ability to reveal haplotypes within a gene [
25] and thus enable identification of allele specific sequence and its expression simultaneously. Here we present our approach that utilizes 3'UTR sequencing to rapidly develop SNP and haplotype markers in sweet cherry, a species without a published genome sequence and a non-model crop. Through
de novo assembly of 454 generated-3'UTR sequencing reads and strict filtering, we initially identified a putative set of contigs containing SNPs. Primer sets designed to amplify the regions of these contigs with putative SNPs were developed and used for High Resolution Melting (HRM) analysis among eight currently utilized parental cultivars of sweet cherry and 13 hybrid seedlings derived from a cross between two of the parental cultivars, respectively. We determined that 68 out of 223 (30.5%) and 65 out of 217 (30.0%) of the tested primer pairs are able to detect genetic variability. From these polymorphic sites, 685 haplotypes were identified from 301 contigs containing multiple SNPs.