There is little information about the DNA sequence variation within and between closely related plant species. The combination of re-sequencing technologies, large-scale DNA pools and availability of reference gene sequences allowed the extensive characterisation of single nucleotide polymorphisms (SNPs) in genes of four biosynthetic pathways leading to the formation of ecologically relevant secondary metabolites in Eucalyptus. With this approach the occurrence and patterns of SNP variation for a set of genes can be compared across different species from the same genus.
In a single GS-FLX run, we sequenced over 103 Mbp and assembled them to approximately 50 kbp of reference sequences. An average sequencing depth of 315 reads per nucleotide site was achieved for all four eucalypt species, Eucalyptus globulus, E. nitens, E. camaldulensis and E. loxophleba. We sequenced 23 genes from 1,764 individuals and discovered 8,631 SNPs across the species, with about 1.5 times as many SNPs per kbp in the introns compared to exons. The exons of the two closely related species (E. globulus and E. nitens) had similar numbers of SNPs at synonymous and non-synonymous sites. These species also had similar levels of SNP diversity, whereas E. camaldulensis and E. loxophleba had much higher SNP diversity. Neither the pathway nor the position in the pathway influenced gene diversity. The four species share between 20 and 43% of the SNPs in these genes.
By using conservative statistical detection methods, we were confident about the validity of each SNP. With numerous individuals sampled over the geographical range of each species, we discovered one SNP in every 33 bp for E. nitens and one in every 31 bp in E. globulus. In contrast, the more distantly related species contained more SNPs: one in every 16 bp for E. camaldulensis and one in 17 bp for E. loxophleba, which is, to the best of our knowledge, the highest frequency of SNPs described in woody plant species.