For the 23 genes 8,631 high confidence SNPs across the four species were identified from 1,764 individuals that represented the full geographic range of each species. Our strict criteria undoubtedly led to the exclusion of many 'real' SNPs, but ensured that there were no false-positives in our SNP identification. There should be no ascertainment bias due to our sampling approach, however, a major potential bias has been raised with re-sequencing methods, namely PCR amplification from pooled samples [11
]. Pools of DNA from four species were used in this SNP discovery project and the primers were designed inside the exons to enhance our chance of even and equal amplification within and across all pools. Our success rate for using primers across all species was 100%. Nevertheless, this came at a cost, since we had to exclude data from genes where the primers clearly amplified more than one locus, namely the large gene families of terpene-, squalene- and chalcone- synthases. The study was successful because the reference or consensus sequences in the four species showed high identity. It is unclear whether exclusion of gene families led to bias in estimation of overall levels of SNP diversity.
A general conclusion that can be drawn for the four species is that within all genes there is more SNP variability in introns than in the exons. In only three of the 92 comparisons (hmgs
in E. nitens
in E. loxophleba
) do more SNPs occur in exons than introns. The lowest SNP density in the exons was found in E. nitens
with one SNP in every 39 bp, and the highest in E. camaldulensis
with one SNP in every 21 bp, which is similar to estimates of one in 43 bp and one in 50 for ESTs from maize [29
]. For angiosperm trees comparable estimates are one in 25 bp for Quercus crispula
] and one every 60 bp in Populus tremula
]. In all these studies a limited number of individuals were sampled. Our estimates are much higher than comparable SNP frequencies previously reported in Eucalyptus
of one SNP in every 192 bp [15
]. This could be explained by our experimental design, which examined a comprehensive set of populations over the geographical range of each species, in contrast to E. grandis
where only three individuals from each of seven families were examined [15
There are different levels of SNPs between genes within species. Eucalyptus camaldulensis
and E. loxophleba
have higher levels of polymorphism for individual genes than E. globulus
and E. nitens
, but there is a smaller range in SNP polymorphisms between genes. A high proportion of the discovered SNPs are shared between several species. While some of these may have occurred independently after species separation, many would have been present before the speciation events. If evolutionary time from speciation is the dominant factor then one would expect that genes from other biosynthetic pathways in the same eucalypt species will show similar patterns. There is much uncertainty as to how long ago the species separated. The proposed separation age ranges from 5 - 10 mya for E. globulus
and E. nitens
to 20 - 42 mya for E. loxophleba
from the other three species [23
]. That so many SNPs are in common between species suggest selective forces have maintained many of them over this period. Similarly, the unusually large proportion of non-synonymous SNP sites, especially of common SNPs, along with the high similarity of proportions of synonymous versus non-synonymous SNPs across species suggests maintenance of these SNPs through selection.
It is noticeable that the two sister species E. globulus
and E. nitens
have very similar levels of SNP diversity overall and at the intron and exon level. The similar proportions of common SNPs in introns for E. globulus
and E. nitens
could result from evolutionary lineages of comparable age. In fact, they share about 28% of their SNPs, even though they have been separated for several million years. Morphologically they are quite distinct species. For the ten other species in the small taxonomic group Globulares
to which these two species belong [22
], we hypothesise that similar patterns of polymorphism will be found for the same functional set of genes. Will other eucalypt species pairs show similar patterns and what does it infer about evolutionary relationships within and between groups of species?
Eucalyptus camaldulensis has the highest numbers of SNPs, especially of rare alleles both in the exons and introns. This species has the largest geographic range of any eucalypt species and the most number of natural populations of the four species that were sampled. Perhaps the species with the greater number of separate evolving populations will have a greater array of rare SNPs. A SNP data set from individuals rather than pooled bulks of DNA would allow examination of this hypothesis. The several subspecies within both E. loxophleba and E. camaldulensis suggest greater evolutionary divergence within these species and this seems to be reflected in the higher SNP diversity in the two species. The higher intron/exon ratios of SNPs in these two species could reflect that they represent older evolutionary lineages which have enabled greater accumulation of SNP alleles over time, especially in introns, where selective forces could be weaker. Similar differences in intron/exon SNP ratios may occur in other eucalypts and plant groups as a result of differences in length of evolutionary lineages.
In a study of nine genes of the phenylpropanoid pathway in A. thaliana
no association was detected between sequence diversity and position in the pathway [7
]. In our study structural genes of the terpenoid and flavonoid biosynthesis pathways, which are important in plant-herbivore interactions [33
], were used. Essentially, no relationship was found between the levels of SNP diversity in genes and their position in the pathways or between pathways. Even without some gene sequences the coverage of the pathways was sufficient to make these conclusions. Most of the genes studied appear to be under purifying selection. Similar results have been reported in other plants [15
]. In forest trees current data suggest 15-20% of genes are under some form of selection [37
]. It is possible that the assumption for a genome-wide neutral model does not apply for the eucalypt species. Whether many of the observed patterns are due to common demographic factors rather than selection may be resolved when nucleotide diversity estimates are available at the population level.
The hypothesis that entry point enzymes such as dxs
control the downstream production of terpenoids [39
] is not reflected in lower levels of SNP diversity in the corresponding genes, but may be reflected by the low ratios of pN/pS. Nevertheless there could be significant associations between SNP polymorphisms and concentrations of final products in these genes. Furthermore, we only examined structural genes here and there could be strong selection on the unknown regulatory elements involved in the pathway. Recent studies have found evidence of different patterns of polymorphism between different functional gene classes with genes interacting with the environment having high levels of SNP diversity [9
]. Examination of the data set in this study with genes in pathways responsible for other phenotypic traits for the same eucalypt species and individuals will enable similar comparison