This is the first wholly Illumina-based assembly of an ascomycete genome and the third assembly to be reported for a necrotrophic plant pathogenic ascomycete [
31,
32]. As might be expected, the
P. teres f.
teres genome assembly demonstrates that the short paired-end reads can be used to effectively capture higher complexity gene-containing regions. The assembly was validated by comparison to BAC sequences, ESTs and by direct amplification of predicted sequences across SSRs. Based on the published assemblies for the phytopathogens
M. grisea and
S. nodorum [
31,
32], the number of predicted genes in
P. teres f.
teres is similar (11,089 versus 11,109 and 10,762, for genes larger than 100 amino acids or
S. nodorum version 2 gene models, respectively). Gene prediction algorithms, even when trained on ESTs from the species in question, are unlikely to correctly predict all coding regions in more complex genomes, and in some instances require further corroborating data from approaches such as proteomics and mass-spectrometry [
41]. Thus, the true number of genes may be less dependent on the assembly
per se and gene models may be further adjusted, concatenated or introduced.
The inevitable corollary of an assembly based on short paired-end reads is that low-complexity regions (containing low GC content, simple microsatellites and repetitive DNA) are under-represented. As a consequence, the assembly is composed of a large number of singleton contigs that are inappropriate for estimating the genomic proportions of such regions. To support the minimum estimate of the genome size based on the assembly, and to provide basic information on chromosome composition, we conducted PFG and GTBM karyotyping. From the PFG results, we concluded that
P. teres f.
teres most likely contains a minimum of 9 chromosomes but with band intensities suggesting 11 chromosomes is possible. This provided an estimated genome size of at least 35.5 Mbp and an upper value of 42.3 Mbp. Clumping and co-migration of bands is a common phenomenon in PFG, as shown, for example, by Eusebio-Cope
et al. [
42]. Resolution of co-migrating bands requires techniques such as Southern blotting [
43] and fluorescence
in situ hybridization [
44] for accurate discrimination. However, the cytological karyotyping correlated with the PFG results in depicting at least nine chromosomes. An upper estimate of nine chromosomes was postulated for
P. teres by Aragona
et al. [
45], although that study did not identify which
P. teres form was examined, and the technique used gave poor resolution of bands between 4.5 and >6 Mbp. Overall, the total assembly size in this study correlates with the higher estimate by elecrophoretic karyotyping and indicates a genome of at least 42 Mbp. This is somewhat larger than the Pleosporales assemblies reported to date for
Cochliobolus heterostrophus (34.9 Mbp; Joint Genome Institute),
P. tritici-repentis (37.8 Mbp; NCBI) and
S. nodorum (37.1 Mbp [
32]).
An expansion in genome size compared to other Pleosporales might be explained by the presence in the assembly of new classes of transposable elements and large numbers of novel repeats (over 60, although these data are incomplete due to poor assembly of degraded regions and therefore have not been shown). These in turn may also explain the large PFG chromosomal level polymorphisms between the two isolates examined here and the relatively large genetic map. Chromosomal level polymorphisms are a feature of some ascomycetes [
46]. Among plant pathogenic fungi, there is growing evidence that host-specificity genes and effectors are located in or next to transposon-rich regions [
31,
47]. This provides opportunities for horizontal acquisition, duplication and further diversification to generate new, species-specific genetic diversity or, where they are recognized as an avirulence gene, to be lost, a process that may also aid host range expansion. The contribution of transposons in
P. teres f.
teres pathogenicity has yet to be determined, although we have preliminary data showing that the avirulence gene
AvrHar is associated with transposon repeats on the second largest chromosome. There is no evidence in
P. teres f.
teres for small chromosomes <2 Mbp, as in
N. haematococca and
A. alternate, where they confer host-specific virulence [
48,
49], and in
Fusarium oxysporum, where they have been demonstrated to be mobile genetic elements conferring virulence to non-pathogenic strains [
50].
The analysis of the gene content of the genome assembly shows that it shares many of the characteristics of similar plant pathogenic fungi, and strong homology to most genes from
P. tritici-repentis. These include highly diverse proteins involved in host contact, signal transduction, secondary metabolite production and pathogenesis. Secreted proteins are of particular interest to plant pathologists since they represent the key interface of host-pathogen interactions, notably avirulence proteins and effectors. These are key components of inducing disease resistance and promoting disease, while expressed effector proteins offer tangible discriminating resistance assay tools in a variety of breeding programs. This is because fungal necrotrophic disease is the sum of the contribution of individual effectors [
51,
52] and single, purified effectors give a qualitative response when infiltrated into leaves. However, effector genes often encode small, cysteine-rich proteins with little or no orthology to known genes. Examples include
Avr2 and
Avr4 in
Cladosporium fulvum,
Avr3 in
F. oxysporum (reviewed in [
53]),
ToxA and
ToxB in
P. tritici repentis [
54,
55] and
SnToxA and
SnTox3 in
S. nodorum [
56,
57]. Identifying candidate effectors in the genome assembly in conjunction with genetic mapping, functional studies and proteomic approaches will in future aid their isolation.
We provide the first genetic linkage map of
P. teres f.
teres. The total length is nearly 2,500 cM, longer than that reported for other ascomycete fungal pathogens; 1,216 cM for
M. graminicola [
58], 1,329 cM for
Cochliobus sativus [
59], and 900 cM for
M. grisea [
60]. However, a genetic map of 359 loci for the powdery mildew fungus
Blumeria graminis f. sp.
hordei, an obligate biotrophic pathogen of barley, covered 2,114 cM [
61]. The length of the genetic map of
P. teres f.
teres may be a function of the relatively large genome size and the presence of large numbers of recombinogenic repetitive elements. This is paralleled by a greater number of linkage groups (25) compared to the estimated number of chromosomes that may also be suggestive of interspersed tracts of repetitive DNA.
The genetic map and karyotyping data will be instrumental in a final assembly of the P. teres f. teres genome, as they will allow scaffolds to be orientated and tiled onto linkage groups. A combination of the genome assembly and the genetic map provides an invaluable resource to identify potential effector candidate genes from phytotoxic protein fractions in conjunction with mass spectrometry peptide analysis. Genetically characterized SSRs provided in this study will also provide an important resource for the community in comparative mapping, gene-flow and genetic diversity studies. Further validation, assembly of low-complexity sequence regions, and genome annotation are now underway using proteomic approaches and 454 pyrosequencing. The priority now is to fully understand the mechanism of pathogenicity in P. teres f. teres in order to achieve a solution to control this pathogen.