Sequence identification of ESTs from non-model species offers distinct challenges particularly when these species have duplicated genomes and when they are phylogenetically distant from sequenced model organisms. For the common carp, an environmental model of aquacultural interest, large numbers of ESTs remained unidentified using BLAST sequence alignment. We have used the expression profiles from large-scale microarray experiments to suggest gene identities.
Expression profiles from ~700 cDNA microarrays describing responses of 7 major tissues to multiple environmental stressors were used to define a co-expression landscape. This was based on the Pearsons correlation coefficient relating each gene with all other genes, from which a network description provided clusters of highly correlated genes as 'mountains'. We show that these contain genes with known identities and genes with unknown identities, and that the correlation constitutes evidence of identity in the latter. This procedure has suggested identities to 522 of 2701 unknown carp ESTs sequences. We also discriminate several common carp genes and gene isoforms that were not discriminated by BLAST sequence alignment alone. Precision in identification was substantially improved by use of data from multiple tissues and treatments.
The detailed analysis of co-expression landscapes is a sensitive technique for suggesting an identity for the large number of BLAST unidentified cDNAs generated in EST projects. It is capable of detecting even subtle changes in expression profiles, and thereby of distinguishing genes with a common BLAST identity into different identities. It benefits from the use of multiple treatments or contrasts, and from the large-scale microarray data.