A hierarchy of 3,688 phyletic patterns was characterized encompassing more than 5,000 known protein-coding genes from 66 complete microbial genomes. The results indicate that gene loss and displacement has occurred in the evolution of most pathways.
Phyletic patterns denote the presence and absence of orthologous genes in completely sequenced genomes and are used to infer functional links between genes, on the assumption that genes involved in the same pathway or functional system are co-inherited by the same set of genomes. However, this basic premise has not been quantitatively tested, and the limits of applicability of the phyletic-pattern method remain unknown.
We characterized a hierarchy of 3,688 phyletic patterns encompassing more than 5,000 known protein-coding genes from 66 complete microbial genomes, using different distances, clustering algorithms, and measures of cluster quality. The most sensitive set of parameters recovered 223 clusters, each consisting of genes that belong to the same metabolic pathway or functional system. Fifty-six clusters included unexpected genes with plausible functional links to the rest of the cluster. Only a small percentage of known pathways and multiprotein complexes are co-inherited as one cluster; most are split into many clusters, indicating that gene loss and displacement has occurred in the evolution of most pathways.
Phyletic patterns of functionally linked genes are perturbed by differential gains, losses and displacements of orthologous genes in different species, reflecting the high plasticity of microbial genomes. Groups of genes that are co-inherited can, however, be recovered by hierarchical clustering, and may represent elementary functional modules of cellular metabolism. The phyletic patterns approach alone can confidently predict the functional linkages for about 24% of the entire data set.