Since the first genome of a halophilic archaeon was sequenced in 2000, biologists have been advancing the understanding of genomic characteristics that allow for survival in the harsh natural environments of these organisms. An increase in protein acidity and GC-bias in the genome have been implicated as factors in tolerance to extreme salinity, desiccation, and high solar radiation. However, few previous attempts have been made to identify novel genes that would permit survival in such extreme conditions.
With the recent release of several new complete haloarchaeal genome sequences, we have conducted a comprehensive comparative genomic analysis focusing on the identification of unique haloarchaeal conserved proteins that likely play key roles in environmental adaptation. Using bioinformatic methods, we have clustered 31,312 predicted proteins from nine haloarchaeal genomes into 4,455 haloarchaeal orthologous groups (HOGs). We assigned likely functions by association with established COG and KOG databases in NCBI. After identifying homologs in four additional haloarchaeal genomes, we determined that there were 784 core haloarchaeal protein clusters (cHOGs), of which 83 clusters were found primarily in haloarchaea. Further analysis found that 55 clusters were truly unique (tucHOGs) to haloarchaea and qualify as signature proteins while 28 were nearly unique (nucHOGs), the vast majority of which were coded for on the haloarchaeal chromosomes. Of the signature proteins, only one example with any predicted function, Ral, involved in desiccation/radiation tolerance in Halobacterium sp. NRC-1, was identified. Among the core clusters, 33% was predicted to function in metabolism, 25% in information transfer and storage, 10% in cell processes and signaling, and 22% belong to poorly characterized or general function groups.
Our studies have established conserved groups of nearly 800 protein clusters present in all haloarchaea, with a subset of 55 which are predicted to be accessory proteins that may be critical or essential for success in an extreme environment. These studies support core and signature genes and proteins as valuable concepts for understanding phylogenetic and phenotypic characteristics of coherent groups of organisms.