Codon degeneracy and codon usage by organisms is an interesting and challenging problem. Researchers demonstrated the relation between codon usage and various functions or properties of genes and proteins, such as gene regulation, translation rate, translation efficiency, mRNA stability, splicing, and protein domains. Researchers usually represent segments of proteins responsible for specific functions or structures in a family of proteins as sequence patterns or motifs. We asked the question if organisms use the same codons in pattern segments as compared to the rest of the sequence.
We used the likelihood ratio test, Pearson’s chi-squared test, and mutual information to compare these two codon usages.
We showed that codon usage, in segments of genes that code for a given pattern or motif in a group of proteins, varied from the rest of the gene. The codon usage in these segments was not random. Amino acids with larger number of codons used more specific codon ratios in these segments. We studied the number of amino acids in the pattern (pattern length). As patterns got longer, there was a slight decrease in the fraction of patterns with significant different codon usage in the pattern region as compared to codon usage in the gene region. We defined a measure of specificity of protein patterns, and studied its relation to the codon usage. The difference in the codon usage between pattern region and gene region, was less for the patterns with higher specificity.
We provided a hypothesis that there are segments on genes that affect the codon usage and thus influence protein translation speed, and these regions are the regions that code protein pattern regions.