|Home | About | Journals | Submit | Contact Us | Français|
The Conserved Oligomeric Golgi (COG) complex is an eight-subunit assembly that localizes peripherally to Golgi membranes and is involved in retrograde vesicular trafficking. COG subunits are organized in two heterotrimeric groups, Cog2, -3, -4 and Cog5, -6, -7, linked by a dimeric group formed by Cog1 and Cog8. Dysfunction of COG complex in humans has been associated with new forms of Congenital Disorders of Glycosylation (CDG), therefore highlighting its essential role. In the present study, we intended to gain further insights into the evolution of COG subunits in vertebrates, using comparative analyses of all eight COG proteins.
We used protein distances and dN/dS ratios as a measure of the rate of proteins evolution. The results showed that all COG subunits are evolving under strong purifying selection, although COG1 seems to evolve faster than the remaining proteins. In addition, we also tested the expression of COG genes in 20 human tissues, and demonstrate their ubiquitous nature.
COG complex has a critical role in Golgi structure and function, which, in turn, is involved in protein sorting and glycosylation. The results of this study suggest that COG subunits are evolutionary constrained to maintain the interactions between each other, as well with other partners involved in vesicular trafficking, in order to preserve both the integrity and function of the complex.
Most cellular processes are carried out by multiprotein complexes that constitute important functional units in the cell . This fact has motivated a number of studies aiming to investigate the structure, function and evolution of such multisubunit molecular machines [e.g., [1-4]].
A cellular process in which protein complexes are known to be involved is the transport of proteins between cellular compartments (vesicular trafficking) [5,6]. Proteins synthesised in the secretory pathway are transported inside vesicles that move from the endoplasmic reticulum to the Golgi apparatus, from where polypeptides are then sorted to several cellular compartments . As progression through the Golgi occurs, proteins may undergo modifications like glycosylation, a necessary step for their stability and function . Several large protein complexes play an important role in the fidelity of vesicle fusion, acting as tethering factors through the formation of physical links between membranes prior to fusion [5,6,9]. One of these is the Conserved Oligomeric Golgi (COG) complex , which localizes at the cytoplasmic surface of the Golgi apparatus [11-14].
Several studies have been performed demonstrating the involvement of COG in retrograde vesicular trafficking of Golgi resident proteins [15-18], including enzymes that participate in glycans biosyntesis [19,20]. Consequently, COG impairment results in abnormal Golgi morphology (dilated cisternae and accumulation of vesicles [10,17,21,22]) and function (glycosylation defects [reviewed in  and references therein]). However, its precise mechanism of action is not completely understood.
COG complex is composed by eight distinct subunits [10,12,13,24-26], Cog1 to Cog8, arranged in two lobes consisting of Cog1 to Cog4 (lobe A) and Cog5 to Cog8 (lobe B) . Although several models have been advanced refining the architecture of COG [22,27-29], the most recent studies converge in suggesting that mammalian COG members are organized in two heterotrimeric groups, Cog2-Cog3-Cog4 and Cog5-Cog6-Cog7, which are linked by the dimeric group formed by Cog1 and Cog8. In particular, Cog1 associates with Cog2-Cog3-Cog4, whereas Cog8 interacts with Cog5-Cog6-Cog7 (Figure (Figure1)1) [22,29].
Previous studies have shown the possibility that COG components have different roles within the complex, since mutations or deletion of individual subunits cause sharply distinct phenotypes [25,26]. In yeast, the deletion of any one of the four lobe A subunits causes a severe growth defect, whilst disruption of the remaining genes (COG5 to COG8) does not substantially interfere with normal cell growth . In mammals, COG1- and COG2-deficient cells present several dilated cisternae  and pleiotropic defects in the synthesis of N-, O- and lipid-linked glycans . A similar phenotype is observed in COG5-deficient cells, although the alterations in glycosylation are subtle . In addition, COG3 depletion entails the accumulation of vesicles distributed throughout the cytoplasm . More recently, COG dysfunction caused by mutations in specific subunits has been associated with new forms of Congenital Disorders of Glycosylation (CDG) in humans [reviewed in [30,31]].
In the present study, we used a broad range of comparative sequence analyses to track the evolutionary profile of this complex in vertebrates.
Human protein sequences corresponding to each COG subunit (COG1-COG8) were retrieved from the National Center for Biotechnology Information (NCBI) . These sequences were used as queries to search orthologous proteins from the reference proteins database (RefSeq) using the BLASTP algorithm , with an E-value cutoff of 10-3 and using the reciprocal best-hit approach . Alternatively, sequences were retrieved from Ensembl genome browser , release 57 from March 2010, via the orthogues option. Available sequences from the following taxa were considered in this study: mammals (Homo sapiens, Pan troglodytes, Pongo pygmaeus, Macaca mulatta, Callithrix jacchus, Mus musculus, Rattus norvegicus, Oryctolagus cuniculus, Equus caballus, Canis familiaris, Bos taurus and Monodelphis domestica), birds (Gallus gallus and Taeniopygia guttata), reptiles (Anolis carolinensis), amphibians (Xenopus tropicalis) and fishes (Danio rerio, Takifugu rubripes, Gasterosteus aculeatus, Tetraodon nigroviridis and Oryzias latipes). COG3 from G. aculeatus, COG5 from X. tropicalis and COG6 from D. rerio were assembled based on the genomic sequence through the comparison with the orthologous protein from other species, and through TBLASTN searches on ESTs database. Accession numbers are available in the additional file 1. In addition, due to probable assembling errors, residues 82 to 124 from P. troglodytes COG4, residues 57 to 146 and 436 to 470 from P. pygmaeus COG7 as well as residues 493 to 522 from M. mulatta COG7 were replaced by missing data.
Protein sequences were aligned with MUSCLE  and manually refined by removing sites at which all sequences except one or two have alignment gaps. Additionally, the initial and terminal regions of the multiple sequences alignment had, in some cases, to be removed because they were poorly aligned. This step was particularly important in COG5 and COG8. The final alignments of each group of orthologous sequences are available at additional file 2. Protein distances were calculated using Protdist from PHYLIP 3.69 package , with the Jones-Taylor-Thornton (JTT) evolutionary model and gamma distribution of rates with a fixed shape parameter of 1. Divergence times between species were obtained with TimeTree [38,39].
To reconstruct species phylogeny, the protein-coding sequences were aligned with ClustalW program  implemented in Bioedit version 220.127.116.11 , using protein alignments as template to avoid out-of-frame gaps. The poorly aligned positions and divergent regions in each alignment were eliminated with GBlocks program  and the resulting blocks were concatenated in a single alignment of 15195 positions. To identify the model of nucleotide substitution that best fits the data the Akaike Information Criterion (AIC) was applied, using the jModelTest 0.1.1 . The selected model (GTR +I +G) was used to reconstruct the maximum likelihood phylogeny in Phyml 3.0 . The tree was drawn with FigTree program . The resulting tree topology, but not branch lengths, was used to fit different models in PAML.
The number of synonymous substitutions per synonymous site (dS) and the number of nonsynonymous substitutions per nonsynonymous sites (dN) have been estimated with CODEML from the PAML v.4.4 package , using the F3 × 4 codon frequency model and treating alignment gaps as ambiguity characters (cleandata = 0). The input alignments were modified by removing positions that showed evidence to represent true indels, while keeping those that appear to be missing data. Alignments are available in additional file 3. Several models that allow for different levels of heterogeneity in the dN/dS ratio (ω) among lineages have been applied: the one-ratio model that assumes the same ω ratio for all branches in the phylogeny; the free-ratios model that allows ω to vary on every lineage; and the two-ratio model, which assumes that the branch of interest has an ω value (ω1) different from the ratio of the other lineages (ω0- background ratio). The above models can be compared using a likelihood ratio test (LRT) to test different hypothesis, as described by Yang . Because synonymous sites saturation prevents comparisons of too divergent sequences, only species from humans to birds were considered in this analysis.
Expressed sequence tag (EST) profiles from several human and murine tissue samples were extracted for COG genes from the UniGene database  as EST counts per million transcripts and were log2 transformed (Homo sapiens: UniGene Build #223 and Mus musculus: UniGene Build #183). A number of erroneously assigned ESTs for human and mouse COG8 were manually removed. For simplicity, only homologous tissues for which information was available in both organisms were included, resulting in a total of 29 tissues. The clusters on the heatmaps were made by an in house tool, using correlation as the measure of similarity.
In addition, because in several tissues some COG genes have no detectable expression (absence of EST counts), we have further tested the presence of all COG transcripts in 20 different human tissues included in a RNA panel obtained from Ambion (FirstChoice Human Total RNA Survey Panel). cDNA was synthesized from 1 μg of RNA using the First Strand cDNA synthesis Kit (Fermentas Life Sciences, Burlington, Ontario, Canada), according to the manufacturer's instructions. Primers specific for COG transcripts, as well as for the positive control GAPDH, were designed to avoid amplification of contaminating genomic DNA, either because they span an intron or the forward primer anneals with an exon/exon boundary (additional file 4). PCR amplifications were performed using the QIAGEN multiplex PCR kit (Qiagen, Hilden, Germany) at 1× Qiagen multiplex PCR master mix with 0.5 μl of cDNA in a 12.5 μl final reaction volume. Final primer concentration in the reaction was 0.4 μM. Thermocycling conditions used included pre-incubation for 15 min at 95°C, followed by 35 cycles of 30 s at 94°C, 90 s at 58°C, and 60 s at 72°C, with a final incubation for 10 min at 72°C. Amplification products ranged from 100 to 253 bp and were separated by horizontal electrophoresis in 12% polyacrylamide gels and visualized by silver staining. RT-PCR products from one sample were confirmed by direct sequencing.
As a first measure of the rate of COG proteins evolution, we calculated the pairwise protein distance for human proteins and each of the 20 orthologs. These values were plotted against the corresponding divergence times for the compared species, and the linear regression trend line was estimated from each group, as shown in Figure Figure2.2. From the slope of the lines and the r2 values we are able to compare the rate of proteins evolution and its constancy over time, respectively. The results presented in Figure Figure22 show that COG1 is the subunit with the fastest rate of evolution; COG6, COG3, COG5 and COG4 have the lowest rates; while the remaining proteins (COG7, COG8 and COG2) have intermediate rates. Although being the most divergent protein, COG1 (along with COG7) is the one in which the rate of evolution has remained most constant (r2 = 0.993).
In addition, we used a classical measure of protein evolution based on the nonsynonymous (dN) to synonymous (dS) substitutions rate ratio (dN/dS or ω).
A ω value higher than 1 can suggest that genes undergo positive selection, while less than 1 is indicative of purifying selection . Following this approach, we tested whether ω ratios for each COG gene are different among lineages, based on the maximum likelihood phylogeny previously inferred (Figure (Figure3).3). Therefore, a likelihood ratio test (LRT) comparing the one-ratio model, that assumes the same ω for all lineages, and the free-ratio model, which assumes independent ω ratios for every branch, was applied. The log likelihoods obtained under each model are presented in Table Table11 and indicate significant variation in ω values among lineages in all COG genes except for COG7, suggesting relaxation of the strong selective constraints in some lineages, yet with low ω. The ω values for branches in the phylogeny for each gene are available at additional file 5. It is interesting to note that the length of the branch that leads to modern rodents (mouse and rat) obtained for COG2 and COG6 is very long, revealing the accumulation of many substitutions (additional file 5). Notwithstanding, the corresponding ω values are low, even when compared with shorter branches represented by non-rodent species. This suggests that rodent lineage has accumulated mainly synonymous rather than nonsynonymous substitutions, thus preserving the amino acid composition of the encoded protein. In fact, when the same COG2 and COG6 trees were drawn with branches lengths proportional to the expected nonsynonymous substitutions rate (dN), the rodent's branch length is more similar to all the other branches (additional file 6).
Although the ω ratio obtained under the one-ratio model does not fit every branch in the phylogeny, it represents an average over all sites and lineages  and therefore can be used to compare the strength of constraints imposed to different COG genes. As presented in Table Table1,1, all ω values are very low, indicating that COG genes are evolving under strong purifying selection. The highest ratio is observed for COG1, while the lowest refers to COG4 and COG6.
Although several studies have been accumulating in the last years about COG complex, particularly in what concerns to the interaction between subunits, thus far the expression profile of different COG genes remains uncharacterized. Therefore, as a preliminary approach to study the expression of COG genes, tissue dependent expression patterns have been inferred from EST profiles accessible in UniGene database . For comparative purposes, the analysis was performed in homologous tissues in human and mouse for which expression information was available for both organisms.
Figure Figure44 shows the clustering of gene-expression data for human and mouse. In general, COG genes appear to have a ubiquitously pattern of expression, yet differences in the level of expression can be observed. However, some of the tissues studied (e.g. adipose tissue) have no detectable expression of specific COG genes. In order to assess the presence of COGs transcripts in some of those and other human tissues, we performed reverse transcription-PCR (RT-PCR). Although no quantitative inferences could be made, using this simple methodology we were able to detect the eight COG transcripts in 20 human tissues (Figure (Figure5)5) thereby confirming their ubiquitous nature.
It is important to recognize, however, that from EST data and RT-PCR analysis we are not able to infer the precise pattern of expression of COG genes, revealing the need for more reliable quantitative data.
COG complex is essential to establish and maintain the structure and function of the Golgi apparatus, which has itself a key role in many cellular processes, such as protein sorting and glycosylation.
In the present study, in order to better understand the evolution of COG subunits in vertebrates, we have applied distinct comparative strategies, including evolutionary and expression analyses.
We demonstrate that all COG proteins are evolving under strong evolutionary constraints, as revealed by the low dN/dS values. This pattern of purifying selection must reflect the critical role of COG complex for Golgi function. This is well illustrated by mutations in COG-specific subunits, which give rise to different human diseases belonging to the Congenital Disorders of Glycosylation (CDG). CDGs are a genetically heterogeneous group of disorders characterized by a deficient glycosylation of glycoconjugates, such as proteins and lipids. Since 2004, defects in COG1 , COG4 , COG5 , COG7 [19,54-56] and COG8 [57,58] have been reported. Recently, a novel mutation in COG1 gene was detected in two patients with a cerebrocostomandibular-like syndrome , showing that the impact of COG dysfunction is far from being completely known.
Being part of a multi-subunit assembly and having such an important functional role in cells must impose strong constraints on the evolution of COG proteins. On one hand, they must be constrained to maintain the structural integrity of the complex, presumably through the conservation of the residues that are involved in the interaction between subunits. This is expected to be true if we assume that COG structure is maintained by the same type of protein-protein interactions in different species. On the other hand, additional interactions with other functional partners (e.g. other protein related to trafficking, such as SNAREs or small GTPases [reviewed in ]) also need to be preserved. The study of the crystallographic structure of the C-terminal region of human COG4 protein, for instance, showed that distinct domains are responsible for the integration of the protein within the complex and for its function . This suggests that a large proportion of the protein sequence of each member of COG complex must be constrained to be evolutionarily conserved.
In fact, the low rate of evolution of COG proteins is consistent with results from more comprehensive studies showing that evolutionary conservation increases from monomeric proteins to members of transient interactions and finally to components of stable complexes (proteins that are permanently associated with each other) [1,62]. Wong and collaborators  also demonstrated that as the number of unique proteins in a complex increases, the mean dN/dS ratio of the associated genes tends to decrease.
To evaluate the impact of different selective forces on COG proteins, it would be interesting to compare the rates of substitution for interacting and non-interacting residues, and also structural and functional domains. Unfortunately, the structure of fragments of only two COG subunits have been reported [61,63], hampering us to analyze with more detail the evolution of distinct regions of each protein.
Despite all COG proteins are evolving under strong selective constraints, COG1 seems to be the one with the highest rate of evolution. This subunit, together with COG8, is the bridging subunits of the mammalian COG complex, bringing together COG2-4 and COG5-7 subcomplexes [22,29]. Interestingly, a quite similar interaction map has been reported in the yeast complex, although in this case only COG1 is required for the association of the two subcomplexes . COG1 from humans and COG1p from yeast share no detectable sequence homology, as happens with COG2 and COG7 . Interestingly, our results revealed that COG1 and COG2 are also the less conserved subunits in vertebrates, suggesting that they are evolving under more relaxed selective constraints. The biological implication of the higher divergence of these proteins is difficult to infer, although we can speculate that it might be related with distinct requirements of COG's function in different species, or with the interactions established by these proteins. The remaining COG proteins (COG3, COG4, COG5, COG6 and COG8), in contrast, have related homologs in human and yeast .
In this study we have also demonstrated that the expression of COG genes exhibit a ubiquitous nature. These results can be taken as a starting point for more detailed quantitative expression studies that can bring additional insights into COG subunits interaction and function and, eventually, to the understanding of the phenotypic heterogeneity associated with different COG defects.
In the past years several studies have been focused on the evolution of protein complexes in terms of type of interactions, revealing that proteins in stable complexes are more conserved than those in transient interactions and those with no apparent interacting partners .
In this study, in turn, we have investigated the evolution of different subunits belonging to the same protein complex in vertebrates. Our results showed that the eight COG subunits seem to be conserved and evolving under strong purifying selection, in order to maintain the integrity and function of the complex. Finally, we confirm the ubiquitous tissue expression of the eight COG transcripts in 20 human tissues.
COG: conserved oligomeric Golgi complex; CDG: congenital disorders of glycosylation; EST: expressed sequence tag; Taxa terminology is abbreviated using the first letter of the genus and two letters of the species name (e.g. Hsa corresponds to Homo sapiens).
RQ, LA and AA conceived the study and main analyses. RM and RQ carried out the expression analysis. RQ, LA, RM and AA analyzed and interpreted the data. RQ and AA wrote the manuscript. All authors read and approved the final manuscript.
Protein accession numbers of COG sequences. This table provides the accession numbers of the sequences used to perform the analyses.
Protein sequences alignments used to calculate the proteins distance with Protdist.
Protein-coding sequences alignments used in the different analyses performed with PAML.
Primer pairs used in PCR reactions. This table provides the sequences of the primer pairs used to detect the presence of COG transcripts in 20 human tissues.
Phylogenetic trees with branches drawn in proportion to their lengths, defined as the expected number of nucleotide substitutions per codon, for each COG gene. Values along each branch represent ω ratios (for simplicity of the figure some of them were not indicated) estimated under the free-ratio model. In COG5 phylogeny the branch leading to Euarchontoglires (primates and Glires) presents a ω ratio of 4.70 (dN = 0.0011; dS = 0.002). A LRT comparing the one-ratio model with the two-ratio model (ωbackground; ωEuarchontoglires), revealed that the estimated ω ratio for the Euarchontoglires branch (infinite, indicating the absence of synonymous substitutions) was not significantly higher than the background ratio (0.094). Moreover, the LRT comparing the two-ratio model with and without the constraint ωEuarchontoglires ≤ 1 revealed that this ratio was not significantly higher than 1 as well.
COG2 and COG6 genes phylogeny with branch lengths defined as the estimated nonsynonymous substitutions rate (dN) or the estimated synonymous substitutions rate (dS).
This work was supported by the research project PTDC/CVT/64154/2006. LA (C2007-IPATIMUP/AA1) and RM (C2007-IPATIMUP/AA2) are supported by the Portuguese Foundation for Science and Technology (FCT) Ciência 2007; RQ (SFRH/BD/23657/2005) is a PhD grantee also from FCT. IPATIMUP is an Associate Laboratory of the Portuguese Ministry of Science, Technology and Higher Education and is partially supported by FCT.