|Home | About | Journals | Submit | Contact Us | Français|
Cohesin is a macromolecular complex that links sister chromatids together at the metaphase plate during mitosis. The links are formed during DNA replication and destroyed during the metaphase-to-anaphase transition. In budding yeast, the 14S cohesin complex comprises at least two classes of SMC (structural maintenance of chromosomes) proteins - Smc1 and Smc3 - and two SCC (sister-chromatid cohesion) proteins - Scc1 and Scc3. The exact function of these proteins is unknown.
Searches of protein sequence databases have revealed new homologs of cohesin proteins. In mouse, Mmip1 (Mad member interacting protein 1) and Smc3 share 99% sequence identity and are products of the same gene. A phylogenetic tree of SMC homologs reveals five families: Smc1, Smc2, Smc3, Smc4 and an ancestral family that includes the sequences from the Archaea and Eubacteria. This ancestral family also includes sequences from eukaryotes. A cohesion interaction network, comprising 17 proteins, has been constructed using two proteomic databases. Genes encoding six proteins in the cohesion network share a common upstream region that includes the MluI cell-cycle box (MCB) element. Pairs of the proteins in this network share common sequence motifs that could represent common structural features such as binding sites. Scc2 shares a motif with Chk1 (kinase checkpoint protein), that comprises part of the serine/threonine protein kinase motif, including the active-site residue.
We have combined genomic and proteomic data into a comprehensive network of information to reach a better understanding of the function of the cohesin complex. We have identified new SMC homologs, created a new SMC phylogeny and identified shared DNA and protein motifs. The potential for Scc2 to function as a kinase - a hypothesis that needs to be verified experimentally - could provide further evidence for the regulation of sister-chromatid cohesion by phosphorylation mechanisms, which are currently poorly understood.
Cohesin is a macromolecular complex that holds sister chromatids together at the metaphase plate during mitosis. The links between the sister chromatids are formed during DNA replication and destroyed during the metaphase to anaphase transition, when sister chromatids separate to opposite poles of the cell. In budding yeast, the 14S cohesin complex comprises at least two SMC (structural maintenance of chromosomes) proteins - Smc1  and Smc3  - and two SCC (sister-chromatid cohesion) proteins - Scc1  and Scc3 . A recent development is the identification of a separate complex, comprising two further sister-chromatid cohesion proteins, Scc2 and Scc4, that function in the loading of cohesin macromolecules onto chromosomes .
The Smc1 and Smc3 proteins belong to the conserved and well characterized SMC family, which also includes Smc2 and Smc4, components of the condensin macromolecular complex. The SMCs have a highly conserved structure comprising five domains arranged in a head-rod-tail architecture, including a Walker A motif in the amino-terminal domain and a DA-box (Walker B motif) in the carboxy-terminal domain (Figure (Figure1a)1a) [5,6,7]. Dimeric models of Smc1-Smc3 protein complexes have been proposed, in which the coiled-coil domains of each protomer interact in an antiparallel arrangement, bringing the Walker A and B motifs together at the termini of the structure, forming two complete ATP-binding sites (Figure (Figure1b)1b) [7,8,9,10,11]. In accordance with this model, an SMC homodimer has been observed by electron microscopy in Bacillus subtilis . A similar model is proposed for Smc1-Smc3 heterodimers in eukaryotes .
A number of additional proteins are known to play key roles in the cohesion mechanism. Eco1 is involved in the establishment of cohesion during S phase of the cell cycle, but not for its maintenance during G2 or M phases [10,12]. Esp1, a separin protein, is a protease that cleaves Scc1 at the metaphase-to-anaphase transition to trigger sister chromatid separation . This protein is complexed with the securin protein Pds1 for some of the cell cycle , which prevents the onset of anaphase when there has been DNA or spindle damage during DNA replication. When Esp1 is separated from Pds1, it undertakes the proteolytic cleavage of Scc1 (for review see ).
Here, we have combined available genomic and proteomic data into a comprehensive network of information to reach a better understanding of the function of the cohesion complex. We have searched for homologs of the SMC proteins, created a new evolutionary tree for these proteins and identified an interesting homology between SMC3 and Mmip1 in mouse. We have also created a cohesion interaction network of 17 proteins using two proteomic databases. A number of pairs of proteins within this network share sequence motifs that could represent common binding sites. In addition, the genes encoding a subset of six proteins in the network share a common upstream regulatory element.
A PSI-BLAST search for sequence homologs of SMC1 and SMC3 from Saccharomyces cerevisiae revealed homologs from many species of eukaryotes, archaea and eubacteria as previously reported [8,16,17,18] (Table (Table1).1). These homology searches provided the basis for a phylogenetic tree and for the analysis of new sequence homologs.
The SMC phylogenetic tree created from the alignment of SMC3 homologs (Figure (Figure2)2) reveals five families: Smc1-Smc4 from eukaryotes and a fifth 'ancestral' family that includes the SMCs from eubacteria and archaea. This ancestral family also includes a number of eukaryotic proteins from S. cerevisiae, Schizosaccharomyces pombe, Caenorhabditis elegans, Drosophila melanogaster and humans. Each of these eukaryotes has SMC proteins from all five families. The eukaryotic proteins within the ancestral family include the Rad18 from S. pombe and Rhc18, the Rad18 homolog in S. cerevisiae. Rad18 in S. pombe is involved in the repair of DNA damaged by UV radiation . The sequences from C. elegans, Drosophila and human that cluster with Rad18 within the ancestral family are likely to be Rad18 homologs. Also clustered within this group is Spr18, an SMC protein proposed to be the homodimeric partner of rad18 in S. pombe . In addition, MukB from Escherichia coli also lies within this ancestral family. MukB is known to be essential for chromosome partitioning in this species [21,22,23]. The clustering of the Rad18 homologs with the ancestral SMC proteins is not observed in the phylogenetic tree constructed by Cobbe and Heck .
One unusual sequence homolog of SMC3 in mouse (SMCD)  has already been reported in the form of bamacan, a chondroitin sulphate proteoglycan [24,25]. This protein is known to have 100% sequence identity to SMCD . Here we identify another new homolog, Mmip1, that also shares an extremely high sequence identity with mouse SMCD. Mmip1 (Mad interacting protein 1) was identified from a yeast two-hybrid screen for proteins that bind Mxi, a basic helix-loop-helix (bHLH) transcription factor . Mmip1 is a basic helix-loop-helix zipper (bHLH-ZIP) protein that strongly dimerizes with Mad1, Mxi, Mad3 and Mad4, but not with Max or c-Myc . A Clustal X alignment of Mmip1 with SMCD reveals that Mmip1 lacks the first globular domain and the first coiled-coil domain common to SMC proteins. In the alignment there is 40% sequence identity between Mmip1 and SMCD over the entire length of the SMCD (1,217 amino acids). Over the length of the Mmip1 protein (485 amino acids), however, the protein shares 99% sequence identity with SMCD. These high percentage sequence identities are also reflected in the DNA sequences that encode these proteins. The cDNA encoding the Mmip1 protein is 100% identical to the cDNA encoding SMCD over the 2,612 base pairs of the Mmip1 sequence.
It has previously been suggested that eubacteria contain a single ancestral SMC protein . The PSI-BLAST search for SMC homologs in the current work identified two SMC-related proteins in two species of eubacteria, B. subtilis and Aquifex aeolicus. In both species one sequence has previously been identified as an SMC homolog, whereas the function of the second is unknown. The two sequences from B. subtilis share 95% sequence identity, whereas the two sequences from A. aeolicus share 20% sequence identity. All four homologs contain a Walker A and B motif, and the two homologs from B. subtilis contain the five domains characteristic of the SMC proteins (Figure (Figure1a).1a). The A. aeolicus protein known to be an SMC homolog (TrEMBL accession number O60878) also contains the five domains, including the two coiled-coil domains separated by a hinge region of 180-200 residues. However, the second homolog in A. aeolicus (TrEMBL accession number O67124) has the two coiled-coil domains (predicted using Coils ) but the hinge region separating them consists of only approximately 10-20 residues. In the current model of SMC dimers the hinge region allows the folding of the structure into an approximately symmetrical complex (Figure (Figure1b).1b). For this A. aeolicus homolog, however, the very short hinge region would restrict the range of folding. In this species, two homodimeric SMC structures could be formed, one from the five-domain SMC and one from the four-domain SMC homolog lacking the hinge domain. The presence of two potential SMC homologs in B. subtilis, however, could mean that the heterodimeric model of SMC interactions proposed for eukaryotes (for example, ) could also be extended to some prokaryotes. The presence of two SMC homologs in some eubacteria is not shown in the SMC phylogenetic tree constructed by Cobbe and Heck .
The SCC proteins are only present in eukaryotes and are not as well characterized as the SMC proteins. Scc1 (also identified as MCD1) is physically associated with the SMC1 protomer in the complex . Homologs in S. pombe , Xenopus laevis , humans  and Drosophila are identified as Rad21 proteins (Table (Table1),1), involved in the repair of DNA double-stranded breaks induced by ionizing radiation. Scc3 (previously identified as IRR1 ) contains a nuclear localization sequence (see later) and a number of homologs have been identified (Table (Table1).1). Scc3 homologs in Drosophila, mouse, human and Arabidopsis are a family of stromalin proteins [32,33] which share between 20-25% sequence identity (Table (Table1).1). In Drosophila, mouse and human there are two stromalin proteins (dSA, dSA2; SA1, SA2; and STAG1, STAG2, respectively), which are located in the nucleus, but their function is unknown. In addition, STAG3 has been identified in humans  and is proposed to be involved in chromosome pairing during meiosis.
Scc2 and Scc4 are the recently identified cohesin loading factors . Homologs to Scc2 have been identified in S. pombe (Mis4 ) and Drosophila (Nipped-B ), Coprinus cinereus (Rad9  and human (IDN3-B; TrEMBL accession number Q9Y6Y3) (Table (Table1).1). Mis4 in S. pombe is required for equal chromatid separation in anaphase and has a function distinct from cohesin . The Rad9 gene product in C. cinereus is essential for the normal completion of meiosis. The Nipped-B gene product is proposed to function architecturally between transcription enhancers and promoters to facilitate enhancer-promoter interactions . The function of the IDN3-B gene in humans is unknown, other than it is preferentially expressed in hepatocellular carcinomas (HCC) . It has been proposed that these SCC molecules represent a family of 'adherins' that share a large central core domain of sequence homology .
Scc4 was identified as a product of open reading frame (ORF) YER147C , and comprises a sequence of 624 amino acids that includes an AMP-binding motif. However, other than interacting with Scc2 and being involved in the establishment of sister-chromatid cohesion, little is known about this protein. Scc4 has no identifiable sequence homologs in either the full-sequence or EST databases, and therefore could be the product of an orphan gene.
A cohesion interaction network was created by collating information from two proteome databases and the literature (Figure (Figure3).3). In Figure Figure3,3, lines are drawn between proteins to indicate known or potential interactions. The data from which the interactions are derived are indicated in a detailed key that differentiates between the two proteomic databases (and between the different sources of data within each database) and the literature. Four proteins (Esp1, Trf4, Prp11 and Tid3) interact directly with SMC or SCC proteins in S. cerevisiae. The interaction of Esp1 and Scc1 is currently known at a functional level , and its importance has already been discussed. This interaction is time-dependent and has not been identified in the yeast two-hybrid screen, and this information is not currently recorded in the YPD.
Trf4 is a protein involved in both mitotic chromosome condensation  and sister-chromatid cohesion . In X. laevis Trf4 interacts with Smc1 and Smc2 , and in S. cerevisiae Trp4 interacts with Smc1 and Trf5 , another member of the TRF family. Trf4 homologs have been identified in S. pombe, C. elegans, Drosophila, human and Arabidopsis (Table (Table2).2). Trf4 has very recently been identified as a DNA polymerase with β-polymerase-like properties and is now designated DNA polymerase κ (the fourth class of nuclear DNA polymerases) . Remote homologs of S. cerevisiae Trf4 include the caffeine-induced cell death protein I (Cid1) in S. pombe  (13.4% sequence identity) and the polynucleotide adenyltransferase enzyme from a number of organisms including S. pombe and humans (10.2% and 9.7% sequence identity respectively). Cid1 is of particular interest as it thought to play a part in the S-M checkpoint pathway in S. pombe . As a homolog of Trf4, Cid1 could be the link between sister-chromatid cohesion and this checkpoint pathway.
Prp11 is a yeast splicing factor involved in the early stages of the spliceosomal assembly pathway . Prp11 is a 266 amino-acid protein that includes a zinc-finger domain common to RNA-binding proteins . This splicing factor forms a complex with two others, Prp9 and Prp21, which together with Prp5 are required for the binding of U2 snRNP to pre-mRNA [45,46]. There are homologs of this splicing factor in S. pombe, C. elegans, Drosophila, Arabidopsis, mouse and human (Table (Table2)2) and all include the RNA-binding motif. In mouse and humans, the homolog is SAP62 (spliceosome-associated protein), a spliceosomal protein that binds to pre-mRNA in the prespliceosomal complex .
Tid3 (NCD80) is a spindle pole body protein that has homologs in a number of eukaryotes (Table (Table2).2). Tid3 is predicted to interact with Smc1 and Smc2, and has been shown experimentally to interact with Spc24, another component of the spindle pole body. Interactions between the human homolog of Tid3, Hec1, and human Smc1 and Smc2 homologs have also been observed . The interactions of Tid3 with subunits from both the cohesin and condensin macromolecules, places it alongside Trf4 and Scc1, as a protein integrally involved in both mechanisms. It is also proposed that Hec1 may be involved in chromatin assembly in the centromere and regulation of the kinetochore . Spc24, one interaction partner of Tid3, also interacts with Prp11, the yeast splicing factor which is linked to the cohesin loading factors through its interaction with Scc2 (Figure (Figure33).
The upstream regions of the genes encoding 17 proteins in the cohesin network (Figure (Figure3)3) were searched for shared motifs using AlignACE. Three consensus motifs were identified that were common to subsets of the 17 genes. Only one motif was found to be relatively specific, however, matching upstream sequences of only 29 genes in the SGD [49,50] (see the Materials and methods). This motif has the consensus sequence A6[X]ACGCGTH2[X]RXAAX and includes the MluI cell-cycle box (MCB) element (consensus sequence ACGCGT) [51,52]. The extended consensus motif found in the current work was present in upstream regions of the genes encoding Scc1, Scc3, Smc3, Pds1, Eco1 and Spc24. This motif was located between 123-299 base pairs (bp) upstream of the genes encoding these six proteins. A search of the SGD revealed 23 additional genes containing this upstream motif. Eight of these additional genes encoded hypothetical proteins of unknown function. However, these additional genes also included those encoding chaperones (JEM1 and PDI1n), transcription factor components (TFA1, RFA2, RNA polymerase II, SPT20 and PRT1), and a YC component of the proteasome. When the search was extended to 2,000 bp upstream of the 5' untranslated regions of the yeast genome, the gene encoding Trf4 was also found to contain this consensus motif (1,560 bp upstream).
Teiresias, a pattern discovery algorithm [53,54], was used to search for common motifs between two or more sequences in the 17 proteins of the cohesion network. The highest number of proteins sharing a common motif was three, and these were the three SMC proteins, which have a high sequence identity and share known Prosite motifs (Table (Table3).3). More interesting were 24 pattern matches found between pairs of proteins in the network. A number of proteins share more than one sequence motif with the same protein. All shared motifs were either specific to the two proteins in the cohesion network, or in the case of three motifs, shared by one other protein sequence.
One motif shared by two sequences in the network and one additional sequence, is the DXXPENIXLXKN motif shared by the sequences of Scc2, Chk1 and a third S. cerevisiae protein PKH1 (yeast ORF YDR490C) (Figure (Figure4).4). Both Chk1 and PKH1 are serine/threonine (ST) protein kinases, and the motif they share with Scc2 includes part of the PROSITE ST kinase signature motif ([LIVMFYC]X[HY]XD[LIVMFY]KXXN[LIVMFYCT](3), where X indicates any residue, (3) indicates that the previous residue is repeated three times, and D is the active site residue). The sequence of Scc2 does not match the ST kinase signature motif exactly. Of the 13 residues in the ST kinase motif, Scc2 has four mismatches but, importantly, the active-site aspartic acid is conserved.
A second motif shared by a third protein not included in the cohesion network was SXXSXLKKKXLXT; this is found in Scc1, Scc2 and yeast ORF YHR011W, a putative seryl-tRNA synthetase (Figure (Figure5a).5a). However, this motif was not part of the tRNA ligase motif of YHR011W, or of any other known motif within this sequence. A third motif shared by a protein from outside the cohesion network was NDXNXDDXDN, shared by Scc1, Smc1, and a P-type ATPase from Plasmodium yoelii (Figure (Figure5b).5b). Scc4 is one of the cohesin loading factors for which no known homolog has been found. This protein was, however, found to share a 10-residue sequence motif (GKXVALTNAK) with Smc3 (Figure (Figure5c5c).
The securin Pds1 is an anaphase inhibitor that contains a destruction box motif (RXXXLXXXXN) , which targets this protein for destruction by the APC ubiquitin ligase. We found three destruction box motifs in Smc3, one in the hinge region (at position 682, RTRLESLKN) and two in the second coiled-coil domain (one at position 744 (RTSLNTKKN) and one at position 920 (RLLLKKLDN)). We also found a KEN-box motif (an additional APC recognition signal ) in SMC2 at position 304 (KENGLLN), in the first coiled-coil domain.
The mechanism of cohesion is essential to the successful completion of the cell cycle. Cohesin is the macromolecular complex, comprising SMC and SCC proteins, that plays a central role in this mechanism.
In the mouse, the sequence of SMC3 (SMCD) is 100% identical to two proteins with distinctly different functions, bamacan and Mmip1. The possibility that SMCs can 'moonlight' outside the cell in the basement membrane in the guise of the proteoglycan bamacan has recently been proposed . Proteoglycans are glycoproteins present in connective tissues and on the cell surface, and modified proteoglycans are found in human tumors of epithelial origin (for example breast, colon and lung). In the current work we found that Mmip1 was 100% identical to the hinge, second coiled-coil and carboxy-terminal domain of SMCD. It is possible that Mmip1 is a protein resulting from alternative gene splicing or post-transcriptional modification. Mmip1 is thought to compete with the dimerization of Max and Mad family proteins, indirectly upregulating the transcriptional function of c-Myc, which plays a central role in cell proliferation, differentiation and apoptosis . Hence, a single gene appears to encode three products, each with a different function.
The evolutionary tree presented here identifies five SMC families: Smc1, Smc2, Smc3, Smc4 and an ancestral SMC family that includes the SMC sequences from the Eubacteria and the Archaea. However, some eukaryotic SMC sequences, namely Rad18 and its homologs, are also clustered within the ancestral SMC family. Humans, Drosophila, C. elegans, S. cerevisiae and S. pombe have SMCs from all five families. A number of these eukaryotes have more than one sequence clustered in the ancestral family, which leads us to suggest that these proteins might form heterodimeric structures, similar to those observed in other SMC proteins (see, for example ). This evolutionary tree is similar in some respects to that recently reported by Cobbe and Heck , in that five families are identified. However, in our tree the Rad18 homologs are more closely related to the ancestral eubacterial and archaeal sequences. Rad18 is involved in DNA repair in response to damage caused by UV irradiation. Rad21 in S. pombe (homologous to Scc1 in the cohesin complex of S. cerevisiae) also functions in the repair of DNA damaged by ionizing radiation. From our analysis it is evident that the SCC proteins are not present in eubacteria or archea. Hence, it is possible that the SMCs of the ancestral family cluster with the eukaryotic Rad18 homologs because the eubacterial and archaeal proteins function both as DNA repair proteins and as structural proteins holding sister chromatids together during cell division.
The phylogenetic tree we have constructed is similar to that of Cobbe and Heck , but three new important results are revealed in our tree. First, Rad18 homologs cluster with the ancestral SMC proteins, as opposed to clustering as a separate family; second, two SMC homologs are identified in two species of eubacteria, as opposed to only single SMC homologs identified in these species ; and third, Mmip1 protein is identified as a new homolog of the SMC mouse protein, a protein not present in the tree previously constructed .
The interaction network is based on a combination of data sources, and includes data from two large-scale yeast two-hybrid screens . It is known that this technique can lead to false-positive results [57,58] and false negatives (due to protein instability, for example ). However, it is still the only high-throughput technique that can be applied effectively to an entire genome. To screen more than 6,000 S. cerevisiae ORFs, some errors are inevitable and these have to be accepted as part of the data. We have combined yeast two-hybrid data with that available in the YPD and in the literature.
The cohesin macromolecular complex is shown to be at the center of a complex interaction network that connects the mechanism of cohesion to proteins in the condensation and spliceosomal pathways and in the spindle pole body (Figure (Figure3).3). These interactions also show the potential overlap of proteins involved in sister-chromatid cohesion during mitosis and meiosis. Trf4 and Tid3 are both proteins that interact with Smc1 from the cohesin complex and Smc2 from the condensin complex. Scc2, one of the cohesin loading factors, interacts with Prp11, a splicing factor, which in turn interacts with Spc24, a protein of the spindle pole body that interacts with Tid3. Tid3 makes an important interaction with Dmc1, a meiosis-specific protein, which is required for recombination, synaptomal complex formation and cell-cycle progression . Dmc1 interacts with large number of other proteins, including Apc2 which is a component of the anaphase-promoting complex (APC). The macromolecular APC is a ubiquitin protein ligase that is essential for mitotic cyclin proteolysis and sister chromatid separation .
The identification of this extensive network of interacting proteins provided us with a data set of genes and proteins that we could search for common motifs. We identified a large number of protein sequence motifs that were shared by pairs of proteins in the network, and which could potentially represent shared binding sites. The most interesting discovery was the motif shared by Scc2 and Chk1, which comprised part of the serine/threonine protein kinase motif, including the active-site residue. Eukaryotic protein kinases are characterized by 12 subdomains and 12 conserved residues . In a ClustalX alignment of Scc2 and Chk1 from S. cerevisiae, Scc2 (which comprises 1,493 residues compared to the 527 residues of Chk1) does not have the 12 subdomains, but it does have 5 of the 12 conserved residues, only two of which are in the shared catalytic-site motif identified using the Teiresias algorithm. The potential for Scc2 to function as a kinase needs to be validated experimentally. If Scc2 functioned as a kinase it would provide further evidence for the regulation of sister-chromatid cohesion by phosphorylation. It has been already been shown that phosphorylation of Scc1 is a requirement for its subsequent cleavage by the Esp1 separin, which occurs at the metaphase-to-anaphase transition of mitosis . If Scc2 is a kinase, then the cohesin-loading function of the Scc2-Scc4 complex might simply be the phosphorylation of Scc1 or one of the other cohesin subunits to allow DNA binding.
The detection of regulatory motifs in gene promoter regions may provide evidence for the functional assignment of genes of unknown function . We found genes encoding six (or seven if an extreme upstream region is considered) cohesion network proteins sharing a common upstream regulatory sequence that includes the MCB box element. The MCB element is found in upstream regions of genes that encode proteins preferentially synthesized at the G1/S boundary of the cell cycle. MCB or MCB-like elements are common upstream of genes for DNA replication, repair and recombination and of a number of cyclin genes . An upstream DNA motif common to a number of genes could be indicative of DNA-binding preferences of transcription factors and hence coexpression or co-regulation of the genes. It would be possible to investigate coexpression levels experimentally using microarray technology. We found that three of the subunits of the cohesin complex - Scc1, Scc3, Smc3 - shared this common upstream motif, along with Eco1, Pds1 and Spc24 (a protein not directly associated with the cohesin complex). We searched for the expression levels of these six proteins in the data from the microarray hybridization experiment of cell-cycle-regulated genes in S. cerevisiae [64,65]. The data for these six proteins was incomplete, but we found that the mRNA levels of five of the six genes (Scc1, Scc3, Smc3, Pds1 and Eco1) peaked in G1 phase of the cell cycle, and that Spc24 appeared not to be cell-cycle regulated. We also found that the mRNA levels of Smc1 peaked in G1. Spellman  observed from the expression data that Smc1, Smc3, Scc1, Pds1 and Pds5 showed tight temporal regulation. This set of proteins includes three of those we found to contain the MCB upstream regulatory region. Further specific microarray data is needed to investigate the expression levels in more detail. However, the coexpression of Pds1 with subunits of the cohesin complex (Scc1, Scc3 and Smc3) would fit with the theory that Pds1 acts a protector of cohesin, inhibiting its breakdown which signifies the onset of anaphase.
The securin Pds1 is an anaphase inhibitor that is complexed to Esp1 for part of the cell cycle. The destruction of Pds1 by the ubiquitin ligase APC is the control point for the separation of sister chromatids [66,67]. APC is activated by Cdc20 during mitosis and the proteolytic function of Cdc20-APC depends on a destruction box motif  which has been identified in Pds1. We found three destruction box motifs in Smc3. It is possible that two of the potential destruction box motifs in Smc3 are linked to the periodicity of residues in the coiled-coil domain. But one destruction box motif is in the hinge region, and if this targets the proteins for destruction by APC, then proteolysis of Smc3 could also be connected with the metaphase-to-anaphase transition. APC is activated by Cdh1 during late mitosis/G1 and Cdh1-APC requires a KEN box recognition signal (KENXXXN) to identify its substrates . We identified a KEN box in the sequence of Smc2. If this targets Smc2 as an APC substrate, then proteolysis of this protein could be an event that occurs in late mitosis/G1 of the cell cycle.
We have combined available genomic and proteomic data into a comprehensive network of information to reach a better understanding of the function of the cohesin complex. We have identified a new sequence homolog to SMC that highlights the capability of a protein to conduct different functions in different cellular locations, and have created a new phylogeny for SMC proteins that includes a number of eukaryotic sequences (predominantly Rad18 homologs) within an ancestral family of SMC proteins. A complex network of interacting cohesion proteins was identified. Six of these networked proteins contain a known upstream regulatory sequence. In addition we have identified a number of protein pairs within the network that share protein motifs, which could indicate a common structural feature such as a binding site. In this way we discovered a motif shared by Scc2 and Chk1 that suggests Scc2 could have kinase activity. This hypothesis needs to be verified experimentally. The potential for Scc2 to function as a kinase could provide further evidence for the regulation of sister-chromatid cohesion by phosphorylation mechanisms, which are currently poorly understood.
A PSI-BLAST  search of the nonredundant protein sequence database (GenBank CDS translations, PDB, SWISSPROT, PIR and PRF) for sequence homologs of Smc1, Smc3, Scc1, Scc2, Scc3 and Scc4 from S. cerevisiae was conducted. PSI-BLAST was used for five iterative rounds of searching and a cut-off threshold of 0.001. Homologs from nine eukaryotic species are listed in Table Table1.1. A PSI-BLAST search was also conducted for four proteins (Trf4, Prp11, Tid3 and Esp1) that interact with cohesin proteins (Table (Table22).
The sequence homologs to Smc3 from S. cerevisiae identified from the PSI-BLAST search were globally aligned in ClustalX (using default parameters). This alignment was used as input for the PHYLIP program [69,70]. PHYLIP (PHYLogeny Inference Package) includes a number of programs for the creation of evolutionary trees. PROTPARS was used to estimate the phylogeny from the ClustalX sequence alignment using the parsimony method. CONSENSE was used to create a consensus tree using the majority-rule consensus tree method, and DRAWTREE was used to plot an unrooted phylogeny (Figure (Figure22).
Each cohesin protein sequence (Smc1, Smc3, Scc1, Scc2, Scc3 and Scc4) and the four interacting proteins (Trf4, Prp11, Tid3 and Esp1) from S. cerevisiae were searched for PROSITE motifs  and PFAM domains . The PROSITE database was searched using ProfileScan . PFAM was searched using the protein search facility at the PFAM website .
A large-scale yeast two-hybrid screen has been conducted to identify protein-protein interactions in S. cerevisiae . This screen revealed 957 putative interactions involving 1,004 proteins, and the data is publicly available from the CuraGen Corporation . The Yeast Proteome Database (YPD)  collates information from the literature, and includes data on protein-protein interactions. Some of these interaction data are derived from the yeast two-hybrid screen. Both the YPD and the yeast two-hybrid data were searched for each of the six cohesin proteins (Smc1, Smc3, Scc1, Scc2, Scc3 and Scc4) to find their interaction partners. This enabled the creation of an interaction network that surrounds the cohesin molecule (Figure (Figure33).
Teiresias, a two-phase general purpose pattern discovery algorithm developed at IBM [53,54], was used to search for common motifs between two or more protein sequences in a network of 17 proteins (Figure (Figure3).3). The sequence pattern discovery search engine was used for 'exact discovery' with parameter L = 4 (minimum number of literals, that is, non-wild characters in pattern) and W = 6 (maximum extent spanned by any L consecutive literals in the reported pattern). All other parameters were used with default settings. Significance values (calculated by assuming each pattern is used as a predicate to search a database with the size and composition of GenPept) were calculated, and only those with a probability value of = ≤ 30.0 were considered significant.
A search was then made for the presence of the significant motifs in SWISSPROT, TREMBL, and PIR, using the 'findpatterns' facility of GCG (Wisconsin Package Version 10.0). This facility in GCG was also used to search for destruction box (RXXLXXXXN) and KEN box (KENXXXN) motifs in the 17 proteins in the cohesion network.
AlignACE, a Gibbs sampling algorithm for identifying motifs over-represented in DNA sequences [77,78] was used to search for common upstream regions of the genes encoding 17 proteins in the cohesion network. The 'Full Analysis' option (which includes three procedures: Extraction, AlignAce and CompareACE) was selected to search the upstream regions of the selected S. cerevisiae genes. Extraction selects the upstream regions of each gene, AlignACE is the motif finding algorithm, and CompareACE compares the extracted motifs with a set of previously identified motifs from S. cerevisiae. Only motifs with a Map score of > 10.0 were considered significant.
The significant motifs were then used to search the upstream regions of the complete yeast genome using the PatMatch search facility of the SGD [49,50]. PatMatch was used to search 1,000 and 2,000 bp upstream of the 5' untranslated regions, searching both strands, with no mismatches, deletions or insertions.
We thank Frank Uhlmann for helpful discussions and comments on the manuscript.