ARS identification in
S. cerevisiae by genome-wide motif scanning has been hampered by the abundance of sequences with high similarity to the ACS, combined with the level of degeneracy of the ACS that supports function. Potential solutions to this problem include: (1) building larger motif models by including other concurrent motifs [
39] or compositional information [
22,
40]; (2) assuming a specific motif distribution on chromosomes, e.g., a Hidden Markov Model [
41,
42]; and (3) narrowing down the regions to be searched. The first two methods rely on assumptions, which may introduce significant error. This study took the third approach, using a high-resolution array to map ORC and Mcm2p binding regions and confining the motif-search to this fraction (~5%) of the genome. A very recently published study took a fourth approach, analyzing phylogenetic conservation, in conjunction with motif searching and published microarray data to predict ACS locations [
43].
We defined 529 nimARS loci throughout the
S. cerevisiae genome that avidly bind ORC and/or Mcm2p. The vast majority of known ARSs (95%) are contained in the nimARS set and virtually all predicted sites exhibit ARS activity when tested (94%). Comparison to a recently determined set of chromosomally active replication origins (ssARS) shows that 83% are contained in the nimARS set [
34]. Together, these analyses confirm the high accuracy of the nimARS data. The HMM analysis is capable of identifying even weak signals, while the target DNA identifies multiple probes on the tiled oligonucleotide array for each binding site, a redundancy that enhances accuracy. We further defined this data set by determining the signal peaks within the nimARS regions and constrained the motif search to a 1 kb segment centered on each peak. Within 370 (70%) of the nimARS loci we identified at least one nimACS, with an overall PPV of 82%.
Approximately one-third of the predicted nimARSs are loci where only Mcm2p was detected. Of the nimARSs for which ARS activity has been demonstrated (in this or previous studies), 34% (52/152; see Additional files
5 and
10) are MCM2-only sites. This observation suggests that the majority of these sites will prove to possess ARS activity. Furthermore, ORC binding was not detected at 23% of known ARSs, while nimACSs, which predict ORC binding, are found at 103 of 178 of the MCM2-only sites. Finally, we have no evidence (such as a unique motif) suggesting that MCM2-only sites represent a distinct function of Mcm2p, which might be independent of ORC.
As ORC is bound to chromatin throughout the cell cycle in budding yeast and is required to "load" the MCM complex onto DNA, the detection of many MCM2-only sites suggests that ORC is present but recalcitrant to detection by ChIP, perhaps due to local chromatin differences. Indeed, we analyzed ORC binding in G2/M-arrested cells because pre-RC assembly is thought to occlude detection of ORC in G1-arrested cells [
44]. However, we have recently found that ORC binding at some ARSs is more strongly detectable by ChIP during G1- or S-phase (JGA and OMA, unpublished). One possibility is that Cdc6 stabilizes binding of ORC to weaker sites during G1 to permit MCM loading [
45,
46]. This would explain the loading of Mcm2p in G1-phase at sites where ORC failed detection in G2/M, and is consistent with the idea that ORC occupancy and stability varies at different sites depending on local chromatin features or DNA sequence variation.
Whereas ORC detection by ChIP may be context- or cell cycle-dependent, Mcm2p seems to be more reliably detected. This may reflect differences in the way the ORC and MCM complexes interact with DNA. In contrast to models of ORC-DNA binding along the A rich strand of DNA [
38], the MCM complex is thought to encircle one or both strands of DNA [
47,
48]. Such a topology might enhance cross-linking of MCM to chromatin or otherwise stabilize these complexes for immunoprecipitation. A greater stability of the MCM complex in pre-RCs is supported by
in vitro data in which high salt extraction of pre-RCs removes ORC (and Cdc6) from DNA, but not the MCM complex [
49-
52].
Significantly more pre-RCs are formed than are normally utilized to replicate the genome. This work predicts about 500 pre-RCs are formed while other studies indicate that about 260–360 of these are primarily responsible for replicating the genome [
32-
34]. Some inefficient pre-RCs retain potential for activation but fail to initiate replication because replication forks emanating from efficient, nearby origins replicate through these sites, thereby eliminating their activation potential (presumably by dismantling the pre-RC) [
53,
54]. However, some sites at which ORC and/or Mcm2p can be identified exhibit relatively weak initiation potential. In some cases weak initiation is due to local chromatin, such as at the mating-type silencer ARSs, because these ARSs function efficiently when removed from their normal chromatin context [
55]. However, some ARSs function poorly in the plasmid context, suggesting that sequence variation results in reduced ORC binding or inefficient DNA unwinding [
56]. Sequence variation explains the failure to identify a robust ACS (EACS+B1) within about 30% of the nimARS. Further study will be required to determine how the sequence composition of the ACS and the surrounding sequences, as well as the presence of nearby motifs bound by other DNA binding proteins, contribute to the differential efficiency of ARSs (although specific cases, such as the silencer-associated with ARSs, have been identified [
9,
57,
58]).
The molecular evolution of sequence and activity among different ORC binding sites (and related sequences) occurs under different selective pressures than that of individual genes or unique sequences with defined functions, as indicated by lower levels of phylogenetic conservation of yeast origins compared to genes [
43]. This is because most individual ORC binding sites likely contribute little or nothing to the organism's fitness. The main requirement is that a sufficient number of efficient origins be distributed along each chromosome to ensure rapid genome duplication. Hence, sequence changes that increase the origin efficiency of one ORC binding site may reduce selective pressure on ORC binding sites on the same chromosome (especially nearby), resulting in weaker binding sites or even sites with specialized function such as the silencers. Origin sequence evolution also may derive from selective pressures on local gene functions if these are influenced by the presence of ORC. Nevertheless, the presence of excess ORC binding sites can help ensure efficient genome duplication in case the normal origin initiation program is disrupted [
59], and hence, the ability of ORC to bind sequence variants is functionally significant. The ability of ORC to bind varied DNA sequences appears to be particularly important in higher organisms where ORC binding appears to conform to differential chromatin contexts related to developmentally regulated gene expression.