The ancestry of vertebrate CNEs is clearly identifiable through their abundance in sharks
[3], whereas the identification of duplicated CNEs (dCNEs) suggests that some elements were present even earlier
[18]. Indeed, within the invertebrates, and despite its earlier radiation from the vertebrate lineage than
Ciona, the amphioxus genome contains traces of a very small number of CNEs
[10]. A total of 56 non-coding, non-repetitive amphioxus sequences were identified with similarity to the human genome
[11], displaying on average 64% identity across regions of 50–70bp. Although only a few of these elements overlap with previously identified vertebrate CNEs, they associate once again with genes that regulate development, indicating that the very beginnings of vertebrate CNEs existed in the chordate ancestor of amphioxus and vertebrates. Just 5 of the proposed amphioxus CNEs fall within gene loci covered in our study (3 close to BARHL2, one near BCL11A and one near ZNF503), and none of these have sufficient sequence identity or length to be detected in any vertebrate genome, including lamprey, using unaligned genome wide searches.
Consequently, critical questions remain as to when, how and why such a large repertoire of very highly conserved sequences became fixed in the chordate lineage. Given their proposed regulatory involvement in development, it is essential that there is an understanding of how CNEs evolved and to what extent they contribute to the gene regulatory networks (GRNs) responsible for orchestrating the patterning of the early vertebrate embryo. Unfortunately, there is a dearth of extant organisms that occupy an evolutionary position between the chordate radiation and the emergence of jawed vertebrates. Only lampreys and hagfish survive from this period, and it is therefore both timely and convenient that the genome sequence of the sea lamprey, Petromyzon marinus, is being generated. Currently, there is only a limited assembly, yet the public availability of over 18 million sequence traces allows a preliminary foray into the CNE architecture of the genome.
CNEs from 11 out of the 13 regions chosen for this study have matches in the lamprey whole genome shotgun reads, indicating that CNEs are widespread in lamprey. Only the
DACH and
BARHL2 gene CNEs gave no matches against the lamprey sequence, although there is evidence that
DACH-like and
BARHL-like coding sequence is present in lamprey. It is not yet apparent how uniform the WGS sequence coverage is for the lamprey genome although our mapping of over 20,000 lamprey ESTs back to the WGS reads predicts greater than 95% coverage (
Methods). Nevertheless, gaps may exist to account for the absence of CNEs around some genes.
It is evident that gnathostome CNEs have evolved extremely slowly, given their near identity between sharks and mammals
[3], species which are thought to have diverged over 500 million years ago
[6],
[15]. This may be a reflection, given their proposed function, of a stable and shared gene/genome copy number and a common bilateral body plan. Indeed, CNEs within the gnathostomes show most variation in teleost fish, a lineage that has undergone its own genome duplication event
[40].
Lamprey CNEs on the other hand, whilst widespread throughout the genome, are considerably shorter and somewhat less well conserved at the nucleotide level than their gnathostome counterparts (
Table S2,
Figure S1,
Figure S2,
Figure S3). This is in part a function of the increased evolutionary divergence between agnathans and gnathostomes but it may also reflect a particularly dynamic and unstable era, early in the vertebrate radiation, during which one or possibly two whole genome duplications occurred. Given their unique developmental characteristics, we suggest that lampreys may have diverged at a time when genomes and CNEs were rapidly evolving, and the vertebrate body plan itself was taking shape (). Hence the contemporary repertoire of lamprey CNEs retains only a core set of conserved regulatory signatures that act to specify common features within rather different body plans. This further supports the theory that lampreys separated from the vertebrate lineage prior to at least one whole genome duplication that occurred in the ancestor of all other vertebrates
[10],
[17].
The strong enrichment for dCNEs in our lamprey data is interesting. Firstly it confirms that dCNEs are ancient, being present in the ancestral vertebrate genome prior to the divergence of lampreys and gnathostomes. The fact that over half the detected lamprey CNE sequences are dCNEs in gnathostomes also supports the notion that the smaller repertoire of CNEs in lamprey is due to the separation of its lineage during a time when CNE sequences were evolving and becoming fixed, such that the stem group of the lamprey lineage only had a relatively small number of CNEs for lamprey to inherit. The alternative scenario, in which all CNE sequences were present in the ancestor to both the agnathans and gnathostomes followed by considerable divergence only in agnathans, struggles to explain the high ratio of dCNEs present in the lamprey genome (i.e. why should the dCNEs not evolve at the same rate as non-dCNEs, thereby preserving the ratio of CNE:dCNE). Additionally, this result suggests that the emergence of many CNEs in vertebrate genomes coincided with, and was perhaps facilitated by rounds of whole genome duplication.
The identification of the C15orf41gene contig allows an insight into the CNE landscape of the lamprey genome. At one end of the gene region, a majority of gnathostome CNEs are detectable in lamprey, yet in the other half of this region, there are no lamprey CNEs present. This could indicate that sets of CNEs co-operate locally across relatively large distances in order to function as modules, something that has not been considered to date, although without further examples, it is not possible to draw any more general conclusions.
We chose two very highly conserved CNEs for functional analysis. The first, a CNE near the EBF3 gene, is over 90% identical across almost 500bp in jawed vertebrates, yet the lamprey identity extends to just over 200bp across the centre of this CNE. We reasoned that this shorter region, given its presence across all vertebrates, might be able to drive reporter gene expression in zebrafish embryos and might therefore define a core region of the human element. Both the human core and the lamprey element drove very specific, near identical, patterns of GFP expression in the developing zebrafish brain, confirming that the shorter region of reduced conservation still retains the basic instructions for this enhancer function. A similar result was obtained for a second CNE, from the PAX2 region, which shows an even more dramatic reduction in length in lamprey, being less than 30% of the length of the gnathostome CNE. Up to now, the long length and high sequence identity of CNEs has made them recalcitrant to analyses that aim to identify regulatory language encoded within them. The lamprey sequence, combined with functional assays, provides a new angle to this approach and may identify important functional motifs within CNEs.
The shorter regions of identity defined by the lamprey sequences appear, at least in the case of the two elements tested here, to be sufficient to drive a highly specific pattern of reporter gene expression in a limited number of structures. By contrast, the majority of the expression data generated from CNEs defined by fish-mammal comparisons, tends to be less specific and often encompasses a range of tissues
[2],
[5],
[21]. Thus the flanking regions of a gnathostome CNE, not conserved in lamprey, might encode additional functional signatures which have evolved since the agnathan divergence, but which are still common, and therefore conserved, within all bony vertebrates. This suggests that CNEs are multi-functional modules that are built from the middle out, and might explain the unusual size of CNEs for
cis-regulatory sequences.
CNEs found in common between lampreys and other vertebrates are likely to represent the most ancient regulatory instructions for the ancestral vertebrate body plan. The completion and assembly of the lamprey genome will provide an outstanding resource from which many facets of our vertebrate ancestry may be traced. Investigation of the role and function of lamprey CNEs on a whole genome scale will provide a critical starting point for the building of gene regulatory networks and for understanding the most fundamental language of vertebrate development.