|Home | About | Journals | Submit | Contact Us | Français|
Sponges are an ancient group of animals that diverged from other metazoans over 600 million years ago. Here we present the draft genome sequence of Amphimedon queenslandica, a demosponge from the Great Barrier Reef, and show that it is remarkably similar to other animal genomes in content, structure and organization. Comparative analysis enabled by the sequencing of the sponge genome reveals genomic events linked to the origin and early evolution of animals, including the appearance, expansion and diversification of pan-metazoan transcription factor, signalling pathway and structural genes. This diverse ‘toolkit’ of genes correlates with critical aspects of all metazoan body plans, and comprises cell cycle control and growth, development, somatic- and germ-cell specification, cell adhesion, innate immunity and allorecognition. Notably, many of the genes associated with the emergence of animals are also implicated in cancer, which arises from defects in basic processes associated with metazoan multicellularity.
The emergence of multicellular animals from single-celled ancestors over 600 million years ago required the evolution of mechanisms for coordinating cell division, growth, specialization, adhesion and death. Dysfunction of these mechanisms drives diseases such as cancers, in which social controls on multicellularity fail, and autoimmune disorders, in which distinctions between self and non-self are disrupted. The hallmarks of metazoan multicellularity are therefore intimately related to those of cancer1 and immunity2, relying on oncogenes, tumour suppressors and cell-surface and signalling components.
Sponges have a critical role in the search for the origins of metazoan multicellular processes3, as they are generally recognized as the oldest surviving metazoan phyletic lineage. Although the kinship of sponges to other animals was recognized by the nineteenth century4, the absence of a gut and nervous system had relegated sponges to the ‘Parazoa’5, a grade below the ‘Eumetazoa’ or ‘true animals’ (for example, cnidarians, ctenophores and bilaterians)6. Nevertheless, sponges share key adhesion and signalling genes7–11 with eumetazoans, as well as other genes important in body plan patterning such as developmental transcription factors12–15; sponge embryos and larvae (Fig. 1) are readily comparable to those of other animals12,16. Sponges are diverse and their phylogeny is poorly resolved17–19, allowing for the possibility that sponges are paraphyletic20, which implies that other animals evolved from sponge-like ancestors.
Here we report on the genome of Amphimedon queenslandica, a haplosclerid demosponge, the adult organization and lifestyle of which is typical for sponges, feeding on microbes and particulate organic matter filtered by flagellated collar cells that resemble choanoflagellates. Although the diversity of sponges, and their uncertain phylogeny, makes it doubtful that any single species can reveal the intricacies of early animal evolution, comparison of the A. queenslandica draft genome with sequences from other species can provide a conservative estimate of the genome of the common ancestor of all animals and the timing and nature of the genomic events that led to the origin and early evolution of animal lineages.
The A. queenslandica genome harbours an extensive repertoire of developmental signalling and transcription factor genes, indicating that the metazoan ancestor had a developmental ‘toolkit’ similar to that in modern complex bilaterians. The origins of many of these and other genes specific to animal processes such as cell adhesion, and social control of cell proliferation, death and differentiation can be traced to genomic events (gene birth, subfamily expansions, intron gain/loss, and so on) that occurred in the lineage that led to the metazoan ancestor, after animals diverged from their unicellular ‘cousins’. In addition to possessing a wide range of metazoan-specific genes, the Amphimedon draft genome is missing some genes that are conserved in other animals, indicative of gene origin and expansion in eumetazoans after their divergence from the demosponge lineage and/or gene loss in Amphimedon.
Amphimedon queenslandica is a hermaphroditic spermcast spawner, and cannot be readily inbred in the laboratory (Fig. 1a–c and Supplementary Note 1)21. Adult sponges also harbour many commensal microbes. To minimize allelic variation and microbial contamination we sequenced genomic DNA from multiple embryos and larvae from a single mother. This DNA contains four dominant parental haplotypes (~3% polymorphism), although a single brood may have multiple fathers (Supplementary Notes 2.1 and 3). We used ~9-fold whole-genome Sanger shotgun coverage to produce a ~167-megabase-pair assembly that typically represents each locus once rather than splitting alleles (Supplementary Notes 2 and 3) and captures ~97% of protein-coding gene content (Supplementary Note 2.5). We also recovered an alpha-proteobacterial genome that is probably a vertically transmitted commensal microbe of Amphimedon embryos (Supplementary Note 2.7).
The assembled A. queenslandica genome encodes ~30,000 predicted protein-coding loci (Supplementary Note 4). This is an overestimate of the true gene number due to overprediction, unrecognized transposable elements and gene fragmentation at contig or scaffold boundaries. Nevertheless, 18,693 (63%) have identifiable homologues in other organisms in the Swiss-Prot database; there are no doubt novel or rapidly evolving sponge genes unknown in other species. CpG dinucleotides are depleted, and TpG and CpA dinucleotides augmented, relative to overall G+C composition, which is indicative of germline cytosine methylation in the Amphimedon genome. This is consistent with the presence of a DNMT3-related putative de novo methytransferase as well as proteins with predicted methyl CpG binding domains.
Analysis of the Amphimedon gene set reveals marked conservation of gene structure (intron phase and position) and genome organization (synteny) relative to other animals (Supplementary Notes 5 and 6). In Amphimedon, intragenic position and phase are retained for 84% of the introns inferred for the metazoan ancestor, comparable to the 76% and 88% retention in human and sea anemone, respectively22,23. The organization of genes shows conserved synteny (that is, conserved linkage without necessarily requiring colinearity) relative to other animals. In particular, 83 of the 153 longest Amphimedon scaffolds (that is, those that contain genes from more than ten distinct metazoan gene families, sufficient for synteny to be assessed) show segments of conserved synteny with other animals (Supplementary Note 6). This indicates that portions of the 15 ancestral linkage groups inferred for the cnidarian–bilaterian ancestor22,24 were already in place in the demosponge–eumetazoan ancestor. No such conserved synteny was detected between animals and the choanoflagellate Monosiga brevicollis.
We addressed the controversial phyletic branching of early animal lineages by comparing sets of orthologous genes in A. queenslandica and a diverse sampling of 18 complete genomes (Supplementary Note 7). Our analyses support the grouping of placozoans, cnidarians and bilaterians into a eumetazoan clade, with demosponge as an earlier-branching lineage25, and reject the diploblast–triploblast phylogeny17 in favour of a more conventional ‘sponges first’ tree19,20 (Fig. 1d). In our discussion below we therefore refer to descendants of the placozoan–cnidarian–bilaterian last common ancestor as Eumetazoa, and reserve ‘Eumetazoa sensu stricto’ for the more limited clade defined by descendants of the cnidarian–bilaterian ancestor.
Our analysis emphasizes the quantitative divergence between metazoans and their closest living unicellular relatives. For example, 28% of the amino acid substitutions between humans and their last common ancestor with choanoflagellates occurred on the metazoan stem lineage (bold line in Fig. 1d), before the divergence of sponges from other animals. This pre-metazoan period can be crudely estimated to be ~150–200 million years ago (Supplementary Note 7.6).
With multiple animal genomes now in hand, we can extend the ‘zootype’ concept26 to include other shared derived genomic characteristics of animals. Out of 4,670 pan-metazoan gene families defined by clustering sponge and eumetazoan peptides, 1,286 (27%) seem to be metazoan-specific (see Supplementary Note 9.2). Similarly, there are eumetazoan, eumetazoan sensu stricto and bilaterian genomic synapomorphies, as well as sponge-specific gene families (for example, kinases, see Supplementary Note 8). Owing to residual incompleteness of the sponge genome draft, and possible gene losses in the Amphimedon lineage, this analysis provides a conservative estimate.
Nearly three-quarters of the 1,286 animal-specific gene families arose by gene duplication on the metazoan stem (Supplementary Note 9). These include the early duplication of transcription factor families such as homeodomains and basic helix–loop–helix domains13,14,27. Additional gene duplication and divergence in eumetazoans further increased transcription factor gene family number, which in general are 2 to 34 times larger in eumetazoans than in Amphimedon. In contrast, substantial diversification of kinase gene families occurred before the divergence of the sponge and eumetazoan lineages (see below)28. We can assess the role of tandem duplication in the creation of these families by seeking evidence for linkages among anciently diverged paralogues (Supplementary Note 10). A significant fraction remain linked (up to 30%, as found in Trichoplax, P < 0.0001, with lower levels in other contemporary metazoan genomes), indicating that many gene family expansions originally occurred as tandem or proximal duplications, and that these genomically local duplications have remained linked over time. This is consistent with the overall preservation of relict linkages observed here and in other basal metazoan genomes22,24,25.
We find 235 animal-specific protein domains, and 769 animal-specific domain combinations also evolved along the metazoan stem (Supplementary Note 9). Additionally, lineage-specific changes to these animal domain architectures occurred in early metazoan evolution16,29,30. For example, new combinations of domains in death-fold domain proteins and laminins possibly allow for the modification of protein interactions and pathways involved in programmed cell death and cell adhesion, respectively (Supplementary Note 9.3), and the cooption of sponge-, eumetazoan- or bilaterian-specific architectures into novel functions.
The 705 Amphimedon kinases represent the largest reported metazoan kinome, and include members of >70% of human kinase classes (compared with 59% in choanoflagellate, 83% in sea anemone, 70% in Caenorhabditis elegans and 77% in fruitfly; see Supplementary Note 8.7). Amphimedon has single copies of most metazoan kinase classes, but has several expansions of over 50 genes per class. The largest expansions are in the tyrosine kinase and tyrosine-kinase-like groups, and include over 150 likely receptor tyrosine kinases (RTKs). Unlike Monosiga, where RTKs could not be classified into metazoan families28, Amphimedon has kinase domains from six known animal families (EGFR, Met, DDR, ROR, Eph and Sevenless). The EGFR and some Eph extracellular domain architectures are as in their eumetazoan counterparts, but many other RTKs have unique extracellular domains. For instance, DDRs have immunoglobulin repeats, and sushi domains are found in some members of the expanded Eph and Met families. This indicates that the activating ligands, presumably found largely in the external environment, may be quite distinct.
The A. queenslandica genome allows us to assess systematically the origin of the six hallmarks of metazoan multicellularity: (1) regulated cell cycling and growth; (2) programmed cell death; (3) cell–cell and cell–matrix adhesion; (4) developmental signalling and gene regulation; (5) allorecognition and innate immunity; and (6) specialization of cell types. These cardinal features of metazoan multicellularity have their origins on the metazoan stem and often are the result of metazoan gene novelties combining with more ancient factors. A recurring theme is the overlap of these core ‘multicellularity’ genes with genes perturbed in cancer, a disease of aberrant multicellularity (see oncogenes and tumour suppressors in Figs 2 and and33).
Although the core machinery of the animal cell cycle traces back to the early eukaryotes (Fig. 2a and Supplementary Note 8.2), some critical metazoan regulatory mechanisms emerged more recently. For example, whereas the p53/p63/p73 tumour suppressor family is holozoan-specific31, the HIPK kinase that phosphorylates p53 in the presence of DNA breaks is metazoan-specific, and the MDM2 ubiquitin ligase that regulates p53 appears as a eumetazoan feature. Thus, the p53-mediated response to DNA damage emerged before the divergence of eumetazoans. Intramolecular regulation also has evolved, as illustrated by the Myc oncogene, which is found in the unicellular Monosiga31 but lacks the N-terminal ‘DCMW’ motif present in Amphimedon and other animal Myc proteins. Because mutation of this motif disrupts Myc function in vertebrates, it probably has an important role in all animals.
Tumour suppressors encoded by two classes of cyclin-dependent kinase (CDK) inhibitors mediate growth-factor-dependent regulation of the cell cycle. Although the INK4/CDKN2 class (p15/p16/p18/p19) regulates the eumetazoan-specific CDK4/6-cyclin D kinase and is chordate-specific, the Cip/Kip/CDKN1 class (p21/p27/p57) is more general, regulating many CDKs, and seems to have arisen on the eumetazoan stem. In bilaterians, Cip/Kip genes integrate external growth signals, and are regulated transcriptionally and post-transcriptionally by the major growth pathways (see below). The emergence of this class of CDK inhibitors on the eumetazoan stem indicates a central regulatory role even in early animals.
Although cell growth and cell division are tightly coupled in unicellular species, they can be separately regulated in multicellular organisms. In bilaterians, growth is regulated by six major signalling pathways (receptor tyrosine kinase (RTK) signalling via Ras, insulin signalling via the phosphatidylinositol-3-OH kinase (PI(3)K) pathway, Rheb/Tor, cytokine-JAK/STAT, Warts/Hippo, and the Myc oncogene) that also modulate the cell cycle (Supplementary Note 8.2). Whereas the Rheb/Tor pathway dates back to early eukaryotes, the other pathways contain several genes that are holozoan and metazoan innovations. For example, the insulin receptor substrate and phosphotyrosine binding proteins GAB1/GAB2 emerged on the metazoan stem after the divergence of choanoflagellates, indicating that an insulin-signalling-like pathway may have been a key regulator of growth in early animals by tying into the ancient PDK1 and Akt kinases (Fig. 2b). However, because p21, p27 and Mdm2 are all eumetazoan novelties, this pathway may not have acquired the ability to regulate cell proliferation until after the divergence of sponges.
In contrast to the cell-cycle machinery, most of the apoptotic circuitry is unique to animals, increasing in complexity along metazoan, eumetazoan and bilaterian stems (Fig. 2c and Supplementary Note 8.3). Both intrinsic and extrinsic programmed cell death pathways require caspases, a metazoan-specific family of cysteine aspartyl proteases. Amphimedon encodes initiator caspases with the characteristic caspase recruitment and death effector domains, as well as an expanded repertoire of effector capases.
The intrinsic pathway drives cell death by permeabilization of the outer mitochondrial membrane and is regulated by the Bcl2 oncogene family of pro- and antiapoptotic factors. The pro-apoptotic protein Bak arose in the metazoan lineage, whereas Bax and Bok seem to be eumetazoan-specific. Bcl2 and Bcl-X are antiapoptotic and metazoan-specific. Mitochondrial permeabilization releases proteins of varying evolutionary origin, including the ancient apoptosis-inducing factor that contributes to caspase-independent apoptosis, metazoan-specific apoptotic protease activating factor 1, and eumetazoan sensu stricto-specific caspase-activated DNase (CAD) and its regulator ICAD/DFF. The extrinsic apoptotic pathway is activated by external signals through transmembrane tumour necrosis factor receptors (TNFRs) whose intracellular death domain interacts with downstream adaptors. Amphimedon encodes a nerve growth factor receptor (NGFR) p75-like protein, although it lacks the crucial death domain that is seen in Nematostella and bilaterians (see ref. 32); other death TNFRs (that is, Fas, DR4, DR5 and TNFR1) are vertebrate-specific32,33. Because the intrinsic cascade is composed of components that pre-date the metazoans, it is likely to be the original mechanism for inducing apoptosis.
The diagnostic domains of two major cell–cell adhesion superfamilies, the cadherins and the immunoglobulins, are present in Monosiga within the extracellular region of putative transmembrane proteins31,34 (Supplementary Note 8.8). Amphimedon cadherins differ from those of Monosiga in having proteins with domain architectures diagnostic for the metazoan-specific classical cadherin and seven pass transmembrane cadherin subfamilies31,35. A considerable expansion of immunoglobulin-like domain-containing proteins occurred on the metazoan stem, with 218 predicted in Amphimedon versus 5 in Monosiga31. The combination of an N-terminal immunoglobulin domain with C-terminal FN3 repeats is found only in metazoans.
Similarly, metazoan extracellular matrix (ECM) proteins use domains that evolved on the holozoan stem. For example, Monosiga encodes proteins with collagen triple helix repeats and other genes with fibrillar collagen C-terminal domains, but these domains only appear together in metazoans30,31. Thrombospondin domain architectures are found in Amphimedon; however, agrin, netrin and perlecan seem to be eumetazoan innovations. The extracellular matrix receptors, α and β integrin, also seem to be specific to metazoans (Fig. 3a).
Components of the major metazoan developmental signalling pathways as well as classes of transcription factors are mostly present in Amphimedon and absent from Monosiga and other non-metazoan genomes13,14,16,27,29, suggesting that ontogenetic development, including primary germ cell formation (Supplementary Note 8.4), originated on the metazoan stem3,11,12. Although Amphimedon possesses a characteristically metazoan repertoire of transcription factor families (Supplementary Note 8.6)13,14,27,31, in general these families are further expanded in eumetazoans13. Some differences between sponges and eumetazoans correlate with morphological complexity. For example, sponges do not seem to have a mesoderm and accordingly Amphimedon lacks transcription factors involved in mesoderm development (Fkh, Gsc, Twist, Snail). In contrast, sponges possess several transcription factors involved in determination or differentiation of muscles and nerves despite lacking a neuromuscular system (PaxB, Lhx genes, SoxB, Msx, Mef2, Irx and bHLH neurogenic factors)13,14,27. Amphimedon lacks Hox genes and some other transcription factor subfamilies that are involved in specifying and patterning bilaterian nervous systems and body plans13,14,27,36,37.
Signalling cascades, such as the Wnt, TGF-β, Notch and Hedgehog pathways, pattern embryos by specifying cellular identity and coordinating morphogenetic events. The ligands and receptors of all of these cascades are metazoan innovations at the cell surface (Supplementary Note 8.5), apart from the eumetazoan sensu stricto-specific Hedgehog ligand29. The transcription factors specific to these pathways are also metazoan-specific (Tcf/Lef, Smads, CSL, Gli), whereas the cytosolic signal transducers generally have more ancient origins. This pattern suggests that these pathways arose by the engagement of novel ligands and receptors with already active signalling mechanisms, enabling multicellular communication.
Amphimedon also has fewer ligands and receptors in each pathway compared to eumetazoans (three Wnt and two Fzd, eight TGF-β ligands and five TGF-β receptors, one Notch and five Deltas) (Supplementary Note 8.5), as observed for many transcription factor families. In contrast to transcription factors13,14,27, however, these proteins generally can not be assigned to eumetazoan subfamilies or are obvious recent sponge-specific duplications. This lack of phylogenetic resolution may reflect a period of rapid evolution and diversification of ligand/receptor molecules in sponge and eumetazoan lineages. Perhaps as a consequence, the inhibitors that interact with ligands and receptors to modulate pathway activity also appear to be lineage specific. In particular, inhibitors described from bilaterians were not found in Amphimedon (for example, Chordin, Numb, I-Smads, Wif).
The transition to multicellularity was accompanied by mechanisms to defend against invading pathogens and to prevent the fusion of genetically distinct conspecifics2. Although some metazoan immunity genes originated early in eukaryotic evolution, many are restricted to animals, as illustrated by the signalling cascades shared by the Toll-like receptor (TLR) and the interleukin1 receptor (IL-1R) (Supplementary Note 8.10). An ancestral form belonging to this receptor superfamily was probably present in the last common metazoan ancestor and independently diversified in poriferan and cnidarian lineages. Nuclear factor κB (NF-κB), Tollip and ECSIT genes are present in holozoans; however, most TLR/IL-1R pathway proteins are either composed of metazoan-specific domains (for example, Pellino) or architectures (for example, the death domain with TIR and protein kinase domains in MyD88 and IRAKs, respectively). Immune effector systems also consist largely of metazoan innovations, such as the macrophage-expressed gene 1 that participates directly in pathogen elimination38. Likewise all animals share specific antiviral defence factors such as MDA5-like RNA helicases, and interferon regulatory factor-like proteins, although other systems (for example, RNAi) have more ancient origins39. A primordial complement pathway appears to have evolved exclusively on the eumetazoan (sensu stricto) stem and further diversified in bilaterians40.
Amphimedon and other demosponges encode unique extracellular Calx-β domain-containing proteoglycans called aggregation factors, which promote cell adhesion and may also be involved in allorecognition41. The presence of a cluster of aggregation-factor-related genes in the Amphimedon genome indicates that allorecognition could be under the control of a multigene family.
Sponge cells adhere to form tissue-like layers, but a true epithelial cell layer, characterized by aligned cell polarity, belt-form junctions and underlying basal lamina, is thought to be a eumetazoan innovation. Amphimedon possesses all the main components of the Par, Crumbs and Discs Large complexes, a set of interacting proteins that are largely metazoan-specific and determine polarity in epithelial cells (Fig. 3a and Supplementary Note 8.8). The main proteins comprising bilaterian spot-form and zonula adherens junctions are also present in Amphimedon and appear to be metazoan-specific34,42. By contrast, septate junction and basal lamina proteins appear to be largely eumetazoan innovations (Fig. 3a); Amphimedon does possess several genes with laminin-like domain architectures (Supplementary Note 9.3).
Sponges can sense and respond to their environment, although nerve cells seem to be restricted to eumetazoans sensu stricto43,44. However, the expression of orthologues of post-synaptic structural and proneural regulatory proteins in Amphimedon larval globular cells suggests an evolutionary connection with an ancestral protoneuron36,42. Amphimedon possesses homologues of bilaterian proteins involved in nervous system development (for example, elav- and musashi-like RNA-binding proteins, neural transcription factors), pre- and post-synaptic organization (for example, synaptotagmin)42, endogenous and exogenous signalling (for example, GPCRs), and neuroendocrine secretion, although bilaterian peptide hormones are not detected (Supplementary Note 8.9). Some key synaptic genes are conspicuously missing from Amphimedon (Fig. 3b and Supplementary Note 8.9), including the ionotropic glutamate receptor family42, whereas neuronal-type metabotropic glutamate, dopamine and serotonin receptors are present. Amphimedon has a homologue of the ephrin receptor, an axon guidance protein, although the ephrin ligand and developmental genes involved in axon guidance (for example, slit, netrin, unc-5 and robo) are not present. Amphimedon also possesses over 200 GPCRs, which includes a large lineage-specific expansion of rhodopsin-related GPCRs that are encoded largely by clusters of single exon genes as observed in other metazoans (Supplementary Note 8.9). From these observations we infer that the metazoan ancestor possessed a complex sensory system, and many of the molecular requirements for neural development and nerve cell function. This suggests that exaptation was critical for the genesis of the first nerve cell, with eumetazoan-specific gene innovations providing the regulatory and structural requirements to connect these protoneural components into a functional neuron (Fig. 3b).
With a diverse sample of genomes in hand, we sought differences in gene repertoire that are associated with gross morphological complexity. Figure 4 shows molecular function categories that are significantly enriched (P < 1×10−10) in one or more metazoan complexity group, with the relative frequencies of genes with these functions in each species shown by colour code. Here we have defined broad groupings representing three grades of morphological complexity, guided by the number of described cell types45, including non-bilaterian (or ‘basal’) metazoans (Nematostella, Trichoplax, Amphimedon; ~5–15 cell types), invertebrate bilaterians (Drosophila, C. elegans, sea urchin; ~50–100 cell types), and vertebrates (~225 cell types, represented by the human genome), with a selection of non-animals as an outgroup (Supplementary Note 11). Similarly, using a principal component analysis, we also identified suites of molecular functions that are associated with complexity (Supplementary Figure 11.2). The first component differentiates between metazoans and non-metazoans; the second component partly differentiates between metazoan complexity groups.
Included among the functional categories that correlate with increase in metazoan morphological complexity are (Fig. 4 and Supplementary Table 11.1.1): GPCRs, ion channels, cell adhesion proteins, and defence and immunity proteins, which are enriched in basal metazoans relative to non-animals; homeobox transcription factors and gap junction proteins, which are enriched in bilaterians relative to non bilaterian animals; and immunoglobulin receptor family members, immunoglobulins, MHC antigens, and cytokine receptors, which are enriched in vertebrates relative to invertebrate bilaterians. These broad associations with complexity are evidently superimposed on notable lineage-specific variation as seen in Fig. 4 (for example, serine protease gene loss in C. elegans, and voltage-gated ion channel expansion in Paramecium). Similar functional categories contribute to principal components (Supplementary Table 11.2.1).
The Amphimedon genome, combined with recently sequenced genomes of diverse invertebrates and a choanoflagellate, identifies innovations that underlie the emergence and early diversification of the Metazoa. These genomic comparisons resurrect a common animal ancestor of remarkable complexity. Metazoans can now be defined by a long list of genomic synapomorphies—gene content, intron–exon structure and syntenies—as well as characteristics common to all animal life such as sex, development, controlled cellular proliferation, differentiation and growth, and immunity. To what extent the ancestral functioning of this gene set is reflected in modern poriferans is unclear, although studies of both sponge development, which yields a highly patterned larva with axial polarity12, and sponge immunity provide points of direct comparison with the eumetazoan condition.
Whereas the eumetazoan lineage produced a wide diversity of body forms, the sponge body plan has been stable for over 600 million years. What can explain this disparity in evolved morphological complexity? Although we have seen that sponges and eumetazoans share many common pathways related to morphogenesis and cell-type specification, there are notable genomic differences, including different microRNA assemblages46, lineage-specific domains and domain architectures, and the differential expansions of gene families. Although there has been minimal characterization of cis-regulatory architectures in non-bilaterians, we note that as most classes of bilaterian transcription factors are also present in sponges, cnidarians and placozoans, it may be that quantitative rather than qualitative differences in cis-regulatory mechanisms were needed to produce more diverse body plans.
The sexually-reproducing, heterotrophic metazoan ancestor had the capacity to sense, respond to, and exploit the surrounding environment while maintaining multicellular homeostasis. Although sponges lack some of the cell types found in eumetazoans, including neurons and muscles, they share with all other animals genes that are essential for the form and function of integrated multicellular organisms. With these genomic innovations enabling the regulation of cellular proliferation, death, differentiation and cohesion, metazoans transcended their microbial ancestry.
Detailed methods are described in Supplementary Information. The genome assembly, gene model sequences, predicted proteins and EST clusters and sequences have been deposited with DDBJ/EMBL/GenBank as project accession ACUQ00000000.
A detailed description of methods used in this study can be found in the Supplementary Information.
Genomic DNA was sheared and cloned into plasmid and fosmid vectors for whole genome shotgun sequencing as described48. The data were assembled using a custom approach described in the Supplementary Information. The Amphimedon 9X assembly and the preliminary data analysis has been deposited at DDBJ/EMBL/GenBank as project accession ACUQ00000000.
Protein-coding genes were annotated using homology-based methods (Augustus49, Genomescan50) and one ab initio method (SNAP51). Protein-coding gene predictions are deposited in DDBJ/EMBL/GenBank as accession ACUQ00000000.
Phylogenetic analyses were conducted using Bayesian inference and maximum likelihood with bootstrap using mrbayes54,55, and PHYML56 respectively. Alternative likelihood topologies were tested using TREEPUZZLE57 and CONSEL58. Bayesian analysis using site-heterogeneous models were done using aamodel (J. Huelsenbeck, unpublished) and PhyloBayes59,60.
Putative orthologues of genes involved in various processes in bilaterians were identified by reciprocal BLAST of human, mouse, or Drosophila genes against the Amphimedon gene models (blastp) or the assembly (tblastn). PFAM61 domain composition, assignment of PANTHER HMMs62,63 and phylogenetic trees were used to determine orthology. Trees were built using the neighbour-joining method in Phylip64 with one-hundred bootstrap replicates.
Metazoan gene families were assigned molecular functions using PANTHER62 annotations. Fisher’s exact test as implemented in R65 was run to test for enrichment or depletion of numbers of gene families for each molecular function category in the novel versus ancestral gene sets. Numbers of genes (not gene families) for various molecular function categories were tested for enrichment between different pairs of four eukaryotic complexity groups (vertebrate, non-vertebrate bilaterian, basal metazoan, non-animal) to identify molecular function families that correlate with the differences in complexity. Principal components analysis was used to identify the contribution of each molecular function category to a eukaryotic complexity group.
This study was supported by funds from the Australian Research Council (B.M.D., M.A), US Department of Energy Joint Genome Institute (B.M.D., D.S.R., S.P.L.) Harvey Karp (K.S.K.), NSF (T.H.O.), NIH/NHGRI (G.M.), University of Queensland Postdocotral Fellowship (M.A., S.F.C), Sars International Centre for Marine Molecular Biology (M.A.), DFG (M.St.), ANR (M.V.), CNRS (M.V.), Gordon and Betty Moore Foundation (D.S.R.) and Richard Melmon (D.S.R.). We thank J. Huelsenbeck and I. Hariharan for help with phylogenetic analyses and growth pathways, respectively.
Supplementary Information is linked to the online version of the paper at www.nature.com/nature.
Author Contributions Genome and EST sequencing, assembly, annotation and analysis: J.C., T.M., U.H., N.H.P., M.St., A.D., Y.Z., M.A., A.C., D.M.G., D.J., S.S., B.J.W. and D.S.R. Phylogenetics: M.S. and D.S.R. Gene family and biological process analyses: M.Sr., B.F., M.E.A.G., G.S.R., C.C., M.D., C.L., M.A., S.M.D., T.H.O., D.C.P., S.F.C., C.H., M.V., K.S.K., G.M., B.M.D. and D.S.R. Clustering, novelty, domain content and complexity analyses: O.S. and D.S.R. Gene family expansion analyses: M.S., O.S., D.S.R. Writing: M.S., B.M.D., D.S.R., O.S., J.C., B.F., M.G., G.S.R., G.M., K.S.K., M.V., C.L., S.M.D., N.H.P., A.D., C.C., M.A., T.H.O. and S.P.L. Project design and coordination: B.M.D and D.S.R.
Author Information The genome sequence data can be accessed from DDBJ/EMBL/GenBank as project accession ACUQ00000000. Reprints and permissions information is available at www.nature.com/reprints. The authors declare no competing financial interests. Readers are welcome to comment on the online version of this article at www.nature.com/nature.