|Home | About | Journals | Submit | Contact Us | Français|
We present a perspective on the molecular evolution of the extracellular matrix (ECM) in metazoa that draws on research publications and data from sequenced genomes and expressed sequence tag libraries. ECM components do not function in isolation, and the biological ECM system or “adhesome” also depends on posttranslational processing enzymes, cell surface receptors, and extracellular proteases. We focus principally on the adhesome of internal tissues and discuss its origins at the dawn of the metazoa and the expansion of complexity that occurred in the chordate lineage. The analyses demonstrate very high conservation of a core adhesome that apparently evolved in a major wave of innovation in conjunction with the origin of metazoa. Integrin, CD36, and certain domains predate the metazoa, and some ECM-related proteins are identified in choanoflagellates as predicted sequences. Modern deuterostomes and vertebrates have many novelties and elaborations of ECM as a result of domain shuffling, domain innovations and gene family expansions. Knowledge of the evolution of metazoan ECM is important for understanding how it is built as a system, its roles in normal tissues and disease processes, and has relevance for tissue engineering, the development of artificial organs, and the goals of synthetic biology.
The evolution of multi-cellular eukaryotic organisms from single-celled ancestors was one of the most significant transitions in the evolution of life on earth. It enabled the emergence of larger and more complex eukaryotes that could resist predation, evolve specialized tissues and higher order biological capacities, and colonize new environments. Multi-cellularity evolved independently in several eukaryotic lineages and, in terms of the number of cell types per organism, animals (the metazoa) include the most complex multi-cellular eukaryotes (Rokas, 2008 ). A key mediator of metazoan multi-cellularity is the extracellular matrix (ECM), a multi-component, proteinaceous network that bridges between cells, contributes to their spatial arrangements by binding cell-surface adhesion receptors, and supports cell survival, differentiation, and tissue organization (Hynes, 2009 ). The advantages of increased organism size for more efficient use of nutrients and escape from predation might have acted as selection pressures for the evolution of ECM, and increases in ocean oxygen levels around 850 million years ago likely provided a favorable environment for these changes (Rokas, 2008 ).
Here, we discuss the adhesome of basal metazoa, the modern animals that provide a window on the earliest-evolving ECM components, and consider the expansion of adhesome complexity that occurred in the deuterostome/chordate lineage. Our focus is on the ECM of internal tissues. Specific details are in the supplementary data.
The pathway by which ECM components are secreted from cells is highly conserved in eukaryotes and clearly predates the metazoa (Dacks et al., 2009 ). Many ECM proteins are oligomeric; the most prevalent oligomerization domains are the GXY motifs of collagens that assemble three polypeptides into a supercoiled triple helix, and the α-helical heptad repeats that assemble ECM oligomers from trimers to pentamers. Both the GXY motif and heptad repeat are found throughout the domains of life and in viruses (Odgren et al., 1996 , Beck and Brodsky, 1998 ). ECM proteins also typically contain repeated domains. Many of these—e.g., the vWF_A domain, thrombospondin type 1 domain, or EGF domain—are also of premetazoan origin (Whittaker and Hynes, 2002 , Tucker, 2004 , Wouters et al., 2005 ). Thus, many constituent domains of ECM proteins predate the origin of animals.
With the benefit of recent genome sequences and expressed sequence tag databases it is possible to assess the representation of adhesome components within the metazoa (Figure 1; Supplemental Tables S1–S3; Supplemental File 1). By inference, proteins represented throughout the metazoa were present in the last common metazoan ancestor and so these studies give us insight into the core molecular machinery of the ECM. ECM components that are conserved from sponges to human include the three clades of fibrillar collagens and basement membrane collagen IV (Figure 1). For a GXY triple helix to be stable and assemble into higher order fibrils, additional protein domains and accessory proteins are needed (reviewed by Shoulders and Raines, 2009 ). The fibrillar collagen polypeptides of sponges contain all the expected domains, namely, a triple-helical domain and N- and C-terminal propeptide domains that align three procollagen polypeptides to nucleate the triple helix (Exposito et al., 2008 , 2010 ). Sponges also encode the N- and C-propeptidases that enable higher order assembly of triple helices into fibrils. Thus, the basic machinery for collagen fibrillogenesis is fully present. Correspondingly, collagen fibrils have been identified as macromolecular structures in sponges (Garrone et al., 1975 , Exposito et al., 2010 ). Of the four taxonomic groups of sponges, the Homoscleromorpha also encode collagen IV and contain a basement membrane structure (Boute et al., 1996 ). Other key components of basement membranes, laminin and perlecan, are also present in sponges and are conserved throughout the metazoa (Figure 1, Table S1). Fibrillin, that forms a separate microfibril system, is also universal (Reber-Müller et al., 1995 , Ramirez and Sakai, 2010 ). Thus, three major structural systems of the ECM appear to have originated in the metazoan ancestor.
Surprisingly, ECM components such as agrin, known for its roles in synapses and at neuromuscular junctions (Wu et al., 2010 ), along with thrombospondin (TSP) and SPARC/osteonectin that have specialized roles in vertebrates, are also conserved in sponges (Figure 1 and Supplemental Table S1; Bentley and Adams, 2010 ). The functions of these proteins in basal metazoa are unknown; however, their strong conservation suggests wide roles in the ECM. In mammals, TSPs or SPARC do not self-associate into fibrils, yet they affect the packing of collagen fibrils (Bornstein et al., 2004 , Halász et al., 2007 , Rentz et al., 2007 ). In addition, two major groups of ECM proteases, the matrix metalloproteinases (MMP) and the disintegrin and metalloproteinase with thrombospondin repeats proteases (ADAMTS), are both also conserved throughout the metazoa (Nicholson et al., 2005 , Fanjul-Fernández et al., 2010 ; Supplemental Table S1 and Supplemental File 1). Thus, the core adhesome apparently includes multiple inputs for regulation or dynamic turnover of the higher order organization of collagen fibrils. This is an important consideration for the design of synthetic, ECM-mimetic, three-dimensional environments.
In contrast, with the exceptions of perlecan and agrin, basal metazoa do not encode secreted proteoglycans equivalent to those of vertebrates (observations from BLAST searches). Nevertheless, carbohydrate-based interactions contribute to tissue organization, as in the cell–cell cohesion of sponges (Bucior et al., 2004 ). These functions might be conserved; for example, a nonsulfated chondroitin of Hydra stabilizes membrane tubulation during assembly of the stinging organelle of nematocytes. This activity is similar to the function of chondroitin sulfate in perineuronal ECM nets of the mammalian CNS (Yamada et al., 2007 , Adamczyk et al., 2010 ). Further investigation of proteoglycan core proteins of basal metazoa by nongenomic approaches (e.g., targeted proteomics of glycosaminoglycan-substituted proteins) is needed to assess the significance of proteoglycans in the core adhesome.
The attachment of ECM components to cell surfaces is also essential for ECM function and transmission of mechanical forces. Certain adhesion receptor families: integrins, syndecan, and glypican membrane proteoglycans and CD36, are present in sponges and throughout the metazoa (Figure 1 and Supplemental Table S2) (Brower et al., 1997 , Hughes, 2001 , Mueller et al., 2004 , Chakravarti and Adams, 2006 , Filmus et al., 2008 ). Others [e.g., dystroglycan and discoidin domain receptor (DDR), a receptor tyrosine kinase that binds collagens at the same site as SPARC (Carafoli et al., 2009 )] originated later within the metazoa (Supplemental Table S2). The high conservation of genes encoding core ECM and adhesion receptors is especially striking for Trichoplax adhaerens, a basal metazoan with four cell types in which a morphologically recognizable intercellular ECM has not been detected by transmission electron microscopy (Srivastava et al., 2008 , Schierwater et al., 2009 ) (Supplemental Tables S1 and S2).
The high conservation of core adhesome components has raised the fascinating issue that their evolutionary origins need to be sought outside the metazoa (King, 2004 , Rokas, 2008 ). On present data, ECM proteins and most of the associated adhesome arose through a major wave of innovation that occurred in conjunction with the origin of the metazoa (Figure 1). Choanoflagellates are considered the closest outgroup to the metazoa (Baldauf, 2003 , King, 2004 ) and genome sequencing for the unicellular choanoflagellate Monosiga brevicollis and the colonial Salpingoeca rosetta has enabled the question of metazoan ECM origins to be pursued in more depth. M. brevicollis and S. rosetta each encode two proteins that contain repeated GXY motifs and several other proteins that contain collagen C-propeptide–like domains (King et al., 2008 , Exposito et al., 2008 ; Supplemental File 1). Vital accessory proteins for collagen fibril assembly, collagen propeptidases and lysyl oxidase, are not encoded. Thus, a current model is that fibrillar collagens originated at the dawn of the metazoa by domain shuffling (Exposito et al., 2008 ). Similarly, although some of the individual domains are present (e.g., the laminin G domain), none of the core basement membrane components are encoded in the choanoflagellates examined to date (King et al., 2008 ). Intriguingly, both choanoflagellates encode a fibrillin-like protein (Supplemental File 1). It will be of great interest to learn whether this protein has fibril-forming capacity. A second intriguing molecule is usherin, which in humans has specialized roles in the retina as an ECM component at photoreceptor cell synapses and in the inner ear as a transmembrane component of the Usher protein network (Reiners et al., 2006 ). The conservation of usherin in Cnidaria (Tucker, 2010 ) and the existence of homologues in choanoflagellates (Supplemental File 1) suggests that a fundamental role in cell interactions remains to be uncovered.
Turning to the adhesion receptors, DDR and membrane-bound proteoglycans are not apparent outside the metazoa (Supplemental File 1). However, the taxonomical representation of CD36 and integrins portrays extended evolutionary histories. A CD36-like protein is evident in M. brevicollis, the opisthokont Capsaspora owczarzaki and the slime mold Dictyostelium discoideum (Supplemental File 1). Mammalian CD36 binds collagen and thrombospondin-1 and also has roles in the internalization of long chain fatty acids, oxidized phospholipids, and lipid microparticles (Silverstein and Febbriao, 2009 ). In the sponge, CD36 contributes to the assembly of aquiferous channels that form a primitive circulation system (Mueller et al., 2004 ). We speculate that CD36 evolved originally for nutritional functions, as in the uptake of prey or lipid materials, and later acquired ECM-binding capacities.
Remarkably, Thecamonas trahens (a protist) and C. owczarzaki both encode integrin α and β subunits as well as key components of intracellular integrin signaling (Sebé-Pedrós et al., 2010 ). These data place the evolutionary debut of integrin heterodimers before the metazoan/fungal divergence (considerably earlier than previous estimates) and indicate lineage-specific losses of integrins in the fungi and choanoflagellates examined to date (King et al., 2008 , Shalchian-Tabrizi et al., 2008 , Sebé-Pedrós et al., 2010 ) (Figure 1). The Sib (similar to integrin beta) proteins of Dictyostelium discoideum are intriguing for their similar function in substratum attachment, yet in structure these proteins are only similar to integrin beta by the presence of an extracellular vWF_A domain and NPXY-dependent talin-binding capacity of the cytoplasmic domain (Cornillon et al., 2006 , 2008 ). Present data do not distinguish whether Sibs and integrin β share a common ancestor or are of independent evolutionary origin. Metazoan integrins function in phagocytosis and cell-cell interactions as well as ECM attachment, and it is plausible that integrins in single-celled organisms have “pre-ECM” roles in signaling cytoskeletal reorganizations during prey capture and uptake. If so, the protist integrins could be of great interest for studies of integrin structure and signaling activation mechanisms.
Deuterostomes in general and vertebrates in particular have many elaborations and specializations of tissue ECM distinct from those of basal metazoa and protostomes. These include novel splice variants (e.g., agrin), gene family expansions (e.g., the laminin, thrombospondin, integrin and ADAMTS protease families; Tzu and Marinkovich, 2008 , Huhtala et al., 2005 , Nicholson et al., 2005 , McKenzie et al., 2006 ), and the evolutionary origin of many novel ECM components [e.g., tenascin, fibronectin (FN), CCN (cyr61, ctgf, nov) proteins, and FACIT collagens (Fibril associated collagens with interrupted triple helices), (Tucker et al., 2006 , Huxley-Jones et al., 2007 , Tucker and Chiquet-Ehrismann, 2009a , Katsube et al., 2009 ; Table S3)]. These innovations exemplify shuffling of preexisting domains into novel combinations (e.g., tenascin, CCN), or inclusion of novel domains (e.g., in fibronectin the FN-I domain is deuterostome-specific and the FN-II domain is chordate-specific) (Kawashima et al., 2009 ) (Figure 1).
Many adhesome components represented by single genes in invertebrate deuterostomes (sea urchin, acorn worm, or sea squirt) form a gene family in vertebrates, most likely due to the two rounds of genome-wide duplication that took place early in the vertebrate lineage (Dehal and Boore 2005 ; Huxley-Jones et al., 2007 ) (Figure 1 and Supplemental File 1). Frequently, as exemplified by syndecan and tenascin, even larger gene families are present in bony fish that underwent a third, fish-specific genome duplication (Meyer and Van de Peer, 2005 , Chakravarti and Adams, 2006 , Tucker et al., 2006 ) (Supplemental Table S3 and Supplemental File 1). The subsequent evolution of the duplicated genes has involved subfunctionalization, such that the modern paralogues are expressed under control of different regulatory pathways in distinct cell-types or tissues (e.g., Sun et al., 2005 , Tucker and Chiquet-Ehrismann, 2009b ).
Collectively, these molecular innovations and gene family expansions have resulted in many changes and additions to ECM network systems (Figure 1). Collagen fibrils are of increased size and molecular complexity, due to presence of FACIT collagens and the repertoire of collagen-binding proteoglycans (Exposito et al., 2010 ). The evolution of hyaluronan synthase has resulted in the presence of large (2–25 micron long), water-retaining hyaluronan/aggrecan polymers in the cartilage and brain ECM of vertebrates (Weigel and DeAngelis, 2007 , Yoneda et al., 2010 ). The evolution of FN fibrils is of particular interest. The predicted FN-like protein of Ciona savignyi contains all three canonical domains (FN-I, FN-II, FN-III) but is unusual in having only one FN-I domain, lacking an RGD motif exposed for integrin-binding, and having immunoglobulin-like domains interspersed with the FN-III domains (Tucker and Chiquet-Ehrismann, 2009a ). For vertebrate FN, fibrillogenesis is a cell-dependent process, requiring integrin-binding and cell contractility in addition to intermolecular interactions between multiple FN-I domains in the N-terminal region (Mao and Schwarzbauer, 2005 ). We hypothesize that Ciona FN-like protein does not form a distinct fibril system, but would bind onto other ECM component such as fibrillin or tenascin, or cell-surface heparan sulfate proteoglycans. Although information on FN in the lamprey and shark is currently missing, it is clear that the FNs of bony fish have all the expected molecular features for fibril assembly (Sun et al., 2005 ). The early embryonic lethality of FN gene knockout mice demonstrates the vital importance of this fibril system in vertebrates (George et al., 1993 ).
Deuterostomes also exemplify many innovations or neofunctions of cell-ECM adhesion receptors (Figure 1; Supplemental Table S3 and Supplemental File 1). Whereas RGD- and laminin-binding integrins exist in both protostomes and deuterostomes, major lineage-specific novelties in chordates include the collagen-binding I domain integrin α subunits and the β subunit clades (Hughes, 2001 , Ewan et al., 2005 , Huhtala et al., 2005 ). Although I domain–containing integrins are present in Ciona intestinalis, their structural and functional characteristics indicate an independent evolutionary lineage (Ewan et al., 2005 , Huhtala et al., 2005 , Tulla et al., 2007 ). The major RGD-dependent, FN-binding integrin of vertebrates, α5β1, appeared after the divergence of tunicates and before the divergence of bony fish (Ewan et al., 2005 , Huhtala et al., 2005 ). The hyaluronan receptor, CD44, first appears in bony fish (Supplemental Table S3). These examples illustrate how innovations, domain shuffling, and coevolutionary events expanded the possibilities for tissue specificity, microenvironmental complexity and subtlety of ECM-dependent intracellular signaling and gene expression. These new potentialities contributed to major tissue and bodyplan innovations, including development of the notochord, the pharyngeal arches and neural crest cells (Zhang and Cohn, 2006 , Jeffery et al., 2008 , Hecht et al., 2008 , Kawashima et al., 2009 ). The origination of cartilaginous tissue based on collagen II provided the framework for the development of endochondrial ossification in vertebrates (Wada, 2010 ).
The above discussion focuses on the core adhesome and the expansion of its complexity. This is not to overlook the vast diversity of ECM that supports distinctive tissue- or species-specific functions. Elastic proteins have evolved in multiple lineages as counterparts to the relatively inflexible collagen fibers (Gosline et al., 2002 ). The production of mineralized bio-composites with remarkable mechanical properties that are based on collagen fibrils occurs in many phyla; the 3-m-long silica spicules of glass sponges and the byssus material of mussels are fascinating examples. Such forms of ECM are of increasing interest to bioengineers (Harrington et al., 2010 , Müller et al., 2009 ). Contrastingly, lineage-specific ECM components can fulfill major tissue functions, as exemplified by the role of lamprin in lamprey cartilage (Robson et al., 1993 ). Given the vast number of arthropod and mollusk species and the relatively low representation of these phyla from sequenced genomes, it is likely that the full extent of ECM diversity remains to be uncovered.
The evolution of metazoa cannot be separated from the evolution of their ECM. ECM representation in modern metazoa exemplifies both extreme conservation and extensive adaptive radiation. This viewpoint has been enabled by comparative genomics, but functional investigations in basal metazoa and choanoflagellates are now needed, complemented by proteomics to identify lineage-specific components. Knowledge of the networks and hierarchies by which tissue ECMs are built is relevant to stem cell biology, tissue engineering, and many clinical conditions. Understanding the evolutionary history of the ECM is an important approach to deciphering these systems.
We thank Douglas Keene, Shriner's Research Center, Portland, for the electron micrograph of mouse dermis. We thank Jean Schwarzbauer and Nicole King for open peer review. A portion of this research was carried out by P.G.B. during a Journal of Cell Science Traveling Fellowship.