|Home | About | Journals | Submit | Contact Us | Français|
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Integrins comprise a large family of α,β heterodimeric, transmembrane cell adhesion receptors that mediate diverse essential biological functions. Higher vertebrates possess a single β1 gene, and the β1 subunit associates with a large number of α subunits to form the major class of extracellular matrix (ECM) receptors. Despite the fact that the zebrafish (Danio rerio) is a rapidly emerging model organism of choice for developmental biology and for models of human disease, little is currently known about β1 integrin sequences and functions in this organism.
Using RT-PCR, complete coding sequences of zebrafish β1 paralogs were obtained from zebrafish embryos or adult tissues. The results show that zebrafish possess two β1 paralogs (β1–1 and β1–2) that have a high degree of identity to other vertebrate β1 subunits. In addition, a third, more divergent, β1 paralog is present (β1–3), which may have altered ligand-binding properties. Zebrafish also have other divergent β1-like transcripts, which are C-terminally truncated forms lacking the transmembrane and cytoplasmic domains. Together with β1–3 these truncated forms comprise a novel group of β1 paralogs, all of which have a mutation in the ADMIDAS cation-binding site. Phylogenetic and genomic analyses indicate that the duplication that gave rise to β1–1 and β1–2 occurred after the divergence of the tetrapod and fish lineages, while a subsequent duplication of the ancestor of β1–2 may have given rise to β1–3 and an ancestral truncated paralog. A very recent tandem duplication of the truncated β1 paralogs appears to have taken place. The different zebrafish β1 paralogs have varied patterns of temporal expression during development. β1–1 and β1–2 are ubiquitously expressed in adult tissues, whereas the other β1 paralogs generally show more restricted patterns of expression.
Zebrafish have a large set of integrin β1 paralogs. β1–1 and β1–2 may share the roles of the solitary β1 subunit found in other vertebrates, whereas β1–3 and the truncated β1 paralogs may have acquired novel functions.
Integrins are a family of metazoan cell surface receptors that play critical roles in cell adhesion, migration, differentiation and survival . Integrins are heterodimeric glycoproteins containing non-covalently associated α and β subunits, and are grouped into sub-families according to the identity of the β subunit. In humans, eight different β subunits combine with 18 different α subunits to form 24 functionally distinct heterodimers. Integrins have a large extracellular domain responsible for interacting with extracellular ligands, and a small intracellular domain that binds to cytoskeletal and signaling proteins. Integrins assimilate information from the extracellular and intracellular environments by acting as bi-directional transducers of signals across the cell membrane. Hence, the binding of an extracellular ligand can elicit activation of intracellular signaling pathways and reorganisation of the cytoskeleton [2,3]. Conversely, changes in intracellular signaling can result in the stimulation or inhibition of ligand binding due to conformational changes in the extracellular domains of an integrin [4-6].
Integrin ligands include the major ECM components laminins, fibronectin and collagens. The β1 integrin sub-family contains 12 different heterodimers in mammals, which form the major group of cell-ECM receptors. An ECM-binding β chain is probably the most ancient, in evolutionary terms, of all the integrin β chains [7,8]. β1 integrins have widespread essential functions both during development and in the adult organism [9-12].
Sequence analysis and X-ray crystal structures [13-15] have demonstrated that all integrin β subunits have an identical domain structure (shown schematically in Fig. Fig.1).1). At the N-terminus there is a PSI (plexin/semaphorin/integrin) domain, into which is inserted an immunoglobulin fold known as the hybrid domain. A von Willebrand factor type A domain (known as the A or I-like domain) is in turn inserted within the hybrid domain. The PSI domain also links directly to the C-terminal portion of the β subunit, which contains four epidermal growth factor (EGF)-type repeats (EGF-1 to EGF-4), a cystatin-like fold known as the β-terminal domain (βTD), a transmembrane domain and a cytoplasmic tail.
The A domain is the portion of the β subunit that is involved in extracellular ligand binding; this domain also is critical for the interactions with the α subunit that lead to heterodimerisation. The A domain consists of a central β-sheet encircled by seven α helices, and has two large sequence insertions compared to other A domains . These insertions, which are critical for ligand binding and for association with the α subunit, lie in the loops between β-strands B and C and between β-strand D and α helix 5 [13,14]. The βC-βD loop (otherwise known the 'specificity-determining loop') contains a small disulfide-bonded segment that contributes to the differences in ligand-binding specificity between different integrin heterodimers [17-19]. The A domain also contains three cation-binding sites: MIDAS (metal-ion dependent adhesion site), ADMIDAS (adjacent to MIDAS) and LIMBS (ligand-associated metal binding site) [13,14]. The MIDAS site is critical for ligand recognition, whereas the ADMIDAS and LIMBS sites appear to have regulatory roles [20,21].
X-ray crystal structures [13-15] of the extracellular domains have revealed that the overall shape of the β subunit is that of a 'head' connected to a long 'leg' with an intervening 'genu' or 'knee' (Fig. (Fig.1).1). The A domain and hybrid domain form part of the integrin head region, whereas the PSI and EGF-1 and EGF-2 lie in the knee region; the remaining EGF repeats and βTD domain contribute to the leg region. The conversion of an integrin from an inactive to an active state is proposed to involve an unbending of the knee region, and a key hinge point in the knee region has recently been proposed to lie in EGF-2 . A critical feature of this unbending is the release of the hybrid domain from interactions with other integrin domains, thereby allowing the hybrid to undergo an outward pivoting. In turn, this movement causes conformational changes in the A domain that promote ligand recognition [14,22]. The β subunit cytoplasmic tail contains a HDRRE motif for interaction with the α subunit cytoplasmic domain , and NPXY sequences that are involved in binding to cytoskeletal proteins such as talin and filamin .
The zebrafish is an emerging model organism with a large number of attractive features for studying the function of genes, such as: (i) the external and rapid development of embryos, (ii) the transparency of embryos, which allows visualisation and tracking of individual cells or groups of cells, (iii) the high degree of conservation between zebrafish and human genes, (iv) the similarities between human and zebrafish embryonic development resulting in a large proportion of tissues and organs being grossly similar between the two species, (v) its genetic tractability, aided by the availability of powerful genetic tools such as gene knock-down by morpholino-modified antisense oligonucleotides (morpholinos)  and gene knockout by target-selected mutational inactivation of genes (TILLING), and (vi) the partially complete genome sequence . In addition, zebrafish is finding increasing use as a model organism to study human diseases [27,28]; in many of these disorders integrin-ligand interactions play important roles in the initiation or progression of the disease .
To date, our knowledge of the biological roles of integrins in vertebrates has been derived mainly from knockout studies in mice [30,31]. For the reasons described above, zebrafish provide a very attractive alternative model system for studying integrin function during development and disease. Currently, however, very little published information is available concerning the sequences and complement of integrin genes in this organism. Here we have investigated if β1-like genes and transcripts are found in zebrafish. We found that zebrafish have two β1 paralogs that are closely related to the single gene found in other vertebrates. Surprisingly, we also found multiple, more divergent, zebrafish β1 paralogs, many of which are truncated and entirely novel forms.
To amplify β1-like sequences, primers were designed based on partial sequences in the zebrafish genome project or from EST databases (see Methods). Complete coding sequences were amplified by RT-PCR using RNA from 4-day old embryos. Two of the sequences obtained, designated β1–1 and β1–2, had a high degree of identity to other vertebrate β1 sequences, such as human β1 [Genbank Q8WUM6] (Fig. (Fig.2).2). These two sequences are >76% identical to human β1 overall but two regions of the subunits are extremely well conserved (Fig. 3A,B). The first of these regions is the A domain, indicating that the key functional features of this domain are retained in these β subunits. Highly conserved portions of the A domain include the loops important for ligand binding and association with the α subunits (Fig. (Fig.2).2). All residues involved in the coordination of the MIDAS, ADMIDAS and LIMBS cations are also completely conserved (Fig. (Fig.2).2). The second region of very strong conservation is the transmembrane domain and cytoplasmic tail, suggesting that the signaling and intracellular ligand binding properties of these sequences have been highly conserved. The conserved features include the HDRRE α subunit cytoplasmic domain interaction sequence and the two NPXY motifs [23,24]. All 56 cysteine residues in the extracellular domains are also perfectly conserved between β1–1, β1–2 and human β1. The sequences of β1–1 and β1–2 are 81% identical to each other; β1–2 is 84% identical to a published catfish β1 sequence  [Genbank:Q9AI01].
A third sequence amplified from zebrafish embryos, designated β1–3, was considerably more divergent in sequence from human β1 than β1–1 and β1–2 (Fig. (Fig.33 and Table Table1).1). β1–3 also has fewer potential N-linked glycosylation sites (five) than β1–1 and β1–2 (which both have eight). However, as observed for β1–1 and β1–2, all 56 cysteine residues are perfectly conserved. Overall, β1–3 is more closely related in sequence to β1–2 than to β1–1 (Table (Table1).1). In comparison to the very high degree of sequence conservation seen for the A domains of β1–1 and β1–2, the A domain of β1–3 is not well conserved (Fig. (Fig.3).3). Although the MIDAS and LIMBS cation-binding sites are conserved in the β1–3 A domain, two aspartate residues are substituted by Ser/Asn at the ADMIDAS site; these two aspartate residues are present in all other chordate β1 subunits. Because each aspartate residue provides two carboxylate oxygen atoms for coordination of the AMIDAS cation, the double substitution will probably lead to loss of cation binding at the ADMIDAS. We have previously shown that mutation of these residues in human β1 causes constitutive inactivation of α5β1 . In addition, the sequence of the 'specificity-determining loop' between β strands B and C differs at several positions to that of the other β1 subunits; most notably, the sequence of the disulfide-linked loop  in β1–3 (CFPSDC) is markedly different to that of β1–1 and β1–2, which are both very similar to human β1 sequence (CTSEQNC). Hence, β1–3 is likely to have altered ligand-binding and α subunit association properties compared with β1–1 and β1–2 [17,19]. However, the transmembrane domain and cytoplasmic tail of β1–3 is highly conserved, and this region is therefore likely to retain all the signaling characteristics and intracellular ligand binding features of β1–1 and β1–2.
Several zebrafish database partial EST sequences  (Table (Table2)2) appeared to be orthologous to human β1 but to differ in sequence with β1–1, β1–2 and β1–3. Two of these ESTs, ESTa and ESTb, were selected for complete sequencing. Although the ESTs were found to be full-length cDNA clones extending from the start codon to the polyA tail, the integrin sequence carried a stop codon at the end of the EGF repeats (see Additional file 1). Hence, these transcripts were lacking the βTD, transmembrane and cytoplasmic domains. The coding sequences of the two ESTs were 95% identical to each other at the amino acid level. The two ESTs had almost identical 5' sequences but were more divergent in the 3' region, including the 3'UTR. In order to examine whether similar sequences could be amplified from zebrafish embryos, we performed RT-PCR using a common forward (5') primer but with different reverse (3') primers, which were specific for either ESTa or ESTb (see Additional file 1). PCR products were subcloned and individual clones were sequenced. The most abundant sequence obtained using the ESTa reverse primer from 4-day-old embryo RNA was designated β1tr-1 (truncated β1 paralog 1); the predominant sequence obtained using the ESTb reverse primer was designated β1tr-2 (truncated β1 paralog 2). Since ESTa was obtained from an adult kidney cDNA library, we also used the same primer sets to amplify products from reverse transcribed kidney RNA. The most abundant sequence obtained using the ESTa reverse primer was identical to that of β1tr-1; however a distinct sequence, designated β1tr-3, was the predominant sequence obtained using the ESTb reverse primer. All three of these paralogs carried a stop codon at the same position as that found in the two ESTs (see Additional file 1), resulting in a truncated translation product terminating four residues after the end of EGF-4. Hence, these data clearly establish that this group of paralogs are bona fide truncated β1-like subunits. The sequence of the truncated paralogs is shown in Fig. Fig.2,2, and in alignment with the EST sequences in Fig. Fig.4.4. The truncated β1 paralogs are 95–98% identical to the EST sequences or to each other (Table (Table1);1); β1tr-1 and β1tr-2 are most like ESTb, whereas β1tr-3 has a similar degree of identity to both ESTa and ESTb. The sequence differences between the truncated paralogs were not due to PCR or sequencing errors because the sequences were confirmed in multiple independent PCR products. The truncated paralogs are even more divergent from human β1 than is β1–3 (Table (Table1,1, Fig. Fig.5A),5A), and like β1–3, these paralogs have relatively few potential glycosylation sites (β1tr-1 and β1tr-2 both have three sites, β1tr-3 has four). However, all cysteine residues remain perfectly conserved, although an additional cysteine residue is present in EGF-4. This extra cysteine residue could form a disulfide bond with the cysteine residue present in the additional four residues at the C-terminus (Fig. (Fig.44).
All the truncated paralogs are closely related to β1–3 (Fig. (Fig.5B),5B), with ~80% sequence identity overall. Hence, together with β1–3 they form a novel group of β1 paralogs. In a surprising contrast to what is observed regarding sequence conservation between β1–1 β1–2 and human β1(Fig. 3), the PSI and hybrid domains are almost perfectly conserved between the truncated paralogs and β1–3, whereas the A domains are much less well conserved (Fig. (Fig.5B).5B). In addition, the sequence of EGF-1 is highly conserved, and the first two EGF repeats of β1tr-3 are almost identical in sequence to those of β1–3 (Fig. (Fig.5B).5B). The major region of sequence variation between the truncated β1 paralogs is the putative linker sequence in EGF repeat 2 , with some of the paralogs having a six amino acid deletion in this region (Fig. (Fig.44).
The truncated paralogs, like β1–3, have a substitution of two key aspartate residues by Ser/Asn at the ADMIDAS. In addition, these paralogs are also lacking an essential LIMBS residue (E189 in human β1 [E169 in the mature sequence]) due to a glutamate to arginine substitution. A similar mutation in human β1 (APM and J. Askari, unpublished data) or β3  perturbs ligand recognition. Hence, the A domains of the truncated paralogs may not be functional for ligand binding, or may possibly bind to ligands in a different manner to other β subunit A domains. The MIDAS coordinating residues are conserved in the truncated paralogs; hence, it remains possible that they may bind ligands in a manner akin to the A domains found in some α subunits .
Overall, the sequences of the A domains in the truncated paralogs are highly divergent from other chordate sequences; indeed, they are only slightly more similar to human β1 A domain (57–58% identity) than the Ciona β1 A domain (56% identity). The loops important for ligand binding and association with the α subunit are poorly conserved. For example, the truncated paralogs all have a large deletion of nine amino acids in the β B-βC loop. A second part of the α/β subunit interface involves residues in the βD-α5 loop, especially those that form a short stretch of 310 helix (DGKL) . This region is highly conserved between the full-length paralogs and human β1 (Fig. (Fig.2)2) but has several substitutions in the truncated paralogs. Taken together, these findings suggest that the truncated β1 paralogs may have weaker associations with α subunits, associate with different α subunits, or even may not associate with α subunits at all. However, even if the truncated paralogs are able to associate with α subunits, they would form non-signaling receptors, as they lack the transmembrane and cytoplasmic domains.
Gene sequences corresponding to β1–1, β1–2 and β1–3 can be identified in the zebrafish genome (Table (Table2)2) (although there are a number of inaccuracies and gaps in the GENESCAN ab initio gene predictions). Currently, the gene for only one truncated paralog can be identified from the latest genome sequence (Zv6), corresponding almost exactly to ESTb (Table (Table2).2). β1–2, 'ESTb' and β1–3 sequences lie adjacent to one another in a tandem array on chromosome 2; β1–1 lies in a separate region of the genome (chromosome 24). The genomic sequences show almost perfect conservation of exon- intron boundaries (data not shown), suggesting that all the β1 paralogs arose from duplication of a single ancestral β1 gene.
For phylogenetic analysis, the six zebrafish β1 sequences and two EST sequences were aligned with human β1 (see Fig. Fig.22 and Additional file 2). The Ciona β1 sequence is ancestral to β1/β2/β7  and is used as an outgroup for the phylogenetic analysis. The resulting phylogenetic analysis is presented in Fig. Fig.66 in the form of a maximum likelihood (ML) tree with supporting data from 1,000 neighbor-joining bootstrap replicates, 1,000 maximum parsimony bootstrap replicates and Bayesian clade credibility values. The phylogeny suggests that: (i) the zebrafish β1 paralogs are all derived from a single ancestral β1 gene present in the last common ancestor of zebrafish and tetrapods, (ii) the duplication that gave rise to β1–1 and β1–2 occurred after the divergence of the tetrapod and teleost lineages, (iii) a subsequent duplication gave rise to the ancestor of the divergent paralogs, (iv) this ancestral divergent paralog then duplicated, giving rise to β1–3 and the ancestor of the truncated paralogs, (v) a very recent tandem duplication of the ancestral truncated paralog has taken place. The branching order of the truncated paralogs has low statistical support, indicative of the high sequence similarity. However, the statistical analysis suggests that ESTb, β1tr-1 and β1tr-2 are more closely related to each other than to ESTa and β1tr-3, and probably arose more recently.
The ML tree was supported by maximum parsimony and Bayesian analyses. Only neighbor joining analysis (not shown) produced two vertebrate clades where β1–1, β1–2 and human β1 grouped separately to the other sequences. However, the genomic locations of the different paralogs add support to the ML analysis. For example, the position of β1–2 and β1–3 in close proximity on the same scaffold suggests that β1–2 and β1–3 arose from a tandem duplication of a common ancestor. The presence of a truncated paralog in the same region of the genome adds weight to the proposal (supported by all the phylogenetic analyses) that the ancestor of the truncated paralogs arose from a duplication of the precursor of β1–3.
We examined the expression of the different β1 paralogs at different times of development using RT-PCR with primers specific for each paralog (Fig. (Fig.7).7). The results showed that both β1–1 and β1–2 were expressed throughout development. The expression of β1–2 appeared to peak at around 14 hours post fertilization (hpf) and declined thereafter. In contrast, the expression of β1–1 appeared to be highest at later stages of development (48 hpf onwards). The more divergent β1 paralog β1–3 was not found to be expressed during early development; β1–3 message was first detected at 14 hpf and expression strongly increased at later stages of development. The truncated paralogs β1tr-1 and β1tr-2 were also expressed only at later stages of development, with message first detected at 24 hpf. In contrast, β1tr-3 was expressed throughout development.
The expression of the different β1 paralogs in adult tissues was also examined by RT-PCR (Fig. (Fig.8).8). The results showed that β1–1 and β1–2 were expressed in all tissues, although the expression of β1–1 in intestine was low, whereas β1–2 was highly expressed in this tissue. Expression of β1–3 was widespread, but tissues such as muscle and brain showed only low expression. The truncated paralogs β1tr-1 and β1tr-2 were only expressed in a limited number of tissues, with expression being particularly strong in liver. In contrast to the tissue-restricted expression of β1tr-1 and β1tr-2, β1tr-3 was found to be expressed in all tissues.
EST sequences corresponding to the different β1 paralogs can be identified in the NCBI databases (Table (Table2).2). The majority of these EST sequences correspond to those of truncated paralogs. Although no EST sequences that exactly matched β1tr-1 or β1tr-2 were found, ESTs corresponding to the 5' and 3' ends of β1tr-3 could be identified (Table (Table2).2). (Conversely, although there are many additional EST sequences that correspond exactly with ESTa (Table (Table2),2), we did not amplify any precisely matching sequences from zebrafish embryos.) EST sequences also provide evidence concerning the expression of the different β1 paralogs in different tissues or at different stages of development (Table (Table2).2). ESTs for the truncated paralogs (especially ESTs corresponding to ESTa) are found in cDNA from several different tissues, and appear to be particularly abundant in liver and kidney.
The novel findings of this report are: (i) Extensive expansion of the β1 integrin gene family has taken place in zebrafish, (ii) a group of divergent β1 paralogs with regulatory site mutations is present, some of which are novel truncated forms that lack the transmembrane and intracellular domains, (iii) the different β1 paralogs exhibit varied patterns of temporal and tissue expression. Our findings raise several important questions. For example: Why do zebrafish have multiple versions of the single β1 gene found in higher vertebrates? What is the function of the divergent paralogs? Which α subunits does each paralog pair with?
Gene duplication is an important mechanism for the generation of phenotypic complexity, diversity and innovation. Gene duplication can occur either through tandem duplication of an individual gene, segmental duplication of a portion of a chromosome, or through whole genome duplication. Recent analyses have provided strong support for a whole genome duplication in the lineage of ray-finned fishes approx. 350 million years ago [34-36]. Following this duplication, most of the gene duplicates were rapidly lost. However, a small proportion of these duplicates were retained. Hence, for approximately 20% of single-copy human genes, two corresponding orthologs are found in zebrafish . It is possible that β1–1 and β1–2 arose from the fish-specific whole genome duplication. Consistent with this possibility, β1–1 and β1–2 have different chromosomal locations and furthermore, both gene loci show synteny with the p11.22 region of human chromosome 10, which contains the β1 (ITGB1) gene (APM, unpublished data). Two co-orthologs of β1 are also found in other teleost species such as fugu .
In the classical model of gene duplication , new gene duplicates face one of two fates: either one copy mutates to a pseudogene (called nonfunctionalization), or one copy preserves the original function, and the other copy mutates freely until by chance it obtains a sequence that confers a new, beneficial, positively selected function (called neofunctionalization). In a third possibility, called subfunctionalization or subfunction partitioning, the complementary partitioning of ancestral subfunctions is the mechanism that preserves gene duplicates . Two classic examples of subfunction partitioning in zebrafish are the pair of sox9 genes, sox9a and sox9b , and the two mitf genes, mitfa and mitfb . It is possible that ancestral β1 integrin functions have been partitioned between β1–1 and β1–2. Subfunction partitioning can occur through tissue-specific expression of gene duplicates and/or expression at different times of development. We found that there are differences in temporal expression of β1–1 and β1–2, with the expression of β1–2 peaking in early development, whereas expression of β1–1 was highest during later development. Also some differences in the tissue expression between these two paralogs were noted, with low expression of β1–1 in adult intestine. It is also possible that partitioning could take place if each β chain associates with a distinct subset of α chain partners. The possible subfunction partitioning of β1 roles by β1–1 and β1–2 may provide distinct advantages for functional genetic analyses in zebrafish. Whereas knockout of the single β1 gene in mice is very early embryonic lethal [9,10], knockout (or knockdown) of only one of these genes in zebrafish may have a much less severe phenotype, making it possible to study the roles of β1 integrins during the later stages of development.
In contrast to β1–1 and β1–2, it seems likely that the divergent β1 paralogs arose through tandem duplications. Taking the genomic, phylogenetic and sequence identity data together, the most likely scenario is that the precursor of β1–2 underwent a tandem duplication that gave rise to the ancestor of β1–3; a subsequent duplication of this paralog gave rise to the progenitor of the truncated paralogs. It seems likely that the divergent paralogs have acquired novel functions. With the exception of β1tr-3, these paralogs are expressed at later times of development and in a more tissue-restricted manner than β1–1 and β1–2. It therefore appears that the divergent paralogs may be involved in tissue differentiation, maintenance or remodelling, rather than in development/organogenesis. The expression of many of these paralogs in tissues such as liver and kidney supports this view.
One clue to the function of the divergent paralogs is that they possess a double substitution (DD to SN) in the regulatory ADMIDAS cation-binding site in the A-domain. Preliminary results suggest that engineering an identical substitution in human β1 (in a heterodimer with α5 subunit) produces a constitutively inactive integrin, i.e., an integrin with very low affinity for ligand (J. Askari and A.P. Mould, unpublished data). It is therefore possible that β1–3 could act as a dominant-negative β1 subunit by competing with β1–1 and β1–2 for α subunit partners or intracellular binding factors. It is also proposed that expression of high levels of unoccupied integrins can trigger apoptosis ; hence, expression of β1-3 could also regulate cell survival. If the truncated paralogs are unable to associate with α subunits they could be secreted into the extracellular environment. There are now many examples of cell surface receptors that also have a secreted antagonistic counterpart, e.g. VEGFR, FGFR-1 and IL-1R [43-45]. It is therefore intriguing to speculate that truncated integrin paralogs could have a similar mode of action.
It is also possible that the divergent paralogs could have acquired new binding partners. The very high degree of sequence conservation of the PSI, hybrid domain and EGF-1 between the divergent paralogs suggest that these domains form a key functional region of these polypeptides, and furthermore that the function of these domains is common to all the divergent β1 paralogs. Such a degree of conservation suggests that these domains interact with protein ligands (even though there is currently no evidence in other integrins that these domains have extracellular ligands). The PSI, hybrid, and EGF-1 domains lie adjacent to each other at, or close to, the knee region of the integrin . Proteins that bind to these domains could therefore alter function, either by constraining the integrin in a bent (low affinity) form or by perturbing the interactions between PSI/hybrid/EGF-1 domains and other integrin domains, thereby favouring the unbent (high affinity) form. A key hinge point in the β subunit knee region is thought to be the near the start of EGF-2 . Interestingly, some of the truncated paralogs have a six amino acid deletion in this linker region (Fig. (Fig.4),4), which is likely to change the conformational state of the protein, and may therefore modify function. Intriguingly, the second EGF repeat in β1tr-3 is essentially identical to that of β1–3, implying that this repeat may also have a conserved role between these two paralogs.
It is currently unclear how many separate genes are represented by the truncated β1 paralogs and ESTs. Although the sequences β1tr-1 and β1tr-2 are 98% identical to each other they are probably distinct gene products because they have different 3'UTRs. In addition, although β1tr-2 and β1tr-3 may have identical 3'UTR sequences, it appears unlikely that β1tr-3 is an alternatively spliced form of β1tr-2 because the differences between the two coding sequences are spread throughout the primary structures (Fig. (Fig.44 and Additional file 1). Additionally, the truncated paralogs cannot simply represent different alleles of the same gene because at least three of these paralogs could be amplified from RNA derived from a single fish (APM and Paul Walker, unpublished results). Furthermore, the genomic sequence that corresponds to ESTb (which is 98% identical in coding sequence to β1tr-2) does not contain sequences that would allow the generation of different transcripts by alternative splicing (data not shown). In summary, the currently available evidence suggests that the truncated paralogs originate from separate (yet to be characterised) genes. Since the genomic scaffold is assembled from shotgun sequences it is possible that, due to the almost identical sequences of the truncated paralogs, the scaffold is incorrectly assembled and additional genes of truncated paralogs may be present in the same region. In the future, the availability of finished clones in the region of the genome that contains 'ESTb' for detailed analysis should help to clarify the precise number of truncated β1 paralogs in the zebrafish genome. A further puzzle is why none of the truncated β1 sequences we amplified from zebrafish RNA exactly matched the ESTa or ESTb sequences. A possible explanation is that the β1tr-1, β1tr-2 and β1tr-3 sequences were only the predominant sequences obtained by subcloning of the RT-PCR products. Other clones obtained have not yet been fully sequenced, some of these may match the EST sequences more precisely.
Truncated forms of β subunits have previously been reported in sponge and man [7,46]. However, these polypeptides represent alternatively spliced forms of full-length β subunits that are truncated within the hybrid domain, and therefore probably result in non-functional proteins . In contrast, the truncated β1 paralogs reported here appear to be unique gene products, and the position of the truncation would preserve the structure and function of all of the extracellular domains with the exception of the βTD.
A key subject of future investigations will be to identify the α subunit partners of the different paralogs. All twelve α chains that pair with β1 in higher vertebrates have orthologs in the zebrafish genome, and at least two of these α chains have two co-orthologs (APM, unpublished results). It seems very unlikely that all of the β1 paralogs identified here are able to associate with all of the >14 potential α chain partners. For example, if each of the six β1 paralogs has retained the ability to pair with all of the α chains then a minimum of 84 different heterodimers could form. This would present a huge increase in complexity over the twelve heterodimers found in higher vertebrates. It is likely, therefore that in this process of multimeric protein evolution, previously termed 'molecular incest' , that only a limited number of possible combinations was retained based on function.
It is not yet clear if divergent β1 paralogs are found in other teleosts. Currently only limited gene information is available on other fish species, with the exception of the pufferfish Takifugu rubripes and Tetraodon nigroviridis. Analysis of the pufferfish genomes suggests that only two β1 paralogs are present in fugu and tetraodon; these paralogs are closely related to zebrafish β1–1 and β1–2 (AP Mould, unpublished results). However, since β1–3 probably arose from an ancient duplication of the ancestor of β1–2 it would be surprising if the former paralog was retained only in the zebrafish lineage. Furthermore, although the divergent β1 paralogs have not been detected in pufferfish genomes, pufferfish may have retained fewer gene duplicates than zebrafish .
The presence of expanded gene families is hypothesized to have made a major contribution to the extraordinary diversity and evolutionary success of teleosts, which make up half of all vertebrate species. It is known that fish genomes are more 'plastic' than other vertebrate genomes, partly due to higher rates of gene duplication . Fish-specific novel members of gene families may contribute to a large extent to the distinct physiology of fishes and mammals, while differential retention of duplicate genes may have facilitated the isolation of emerging species during the vast radiation of teleosts [51,52]. Our findings show that integrins are a dynamic family of genes that have evolved in multiple ways after the divergence of the common ancestors of the mammalian and fish lineages. Expansion of the integrin family may also correlate with expansion of extracellular matrix gene families, e.g. collagens, in teleosts .
At least six paralogs of the integrin β1 gene have been identified in zebrafish, demonstrating that this species has a greatly expanded integrin repertoire. Two of these paralogs may share the functions of the single β1 subunit found in higher vertebrates, whereas the remaining paralogs may have acquired novel roles. This is the first description of truncated β1 chains, which we speculate could be secreted proteins that act as regulators of integrin functions.
The complete amino acid sequence of human β1 [Swiss-Prot: P05556] was used to probe the Danio rerio genome ([59,61]) using TBLASTN  to identify genes with highest sequence identity to human β1. To amplify complete β1 sequences, primers were designed using putative 5' and 3' sequences (identified as described above) or from EST sequences on the NCBI database . The following primers were used: β1–1, 5'-ATGGACCTGAAGCTACTTTTCATATC-3' and 5'-CTGATGGCCATTATTTGCCTTCG-3', β1–2, 5'-ATGGACGTAAGGCTGCTCCTG-3'and 5'-CACGTTCGTCCATTATTTGCCCTC-3', β1–3, 5'-ATGAAAATGAAGCTGCTGTTATTATC-3' and 5'-CACTTTCCCTCATATCTGGGATTC-3' β1tr-1, 5'-ATGGATATAACAGTTTTGTTATTATCAG-3' and 5'-ATGTATAACATGAGGTCATGATGTAC-3' β1tr-2 5'-ATGGATATAACAGTTTTGTTATTATCAG-3' and 5'-GTATAACATGTGTCTCAATATATGATG-3'
Total RNA was prepared from 4-day old embryos using TRI reagent (Sigma), and reverse transcription was performed using Superscript II (Invitrogen) according to the manufacturer's instructions. PCR reactions were performed using Phusion (New England Biolabs). Cycling parameters were 98°C for 30 s, followed by 40 cycles of 98°C for 10 s, 60°C for 20 s and 72°C for 60 s. An additional sequence, β1tr-3, was amplified from reverse transcribed RNA from adult kidney (prepared as described below) using the same primers as for β1tr-2. The PCR reactions generated products of ~2.4 kB for β1–1, β1–2 and β1–3, and ~1.9 kB for β1tr-1, β1tr-2 and β1tr-3. EST clones ESTa and ESTb were obtained from RZPD German Resource Center for Genome Research, product numbers IRAKp961B08165Q [IMAGE:6525557] from adult kidney cDNA library and IMAGp998I1214695Q3 [IMAGE:7001749] from whole adult body cDNA library, respectively.
PCR products were analysed on 1.5% agarose gels. Sequencing of PCR products or EST clones was performed using the BigDye cycle sequencing kit (Applied Biosystems). PCR products from reactions using the 5'-ATGGATATAACAGTTTTGTTATTATCAG-3' forward primer contained a mixture of sequences and therefore these products were subcloned into the Zero Blunt TOPO cloning kit (Invitrogen). Sequencing of individual clones revealed a single most abundant sequence (β1tr-1, β1tr-2 or β1tr-3). Alignment of zebrafish and human sequences was performed using CLUSTAL W .
The β1 integrin sequences identified in Danio rerio were aligned with the human β1 sequence [Swiss-Prot: P05556] and the ancestral β1-like gene previously identified in Ciona intestinalis (JGI Ciona v1.0 ci0100141446)  using CLUSTAL X . Gap-containing sites were removed from each alignment and Maximum Likelihood trees were inferred using PROML from the PHYLIP package . The JTT model of amino acid substitutions was used with and without global rearrangements and correction for rate heterogeneity (α value obtained from TREEPUZZLE ). The topologies of the trees were tested using three independent methods. Neighbor-Joining and Maximum Parsimony bootstrap replicates were obtained using the PHYLIP package . Bayesian tree inference values were produced from the MrBayes programme .
Embryos were harvested at different times post fertilization. Tissues from adult fish (~9 months old) were removed by dissection and flash frozen in liquid nitrogen. Total RNA was prepared using Trizol (QIAGEN). Reverse transcription was performed using M-MLV reverse transcriptase or Omniscript (Invitrogen) according to the manufacturer's instructions. PCR reactions were performed using recombinant Taq (a gift from P. Walker, University of Manchester, UK) or Advantage 2 polymerase (BD Biosciences). For the truncated β1 sequences, primers were designed to give a single product of approx. 600 bp that is unique to β1tr-1, β1tr-2 or β1tr-3. The following primers were used:
β1–1 5'-ATGGACCTGAAGCTACTTTTCATATC-3' and 5'-GTGACGTTTCTCCAGCCAATGTG β1–2 5'-GATGGTAATGAATGCACCAAGGC-3' and 5'-GGAGTCGGAGGTAAGCGTTCC-3' β1–3 5'-GTGTTGTTTGATATAGAAATCACGGCT-3' and 5'-CGTATCCCACTTGGCATTATTTTTCTC-3' β1tr-1 5'-CCAGGATCTGGATGCATACTG-3' and 5'-ATGTATAACATGAGGTCATGATGTAC-3' β1tr-2 5'-CCAGGATCTGGATGCATACTG-3' and 5'-GTATAACATGTGTCTCAATATATGATG-3' β1tr-3 5'-CATGATGAGGTGCTGGCGGATG-3' and 5'-GTATAACATGTGTCTCAATATATGATG-3'
Cycling parameters were 95°C for 2 min, followed by 30 cycles of 95°C for 30 s, 60°C for 20 s and 72°C for 60 s. PCR products were analysed on 1.5% agarose gels. The identity of selected products was confirmed by DNA sequencing. As a control, a fragment of β-actin was amplified using the primers 5'-CCACGAGACCACCTTCAACT-3' and 5'-CATTGTGAGGAGGGCAAAGT-3' for 28 cycles of 95°C for 30 s, 55°C for 40 s and 72°C for 60 s. Negative controls consisted of no cDNA template reactions.
APM and MJH conceived of the study. APM and JAM carried out the database searching, PCR reactions and sequence analysis, JHJ performed the phylogenetic analyses and ACG prepared RNA for RT-PCR analysis, APM and ACG carried out the RT-PCR analysis. All authors participated in the interpretation of data and/or in the writing of the manuscript.
DNA sequence alignment of truncated β1 paralogs. TGA stop codon is shown in bold font. The portion of the 3'UTR used to design reverse primers is shown boxed. PolyA tail region is shown underlined. Sequence identities are indicated by *.
Protein sequence alignment of zebrafish β1 sequences with human and Ciona β1 orthologs. Alignment was performed using Clustal X and is displayed using Boxshader. Levels of sequence conservation are indicated (>50% identical, red; conservative substitutions, blue). Note that the refined Ciona β1 sequence  is missing the signal peptide, the PSI domain and part of the hybrid domain.
JHJ and ACG are supported by BBSRC PhD studentships, AJLH by a fellowship from Cancer Research UK, and APM and MJH by The Wellcome Trust. We thank Paul Walker for the gift of recombinant Taq, and Drs Sally Stringer, Paul Walker and Patrick Buckley for helpful discussions.