The novel findings of this report are: (i) Extensive expansion of the β1 integrin gene family has taken place in zebrafish, (ii) a group of divergent β1 paralogs with regulatory site mutations is present, some of which are novel truncated forms that lack the transmembrane and intracellular domains, (iii) the different β1 paralogs exhibit varied patterns of temporal and tissue expression. Our findings raise several important questions. For example: Why do zebrafish have multiple versions of the single β1 gene found in higher vertebrates? What is the function of the divergent paralogs? Which α subunits does each paralog pair with?
Gene duplication is an important mechanism for the generation of phenotypic complexity, diversity and innovation. Gene duplication can occur either through tandem duplication of an individual gene, segmental duplication of a portion of a chromosome, or through whole genome duplication. Recent analyses have provided strong support for a whole genome duplication in the lineage of ray-finned fishes approx. 350 million years ago [34
]. Following this duplication, most of the gene duplicates were rapidly lost. However, a small proportion of these duplicates were retained. Hence, for approximately 20% of single-copy human genes, two corresponding orthologs are found in zebrafish [37
]. It is possible that β1–1 and β1–2 arose from the fish-specific whole genome duplication. Consistent with this possibility, β1–1 and β1–2 have different chromosomal locations and furthermore, both gene loci show synteny with the p11.22 region of human chromosome 10, which contains the β1 (ITGB1) gene (APM, unpublished data). Two co-orthologs of β1 are also found in other teleost species such as fugu [38
In the classical model of gene duplication [39
], new gene duplicates face one of two fates: either one copy mutates to a pseudogene (called nonfunctionalization), or one copy preserves the original function, and the other copy mutates freely until by chance it obtains a sequence that confers a new, beneficial, positively selected function (called neofunctionalization). In a third possibility, called subfunctionalization or subfunction partitioning, the complementary partitioning of ancestral subfunctions is the mechanism that preserves gene duplicates [37
]. Two classic examples of subfunction partitioning in zebrafish are the pair of sox9 genes, sox9a and sox9b [40
], and the two mitf genes, mitfa and mitfb [41
]. It is possible that ancestral β1 integrin functions have been partitioned between β1–1 and β1–2. Subfunction partitioning can occur through tissue-specific expression of gene duplicates and/or expression at different times of development. We found that there are differences in temporal expression of β1–1 and β1–2, with the expression of β1–2 peaking in early development, whereas expression of β1–1 was highest during later development. Also some differences in the tissue expression between these two paralogs were noted, with low expression of β1–1 in adult intestine. It is also possible that partitioning could take place if each β chain associates with a distinct subset of α chain partners. The possible subfunction partitioning of β1 roles by β1–1 and β1–2 may provide distinct advantages for functional genetic analyses in zebrafish. Whereas knockout of the single β1 gene in mice is very early embryonic lethal [9
], knockout (or knockdown) of only one of these genes in zebrafish may have a much less severe phenotype, making it possible to study the roles of β1 integrins during the later stages of development.
In contrast to β1–1 and β1–2, it seems likely that the divergent β1 paralogs arose through tandem duplications. Taking the genomic, phylogenetic and sequence identity data together, the most likely scenario is that the precursor of β1–2 underwent a tandem duplication that gave rise to the ancestor of β1–3; a subsequent duplication of this paralog gave rise to the progenitor of the truncated paralogs. It seems likely that the divergent paralogs have acquired novel functions. With the exception of β1tr-3, these paralogs are expressed at later times of development and in a more tissue-restricted manner than β1–1 and β1–2. It therefore appears that the divergent paralogs may be involved in tissue differentiation, maintenance or remodelling, rather than in development/organogenesis. The expression of many of these paralogs in tissues such as liver and kidney supports this view.
One clue to the function of the divergent paralogs is that they possess a double substitution (DD to SN) in the regulatory ADMIDAS cation-binding site in the A-domain. Preliminary results suggest that engineering an identical substitution in human β1 (in a heterodimer with α5 subunit) produces a constitutively inactive integrin, i.e., an integrin with very low affinity for ligand (J. Askari and A.P. Mould, unpublished data). It is therefore possible that β1–3 could act as a dominant-negative β1 subunit by competing with β1–1 and β1–2 for α subunit partners or intracellular binding factors. It is also proposed that expression of high levels of unoccupied integrins can trigger apoptosis [42
]; hence, expression of β1-3 could also regulate cell survival. If the truncated paralogs are unable to associate with α subunits they could be secreted into the extracellular environment. There are now many examples of cell surface receptors that also have a secreted antagonistic counterpart, e.g. VEGFR, FGFR-1 and IL-1R [43
]. It is therefore intriguing to speculate that truncated integrin paralogs could have a similar mode of action.
It is also possible that the divergent paralogs could have acquired new binding partners. The very high degree of sequence conservation of the PSI, hybrid domain and EGF-1 between the divergent paralogs suggest that these domains form a key functional region of these polypeptides, and furthermore that the function of these domains is common to all the divergent β1 paralogs. Such a degree of conservation suggests that these domains interact with protein ligands (even though there is currently no evidence in other integrins that these domains have extracellular ligands). The PSI, hybrid, and EGF-1 domains lie adjacent to each other at, or close to, the knee region of the integrin [15
]. Proteins that bind to these domains could therefore alter function, either by constraining the integrin in a bent (low affinity) form or by perturbing the interactions between PSI/hybrid/EGF-1 domains and other integrin domains, thereby favouring the unbent (high affinity) form. A key hinge point in the β subunit knee region is thought to be the near the start of EGF-2 [15
]. Interestingly, some of the truncated paralogs have a six amino acid deletion in this linker region (Fig. ), which is likely to change the conformational state of the protein, and may therefore modify function. Intriguingly, the second EGF repeat in β1tr-3 is essentially identical to that of β1–3, implying that this repeat may also have a conserved role between these two paralogs.
It is currently unclear how many separate genes are represented by the truncated β1 paralogs and ESTs. Although the sequences β1tr-1 and β1tr-2 are 98% identical to each other they are probably distinct gene products because they have different 3'UTRs. In addition, although β1tr-2 and β1tr-3 may have identical 3'UTR sequences, it appears unlikely that β1tr-3 is an alternatively spliced form of β1tr-2 because the differences between the two coding sequences are spread throughout the primary structures (Fig. and Additional file 1
). Additionally, the truncated paralogs cannot simply represent different alleles of the same gene because at least three of these paralogs could be amplified from RNA derived from a single fish (APM and Paul Walker, unpublished results). Furthermore, the genomic sequence that corresponds to ESTb (which is 98% identical in coding sequence to β1tr-2) does not contain sequences that would allow the generation of different transcripts by alternative splicing (data not shown). In summary, the currently available evidence suggests that the truncated paralogs originate from separate (yet to be characterised) genes. Since the genomic scaffold is assembled from shotgun sequences it is possible that, due to the almost identical sequences of the truncated paralogs, the scaffold is incorrectly assembled and additional genes of truncated paralogs may be present in the same region. In the future, the availability of finished clones in the region of the genome that contains 'ESTb' for detailed analysis should help to clarify the precise number of truncated β1 paralogs in the zebrafish genome. A further puzzle is why none of the truncated β1 sequences we amplified from zebrafish RNA exactly matched the ESTa or ESTb sequences. A possible explanation is that the β1tr-1, β1tr-2 and β1tr-3 sequences were only the predominant sequences obtained by subcloning of the RT-PCR products. Other clones obtained have not yet been fully sequenced, some of these may match the EST sequences more precisely.
Truncated forms of β subunits have previously been reported in sponge and man [7
]. However, these polypeptides represent alternatively spliced forms of full-length β subunits that are truncated within the hybrid domain, and therefore probably result in non-functional proteins [47
]. In contrast, the truncated β1 paralogs reported here appear to be unique gene products, and the position of the truncation would preserve the structure and function of all of the extracellular domains with the exception of the βTD.
A key subject of future investigations will be to identify the α subunit partners of the different paralogs. All twelve α chains that pair with β1 in higher vertebrates have orthologs in the zebrafish genome, and at least two of these α chains have two co-orthologs (APM, unpublished results). It seems very unlikely that all of the β1 paralogs identified here are able to associate with all of the >14 potential α chain partners. For example, if each of the six β1 paralogs has retained the ability to pair with all of the α chains then a minimum of 84 different heterodimers could form. This would present a huge increase in complexity over the twelve heterodimers found in higher vertebrates. It is likely, therefore that in this process of multimeric protein evolution, previously termed 'molecular incest' [48
], that only a limited number of possible combinations was retained based on function.
It is not yet clear if divergent β1 paralogs are found in other teleosts. Currently only limited gene information is available on other fish species, with the exception of the pufferfish Takifugu rubripes
and Tetraodon nigroviridis
. Analysis of the pufferfish genomes suggests that only two β1 paralogs are present in fugu and tetraodon; these paralogs are closely related to zebrafish β1–1 and β1–2 (AP Mould, unpublished results). However, since β1–3 probably arose from an ancient duplication of the ancestor of β1–2 it would be surprising if the former paralog was retained only in the zebrafish lineage. Furthermore, although the divergent β1 paralogs have not been detected in pufferfish genomes, pufferfish may have retained fewer gene duplicates than zebrafish [49
The presence of expanded gene families is hypothesized to have made a major contribution to the extraordinary diversity and evolutionary success of teleosts, which make up half of all vertebrate species. It is known that fish genomes are more 'plastic' than other vertebrate genomes, partly due to higher rates of gene duplication [50
]. Fish-specific novel members of gene families may contribute to a large extent to the distinct physiology of fishes and mammals, while differential retention of duplicate genes may have facilitated the isolation of emerging species during the vast radiation of teleosts [51
]. Our findings show that integrins are a dynamic family of genes that have evolved in multiple ways after the divergence of the common ancestors of the mammalian and fish lineages. Expansion of the integrin family may also correlate with expansion of extracellular matrix gene families, e.g. collagens, in teleosts [34