The nucleotide sequence and domain structure of human RPTPρ
The nucleotide sequence of hRPTPρ cDNA predicts a 1463AA polypeptide containing at least eight domains. The polypeptide is comprised of extracellular and intracellular segments. The extracellular segment contains a signal peptide (AA 1-25), a MAM (meprin, A5 (neuropilin), RPTPmu) domain (AA 32-191), an Ig-like domain (AA 210-266) and four FN-III repeats (AA 286-369, 382-471, 483-576 and 593-675). A potential proteolytic cleavage site is located at AA 632-635, in the fourth fibronectin repeat. The transmembrane segment is located at AA 765-785. The intracellular region contains a juxtamembrane 'wedge' region (AA 888-920), and two highly conserved phosphatase domains (AA 1061-1162 and 1351-1456). The 11 hallmark amino acids that define the catalytic core of the first phosphatase domain are located at AA 1104-1114. The stop codon is found after residue 1463 of the amino acid sequence.
Human RPTPρ genomic organization
We have determined that the region encompassing human RPTPρ is contained within 10 contiguous PAC clones and 1 BAC clone (dJ269M15, dJ47A22, dJ753D4, dJ914M10, bA32G22, dJ232N11, dJ3E5, dJ230I19, dJ81G23, dJ707K17, and dJ1121H13; Sanger Center, chromosome 20 group) (Figure ). We have ordered these clones by identifying RPTPρ exons within each of them. The RPTPρ gene spans a minimum of 1 Mbp, and the RPTPρ coding sequence is comprised of at least 33 exons, several of which are alternatively spliced. A prominent feature of the RPTPρ gene structure is the considerable variability of exon spacing (Figure ). Exons 1-19 extend over the initial ~ 1000 kbp of the gene; exons 1-10 are widely separated, while exons 10-19 are more closely spaced. Of particular note are introns 1 and 7, which are ~ 300 and ~ 200 kbp long, respectively, considerably longer than the next largest intron. In contrast, exons 20-28 and 29-32 form two tight clusters, which together span approximately 60 kbp. In general, this pattern of exon organization appears to be characteristic of most RPTPs, as it is also observed in RPTPγ [8
], LAR [9
], CD45 [10
] and RPTPα [11
]. Each of these phosphatases has at least one very large intron in the 5'-region of the gene. This feature is not restricted to receptor-like phosphatases as it is also present in a number of adhesion receptor genes, including E-cadherin, N-cadherin, P-cadherin, N-CAM, deleted in colorectal cancer (DCC), axonin-1 and F11 (discussed in [12
Figure 2 Genomic organization of the human RPTP ρ gene. Exons are shown as vertical bars and introns as thin horizontal lines. Thicker horizontal lines represent PAC (dJ) and BAC (b) clones (Sanger Centre, Chromosome 20 group) containing the RPTPρ (more ...)
The exon and intron sizes and exon/intron junctional sequences of the human RPTPρ gene are detailed in Table . The majority of 5' and 3' splice sites are consensus sequences. There is some variation in the length of exons, which range from 30 to 297 bp. Approximately one third of the exons are less than 100 bp, while the remaining two thirds are in the 100-300 bp range. Greater variation occurs in the size of the introns, which range from 725 to 303,715 bp. The largest number of introns (15) falls into the 104 to 105 bp bin, and somewhat fewer (12) fall into the 103 to 104 bp bin size. Only 5 introns lie outside this range: Three of these fall into the 102 to 103 bp range, and two unusually long introns in the extracellular domain are over 105 bp.
Table 1 Columns (left to right): Exon number, protein domain, exon size, exon/intron junctional sequences, and intron phases are shown. Amino acids (standard one-letter code) are listed below the coding nucleotides. D1 and D2 represent the first and second phosphatase (more ...)
The RPTP extracellular segment is comprised of protein domains; the borders of these modules correspond to the boundaries of exon-clusters. There are three possible junctional phases between exons and introns: Phase 0 refers to introns with junctions between the triplet codons, whereas phase 1 and 2 introns separate within the triplet after the first and second nucleotides, respectively. Figure shows the distribution of intron phases relative to the domain structure of RPTPρ. Within the RPTPρ gene, the number of phase 0 and phase 1 introns is comparable at 15 and 12, respectively. In contrast, there are only five phase 2 introns in the entire gene. A notable feature of RPTPρ gene structure is that phase 1 introns appear to be preferentially associated with the extracellular segment, where they flank each of the protein domain exon modules. The intracellular segment is almost devoid of phase 1 introns. In contrast, phase 0 introns are primarily associated with the intracellular segment, and are only infrequently represented in the extracellular region.
Figure 3 A. Relationship between RPTPρ protein domains, corresponding exons and associated intron phases. Downward arrows indicate intron phases. Protein domains (center) show good correspondence with exon boundaries (bottom line). B. Percentage nucleotide (more ...)
Recently, RPTPs have been examined in sponges [13
] the phylogenetically oldest extant metazoan. Although sponges are multicellular organisms, they lack the cellular cohesiveness of the higher eukaryotes. When RPTPs from yeast, sponge and human were aligned and rooted cladograms constructed, the common early ancestor of the phosphatase domains appeared to be yeast. The second phosphatase domain arose as a duplication of the first [13
]. The RPTP extracellular domain was acquired during the transition from single-celled to multicellular organisms. In RPTPρ, the extracellular and intracellular exon modules are separated by phase 1 and phase 0 introns, respectively. Furthermore, intracellular introns are much smaller than those in the extracellular segment. Together, these observations suggest that the RPTPρ extracellular and intracellular segments originated as separate modular proteins that evolved by exon shuffling and duplication, respectively [13
]. The two segments became linked to form a functional transmembrane molecule during the transition from single to multicellular organisms.
Over fifty percent of the human genome is comprised of repeat sequences [16
], making it the first repeat-rich genome to be sequenced. Analysis of these numerous segments can provide important indications of the evolutionary history of a particular region, or gene. Transposon-derived elements form the largest category of repeats, and include LINEs, SINEs, LTRs and DNA elements. In the RPTPρ gene, the most common of these are: LINE1 (7.6%) and LINE2 (2.0%); the SINEs Alu (4.2%), MIR (3.6%) and THE (0.65%); LTR (0.7%); and the DNA elements MLT (2.5%), MER (2.5%), and MST (0.5%). Less common elements found in the RPTPρ gene include Tiggers in introns 2, 7 and 9 (0.5%), HAL in introns 2 and 7 (0.28%), MAD in introns 1 and 16 (0.013%), and U2 in intron 2 (0.006%). There is also a Charlie repeat in intron 7 (0.005%). In addition to the transposon-derived repeats, there is a pseudogene in intron 7, a tRNA-derived repeat in intron 30, and 133 variable length nucleotide tandem repeats (VNTRs/ microsatellites) found in the gene. The G/C content of the RPTPρ gene is approximately 42%. Descriptions of the above repeat elements may be found on Repbase at http://www.girinst.org./
The overall percentage of the RPTPρ gene comprised of repeat sequences is lower (by 45%) than that of the entire human genome. In the human genome, LINEs comprise 21% of repetitive sequences, SINEs 13%, LTRs 8%, and DNA elements 3% [16
]. In RPTPρ, LINEs comprise 9.6% of repetitive sequences, SINEs 8.4%; LTRs 0.7%; and DNA elements 6.3%. The significance of this deviation in RPTPρ from the normal range is unknown.
cDNA cloning and genomic structure of mouse RPTPρ
The mouse RPTPρ cDNA was cloned using a combination of PCR and 5'-RACE. The mouse cDNA (Genbank accession #AF152556) encodes a 1451AA polypeptide that is 96% identical to that of the human protein and predicts an analogous domain structure (Figure ). The Celera Discovery System mouse genomic database was used to identify clones containing RPTPρ exons. These clones were then ordered and analyzed to identify exon/intron junctions. Exon and intron sizes, exon/intron junctional sequences, and intron phases of the mouse RPTPρ gene are shown in Table . In general, the exon/intron splice sites in the mRPTPρ correspond to expected GT-AG intron consensus splicing sequences, and the intron phases in mouse (Table ) are identical to those in the human gene (Table ). Although the two species share approximately 89% nucleotide identity overall, when examined exon by exon, the degree of identity varies slightly between the extracellular and intracellular segments (Figure ). The overall identity of the mouse and human extracellular and intracellular segments is 89% and 92%, respectively. In general, there is slightly greater variance between the two species in the extracellular segment; for example, mouse and human exons 1 and 9 share 78% and 95% identity. Within the intracellular segment, mouse exon 21 is 86% identical to that of the human, and exon 24, which contains the first half of the catalytic core, is 96% identical. Notably, the alternatively spliced exons 14, 16 and 22a (discussed below) are 100%, 97% and 95% identical, respectively, indicating a high degree of conservation between mouse and human. In summary, the mouse and human genes are virtually identical in terms of the number and size of exons, and the exons differ only slightly with respect to the nucleotide sequence.
Table 2 Columns (left to right): Exon number, protein domain, exon size, exon/intron junctional sequences, and intron phases are shown. Amino acids (standard one-letter code) are listed below the coding nucleotides. D1 and D2 represent the first and second phosphatase (more ...)
Exon/intron organization of the RPTPρ extracellular segment
The relationship between RPTPρ exon organization and protein domain boundaries is shown in Figure and in Tables and . Within the extracellular segment, exon 1 encodes the signal peptide, and exons 2, 3 and 4 encode the single N-terminal MAM domain, a distinguishing feature of all type IIB phosphatases. Although the function of the RPTPρ MAM domain is unclear, other type IIB phosphatases have shown homophilic binding properties: When heterologously expressed in non-adherent cells, both RPTPμ and RPTPκ bind homophilically to induce the formation of large, calcium-independent aggregates [17
]. Furthermore, when the RPTPμ MAM domain was deleted, aggregation was eliminated [19
], implying that the domain had a crucial role in homophilic cellular interactions.
The three RPTPρ MAM exons differ widely in size: 126 bp (exon 2), 272 bp (exon 3) and 82 bp (exon 4). All MAM-associated introns are in phase 1, with the exception of the second internal intron, which is in phase 0. MAM domains have been identified in a variety of cell adhesion molecules. We have determined the exon structure of the MAM domain in all four human RPTP IIB genes, and in human zonadhesin and human enteropeptidase (NCBI database). The genomic organization of the MAM domain in all four IIB phosphatases is identical. In all RPTP IIB proteins (Genbank #NM 002844; NM 002845; NM 005704; NM 007050) and in human zonadhesin (Genbank #AF312032) there is a MAM domain at the N-terminus, the genomic structure of which is highly conserved. In zonadhesin, there are two additional and adjacent MAM domains. The genomic organization of the latter two domains differs from that of the first. The single MAM domain in the human enteropeptidase gene (Genbank #Y19124) is more internally located than that of RPTPρ, close to the transmembrane region. It is comprised of four exons that are 150, 135, 89 and 125 bp in length, and is unlike any of the IIB and zonadhesin MAM domains. In summary, all known MAM domains are located within the extracellular segment, but within this region, their location, exon number and exon size can vary considerably. The size and structure of exons comprising the most N-terminal MAM domain appear to be unique. Because the nucleotide sequence of the RPTPρ MAM domain predicts a protein similar to that found in the other type IIB RPTPs, it might be expected that the RPTPρ MAM domain also participates in homophilic interactions, as was shown for RPTPμ [19
Adjacent to the MAM domain, the single Ig-like domain is split into two similarly sized exons (5 and 6) by one intron in phase 0 (Figure ). Introns flanking the Ig-like domain are in phase 1. In the majority of genes encoding Ig-like domains, only one exon encodes each domain, while in others such as N-CAM, two exons encode each domain [20
]. The single Ig-like domain of the RPTPρ gene falls into the latter category, suggesting a closer relationship to N-CAM-like molecules. LAR has characteristics of both groups [9
], a feature which it shares with several other genes, such as perlecan [21
] and DCC [22
]. Within the RPTP IIB family, the Ig-like domain appears to act in conjunction with the MAM domain to bring about homophilic cell-cell interactions [23
Following the Ig domain are four FN-III repeats (Figure ), each of which begins with a highly conserved proline residue. FN-III domains are found in a wide range of proteins, and recently, have been shown to be involved in retinal axon target selection [24
]. As a general rule, FN-III domains are encoded either by 1 or 2 exons [25
]. Within genes that encode multiple FN-III domains, exon organization may be of one type, or a combination of the two types. For example, N-CAM has 2 exons for each FN-III domain [26
], whereas tenascin [27
] and LAR [9
] have a mixture of both types. In the RPTPρ gene, there is a good correlation between exon structure and FN-III boundaries (Figure ), although there is some variation in the number of exons per domain: Each of the first two FN-III repeats is encoded by a single exon (exons 7 and 8, respectively). In contrast, the third FN-III repeat is encoded by two exons (9 and 10). Somewhat atypically, the fourth FN III repeat is encoded by three exons (11, 12 and 13). This domain contains a putative proteolytic cleavage site. RPTPρ FN-III repeats share high sequence similarity with those of N-CAM, but only the third FN-III domain in RPTPρ is encoded by two exons. In contrast to the type IIA phosphatase LAR, the RPTPρ gene does not contain exons encoding more than one fibronectin domain; however, like LAR, it has a FN-III domain encoded by three exons.
In the majority of known cases, the exon/intron junctions corresponding to the FN-III domain boundaries are in phase 1. When two exons encode a FN-III domain, an intron interrupts the coding region in a central, relatively non-conserved, part of the domain, and the exon/intron junction may be in any phase. In the RPTPρ gene, introns separating the individual FN-III repeats are in phase 1; the intron internal to the third repeat is in phase 0, and introns internal to the fourth FN-III repeat are in phase 2 and 0, respectively.
Exon/intron organization of the RPTPρ intracellular segment
Following the transmembrane segment (exon 15), exons 16-18 encode the juxtamembrane region (Figure , Tables and ). This segment of the RPTPρ protein is similar to the membrane proximal region in the type IV phosphatase, murine RPTPα, for which the crystal structure has been determined [28
]. RPTPα exists as a dimer in which the catalytic site of one molecule is blocked by contact with a 'wedge' from the other. Specifically, the 'turn' part of the helix-turn-helix motif is inserted into the active site, which maintains the WpD loop in the open state [28
]. In other phosphatases [29
], the WpD loop undergoes a conformational shift upon substrate binding, which appears to be crucial for catalysis. Thus, it is very likely that the dimeric form of RPTPα is unable to bind tyrosine-phosphorylated substrates, rendering it catalytically inactive. The negative charge of two adjacent residues within a highly conserved sequence in the juxtamembrane region appears to be crucial for inhibition [28
]. In RPTPα, these two residues are negatively charged aspartates. In type IIB RPTPs, the first residue is changed to an alanine in PCP-2 and RPTPμ, and to a serine in RPTPκ . The second residue is retained as either a glutamate in PCP-2 and RPTPμ, or an aspartate in RPTPκ . These single amino acid changes may indicate a somewhat weaker level of inhibition. This is supported by the examination of the crystal structure of RPTPμ, which shows that although a wedge is formed, catalytic activity is not inhibited by its insertion into the active site on the adjacent monomer [31
]. However, in the case of RPTPρ, the first residue is a glycine, and the second is the large basic residue, glutamine. Thus, the RPTPρ juxtamembrane catalytic region is likely to have a different conformation to that of the other phosphatases and a net positive charge, making the regulation of phosphatase activity by dimerization-induced wedge inhibition unlikely.
Although the extracellular regions of receptor-like phosphatases are highly variable, the intracellular tandem phosphatase domains appear quite closely related. The structure of the CD45 gene indicates that both protein tyrosine phosphatase (PTPase) domains have a very similar exon/intron organization, which probably arose by duplication [10
]. In RPTPρ, the first and second phosphatase domains are encoded by exons 19-26 and 27-32, respectively (Figure ). The exon structure of the RPTPρ phosphatase domains, and that of homologous domains in PCP-2 (NM_005704), RPTPκ (NM 002844), RPTPμ (NM 002845), LAR [9
], CD45 [10
] RPTPα [11
], RPTPγ [8
] and rat Esp/mOST-PTP [32
], are compared in Figure . We have deduced the genomic structure of RPTPκ, RPTPμ and PCP-2 by comparing known cDNA sequences with human genomic clones (NCBI). The positions of the exon boundaries in the phosphatase domains of RPTPρ, RPTPκ, RPTPμ and PCP-2 coincide exactly, and correspond well with the five other phosphatases. LAR is somewhat anomalous in that, although the exon/intron structure of the second phosphatase domain is generally similar to that of the other RPTPs, exons in the first phosphatase domain are fewer in number, but greater in size. The final exon in all nine genes encodes the end of the second phosphatase domain, the short C-terminus and the entire 3'-untranslated region.
Figure 4 Genomic organization of the two phosphatase domains in nine RPTPs. Boxed numbers indicate the number of nucleotides in each exon; interconnecting horizontal lines represent introns (neither are to scale). Note that exon 22a is not shown in order to preserve (more ...)
A striking similarity among the RPTP genes is the conservation of exon/intron junction 24/25 in the first phosphatase domain. In LAR, CD45 and RPTPα, this junction interrupts the highly conserved sequence VHCSAGV, part of the catalytic core of the phosphatase [34
]. Although this exon/intron junction in the IIB phosphatases corresponds exactly, there is a change in the last amino acid from a valine to an alanine. Interestingly, an exon/intron junction is not observed at this position in the cytoplasmic PTPase PTP1B [36
], an observation that may indicate an early evolutionary divergence of the cytoplasmic and transmembrane PTPases [37
Although the exon/intron structure of the two phosphatase domains was remarkably similar in each of the nine RPTPs examined, there were variations in exon size and number, primarily in those close to the transmembrane domain. For example, the third exon (135 nt) in the first phosphatase domain of rat Esp/mOST-PTP and RPTPγ is replaced by two smaller exons (37 and 98 nt) in RPTPα, CD45, RPTPρ, PCP-2, RPTPκ, and RPTPμ . Two smaller exons replace a single exon at the C-terminal end of the first phosphatase domain of rat Esp/mOST-PTP. Similarly, at the start of the second phosphatase domain, the first exon (174nt) in RPTPρ, PCP-2, RPTPκ, RPTPμ and LAR is replaced by two smaller exons in rat Esp/mOST-PTP, RPTPα, RPTPγ and CD45. In each case, the total number of nucleotides in the two smaller exons is virtually identical to that of the single larger exon at the same position. It is unclear whether these changes in exon number resulted from intron gain or exon fusion.
RPTPρ 3' untranslated region
Following the second phosphatase domain, there is a long (8.0 kb) 3' untranslated sequence. BLAST comparisons identified a region on the KIAA0283 gene (Genbank accession #AB006621) that showed 99% identity to nucleotides 3181 to 4437 of the hRPTPρ sequence. Thus, the 3'-UTR of hRPTPρ, which is contained in exon 32, was identified as KIAA0283. Polyadenylation signals were found at 12425 nt and 12663 nt (NM_007050).
Alternative splicing of mouse and human RPTPρ genes
Comparison of the four RPTP type IIB (RPTPμ, RPTPκ, RPTPρ, PCP-2) nucleotide sequences predicted that, at least, two exons (14 and 16) are likely to be alternatively spliced. In addition, the presence of a segment (AA 826-850) in xenopus RPTPρ that is absent in the majority of other type IIB RPTPs, raised the possibility of an alternatively spliced exon between exons 17 and 18. Human fetal brain, mouse neonatal brain, and several regions (cortex, forebrain, brainstem, and cerebellum) of adult C57BL/6 mouse brain were examined for the presence of alternatively spliced regions. PCR primers were designed to amplify the regions encapsulating exons 14 and 16, and the region between exons 17 and 18. An additional region between exons 22 and 23 was also examined. The identity of all PCR products was verified by sequencing.
The RPTPρ exon 14 primers yielded two products of 257 and 200 bp (Figure and ), indicating a 57 nt alternatively spliced region at 2177 to 2233 nt. This 19AA segment is encoded by exon 14. Both splice forms were observed in human fetal, and in neonatal and adult mouse brain mRNA. We have obtained similar results for RPTPμ (data not shown), in which exon 14 was reported to be absent (NM_002845). The RPTPρ exon 16 primers yielded two bands of 356 and 326 bp (Figure and ). This indicates an additional 10 AA alternatively spliced region, located between the transmembrane and the first phosphatase domain (2370-2399 nt). Both transcripts were present in mouse and human brain, and were observed in all brain regions analyzed. PCR of the same region in RPTPμ yielded only one product that did not contain the exon 16 sequence (data not shown). A third alternatively spliced exon (22a) was identified in the first phosphatase domain between exons 22 and 23. Exon 22a was inserted after nucleotide 3172 in mouse, and after nucleotide 3232 in human RPTPρ, predicting an additional alternatively spliced region 20 AA in length. In each case, primers yielded two bands of 93 and 152 bp (Figure and ) in all brain regions examined. It remains to be determined if other members of the type IIB subfamily also contain this exon, or whether the region is unique to RPTPρ.
Figure 5 Alternative splicing of exons 14 and 16. RT-PCR products were amplified using primers flanking exon 14 (panels A and B), exon 16 (panels C and D) and exon 22a (panels E and F). Left panels: bands in lanes 1, 2, and 3 are from human fetal brain, mouse (more ...)
Comparison of xenopus, mouse and human type IIB RPTP nucleotide sequences indicated the possibility of a fourth alternatively spliced region located 3' to exon 17, within the wedge domain. This 75 nt segment is present in the reported sequence of human RPTPμ (2445-2520 nt) and in xenopus RPTPρ (2448-2523 nt). It is absent in the reported sequences of human and mouse RPTPκ, RPTPρ and PCP-2. The exon 17/18 primers were designed to amplify two potential products of 209 and 134 nt. However, only a single product of 134 nt was observed in human and mouse brain regions (data not shown). This sequence appears to be unique to human RPTPμ and xenopus RPTPρ and is unlikely to represent an alternatively spliced exon in any of the RPTP IIB genes.
Both splice variants of exons 14, 16 and 22a were present in human and mouse brain, at all ages and in all brain regions examined. Although the RPTPρ protein products encoded by the alternatively spliced exons do not appear to encode any known motifs, different isoforms of the phosphatase, with as yet unknown functions, are likely to be present. Alternatively spliced isoforms of the related RPTPs, LAR [38
] and RPTPβ/ζ [39
], are spatially and temporally distinct in the central nervous system, and there is evidence that alternatively spliced exons can influence ligand binding, as is the case with LAR [9