We sought to characterize the structure and expression of the APBA2 gene, as a prelude to analysis of this gene as a possible autism candidate. We obtained a bacterial artificial chromosome (BAC) clone (686I6) containing genomic sequence of the APBA2 transcriptional unit. We also obtained yeast artificial chromosome (YAC) clones covering the interval between the 3' end of GABRG3 and OCA2 (962D11; contig WC-153) and to which an APBA2 EST (Wl-14097) and nearby markers had been mapped (764C6; contig WC-699). STS content mapping demonstrated that APBA2 sequences and other markers are present within the BAC 686I6 and YAC 764C4 but absent from the YAC clone covering the GABRG3-OCA2 interval (Figure ). Thus, the previous map position appeared to be incorrect. Based on STS marker content in the 68616 clone, APBA2 maps to a more distal location in 15q13. The revised map position places APBA2 outside of the most narrow interstitial duplication interval and the area most supported by linkage data but within the larger idic(15) duplications. Current genomic sequence assemblies now agree with this experimentally-determined data.
Figure 2 Physical mapping of APBA2 and PCR-RFLP analysis of APBA2 duplications. A. An expanded view of the 15q12-q14 region containing distal duplication breakpoints is shown. Breakpoints BP3 and BP5 are depicted as jagged, hatched structures; BP4 is not thus (more ...)
To facilitate direct screening of APBA2, we determined the gene structure and developed exon-specific PCR assays. We characterized genomic structure using a combination of existing genomic sequence and direct sequencing from BAC clone template. APBA2 is encoded by 14 exons; 12 exons are coding and 2 exons contain 5' untranslated sequence. All exons are now accurately predicted in the NCBI (Genbank: NT_010363) and Celera assemblies, while the EnsembI assembly lacks the putative first exon. A sequence containing this exon has been deposited in GenBank (accession: AY228760). Junctions for most exons were confirmed by direct sequencing from RP-11 BAC 330K3 template, which also contains the APBA2 transcriptional unit. Table itemizes intron-exon boundary information. Primer sequences and conditions for PCR amplification of coding exons (3–14) are presented in Table . APBA2 is transcribed towards the telomere.
In the course of analyzing the structure of APBA2
, we noted that a large 5' exon containing the initiating methionine codon was present not only in the 686I6 BAC but also highly homologous to sequences from other BAC clones with non-overlapping STS content. In silico
STS content mapping revealed two distinct pairs of BAC clones containing the partial duplications (Figure ). One of these (RP-11 clones 602M11 and 122P18) also contains a partial duplication of a pseudogene containing the 3' end of the neuronal nicotinic receptor α 7 subunit (CHRNA7
) gene [24
]. This pseudogene locus maps to a site telomeric to APBA2
in 15q13. The second group of BACs (RP-11 438P7 and 1 H8) contained the partial APBA2
duplication and the K+
or KCC3) gene. The duplications include the entire 1-kb exon and ~5-kb of downstream sequence, however this sequence is interrupted in the intact locus by an ~9.7-kb non-duplicated interval. The first block of homology includes exon 3, ~100 bp of sequence 5' and 1,045 bp 3' to the exon. Following the homology gap, there is an additional 3,945 bp of duplicated sequence. Furthermore, an additional 15 kb region in both sets of BACs containing partial APBA2
duplications is highly homologous to a region approximately 25 kb upstream of the presumptive first exon of APBA2
. Current sequence assemblies show the correct map location for the intact locus and show the map location of these dupAPBA2 sequences at ~5 Mb telomeric.
BLAST analysis of BACS containing the partial APBA2
duplications revealed duplicon-like sequences, based on significant homology to a large number of BACs from known duplicons. The largest number of similarities detected (>30) corresponds to an interval of ~15 kb located ~15 kb centromeric to the APBA2
duplications. However, there is highly significant homology over ~100-kb of sequence from assembly contig NT_035325 mapping to 15q26, where LCR-15 duplicon sequences have been reported [18
]. These sequences appear to correspond to the LCR15 class of duplicons, based on sequence content. BLAST analysis of sequence at the intact locus revealed a similar low copy repeat or duplicon-type sequence of ~1.3-kb. More than 80 copies of this sequence are present at sites across the genome on every chromosome, and with an apparent clustering at telomere locations. This sequence, while short, does not correspond to any of the known chromosome 15q11-q13 duplicon sequences, and therefore could represent a novel class of such low copy repeats. This sequence has been deposited in GenBank (AY237156). It is worth noting that the presence of multiple duplicons and repeated sequences has significantly complicated genomic sequence assembly for this region.
Since the partial duplications of APBA2 contained the first coding exon and the transcription start site(s) for APBA2 have not been characterized, we sought to determine whether these copies were transcriptionally active. We developed restriction fragment length polymorphism (RFLP) assays to detect sequence differences discriminating the intact APBA2 locus from the partial duplications. These assays were used following PCR from genomic or BAC DNA and cDNA from adult and fetal human brain (Figure ). Distinct "fingerprint" patterns are apparent for the intact locus and the duplications. The sequence differences used in Figure that distinguish the duplications from the intact locus do not distinguish the duplications from each other. This distinction was made based on in silico analysis of numerous sequence differences, clone-sequence assembly and clone-marker/gene content relationships. The pattern in cDNA samples was identical to that in clone 686I6 but not either of the two duplications. These data argue that the only transcript present in brain corresponds to the intact locus, and that the two partial duplications are therefore not transcribed.
Immediately telomeric to APBA2
lies another gene (KIAA0574; see Figure ), which was initially identified from a sequencing project of large, brain-derived cDNAs [25
]. KIAA0574 encodes a protein of unknown function, and is transcribed in the opposite orientation relative to APBA2
. BLASTP using the KIAA0574 predicted amino acid sequence detects weak homology (35% identity, 64% similarity over 69 residues) to a protein termed X123, the gene for which is located in the 9q12.21 Freidrich's ataxia region. The X123 gene in turn maps immediately adjacent to APBA1
(alias Mint1 or X11α). APBA1 and APBA2 are highly homologous for the C-terminal half of their respective sequences (90% similar, 84% identical). The N-terminal half, corresponding to the duplicated exon, is only weakly similar (36% similar, 30% identical). Thus the distinction between the APBA1 and APBA2 proteins corresponds to exon 3-encoded residues. Despite the comparatively weak homology between the KIAA0574 and X123 predicted peptides, this scenario suggests a clear evolutionary relationship between these gene pairs. The propensity of the 15q11-q13 region to undergo rearrangement may have played a role in the evolution of this gene family.
The distribution of APBA2 brain expression was determined using in situ hybridization and northern blotting. Commercial brain northern blots were hybridized with a cDNA probe corresponding to exon 3. Such a probe allows us to discriminate between APBA2 expression and potential signal from APBA1. Northern blotting revealed a predominant moderate-abundance transcript of ~4.2-kb widely distributed throughout the brain and spinal cord (Figure ). Several smaller transcripts were also seen. A cDNA clone for the mouse ortholog of APBA2 was obtained and a similar probe was used to test for expression using in situ hybridization to mouse brain sections. Figure shows representative coronal and sagittal mouse brain sections. Apba2 demonstrates moderate expression in mouse cortical and limbic regions including frontal, parietal, and temporal cortex, hippocampus, amygdala, thalamus, and cerebellum and lower level expression in many other regions summarized in Table . A developmental expression series illustrating expression from embryonic day 8 through day 15 (E8-E15) is shown in Figure . Early expression in the primitive neural tube emerges at day E10. Apba2 expression extends throughout the neural tube during days E11 and E12, but is apparently restricted to the primitive brain vesicles during days E13 and E14. Within the brain, distribution is ubiquitous and uniform. In addition to diffuse expression throughout the brain, by day E15, a punctate pattern appears around the spinal cord, corresponding to expression within the dorsal root ganglia.
Figure 3 Expression analysis of APBA2/Apba2. A. Northern blot analysis of APBA2 in human brain. B. Relative expression of Mint2 in adult mouse brain by in situ hybridization. Mint2 shows highest expression in cortical and limbic regions, including limbic nuclei (more ...)
Apba2 in situ hybridization summary
Figure 4 Developmental expression profile of Apba2. Apba2 expression in the primitive neural tube begins by embryonic day 10 (E10). Expression throughout the neural tube is seen in day E11 and E12, but appears to be restricted to the primitive brain vesicles by (more ...)
The availability of emerging mouse genomic sequence allowed us to examine sequence conservation across the transcriptional unit. We submitted human and mouse genomic sequences to the VISTA web site http://www-gsd.lbl.gov/vista/
and homology is plotted and shown in Figure . Murine genomic sequence for exons 1 and 2 was not available, and the comparison was made for the remainder of the transcriptional unit. We would expect coding sequences to be conserved, but regions of non-coding conservation, which could harbour potential functional (conserved) sequences, are of particular interest. Several such regions are present within the APBA2
Figure 5 Comparative sequence analysis of human and mouse APBA2 using VISTA. Output from VISTA analysis of the ATP10C transcriptional unit is shown, with regions of non-coding sequence conservation (>75% identity) indicated by pink shading and coding homology (more ...)