|Home | About | Journals | Submit | Contact Us | Français|
Segmental copy-number variations (CNVs) may contribute to genetic variation in humans. Reports of the existence and characteristics of CNVs in a large Japanese cohort are quite limited. We report the data from a large Japanese population. We conducted population screening for 213 unrelated Japanese individuals using comparative genomic hybridization based on a bacterial artificial chromosome microarray (BAC-aCGH). We summarize the data by focusing on highly polymorphic CNVs in ≥5.0% of the individual, since they may be informative for demonstrating the relationships between genotypes and their phenotypes. We found a total of 680 CNVs at 16 different BAC-regions in the genome. The majority of the polymorphic CNVs presented on BAC-clones that overlapped with regions of segmental duplication, and the majority of the polymorphic CNVs observed in this population had been previously reported in other publications. Some of the CNVs contained genes which might be related to phenotypic heterogeneity among individuals.
Segmental copy-number variations (CNVs), involving the gain or loss of several hundreds of bases to several hundred kilobases (kb) of the genome, can be an important source of genetic variation among human populations of different ethnic groups as well as among individuals. This heterogeneity may contribute to noted phenotypic variations and different susceptibilities to various diseases. Molecular genetics analyses and cytogenetic analyses have provided significant information about these variations in the human genome, specifically as they relate to disease, such as cancer, and to congenital malformation (see [1, 2], and review in ). Following the development of methodologies and the introduction of new research platforms [4–9], information regarding the nature and pattern of CNVs from representative populations have accumulated. Examinations of a relatively large number of individuals from various specific ethnic groups have recently been conducted using different array platforms, such as BAC-arrays [10–13], oligo-arrays [14–16], and others . The results are not always consistent and it is likely that different human populations bear different CNVs. The numbers of Japanese individuals examined to date are not so large compared to the studies for other ethnicities . Polymorphic CNVs have received considerable attention since they might play an important role in the etiology of common diseases. Therefore, more data regarding CNVs should be accumulated from Japanese populations. In this report, we focus on CNVs which were observed at a high frequency (≥5.0% of the individuals) in the population residing in Hiroshima and Nagasaki, Japan by aCGH with BAC-clones as targets.
In the study, the population studies were conducted at two stages: Stage (1): 80 unrelated Japanese individuals were examined using BAC-aCGH with an array having 2,241 BAC clones, and Stage (2): 133 unrelated Japanese individuals were examined using BAC-aCGH that contained 2,622 BAC-clones.
The majority of the clones used in Stage (1) of this study were selected from the set of cytogenetically mapped P1-artificial chromosome (PAC) clones and bacterial artificial chromosome (BAC) clones reported by the BAC Resource Consortium  and obtained from either the Children's Hospital Oakland Research Institute (Oakland, CA, USA) or from Invitrogen Inc., Co. (Carlsbad, CA, USA). In Stage (2), in addition to BAC clones used in Stage (1), an additional 381 BAC clones were used, a majority of which were collaboratively obtained from Dr. N. Matsumoto of Yokohama City University. The 2,241 clones of chromosomal fragments from chromosome 1 to chromosome 22 were used in Stage (1) and 2,622 clones were used in Stage (2), respectively. That is, the additional 381 BAC clones were examined for only 133 unrelated individuals in Stage (2). Those clones are distributed every 1.2Mb across all of the human autosomes in Stage (1), and 1.1Mb in Stage (2), respectively. In addition to autosomal clones, four kinds of X-chromosomal clones were used as internal references. With respect to examination for Stage (1), three sets of arrays were constructed and imprinted: slide no.1, consisted of 698 clones on chromosomes 1 to 4; slide no.2 consisted of 718 clones on chromosomes 5 to 10, plus two BAC clones on chromosome 3, and six clones on chromosome 4; and slide no.3 consisted of 817 clones on chromosomes 11 to 22 . For Stage (2), all of the clones were printed onto one glass slid.
The genomic DNA samples used in this study were principally the same as those used in a previous study . The DNA samples used for reference purposes were extracted from mononuclear cells of two physically and clinically normal volunteers (a 57-year-old Japanese male and a 54-year-old Japanese female). The DNA used for testing and analyses of this population was extracted from lymphoblastoid cell lines obtained from the offspring of atomic-bomb survivors. High molecular weight genomic DNA was isolated using conventional methods as described in detail elsewhere . Lymphoblastoid cell lines were derived from a cryopreserved archive of approximately 1000 families consisting of father, mother, and offspring from Hiroshima and Nagasaki for whom permanent cell lines have been established by Epstein-Barr (EB) virus transformation of peripheral B-lymphocytes. The composition of the families has been reported elsewhere . Three hundred five offspring were initially screened. Since the offspring include some siblings, we selected one representative offspring to construct “unrelated individuals” to avoid double counting of polymorphic CNVs from families containing two or more siblings. We selected the offspring who first visited our institution for donating blood rather than other siblings. The 213 offspring selected as unrelated individuals included 124 offspring from Hiroshima and 89 from Nagasaki. The individuals gave their informed consent prior to the study. The Institutional Review Board of our Foundation approved this study.
Cloned DNAs for microarray targets were isolated from bacterial cultures using NucleoBond BAC 100 (NIPPON Genetics, Tokyo). With respect to Stage (1), DNA was digested by NotI followed by phenol-chloroform-isoamyl alcohol (25:24:1) extraction and ethanol-precipitation. On the other hand, in Stage (2), cloned DNA was digested with MseI followed by phenol-chloroform-isoamyl alcohol (25:24:1) extraction and ethanol precipitation. The fragmented DNAs were amplified by ligation-mediated PCR carried out as described by Snijders et al. . The target DNAs (0.5μg/μL) were dissolved in 50%-dimethylsuloxide and printed in triplicate onto the glass slides (Matsunami Glass Co. Ltd.) using the Affymetrix 417 Arrayer (Affymetrix).
The screenings of both stages were conducted following the procedures described previously . In brief, for labeling DNA, test and reference genomic DNA (1.25μg each) was cut by BamHI, and labeled by a random priming method with Cyanine-5- and Cyanine-3-labeled dUTP (Cy5- and Cy3-dUTP; PerkinElmer Life Sciences, Wellesley, MA, USA). The labeled probes were mixed and centrifuged with Microcon column (Millipore Co., Bedford, MA, USA) to purify the probes. Subsequently, human CotI DNA (120μg; Roche Diagnostic GmbH, Mannheim, Germany) was added to the column, and recentrifuged. After the volume of the mixture became less than 20μL, it was transferred to microtubes with 100μL of hybridization solution (50% formamide, 10% dextran sulfate, 1% Tween 20, 2 × SSC, 10mM Tris-HCl [pH 7.4] and 800μg of yeast t-RNA [Invitrogen, Carlsbad, CA, USA]). The hybridization mixture was then denatured at 70°C for 10 minutes, and subsequently incubated at 37°C for at least five hours to block repetitive sequences of the labeled probes.
Prehybridization was conducted in order to block repetitive sequence binding of target DNA on the arrays, and to prevent nonspecific binding of probe DNA to the targets. Following the initial incubation (overnight at 37°C), the prehybridization solution was removed, and fresh hybridization solution with Cy-labeled DNA (prepared as described above) was added. Again, hybridization processes were carried out. The prehybridization and the hybridization were conducted with continuous mixing. After hybridization, the arrays were washed by the procedures reported previously . All of the procedures were conducted using the GeneTAC Hybridization Station (Genomic Solutions Inc., Ann Arbor, MI, USA).
Fluorescent images of the hybridized arrays were obtained using a ScanArray 5000 confocal laser scanner (PerkinElmer Life Sciences). ArraySuite (Scanalytics Inc., Fairfax, VA, USA) in Stage (1) and Gene Pix (Axon Instruments, Sunnyvale, CA) in Stage (2), respectively, were used to quantify the fluorescence of each spot on the array images. We then used software specifically developed to perform the following three measurements.
EB-transformed lymphoblastoid cells established from the offspring were embedded in agarose plugs. DNA in the plugs was cleaved by restriction enzymes (PacI, or Sse8387I). The resulting fragments were then separated by pulse field gel electrophoreses (PFGE) on 1% Pulse Field Certified Agarose (BioRad, Hercules, CA, USA) with × 0.5 TBE (Tris-Borate-Ethylenediaminetetraacetic acid). Using the CHEF-DR II system (BioRad) , electrophoresis was carried out using 6V/cm at 14°C for 22hrs. The angle of pulse was 120°. The switch time was used: ramped from 0.3 seconds to 15 seconds.
Southern blot analyses were carried out using conventional, well-described procedures . In brief, after completion of PFGE, the DNA in the gel was cleaved by UV irradiation and blotted onto nitrocellulose filters (Schleiche & Schuell, Dassel, Germany). The filters were prehybridized with human CotI DNA (48μg/ml, Roche Diagnostic GmbH) and salmon testis DNA (14μg/ml, Sigma-Aldrich) to decrease the background due to repetitive sequences. Subsequently, the filters were hybridized with whole BAC-DNA as a probe. DNA probes were labeled with [α-32P] dCTP (Amersham Biosciences, Piscataway, NJ, USA) and preannealed with human CotI DNA and salmon testis DNA (10.5μg/ml). Prehybridization and hybridization were performed overnight at 37°C in a solution containing 50% formamide, 10% dextran sulfate, 1% Tween 20, 2 × SSC, and 10 mM Tris-HCl (pH 7.4). After hybridization, the filters were washed at 65°C with 1.0 × SSC containing 0.1% SDS (Sodium dodecyl sulfate) and 0.5 × SSC containing 0.1% SDS. Banding patterns were obtained by either exposure to X-ray film (Fuji Film, Tokyo, Japan) or through use of the Molecular Imager FX (BioRad).
The qPCR was performed using SYBR premix EX Taq (Takara-Bio) and the Light Cycler System (Roche Diagnostics), according to the manufacturers' protocols. Primers were designed with Primer3 software (http://primer3.sourceforge.net), and the size of PCR products was confirmed by the pattern of restriction enzyme digested fragments using the LabChip DNA 500 Kit on the 2100 Bioanalyzer (Agilent Technologies, Waldbronn, Germany). Two, 0.5, and 0.125ng of genome DNA from an individual with CNV were used and the quantification of each amplicon was carried out at 45 cycles of PCR. The results were analyzed with Light Cycler Data Analysis software using a second derivative maximum model. Relative DNA content of variants for every amplimer was calculated using “normals” (individuals who did not have CNVs) as a control.
We examined 213 unrelated individuals. The main purpose of this paper is to report the accumulation of the data about highly polymorphic CNVs found in ≥5.0% of the individuals. (The number of CNVs was 11 or more in each BAC-spot.) As shown in Table 1, 680 polymorphic CNVs were observed on 16 BAC-regions. As described before, the results of two BAC (RP11-259N12 and RP11-121A8) were obtained from 133 unrelated individuals examined in Stage (2).
Southern blot analyses followed by PFGE were carried out for the highly polymorphic CNVs. As shown by the typical cases in Figures Figures11 and and2,2, the highly polymorphic CNVs exhibited complicated patterns of alternation. The patterns of these two BAC clones (RP11-79F15 and RP11-88L18) are shown in Figures Figures11 (RP11-79F15) and and22 (RP11-88L18) as the typical examples. Since each individual contained a different number of core segmental duplication units, each individual showed bands having different motilities. The numbers of repeat units corresponded to the intensity of each spot. The data described in the figure legends in more detail.
The results of qPCR conducted for two BAC clone regions are described in Figure 3 (RP11-89B15) and Figure 4 (RP11-79O18). For the former case, a part of a gene (MEOX2) was deleted. On the contrary, for the latter case, the copy number of a part of the gene (NSF) increased, but the copy number of the gene (WNT3) did not change.
There are many segmental duplications which have already been summarized in a public data base, such as Human Genome Segmental Duplication Database (TCAG Database; http://projects.tcag.ca/cgi-bin/variation/gbrowse), UCSC Human Genome Browser (UCSC Database; http:/genome.ucsc.edu/index.html), and NCBI Map Viewer (NCBI Database; http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi). The known CNVs have also been reported in the same databases as above. The CNV data obtained in our studies are summarized in Table 1 in addition to the presence or absence of CNVs already reported in the databases. One BAC clone, named “RP11-115G22,” was mapped on two chromosomes, 6 and 15 in TCAG Database, so it is likely that this clone is present on two discrete chromosomes. On the contrary, however, that clone was mapped on only Chromosome no.15 in UCSC Database and NCBI Database. We accept the reports from the latter two databases and described that this clone was mapped on only Chromosome no.15. With respect to segmental duplications, 10 out of 16 (about 63%) were present in the above databases (UCSC Database, and NCBI Database). It was noteworthy that the majority of BAC clones containing our CNVs were known to overlap to at least one CNV reported in the database. However, that does not mean that our highly polymorphic CNVs are exactly the same as those reported in previous reports, since precise comparisons between our study and the other studies were not carried out.
As mentioned before, there was very little information about the CNVs of 45 Japanese individuals for which relatively large sizes of population have been systematically screened. We compared our data with the data of Japanese including the HapMap project as reported by Redon et al. , in which CNVs were examined by a tiling BAC-array. The results are summarized in Table 1. Ten out of 16 CNVs identified in our study were reported in the data reported by Redon et al. , although we should emphasize again that the CNVs identified in our BAC region are not exactly the same as those reported by Redon et al. . CNVs identified in our study with lower numbers tend to not be identified in Redon's report . On the other hand, when our data are compared with Redon's oligo-data from Affymetrix 500EA array conducted for Japanese, only two CNVs were overlapped to our CNVs. As Redon et al. mentioned in their report , the reason appears to be that oligo arrays have some limitations for a complicated genome, such as segmental duplication areas.
We summarized the genes and disease-related genes in OMIM which overlap to the BAC-clone region with our CNVs (Table 2). In addition to those two categories, mRNAs have been also reported in the database, but they are too many to describe here. All BAC-clone regions contained at least one mRNA, although the functions of a majority of those mRNAs are not known yet (data not shown).
We examined 213 unrelated Japanese using BAC-aCGH and found a total of 680 CNVs on 16 BAC clones. A large fraction of the regions involved in the CNVs observed in our study (i.e., 625 out of 680 (92%), Table 1) have been reported previously in other studies listed in the database. That suggests that the structural rearrangements are evolutionarily ancient. A majority (≈ 63%) of the CNVs had been found on the BAC clones that overlapped with segmental duplication, suggesting the notion that segmental duplication might play a significant role in the creation of CNVs (Table 1). The observations are supported by previous data from Sharp et al.  in which they reported the sharing of CNVs among several populations, meaning those specific genomic imbalances either predated the dispersal of modern humans out of Africa or arose independently in different populations.
On the other hand, the data from Japanese are limited to about 45 individuals from the HapMap study . We compare our CNVs data to the Japanese data from Redon et al. . Our CNVs, especially those showing high frequencies, were also identified by Redon's work . On the contrary, our CNVs showing low frequency, such as less than about 10% of individuals, were not observed in their work (Table 1). That result suggests that those CNVs might be identified if the number of individuals examined by Redon et al. were increased to the level of our study (about 200 individuals). Moreover, as described before, when our data compared with Redon's oligo-data conducted for Japanese, only two of their CNVs were overlapped to our CNVs. Redon et al.  suggested that BAC- and oligo-based methods complement each other. Although the oligo-based method tends to detect smaller CNVs (about a few kilobases), this approach is less effective in tracking CNVs in genomic regions of complex structure, like segmental duplications, that are not sufficiently tagged by oligo targets. On the contrary, the BAC platform can only identify larger CNVs (>40kb), but this method has some advantage for detecting CNVs present in the regions of segmental duplications.
The CNVs in the human genome are often associated with developmental disorders and susceptibility to diseases. More importantly, CNVs may represent a major genetic component of our phenotypic diversity. Large duplications and deletions have been known to be present within the human genome based initially on cytogenetic observations in the course of etiological studies of congenital malformations (e.g., [1, 2]). The frequency of those duplications and deletions was presumed to be low and, for the most part, directly related to specific genetic disorders. A limited number of studies reported the presence of specific large duplications and deletions that were not apparently related to diseases (e.g., ).
Pinkel et al.  reported that CGH on a BAC-DNA-based microarray could reliably detect single-copy gene decreases or increases from normal diploidy. Following this, other array platforms, such as cDNA , and oligo-nucleotide , have been developed and many data from them have been reported.
As we mentioned before, many CNVs were reported to be closely related to disease phenotypes, and recent studies based on advanced molecular technologies, such as genomewide association studies [27–29] and next generation sequencing [30, 31], reported that many genes appear to play important roles in the etiology of common diseases. We report our highly polymorphic CNVs in BAC clones, which were reported to contain genes, expected to be related to phenotypic heterogeneity of each individual, based on the TCAG Database. We focused on genes reported in the above database, although many mRNA were listed in the database in addition to genes (Table 2).
BAC clone (RP11-90A9) contains two genes: ankyrin repeat domain 34B (ANKRD34B) and dihydrofolate reductase (DHFR). A phosphoprotein encoded by ANKRD34B is induced during bone marrow commitment to dendritic cells which play an important role in vertebrate immunity . DHFR genes were reported to be related to various malignancies including lymphoproliferative disorders such as systemic non-Hodgkin's lymphoma, primary central nervous system lymphoma (PCNSL) , and childhood acute lymphoblastic leukemia . Those two genes are fully overlapped to the BAC clone (RP11-90A9). It is likely that the CNVs might affect the copy number of those genes.
One BAC clone (RP11-89B15) contains a gene mesenchyme homeobox 2 (MEOX2). The analyses by qPCR (Figure 3) demonstrated that the copy number of gene (MEOX2) is deleted in the individual having CNVs detected by this BAC clone. Loss of MEOX2 gene is associated with Wilms tumor . MEOX2 induces senescence through controlling INK4a activity . MEOX2 suppressed epithelial cell proliferation in cooperation with TGF-beta1, and mediated induction of the cell-cycle inhibitor gene p21 . Finally, the data from genome wide association study (GWAS) reported that this is one of the candidate genes that might be associated with ischaemic stroke .
Another BAC clone (RP11-115G22) was mapped on the Chromosomes 15. The BAC clone contains one gene “cholinergic receptor, nicotinic, alpha 7” (CHRNA7). That gene was used as a candidate target for examining interactions on the severity of adult attention deficit hyperactivity disorder (ADHD) . It is suggested that CHRNA7 regulates airway epithelium differentiation by controlling basal cell proliferation . Moreover, it was reported that the gene is one of the candidates for Alzheimer's disease . The gene CHRNA7 is overlapped to the segmental duplication region of BAC clone (RP11-115G22). Many CNVs were listed in the Databases mentioned as above, and those are overlapped to the gene. For those reasons, our CNV appears to be overlapped to the genes, and it might affect the copy number of gene CHRNA7.
A BAC clone (RP11-79O18) contains two genes which are N-ethylmaleimide-sensitive factor (NSF) and wingless-type MMTV integration site family, member 3 (WNT3). The qPCR results (Figure 4) demonstrated that the copy number of a gene (NSF) increased but no change was observed in the gene (WNT3). That result suggested that the CNV might affect the NSF gene, but not the WNT3 gene. However, there may be an opportunity for the CNVs to become a surrogate marker of WNT3 for the future association study between the CNV and some disease phenotype. The NSF gene is one of the essential components of membrane fusion machinery which is an important homeostatic process in eukaryotic cells . The gene also plays an important role for assembly during human sperm exocytosis . A recent study showed that the NSF gene is a good candidate marker for association studies for genetic risk underlying Parkinson's disease . The WNT3 gene's single nucleotide polymorphisms (SNPs) were used as candidate markers for association studies of hemorrhagic stroke , hypertension , and chronic kidney disease . Upregulation of the WNT gene family, including WNT3, suggested involvement of the WNT's canonical and/or noncanonical signaling pathway in chronic lymphocytic leukemia . WNT3 signaling is also required in primary axis formation during vertebrate embryogenesis .
A BAC clone (RP11-79F15) contains two genes: methyl-CpG binding domain protein 3-like 1 (MBD3L1) and cell surface associated mucin 16 (MUC16). MBD3L1 encodes a protein that is related to methyl-CpG-binding proteins. The protein is localized to discrete areas in the nucleus, and expression appears to be restricted to round spermatids, suggesting that the protein plays a role in the postmeiotic stages of male germ cell development . On the other hand, MUC16 is a member of the mucin gene family and encodes cancer antigen 125 (CA125) which is a blood biomarker routinely used to monitor the progression of human epithelial ovarian cancer (EOC), although its potential role in EOC is poorly understood . Maintenance of an intact mucosal barrier, one of whose components is a gene product of MUC16, is critical to preventing damage and infection of wet-surfaced epithelia . As we demonstrated by Southern blot analysis (Figure 1), the polymorphism was caused by segmental duplications in the BAC clone. The series of segmental duplications were on the 5′-region of two genes (MBD3L1 and MUC16). The CNVs do not affect the expression pattern of the genes, but these CNVs might be useful surrogate markers for future studies.
A BAC clone (RP11-231J7) contains one gene “glutamate receptor, ionotropic, (AMPA 2).” AMPAs are ligand-activated cation channels that mediate the fast component of excitatory postsynaptic currents in neurons of the central nervous system . Since the size of the gene (AMPA 2) is larger than that of the BAC clone (RP11-231J7), it is likely that the CNVs affect the copy number of the gene.
A BAC clone (CTD-2100F13) contains two genes: RIO kinase 3 (yeast) (RIOK3) and Niemann-Pick disease, type C1 (NPC1). RIOK3 promotes pancreas ductal cell motility and invasion in pancreatic cancer . NPC1 can have a function in the egress of certain membrane-impermeable lysosomal cargo . The membrane-bound NPC1 and soluble NPC2 play an important role for the release of cholesterol from lysosomes. As a result of that mechanism, the gene is associated with obesity [56–58]. The BAC clone (CTD-2100F13) is fully overlapped to both of the two genes (RIOK3 and NPC1). The size of CNVs summarized in the databases shows that the reported CNVs were overlapped to both genes. We assumed that our CNVs may reflect the change of copy number of the genes themselves.
Since the genes mentioned are good candidate markers for enabling us to examine the etiology of common diseases and phenotypical heterogeneities among individuals, our highly polymorphic CNVs should be able to become good markers in future studies.
We are currently planning to examine the same population using a high-density oligo-array platform to accumulate more CNV data for Japanese. The reason, as mentioned before, is that the BAC platform and oligo methods complement each other. The oligo platform is known to be more effective in detecting smaller CNVs (around a few kilobases), even though this approach is less effective in tracing CNVs in genomic regions of complex structure that are covered by the BAC approach conducted in this study. We expect to construct a more definite Japanese CNV database by the combination of BAC- and oligo-platform arrays.
We conducted population screening for 213 unrelated Japanese, and observed 680 highly polymorphic CNVs. The majority of the polymorphic CNVs presented on BAC clones that overlapped with regions of segmental duplication, and had been previously reported in other publications. Some CNVs contained genes which might be related to phenotypic heterogeneity among individuals. Moreover, it is expected that the CNVs might be good surrogate markers for detecting etiological genes, even if CNVs did not directly affect the genes themselves.
The authors thank H. Omine, J. Kaneko, A. Miura, M. Imanaka, and E. Nishikori for their technical assistant. The authors are grateful to E. B. Douple for critical reading of the paper. This publication was supported by Research Protocols RP 1-01 and RP 2-07 of the Radiation Effects Research Foundation (RERF) and in part by Grants-in-Aid for Scientific Research from the Japanese Ministry of Education, Culture, Sports, Science and Technology and the Japan Science and Technology Agency (Core Research for Evolutional Science and Technology). RERF, Hiroshima and Nagasaki, Japan is a private nonprofit foundation funded by the Japanese Ministry of Health, Labour and Welfare and the U.S. Department of Energy, the latter in part through the National Academy of Sciences. Norio Takahashi and Yasunari Satoh equally contributed to this work as the first authors.