PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (956381)

Clipboard (0)
None

Related Articles

1.  Genomic Landscape of a Three-Generation Pedigree Segregating Affective Disorder 
PLoS ONE  2009;4(2):e4474.
Bipolar disorder (BPD) is a common psychiatric illness with a complex mode of inheritance. Besides traditional linkage and association studies, which require large sample sizes, analysis of common and rare chromosomal copy number variants (CNVs) in extended families may provide novel insights into the genetic susceptibility of complex disorders. Using the Illumina HumanHap550 BeadChip with over 550,000 SNP markers, we genotyped 46 individuals in a three-generation Old Order Amish pedigree with 19 affected (16 BPD and three major depression) and 27 unaffected subjects. Using the PennCNV algorithm, we identified 50 CNV regions that ranged in size from 12 to 885 kb and encompassed at least 10 single nucleotide polymorphisms (SNPs). Of 19 well characterized CNV regions that were available for combined genotype-expression analysis 11 (58%) were associated with expression changes of genes within, partially within or near these CNV regions in fibroblasts or lymphoblastoid cell lines at a nominal P value <0.05. To further investigate the mode of inheritance of CNVs in the large pedigree, we analyzed a set of four CNVs, located at 6q27, 9q21.11, 12p13.31 and 15q11, all of which were enriched in subjects with affective disorders. We additionally show that these variants affect the expression of neuronal genes within or near the rearrangement. Our analysis suggests that family based studies of the combined effect of common and rare CNVs at many loci may represent a useful approach in the genetic analysis of disease susceptibility of mental disorders.
doi:10.1371/journal.pone.0004474
PMCID: PMC2637422  PMID: 19214233
2.  The Effect of Algorithms on Copy Number Variant Detection 
PLoS ONE  2010;5(12):e14456.
Background
The detection of copy number variants (CNVs) and the results of CNV-disease association studies rely on how CNVs are defined, and because array-based technologies can only infer CNVs, CNV-calling algorithms can produce vastly different findings. Several authors have noted the large-scale variability between CNV-detection methods, as well as the substantial false positive and false negative rates associated with those methods. In this study, we use variations of four common algorithms for CNV detection (PennCNV, QuantiSNP, HMMSeg, and cnvPartition) and two definitions of overlap (any overlap and an overlap of at least 40% of the smaller CNV) to illustrate the effects of varying algorithms and definitions of overlap on CNV discovery.
Methodology and Principal Findings
We used a 56 K Illumina genotyping array enriched for CNV regions to generate hybridization intensities and allele frequencies for 48 Caucasian schizophrenia cases and 48 age-, ethnicity-, and gender-matched control subjects. No algorithm found a difference in CNV burden between the two groups. However, the total number of CNVs called ranged from 102 to 3,765 across algorithms. The mean CNV size ranged from 46 kb to 787 kb, and the average number of CNVs per subject ranged from 1 to 39. The number of novel CNVs not previously reported in normal subjects ranged from 0 to 212.
Conclusions and Significance
Motivated by the availability of multiple publicly available genome-wide SNP arrays, investigators are conducting numerous analyses to identify putative additional CNVs in complex genetic disorders. However, the number of CNVs identified in array-based studies, and whether these CNVs are novel or valid, will depend on the algorithm(s) used. Thus, given the variety of methods used, there will be many false positives and false negatives. Both guidelines for the identification of CNVs inferred from high-density arrays and the establishment of a gold standard for validation of CNVs are needed.
doi:10.1371/journal.pone.0014456
PMCID: PMC3012691  PMID: 21209939
3.  Comparative analyses of seven algorithms for copy number variant identification from single nucleotide polymorphism arrays 
Nucleic Acids Research  2010;38(9):e105.
Determination of copy number variants (CNVs) inferred in genome wide single nucleotide polymorphism arrays has shown increasing utility in genetic variant disease associations. Several CNV detection methods are available, but differences in CNV call thresholds and characteristics exist. We evaluated the relative performance of seven methods: circular binary segmentation, CNVFinder, cnvPartition, gain and loss of DNA, Nexus algorithms, PennCNV and QuantiSNP. Tested data included real and simulated Illumina HumHap 550 data from the Singapore cohort study of the risk factors for Myopia (SCORM) and simulated data from Affymetrix 6.0 and platform-independent distributions. The normalized singleton ratio (NSR) is proposed as a metric for parameter optimization before enacting full analysis. We used 10 SCORM samples for optimizing parameter settings for each method and then evaluated method performance at optimal parameters using 100 SCORM samples. The statistical power, false positive rates, and receiver operating characteristic (ROC) curve residuals were evaluated by simulation studies. Optimal parameters, as determined by NSR and ROC curve residuals, were consistent across datasets. QuantiSNP outperformed other methods based on ROC curve residuals over most datasets. Nexus Rank and SNPRank have low specificity and high power. Nexus Rank calls oversized CNVs. PennCNV detects one of the fewest numbers of CNVs.
doi:10.1093/nar/gkq040
PMCID: PMC2875020  PMID: 20142258
4.  Characterization of autosomal copy-number variation in African Americans: the HyperGEN Study 
European Journal of Human Genetics  2011;19(12):1271-1275.
African Americans are a genetically diverse population with a high burden of many, common heritable diseases. However, our understanding of genetic variation in African Americans is substandard because of a lack of published population-based genetic studies. We report the distribution of copy-number variation (CNV) in African Americans collected as part of the Hypertension Genetic Epidemiology Network (HyperGEN) using the Affymetrix 6.0 array and the CNV calling algorithms Birdsuite and PennCNV. We present population estimates of CNV from 446 unrelated African-American subjects randomly selected from the 451 families collected within HyperGEN. Although the majority of CNVs discovered were individually rare, we found the frequency of CNVs to be collectively high. We identified a total of 11 070 CNVs greater than 10 kb passing quality control criteria that were called by both algorithms – leading to an average of 24.8 CNVs per person covering 2214 kb (median). We identified 1541 unique copy-number variable regions, 309 of which did not overlap with the Database of Genomic Variants. These results provide further insight into the distribution of CNV in African Americans.
doi:10.1038/ejhg.2011.115
PMCID: PMC3230358  PMID: 21673747
DNA copy-number variation; African American; calling algorithm; Birdsuite; PennCNV; HyperGEN
5.  Genome-wide algorithm for detecting CNV associations with diseases 
BMC Bioinformatics  2011;12:331.
Background
SNP genotyping arrays have been developed to characterize single-nucleotide polymorphisms (SNPs) and DNA copy number variations (CNVs). Nonparametric and model-based statistical algorithms have been developed to detect CNVs from SNP data using the marker intensities. However, these algorithms lack specificity to detect small CNVs owing to the high false positive rate when calling CNVs based on the intensity values. Therefore, the resulting association tests lack power even if the CNVs affecting disease risk are common. An alternative procedure called PennCNV uses information from both the marker intensities as well as the genotypes and therefore has increased sensitivity.
Results
By using the hidden Markov model (HMM) implemented in PennCNV to derive the probabilities of different copy number states which we subsequently used in a logistic regression model, we developed a new genome-wide algorithm to detect CNV associations with diseases. We compared this new method with association test applied to the most probable copy number state for each individual that is provided by PennCNV after it performs an initial HMM analysis followed by application of the Viterbi algorithm, which removes information about copy number probabilities. In one of our simulation studies, we showed that for large CNVs (number of SNPs ≥ 10), the association tests based on PennCNV calls gave more significant results, but the new algorithm retained high power. For small CNVs (number of SNPs <10), the logistic algorithm provided smaller average p-values (e.g., p = 7.54e - 17 when relative risk RR = 3.0) in all the scenarios and could capture signals that PennCNV did not (e.g., p = 0.020 when RR = 3.0). From a second set of simulations, we showed that the new algorithm is more powerful in detecting disease associations with small CNVs (number of SNPs ranging from 3 to 5) under different penetrance models (e.g., when RR = 3.0, for relatively weak signals, power = 0.8030 comparing to 0.2879 obtained from the association tests based on PennCNV calls). The new method was implemented in software GWCNV. It is freely available at http://gwcnv.sourceforge.net, distributed under a GPL license.
Conclusions
We conclude that the new algorithm is more sensitive and can be more powerful in detecting CNV associations with diseases than the existing HMM algorithm, especially when the CNV association signal is weak and a limited number of SNPs are located in the CNV.
doi:10.1186/1471-2105-12-331
PMCID: PMC3173460  PMID: 21827692
6.  Copy Number Variations and Primary Open-Angle Glaucoma 
This study has identified rare and recurrent deletions and duplications in POAG patients in the first large-scale, whole-genome study of structural variation performed in a sample of POAG patients and POAG-free subjects.
Purpose.
This study sought to investigate the role of rare copy number variation (CNV) in age-related disorders of blindness, with a focus on primary open-angle glaucoma (POAG). Data are reported from a whole-genome copy number screen in a large cohort of 400 individuals with POAG and 500 age-matched glaucoma-free subjects.
Methods.
DNA samples from patients and controls were tested for CNVs using a combination of two microarray platforms. The signal intensity data generated from these arrays were then analyzed with multiple CNV detection programs including CNAG version 2.0, PennCNV, and dChip.
Results.
A total of 11 validated CNVs were identified as recurrent in the POAG set and absent in the age-matched control set. This set included CNVs on 5q23.1 (DMXL1, DTWD2), 20p12 (PAK7), 12q14 (C12orf56, XPOT, TBK1, and RASSF3), 12p13.33 (TULP3), and 10q34.21 (PAX2), among others. The CNVs presented here are exceedingly rare and are not found in the Database of Genomic Variants. Moreover, expression data from ocular tissue support the role of these CNV-implicated genes in vision-related processes. In addition, CNV locations of DMXL1 and PAK7 overlap previously identified linkage signals for glaucoma on 5p23.1 and 20p12, respectively.
Conclusions.
The data are consistent with the hypothesis that rare CNV plays a role in the development of POAG.
doi:10.1167/iovs.10-5606
PMCID: PMC3207715  PMID: 21310917
7.  The Genetic Effect of Copy Number Variations on the Risk of Type 2 Diabetes in a Korean Population 
PLoS ONE  2011;6(4):e19091.
Background
Unlike Caucasian populations, genetic factors contributing to the risk of type 2 diabetes mellitus (T2DM) are not well studied in Asian populations. In light of this, and the fact that copy number variation (CNV) is emerging as a new way to understand human genomic variation, the objective of this study was to identify type 2 diabetes–associated CNV in a Korean cohort.
Methodology/Principal Findings
Using the Illumina HumanHap300 BeadChip (317,503 markers), genome-wide genotyping was performed to obtain signal and allelic intensities from 275 patients with type 2 diabetes mellitus (T2DM) and 496 nondiabetic subjects (Total n = 771). To increase the sensitivity of CNV identification, we incorporated multiple factors using PennCNV, a program that is based on the hidden Markov model (HMM). To assess the genetic effect of CNV on T2DM, a multivariate logistic regression model controlling for age and gender was used. We identified a total of 7,478 CNVs (average of 9.7 CNVs per individual) and 2,554 CNV regions (CNVRs; 164 common CNVRs for frequency>1%) in this study. Although we failed to demonstrate robust associations between CNVs and the risk of T2DM, our results revealed a putative association between several CNVRs including chr15:45994758–45999227 (P = 8.6E-04, Pcorr = 0.01) and the risk of T2DM. The identified CNVs in this study were validated using overlapping analysis with the Database of Genomic Variants (DGV; 71.7% overlap), and quantitative PCR (qPCR). The identified variations, which encompassed functional genes, were significantly enriched in the cellular part, in the membrane-bound organelle, in the development process, in cell communication, in signal transduction, and in biological regulation.
Conclusion/Significance
We expect that the methods and findings in this study will contribute in particular to genome studies of Asian populations.
doi:10.1371/journal.pone.0019091
PMCID: PMC3081314  PMID: 21526130
8.  Genome-Wide Identification of Copy Number Variations in Chinese Holstein 
PLoS ONE  2012;7(11):e48732.
Recent studies of mammalian genomes have uncovered the vast extent of copy number variations (CNVs) that contribute to phenotypic diversity. Compared to SNP, a CNV can cover a wider chromosome region, which may potentially incur substantial sequence changes and induce more significant effects on phenotypes. CNV has been becoming an alternative promising genetic marker in the field of genetic analyses. Here we firstly report an account of CNV regions in the cattle genome in Chinese Holstein population. The Illumina Bovine SNP50K Beadchips were used for screening 2047 Holstein individuals. Three different programes (PennCNV, cnvPartition and GADA) were implemented to detect potential CNVs. After a strict CNV calling pipeline, a total of 99 CNV regions were identified in cattle genome. These CNV regions cover 23.24 Mb in total with an average size of 151.69 Kb. 52 out of these CNV regions have frequencies of above 1%. 51 out of these CNV regions completely or partially overlap with 138 cattle genes, which are significantly enriched for specific biological functions, such as signaling pathway, sensory perception response and cellular processes. The results provide valuable information for constructing a more comprehensive CNV map in the cattle genome and offer an important resource for investigation of genome structure and genomic variation underlying traits of interest in cattle.
doi:10.1371/journal.pone.0048732
PMCID: PMC3492429  PMID: 23144949
9.  Comparative analysis of copy number variation detection methods and database construction 
BMC Genetics  2011;12:29.
Background
Array-based detection of copy number variations (CNVs) is widely used for identifying disease-specific genetic variations. However, the accuracy of CNV detection is not sufficient and results differ depending on the detection programs used and their parameters. In this study, we evaluated five widely used CNV detection programs, Birdsuite (mainly consisting of the Birdseye and Canary modules), Birdseye (part of Birdsuite), PennCNV, CGHseg, and DNAcopy from the viewpoint of performance on the Affymetrix platform using HapMap data and other experimental data. Furthermore, we identified CNVs of 180 healthy Japanese individuals using parameters that showed the best performance in the HapMap data and investigated their characteristics.
Results
The results indicate that Hidden Markov model-based programs PennCNV and Birdseye (part of Birdsuite), or Birdsuite show better detection performance than other programs when the high reproducibility rates of the same individuals and the low Mendelian inconsistencies are considered. Furthermore, when rates of overlap with other experimental results were taken into account, Birdsuite showed the best performance from the view point of sensitivity but was expected to include many false negatives and some false positives. The results of 180 healthy Japanese demonstrate that the ratio containing repeat sequences, not only segmental repeats but also long interspersed nuclear element (LINE) sequences both in the start and end regions of the CNVs, is higher in CNVs that are commonly detected among multiple individuals than that in randomly selected regions, and the conservation score based on primates is lower in these regions than in randomly selected regions. Similar tendencies were observed in HapMap data and other experimental data.
Conclusions
Our results suggest that not only segmental repeats but also interspersed repeats, especially LINE sequences, are deeply involved in CNVs, particularly in common CNV formations.
The detected CNVs are stored in the CNV repository database newly constructed by the "Japanese integrated database project" for sharing data among researchers. http://gwas.lifesciencedb.jp/cgi-bin/cnvdb/cnv_top.cgi
doi:10.1186/1471-2156-12-29
PMCID: PMC3058066  PMID: 21385384
10.  Accuracy of CNV Detection from GWAS Data 
PLoS ONE  2011;6(1):e14511.
Several computer programs are available for detecting copy number variants (CNVs) using genome-wide SNP arrays. We evaluated the performance of four CNV detection software suites—Birdsuite, Partek, HelixTree, and PennCNV-Affy—in the identification of both rare and common CNVs. Each program's performance was assessed in two ways. The first was its recovery rate, i.e., its ability to call 893 CNVs previously identified in eight HapMap samples by paired-end sequencing of whole-genome fosmid clones, and 51,440 CNVs identified by array Comparative Genome Hybridization (aCGH) followed by validation procedures, in 90 HapMap CEU samples. The second evaluation was program performance calling rare and common CNVs in the Bipolar Genome Study (BiGS) data set (1001 bipolar cases and 1033 controls, all of European ancestry) as measured by the Affymetrix SNP 6.0 array. Accuracy in calling rare CNVs was assessed by positive predictive value, based on the proportion of rare CNVs validated by quantitative real-time PCR (qPCR), while accuracy in calling common CNVs was assessed by false positive/false negative rates based on qPCR validation results from a subset of common CNVs. Birdsuite recovered the highest percentages of known HapMap CNVs containing >20 markers in two reference CNV datasets. The recovery rate increased with decreased CNV frequency. In the tested rare CNV data, Birdsuite and Partek had higher positive predictive values than the other software suites. In a test of three common CNVs in the BiGS dataset, Birdsuite's call was 98.8% consistent with qPCR quantification in one CNV region, but the other two regions showed an unacceptable degree of accuracy. We found relatively poor consistency between the two “gold standards,” the sequence data of Kidd et al., and aCGH data of Conrad et al. Algorithms for calling CNVs especially common ones need substantial improvement, and a “gold standard” for detection of CNVs remains to be established.
doi:10.1371/journal.pone.0014511
PMCID: PMC3020939  PMID: 21249187
11.  Copy number variations in 6q14.1 and 5q13.2 are associated with alcohol dependence 
Background
Excessive alcohol use is the third leading cause of preventable death and is highly correlated with alcohol dependence, a heritable phenotype. Many genetic factors for alcohol dependence have been found, but many remain unknown. In search of additional genetic factors, we examined the association between DSM-IV alcohol dependence and all common copy number variations (CNV) with good reliability in the Study of Addiction: Genetics and Environment (SAGE).
Methods
All participants in SAGE were interviewed using the Semi-Structured Assessment for the Genetics of Alcoholism (SSAGA), as a part of three contributing studies. 2,610 non-Hispanic European American samples were genotyped on the Illumina Human 1M array. We performed CNV calling by CNVpartition, PennCNV and QuantiSNP and only CNVs identified by all three software programs were examined. Association was conducted with the CNV (as a deletion/duplication) as well as with probes in the CNV region. Quantitative polymerase chain reaction (qPCR) was used to validate the CNVs in the laboratory.
Results
CNVs in 6q14.1 (P= 1.04 × 10−6) and 5q13.2 (P= 3.37 × 10−4) were significantly associated with alcohol dependence after adjusting multiple tests. On chromosome 5q13.2 there were multiple candidate genes previously associated with various neurological disorders. The region on chromosome 6q14.1 is a gene desert that has been associated with mental retardation, and language delay. The CNV in 5q13.2 was validated whereas only a component of the CNV on 6q14.1 was validated by qPCR. Thus, the CNV on 6q14.1 should be viewed with caution.
Conclusion
This is the first study to show an association between DSM-IV alcohol dependence and CNVs. CNVs in regions previously associated with neurological disorders may be associated with alcohol dependence.
doi:10.1111/j.1530-0277.2012.01758.x
PMCID: PMC3436997  PMID: 22702843
Copy Number Variations; Alcohol dependence; CNV Accuracy
12.  A Genome-Wide Investigation of Copy Number Variation in Patients with Sporadic Brain Arteriovenous Malformation 
PLoS ONE  2013;8(10):e71434.
Background
Brain arteriovenous malformations (BAVM) are clusters of abnormal blood vessels, with shunting of blood from the arterial to venous circulation and a high risk of rupture and intracranial hemorrhage. Most BAVMs are sporadic, but also occur in patients with Hereditary Hemorrhagic Telangiectasia, a Mendelian disorder caused by mutations in genes in the transforming growth factor beta (TGFβ) signaling pathway.
Methods
To investigate whether copy number variations (CNVs) contribute to risk of sporadic BAVM, we performed a genome-wide association study in 371 sporadic BAVM cases and 563 healthy controls, all Caucasian. Cases and controls were genotyped using the Affymetrix 6.0 array. CNVs were called using the PennCNV and Birdsuite algorithms and analyzed via segment-based and gene-based approaches. Common and rare CNVs were evaluated for association with BAVM.
Results
A CNV region on 1p36.13, containing the neuroblastoma breakpoint family, member 1 gene (NBPF1), was significantly enriched with duplications in BAVM cases compared to controls (P = 2.2×10−9); NBPF1 was also significantly associated with BAVM in gene-based analysis using both PennCNV and Birdsuite. We experimentally validated the 1p36.13 duplication; however, the association did not replicate in an independent cohort of 184 sporadic BAVM cases and 182 controls (OR = 0.81, P = 0.8). Rare CNV analysis did not identify genes significantly associated with BAVM.
Conclusion
We did not identify common CNVs associated with sporadic BAVM that replicated in an independent cohort. Replication in larger cohorts is required to elucidate the possible role of common or rare CNVs in BAVM pathogenesis.
doi:10.1371/journal.pone.0071434
PMCID: PMC3789669  PMID: 24098321
13.  Genome-wide association study identifies a maternal copy-number deletion in PSG11 enriched among preeclampsia patients 
Background
Specific genetic contributions for preeclampsia (PE) are currently unknown. This genome-wide association study (GWAS) aims to identify maternal single nucleotide polymorphisms (SNPs) and copy-number variants (CNVs) involved in the etiology of PE.
Methods
A genome-wide scan was performed on 177 PE cases (diagnosed according to National Heart, Lung and Blood Institute guidelines) and 116 normotensive controls. White female study subjects from Iowa were genotyped on Affymetrix SNP 6.0 microarrays. CNV calls made using a combination of four detection algorithms (Birdseye, Canary, PennCNV, and QuantiSNP) were merged using CNVision and screened with stringent prioritization criteria. Due to limited DNA quantities and the deleterious nature of copy-number deletions, it was decided a priori that only deletions would be selected for assay on the entire case-control dataset using quantitative real-time PCR.
Results
The top four SNP candidates had an allelic or genotypic p-value between 10-5 and 10-6, however, none surpassed the Bonferroni-corrected significance threshold. Three recurrent rare deletions meeting prioritization criteria detected in multiple cases were selected for targeted genotyping. A locus of particular interest was found showing an enrichment of case deletions in 19q13.31 (5/169 cases and 1/114 controls), which encompasses the PSG11 gene contiguous to a highly plastic genomic region. All algorithm calls for these regions were assay confirmed.
Conclusions
CNVs may confer risk for PE and represent interesting regions that warrant further investigation. Top SNP candidates identified from the GWAS, although not genome-wide significant, may be useful to inform future studies in PE genetics.
doi:10.1186/1471-2393-12-61
PMCID: PMC3476390  PMID: 22748001
Copy-number variant; Genome-wide association study; Microarray analysis; Preeclampsia; Single nucleotide polymorphism
14.  Genome-wide analysis shows increased frequency of CNV deletions in Dutch schizophrenia patients 
Biological psychiatry  2011;70(7):655-662.
Background
Since 2008 multiple studies have reported on copy number variations (CNVs) in schizophrenia. However, many regions are unique events with minimal overlap between studies. This makes it difficult to gain a comprehensive overview of all CNVs involved in the aetiology of schizophrenia. We performed a systematic CNV study based on a homogeneous genome-wide dataset aiming at all CNVs ≥50 kb. We complemented this analysis with a review of cytogenetic and chromosomal abnormalities for schizophrenia reported in the literature with the purpose to combine classical genetic findings and our current understanding of genomic variation.
Methods
We investigated 834 Dutch schizophrenia patients and 672 Dutch controls. CNVs were included if they were detected by QuantiSNP as well as PennCNV and contain known protein coding genes. The integrated identification of CNV regions and cytogenetic loci indicates regions of interest (CROIs).
Results
In total, 2,437 CNVs were identified with an average number of 2.1 CNVs per subject for both cases and controls. We observed significantly more deletions, but not duplications, in schizophrenia cases versus controls. The CNVs identified coincide with loci previously reported in the literature, confirming well-established schizophrenia CROIs 1q42 and 22q11.2, as well as indicating a potentially novel CROI on chromosome 5q35.1.
Conclusions
Chromosomal deletions are more prevalent in schizophrenia patients than in healthy subjects and therefore confer a risk factor for pathogenicity. The combination of our CNV data with previously reported cytogenetic abnormalities in schizophrenia provides an overview of potentially interesting regions for positional candidate genes.
doi:10.1016/j.biopsych.2011.02.015
PMCID: PMC3137747  PMID: 21489405
copy number variation; schizophrenia; cytogenetic abnormality; deletion; duplication; candidate gene
15.  Genome-Wide Copy Number Variations Inferred from SNP Genotyping Arrays Using a Large White and Minzhu Intercross Population 
PLoS ONE  2013;8(10):e74879.
Copy number variations (CNVs) are one of the main contributors to genetic diversity in animals and are broadly distributed in the genomes of swine. Investigating the performance and evolutionary impacts of pig CNVs requires comprehensive knowledge of their structure and function within and between breeds. In the current study, 4 different programs (i.e., GADA, PennCNV, QuantiSNP, and cnvPartition) were used to analyze Porcine SNP60 genotyping data of 585 pigs from one Large White × Minzhu intercross population to detect copy number variant regions (CNVRs). Overlapping CNVRs recalled by at least 2 programs were used to construct a powerful and comprehensive CNVR map, which contained249 CNVRs (i.e., 70 gains, 43 losses, and 136 gains/losses) and covered 26.22% of the regions in the swine genome. Ten CNVRs, representing different predicted statuses, were selected for validation via quantitative real-time PCR (QPCR); 9/10 CNVRs (i.e., 90%) were validated. When being traced back to the F0 generation, 58 events were identified in only Minzhu F0 parents and 2 events were identified in only Large White F0 parents. A series of CNVR function analyses were performed. Some of the CNVRs functions were predicted, and several interesting CNVRs for meat quality traits and hematological parameters were obtained. A comprehensive and lower false rate genome-wide CNV map was constructed for Large White and Minzhu pig genomes in this study. Our results may provide an important basis for determining the relationship between CNVRs and important qualitative and quantitative traits. In addition, it can help to further understand genetic processes in pigs.
doi:10.1371/journal.pone.0074879
PMCID: PMC3787955  PMID: 24098353
16.  Inheritance Model Introduces Differential Bias in CNV Calls between Parents and Offspring 
Genetic epidemiology  2012;36(5):488-498.
Copy Number Variation (CNV) is increasingly implicated in disease pathogenesis. CNVs are often identified by statistical models applied to data from single nucleotide polymorphism (SNP) panels. Family information for samples provides additional information for CNV inference. Two modes of PennCNV (the Joint-call and Posterior-call), which are some of the most well-developed family-based CNV calling methods, use a “Joint-model” as a main component. This models all family members’ CNV states together with Mendelian inheritance. Methods based on the Joint-model are used to infer CNV calls of cases and controls in a pedigree, which may be compared to each other to test an association. Although benefits from the Joint-model have been shown elsewhere, equality of call rates in parents and offspring has not been evaluated previously. This can affect downstream analyses in studies that compare CNV rates in cases versus controls in pedigrees. In this paper, we show that the Joint-model can introduce different CNV call rates among family members in the absence of a true difference. First, we show that the Joint-model may analytically introduce differential CNV calls because of asymmetry of the model. We demonstrate these differential call rates using single-marker simulations. We show that call rates using the two modes of PennCNV also differ between parents-offspring in one multi-marker simulated dataset and two real datasets. Our results advise need for caution in use of the Joint-model calls in CNV association studies with family-based datasets.
doi:10.1002/gepi.21643
PMCID: PMC3678551  PMID: 22628073
Schizophrenia; Calling Algorithm; Family-Based Study; CNV burden
17.  Global variation in copy number in the human genome 
Nature  2006;444(7118):444-454.
Copy number variation (CNV) of DNA sequences is functionally significant but has yet to be fully ascertained. We have constructed a first-generation CNV map of the human genome through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia (the HapMap collection). DNA from these individuals was screened for CNV using two complementary technologies: single nucleotide polymorphism (SNP) genotyping arrays, and clone-based comparative genomic hybridization. 1,447 copy number variable regions covering 360 megabases (12% of the genome) were identified in these populations; these CNV regions contained hundreds of genes, disease loci, functional elements and segmental duplications. Strikingly, these CNVs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution. The data obtained delineate linkage disequilibrium patterns for many CNVs, and reveal dramatic variation in copy number among populations. We also demonstrate the utility of this resource for genetic disease studies.
doi:10.1038/nature05329
PMCID: PMC2669898  PMID: 17122850
18.  A comprehensive survey of copy number variation in 18 diverse pig populations and identification of candidate copy number variable genes associated with complex traits 
BMC Genomics  2012;13:733.
Background
Copy number variation (CNV) is a major source of structural variants and has been commonly identified in mammalian genome. It is associated with gene expression and may present a major genetic component of phenotypic diversity. Unlike many other mammalian genomes where CNVs have been well annotated, studies of porcine CNV in diverse breeds are still limited.
Result
Here we used Porcine SNP60 BeadChip and PennCNV algorithm to identify 1,315 putative CNVs belonging to 565 CNV regions (CNVRs) in 1,693 pigs from 18 diverse populations. Total 538 out of 683 CNVs identified in a White Duroc × Erhualian F2 population fit Mendelian transmission and 6 out of 7 randomly selected CNVRs were confirmed by quantitative real time PCR. CNVRs were non-randomly distributed in the pig genome. Several CNV hotspots were found on pig chromosomes 6, 11, 13, 14 and 17. CNV numbers differ greatly among different pig populations. The Duroc pigs were identified to have the most number of CNVs per individual. Among 1,765 transcripts located within the CNVRs, 634 genes have been reported to be copy number variable genes in the human genome. By integrating analysis of QTL mapping, CNVRs and the description of phenotypes in knockout mice, we identified 7 copy number variable genes as candidate genes for phenotypes related to carcass length, backfat thickness, abdominal fat weight, length of scapular, intermuscle fat content of logissimus muscle, body weight at 240 day, glycolytic potential of logissimus muscle, mean corpuscular hemoglobin, mean corpuscular volume and humerus diameter.
Conclusion
We revealed the distribution of the unprecedented number of 565 CNVRs in pig genome and investigated copy number variable genes as the possible candidate genes for phenotypic traits. These findings give novel insights into porcine CNVs and provide resources to facilitate the identification of trait-related CNVs.
doi:10.1186/1471-2164-13-733
PMCID: PMC3543711  PMID: 23270433
Copy number variation; Copy number variable gene; Complex trait; QTL; Pig
19.  Ethnic differentiation of copy number variation on chromosome 16p12.3 for association with obesity phenotypes in European and Chinese populations 
OBJECTIVE
Genomic copy number variations (CNVs) have been strongly implicated as important genetic factors for obesity. A recent genome-wide association study identified a novel variant, rs12444979, which is in high linkage disequilibrium with CNV 16p12.3, for association with obesity in Europeans. The aim of this study was to directly examine the relationship between the CNV 16p12.3 and obesity phenotypes, including body mass index (BMI) and body fat mass.
SUBJECTS
Subjects were a multi-ethnic sample, including 2286 unrelated subjects from a European population and 1627 unrelated Han subjects from a Chinese population. Body fat mass was measured using dual energy X-ray absorptiometry.
RESULTS
Using Affymetrix Genome-Wide Human SNP Array 6.0, we directly detected CNV 16p12.3, with the deletion frequency of 27.26 and 0.8% in the European and Chinese populations, respectively. We confirmed the significant association between this CNV and obesity (BMI: P = 1.38 × 10−2; body fat mass: P = 2.13 × 10−3) in the European population. Less copy numbers were associated with lower BMI and body fat mass, and the effect size was estimated to be 0.62 (BMI) and 1.41 (body fat mass), respectively. However, for the Chinese population, we did not observe significant association signal, and the frequencies of this deletion CNV are quite different between the European and Chinese populations (P<0.001).
CONCLUSION
Our findings first suggest that CNV 16p12.3 might be ethnic specific and cause ethnic phenotypic diversity, which may provide some new clues into the understanding of the genetic architecture of obesity.
doi:10.1038/ijo.2012.31
PMCID: PMC3682477  PMID: 22391884
CNV; 16p12.3; BMI; body fat mass; association
20.  Copy Number Variation in Familial Parkinson Disease 
PLoS ONE  2011;6(8):e20988.
Copy number variants (CNVs) are known to cause Mendelian forms of Parkinson disease (PD), most notably in SNCA and PARK2. PARK2 has a recessive mode of inheritance; however, recent evidence demonstrates that a single CNV in PARK2 (but not a single missense mutation) may increase risk for PD. We recently performed a genome-wide association study for PD that excluded individuals known to have either a LRRK2 mutation or two PARK2 mutations. Data from the Illumina370Duo arrays were re-clustered using only white individuals with high quality intensity data, and CNV calls were made using two algorithms, PennCNV and QuantiSNP. After quality assessment, the final sample included 816 cases and 856 controls. Results varied between the two CNV calling algorithms for many regions, including the PARK2 locus (genome-wide p = 0.04 for PennCNV and p = 0.13 for QuantiSNP). However, there was consistent evidence with both algorithms for two novel genes, USP32 and DOCK5 (empirical, genome-wide p-values<0.001). PARK2 CNVs tended to be larger, and all instances that were molecularly tested were validated. In contrast, the CNVs in both novel loci were smaller and failed to replicate using real-time PCR, MLPA, and gel electrophoresis. The DOCK5 variation is more akin to a VNTR than a typical CNV and the association is likely caused by artifact due to DNA source. DNA for all the cases was derived from whole blood, while the DNA for all controls was derived from lymphoblast cell lines. The USP32 locus contains many SNPs with low minor allele frequency leading to a loss of heterozygosity that may have been spuriously interpreted by the CNV calling algorithms as support for a deletion. Thus, only the CNVs within the PARK2 locus could be molecularly validated and associated with PD susceptibility.
doi:10.1371/journal.pone.0020988
PMCID: PMC3149037  PMID: 21829596
21.  Assessment of Copy Number Variation Using the Illumina Infinium 1M SNP-Array: A Comparison of Methodological Approaches in the Spanish Bladder Cancer/EPICURO Study 
Human mutation  2011;32(2):240-248.
High-throughput single nucleotide polymorphism (SNP)-array technologies allow to investigate copy number variants (CNVs) in genome-wide scans and specific calling algorithms have been developed to determine CNV location and copy number. We report the results of a reliability analysis comparing data from 96 pairs of samples processed with CNVpartition, PennCNV, and QuantiSNP for Infinium Illumina Human 1Million probe chip data. We also performed a validity assessment with multiplex ligation-dependent probe amplification (MLPA) as a reference standard. The number of CNVs per individual varied according to the calling algorithm. Higher numbers of CNVs were detected in saliva than in blood DNA samples regardless of the algorithm used. All algorithms presented low agreement with mean Kappa Index (KI) <66. PennCNV was the most reliable algorithm (KIw=98.96) when assessing the number of copies. The agreement observed in detecting CNV was higher in blood than in saliva samples. When comparing to MLPA, all algorithms identified poorly known copy aberrations (sensitivity = 0.19–0.28). In contrast, specificity was very high (0.97–0.99). Once a CNV was detected, the number of copies was truly assessed (sensitivity > 0.62). Our results indicate that the current calling algorithms should be improved for high performance CNVanalysis in genome-wide scans. Further refinement is required to assess CNVs as risk factors in complex diseases.
doi:10.1002/humu.21398
PMCID: PMC3230937  PMID: 21089066
copy number variation; genome-wide association study; specificity; sensitivity; reliability; accuracy; CNVpartition; PennCNV; QuantiSNP
22.  Genome-Wide Survey of Large Rare Copy Number Variants in Alzheimer’s Disease Among Caribbean Hispanics 
G3: Genes|Genomes|Genetics  2012;2(1):71-78.
Recently genome-wide association studies have identified significant association between Alzheimer’s disease (AD) and variations in CLU, PICALM, BIN1, CR1, MS4A4/MS4A6E, CD2AP, CD33, EPHA1, and ABCA7. However, the pathogenic variants in these loci have not yet been found. We conducted a genome-wide scan for large copy number variation (CNV) in a dataset of Caribbean Hispanic origin (554 controls and 559 AD cases that were previously investigated in a SNP-based genome-wide association study using Illumina HumanHap 650Y platform). We ran four CNV calling algorithms to obtain high-confidence calls for large CNVs (>100 kb) that were detected by at least two algorithms. Global burden analyses did not reveal significant differences between cases and controls in CNV rate, distribution of deletions or duplications, total or average CNV size; or number of genes affected by CNVs. However, we observed a nominal association between AD and a ∼470 kb duplication on chromosome 15q11.2 (P = 0.037). This duplication, encompassing up to five genes (TUBGCP5, CYFIP1, NIPA2, NIPA1, and WHAMML1) was present in 10 cases (2.6%) and 3 controls (0.8%). The dosage increase of CYFIP1 and NIPA1 genes was further confirmed by quantitative PCR. The current study did not detect CNVs that affect novel AD loci identified by recent genome-wide association studies. However, because the array technology used in our study has limitations in detecting small CNVs, future studies must carefully assess novel AD genes for the presence of disease-related CNVs.
doi:10.1534/g3.111.000869
PMCID: PMC3276183  PMID: 22384383
gene; deletion; duplication; Alzheimer’s Disease; copy number variants
23.  Power to Detect Risk Alleles Using Genome-Wide Tag SNP Panels 
PLoS Genetics  2007;3(10):e170.
Advances in high-throughput genotyping and the International HapMap Project have enabled association studies at the whole-genome level. We have constructed whole-genome genotyping panels of over 550,000 (HumanHap550) and 650,000 (HumanHap650Y) SNP loci by choosing tag SNPs from all populations genotyped by the International HapMap Project. These panels also contain additional SNP content in regions that have historically been overrepresented in diseases, such as nonsynonymous sites, the MHC region, copy number variant regions and mitochondrial DNA. We estimate that the tag SNP loci in these panels cover the majority of all common variation in the genome as measured by coverage of both all common HapMap SNPs and an independent set of SNPs derived from complete resequencing of genes obtained from SeattleSNPs. We also estimate that, given a sample size of 1,000 cases and 1,000 controls, these panels have the power to detect single disease loci of moderate risk (λ ∼ 1.8–2.0). Relative risks as low as λ ∼ 1.1–1.3 can be detected using 10,000 cases and 10,000 controls depending on the sample population and disease model. If multiple loci are involved, the power increases significantly to detect at least one locus such that relative risks 20%–35% lower can be detected with 80% power if between two and four independent loci are involved. Although our SNP selection was based on HapMap data, which is a subset of all common SNPs, these panels effectively capture the majority of all common variation and provide high power to detect risk alleles that are not represented in the HapMap data.
Author Summary
Advances in high-throughput genotyping technology and the International HapMap Project have enabled genetic association studies at the whole-genome level. Our paper describes two genome-wide SNP panels that contain tag SNPs derived from the International HapMap Project. Tag SNPs are proxies for groups of highly correlated SNPs. Information can be captured for the entire group of correlated SNPs by genotyping only one representative SNP, the tag SNP. These whole-genome SNP panels also contain additional content thought to be overrepresented in disease, such as amino acid–changing nonsynonymous SNPs and mitochondrial SNPs. We show that these panels cover the genome with very high efficiency as measured by coverage of all HapMap SNPs and a set of SNPs derived from completely resequenced genes from the Seattle SNPs database. We also show that these panels have high power to detect disease risk alleles for both HapMap and non-HapMap SNPs. In complex disease where multiple risk alleles are believed to be involved, we show that the ability to detect at least one risk allele with the tag SNP panels is also high.
doi:10.1371/journal.pgen.0030170
PMCID: PMC2000969  PMID: 17922574
24.  Power to Detect Risk Alleles Using Genome-Wide Tag SNP Panels 
PLoS Genetics  2007;3(10):e170.
Advances in high-throughput genotyping and the International HapMap Project have enabled association studies at the whole-genome level. We have constructed whole-genome genotyping panels of over 550,000 (HumanHap550) and 650,000 (HumanHap650Y) SNP loci by choosing tag SNPs from all populations genotyped by the International HapMap Project. These panels also contain additional SNP content in regions that have historically been overrepresented in diseases, such as nonsynonymous sites, the MHC region, copy number variant regions and mitochondrial DNA. We estimate that the tag SNP loci in these panels cover the majority of all common variation in the genome as measured by coverage of both all common HapMap SNPs and an independent set of SNPs derived from complete resequencing of genes obtained from SeattleSNPs. We also estimate that, given a sample size of 1,000 cases and 1,000 controls, these panels have the power to detect single disease loci of moderate risk (λ ∼ 1.8–2.0). Relative risks as low as λ ∼ 1.1–1.3 can be detected using 10,000 cases and 10,000 controls depending on the sample population and disease model. If multiple loci are involved, the power increases significantly to detect at least one locus such that relative risks 20%–35% lower can be detected with 80% power if between two and four independent loci are involved. Although our SNP selection was based on HapMap data, which is a subset of all common SNPs, these panels effectively capture the majority of all common variation and provide high power to detect risk alleles that are not represented in the HapMap data.
Author Summary
Advances in high-throughput genotyping technology and the International HapMap Project have enabled genetic association studies at the whole-genome level. Our paper describes two genome-wide SNP panels that contain tag SNPs derived from the International HapMap Project. Tag SNPs are proxies for groups of highly correlated SNPs. Information can be captured for the entire group of correlated SNPs by genotyping only one representative SNP, the tag SNP. These whole-genome SNP panels also contain additional content thought to be overrepresented in disease, such as amino acid–changing nonsynonymous SNPs and mitochondrial SNPs. We show that these panels cover the genome with very high efficiency as measured by coverage of all HapMap SNPs and a set of SNPs derived from completely resequenced genes from the Seattle SNPs database. We also show that these panels have high power to detect disease risk alleles for both HapMap and non-HapMap SNPs. In complex disease where multiple risk alleles are believed to be involved, we show that the ability to detect at least one risk allele with the tag SNP panels is also high.
doi:10.1371/journal.pgen.0030170
PMCID: PMC2000969  PMID: 17922574
25.  Autism genome-wide copy number variation reveals ubiquitin and neuronal genes 
Nature  2009;459(7246):569-573.
Autism spectrum disorders (ASDs) are childhood neurodevelopmental disorders with complex genetic origins1–4. Previous studies focusing on candidate genes or genomic regions have identified several copy number variations (CNVs) that are associated with an increased risk of ASDs5–9. Here we present the results from a whole-genome CNV study on a cohort of 859 ASD cases and 1,409 healthy children of European ancestry who were genotyped with ~550,000 single nucleotide polymorphism markers, in an attempt to comprehensively identify CNVs conferring susceptibility to ASDs. Positive findings were evaluated in an independent cohort of 1,336 ASD cases and 1,110 controls of European ancestry. Besides previously reported ASD candidate genes, such as NRXN1 (ref. 10) and CNTN4 (refs 11, 12), several new susceptibility genes encoding neuronal cell-adhesion molecules, including NLGN1 and ASTN2, were enriched with CNVs in ASD cases compared to controls (P = 9.5 × 10−3). Furthermore, CNVs within or surrounding genes involved in the ubiquitin pathways, including UBE3A, PARK2, RFWD2 and FBXO40, were affected by CNVs not observed in controls (P = 3.3 × 10−3). We also identified duplications 55 kilobases upstream of complementary DNA AK123120 (P = 3.6 × 10−6). Although these variants may be individually rare, they target genes involved in neuronal cell-adhesion or ubiquitin degradation, indicating that these two important gene networks expressed within the central nervous system may contribute to the genetic susceptibility of ASD.
doi:10.1038/nature07953
PMCID: PMC2925224  PMID: 19404257

Results 1-25 (956381)