We sought to determine if CNVs are associated with T1D by performing genome-wide CNV analysis on a cohort of 20 patients with T1D and 20 Ctrl patients using the Affymetrix SNP Array 6.0. An additional cohort of 10 monozygotic twin pairs discordant for T1D was analyzed for validation purposes. Quality of the hybridization, as defined by Affymetrix in the Genotyping Console as a contrast QC <0.4, was assessed and 1 Ctrl sample failed prior to copy number analysis.
Of primary importance in the analysis of these data was the validity of our copy number calling algorithm. Raw data from all 3 cohorts, 59 Affymetrix arrays in total, were inputted into the Birdsuite programs and copy numbers were called across the genome. The Birdsuite software determined integer copy numbers of predefined regions of common variance (copy number polymorphisms, CNPs) and employed a more complex, multi-dimensional model to identify rare variants 
. Output files contain copy number values across the chromosome with a confidence score of each individual call (Table S1
). Genome wide call rates were also estimated for each individual sample. Two samples from the unrelated adult T1D cohort failed a quality control checkpoint of call rate greater than 98%. The remaining 57 arrays had call rates ≥ 98.6%. We compared copy numbers determined by the Birdsuite analysis to copy numbers determined by quantitative PCR (qPCR) in 37 samples of genomic DNA across 5 distinct chromosomal regions (Table S2
). For qPCR experiments, Applied Biosystem's CopyCaller1.0 program determined a non-integer copy number based upon the ΔΔCt calculation and then predicted an integer copy number, each with an associated confidence value. For 185 separate experimental points, there was >96% agreement in copy number determinations made by the Birdsuite analysis and the qPCR analysis ().
Percent agreement between Birdsuite copy number calls and qPCR.
For the analysis, we first catalogued all confident, variant CN calls on chromosomes 1–22 within the framework of known copy number polymorphisms (CNPs) 
. CNPs are regions of copy number variance present in greater than 1% of the 270 HapMap samples, resulting in a library of 1,320 CNPs. Novel CNVs not represented in the CNP library were also identified and included in the analysis.
A single CNV is capable of causing disease. In the case of Charcot Marie Tooth disease type 1, 70% of patients have one singular pathogenic variant
. In a more common disease, like T1D, we hypothesized that a single undiscovered variant would not be present and pathogenic at a percentage as high as 70%, rather we set the threshold of variance within each diabetes group at roughly half that, or 40%. Additionally, to ensure selection of variants differentially expressed between the two groups, we further limited the classification of enrichment in T1D to those CNVs present at a 1.5 fold greater frequency than Ctrl. Conversely, a CNV was classified as depleted in T1D if it was present in >40% of the Ctrl cohort at a 1.5 fold greater frequency than T1D.
Variants in the T1D group were compared to those in the Ctrl group and 18 CNPs present in > 40% of the T1D cohort at a 1.5 fold greater frequency than the Ctrl cohort were identified as enriched in T1D. Conversely, 20 CNPs and 1 novel CNV were depleted in the T1D cohort, defined as a variant present in >40% of the Ctrl cohort at a 1.5 fold greater frequency than the T1D cohort. These 39 CNVs were then studied in a second cohort.
The Affymetrix chip determines copy number based on values of nearly 1,000,000 probes in the genome, resulting in a high probability for both type I and II errors. To help control for these errors, we performed genome-wide copy number analysis in a second cohort of patients, monozygotic twin pairs discordant for T1D. We hypothesized that the 39 CNVs identified in the first T1D:Ctrl comparison may be differentially present in this second cohort of MZ twins. For each twin, confident variant calls were catalogued in the CNP library as before. CNVs present in only 1 twin of a pair were isolated and grouped based on disease status (affected or unaffected). These variants were compared to the 39 CNVs from the previous analysis and no overlap was found. Additionally, there were no CNVs present in more than 2 affected or unaffected twins of the pairs when this cohort was considered independently of the previous analysis.
The unaffected twin in each of these MZ twin pairs will have a greater than 75% lifetime incidence of developing islet autoantibodies and 65% of these now-unaffected twins will go on to develop T1D in their lifetime
. As such, CNVs may be enriched in this group as a whole that confer risk to developing islet autoimmunity or T1D. Alternatively, the CNVs depleted in the unrelated adult T1D cohort may also be depleted in the twin cohort as a whole. The 10 MZ twin pairs were compared to the Ctrl cohort to determine CNVs that were enriched or depleted in the Twin group. Criteria for enrichment and depletion were identical to those outlined above. A total of 49 CNPs were enriched and 23 CNVs were depleted in the Twin cohort. Of the depleted CNVs, 22 were CNPs while 1 novel CNV was identified. All together, 72 CNVs were enriched or depleted in the Twin cohort.
The 72 CNVs identified in the Twin cohort were compared to the 39 CNVs identified in the adult T1D cohort to identify CNVs present in both cohorts. Of these, 10 CNVs were enriched in both cohorts relative to Ctrl and 11 CNVs were depleted. Based upon permutation testing (with 1000 permutations), the p-value or probability of observing 10 or more overlapping CNVs in these cohorts by chance is 0.005 (
The 21 CNVs were further classified to select those CNVs greater than 1,000 base pairs in length and identified by at least 3 consecutive probes on the Affymetrix array. A total of 9 CNVs (8 CNPs, 1 novel CNVR) met these criteria. Of these, 5 CNPs were enriched in the T1D and Twin cohorts and are identified by their CNP ID (). These CNPs are located on 5 different chromosomes, range in size from 1,400 base pairs to 14,000 base pairs and are deletions to copy numbers of 0 or 1. Frequencies in the T1D and Twin cohorts range from 50% to 95% with corresponding frequencies in the Ctrl cohort from 21% to 58%. CNP253, on chromosome 2p11, contains part of a non coding RNA, NCRNA00152. CNP1303 contains the gene SNTG1
, encoding gamma syntrophin, a cytoplasmic peripheral membrane protein known to be expressed in brain. Two regions, CNP934 and CNP1162, contain at least one segment of DNA longer than 100 bp with more than 70% evolutionarily conserved sequence to mus musculus
as determined by the ECR browser, defined as an evolutionarily conserved noncoding sequence
. Each of these sequences encodes at least one potential transcription factor binding site suggesting these regions may have regulatory function. The sequence encompassed by CNP1956 is not gene coding and does not contain an evolutionarily conserved noncoding sequence.
CNVs enriched in T1D and Twin cohorts, relative to Ctrl.
Four CNVs were depleted in the T1D and Twin cohorts relative to Ctrl (). The 3 CNPs are gene coding regions located on 3 different chromosomes, span 3,400 base pairs to 15,900 base pairs and are also all copy number deletions to 0 or 1. The frequency of the CNVs depleted in the diabetes cohorts range from total absence (0%) to 39%. The frequency of these CNPs in the Ctrl cohort ranged from 42%–68%. CNP1102 contains a deletion of TYW1, encoding a protein involved in stabilizing ribosomal decoding processes. CNP1879 is in the region coding for the ankyrin repeat and sterile alpha motif domain gene, ANKS1B, and the chromosome 17 CNP2240 contains coding sequence for TRIM16.
CNVs depleted in T1D and Twin cohorts, relative to Ctrl.
The novel CNV, A588, depleted in both T1D and Twin cohorts is located on chromosome 15 and spans more than 1.3 million base pairs. It manifests as both an amplification and deletion and contains coding regions for genes like the golgin family members GOLGA6L6
, B cell CLL/Lymphoma gene BCL8
and an ankyrin domain family member, POTEB
. The frequency of this variant in the Ctrl group is 58%, T1D group 33% and only 10% in the Twin cohort. Interestingly, many of the variants in this region do not span the entirety of the more than 1 million base pairs (), rather we see a preponderance of overlapping and non-overlapping variants clustered in this region. Because a single variant can impact regulation and expression of a gene more than 1 megabase away, these variants were grouped together as one singular CNV region (CNVR) 
Individual breakpoints of CNVR A588.
The CNVs enriched and depleted in our cohorts are potentially associated with autoimmunity so we assessed the frequency of these variants in cohorts of 21 patients with rheumatoid arthritis (RA), and 50 patients with multiple sclerosis 
(). The T1D enriched CNPs 253, 934, 1162 and 1303 also meet the criteria for enrichment in RA; additionally, CNP1162 and CNP1303 were significantly enriched in the MS cohort (). The frequency of CNP1956 did not meet the criteria for enrichment in RA or MS. For those CNVs depleted in T1D, we similarly assessed their frequency in the RA and MS cohorts (). CNPs 1879, 2240 and the novel CNVR, A588, were also depleted in RA and MS. The depletion of CNVR A588 in RA was significant along with the depletion of CNP2240 in both RA and MS. CNP1102 was depleted in T1D and RA, but not MS (). Thus, a portion of CNVs enriched or depleted in T1D are found at similar frequencies in subjects with other autoimmune diseases.
Frequencies of CNVs in other autoimmune diseases.
Finally, we sought to determine if similar differential frequency of variance could be seen in larger, independent cohorts of cases and controls. Copy number at the T1D enriched CNPs 1162, 1303 and 1956 was determined by qPCR in a group of 73 Ctrl subjects and 73 subjects with T1D, independent of previous cohorts (). While frequency of variance did not differ appreciably between the two groups at CNPs 1162 and 1956, the difference of variance at CNP 1303 approached significance (p
0.0655). Independent validation of 3 CNPs enriched in T1D showed one region that may continue to be of interest as a potential pathogenic variant.
qPCR analysis of 3 T1D enriched CNPs in independent cohorts.