|Home | About | Journals | Submit | Contact Us | Français|
Both underweight and obesity have been associated with increased mortality1,2. Underweight, defined as body mass index (BMI) ≤ 18,5 kg/m2 in adults 3 and ≤ −2 standard deviations (SD) in children4,5, is the main sign of a series of heterogeneous clinical conditions such as failure to thrive (FTT) 6–8, feeding and eating disorder and/or anorexia nervosa9,10. In contrast to obesity, few genetic variants underlying these clinical conditions have been reported 11, 12. We previously demonstrated that hemizygosity of a ~600 kb region on the short arm of chromosome 16 (chr16:29.5–30.1Mb), causes a highly-penetrant form of obesity often associated with hyperphagia and intellectual disabilities13. Here we show that the corresponding reciprocal duplication is associated with underweight. We identified 138 (132 novel cases) duplication carriers (108 unrelated carriers) from over 95,000 individuals clinically-referred for developmental or intellectual disabilities (DD/ID), psychiatric disorders or recruited from population-based cohorts. These carriers show significantly reduced postnatal weight (mean Z-score −0.6; p=4.4×10−4) and BMI (mean Z-score −0.5; p=2.0×10−3). In particular, half of the boys younger than 5 years are underweight with a probable diagnosis of FTT, while adult duplication carriers have an 8.7-fold (p=5.9×10−11; CI_95=[4.5–16.6]) increased risk of being clinically underweight. We observe a significant trend towards increased severity in males, as well as a depletion of male carriers among non-medically ascertained cases. These features are associated with an unusually high frequency of selective and restrictive feeding behaviours and a significant reduction in head circumference (mean Z-score −0.9; p=7.8×10−6). Each of the observed phenotypes is the converse of one reported in carriers of deletions at this locus, correlating with changes in transcript levels for genes mapping within the duplication but not within flanking regions. The reciprocal impact of these 16p11.2 copy number variants suggests that severe obesity and being underweight can have mirror etiologies, possibly through contrasting effects on eating behaviour.
Copy number variants at the 16p11.2 locus have been associated with cognitive disorders including autism (deletions) and schizophrenia (duplications)14–19, conditions that have been suggested to lie at opposite ends of a single spectrum of psychiatric phenotypes20. We and others have reported that deletion of this region spanning 28 genes (Supplementary Table S1) increases the risk of morbid obesity 43-fold (Supplementary Figure S1)13,21. We hypothesized that the reciprocal duplication, with its resulting increase in gene dosage, may influence BMI in a converse manner. The duplication was identified in 73 out of 31,424 patients with DD/ID, a frequency consistent with previous reports17 (Table 1). Four additional cases were identified among 1,080 patients affected by bipolar disease or schizophrenia. Compared to its prevalence in seven European population-based GWAS cohorts22–24 (31 out of 58,635 individuals), the duplication was significantly more frequent in both the DD/ID cohorts (p=4.23×10−13; OR=4.4, CI_95=[2.9–6.9]) and the psychiatric cohorts (p=3.6×10−3; OR=7.0, CI_95=[1.8–19.9]) (Table 1) strengthening previous reports of similar associations16,17. Our data do not support a two-hit model25 for the effects of 16p11.2 duplications or deletions (Supplementary Text and Table S2).
We compared available data on weight, height and BMI for 106 independent duplication carriers (including published cases) to those in gender-, age- and geographically-matched reference populations (Table 2, Supplementary Tables S3 and S4, Methods). The duplication was strongly associated with lower weight (mean Z-score −0.56; p=4.4×10−4) and BMI (mean Z-score −0.47; p=2.0×10−3) (Table 2, Supplementary Table S5). Birth parameters (n=48) were normal indicating a postnatal effect. Adults carrying the duplication had a relative risk (RR) of being clinically underweight (BMI < 18.5)26 of 8.7, CI_95=[4.5–16.6] (p=5.9×10−11) (Methods). Concordantly, none of the 3,544 patients in our obesity cohorts13,21 carried the duplication (Table 1).
To further investigate these associations, we carried out separate analyses of carrier patients (DD/ID and psychiatric) and non-medically ascertained carriers (population-based cohorts plus 11 transmitting parents and three other affected first-degree relatives for whom data were available) (Table 2). Each category exhibited significantly lower weight and BMI with similar effect sizes. However, the proportion of underweight cases (BMI ≤ −2 SD)26 was higher in the first than in the second group (17/76 compared to 2/40; p=0.017). Note that the impact of the duplication on underweight status might be understated due to prescription of antipsychotic treatments, often associated with weight gain27 (Supplementary Table S6)
Having demonstrated an association of the duplication with underweight, we investigated the contribution of gender to the resulting phenotypes (Figure 1, Supplementary Figure S2 and Table S7). In DD/ID patients, the impact of the duplication on being underweight is stronger in males – the effect in females is in the same direction, but is both smaller and statistically non-significant (Table 2). A similar and significant difference (p=0.0173) was observed in adult carriers (all groups combined): the RR of being underweight for males is 24.1 (CI_95=[9.5–61.2], p=2.2×10−11) and only 4.9 for females (CI_95=[2.0–12.3], p=6.7×10−4). A gender bias was also observed in the ascertainment of DD/ID duplication carriers, in which we have an excess of males (51M:33F, p=0.044). By contrast, carriers from the general population exhibited a strong overrepresentation of females (10M:21F, p=0.035) (Supplementary Text). A similar bias was observed among transmitting parents (7M:23F, p=5.53×10−4). Thus, there is an overrepresentation of males in the medically ascertained group and a depletion in the non-medically ascertained one. We suggest that males are more likely than females to present severe phenotypes, and that this may account for the observed gender bias, as severely affected males are less likely to be recruited to adult population cohorts or be reproductively successful.
As previously reported28, the duplication was also associated with reduced head circumference (HC; mean Z-score −0.89; p=7.8×10−6) (Figure 1), 26.7% presenting with microcephaly, while carriers of the reciprocal deletion had an increased HC (mean Z-score +0.57; p=1.79×10−5) (Supplementary Table S8, Figure S3), demonstrating an additional instance of a mirror phenotype associated with reciprocal copy number changes at this locus. Notably, HC Z-scores correlate positively with those of BMI in carriers of both the duplication (rho=0.37; p=2.65×10−3) and the deletion (rho=0.42; p=1.9×10−5) (Supplementary Methods). This suggests that HC and BMI may be regulated by a common pathway or that a causal relationship exists between these two traits in these patients. A full list of malformations and secondary phenotypes reported in duplication carriers ascertained for DD/ID is available in Supplementary Table S9.
In view of their importance in obesity and underweight, the clinical reports of duplication carriers were screened for evidence of modified eating behaviours. Consequently, we carried out multiplex ligation-dependent probe amplification (MLPA, Supplementary Table S10) to screen for 16p11.2 rearrangements in 441 patients diagnosed with ED including anorexia nervosa (AN), bulimia and binge eating disorder (Table 1, Supplementary Text). No duplications of the entire region were identified, but one out of 109 AN patients carried an atypical 136 kb duplication that encompasses the SPN and QPRT genes (Supplementary Figure S4). This smaller duplication is currently the subject of further investigations, as it potentially delineates a critical region affecting eating behaviour.
Large genomic structural variants are known to affect the expression of genes not only within the affected region but also at a distance29–32. We therefore measured relative transcript levels in lymphoblastoid cell-lines of 27 genes mapping within the rearrangement or nearby (Supplementary Tables S1 and S11): six from deletion carriers, five from duplication carriers and ten from gender- and age-matched controls (Supplementary Table S12). Expression levels correlated positively with gene dosage for all genes within the copy number variable region (Figure 2) consistent with published partial results from adipose tissue13. Mean relative transcript levels in deletion and duplication carriers were, respectively, 67% and 214% of the levels measured in controls (Supplementary Table S13). While genes proximal (centromeric) to the rearrangement interval showed no significant variation in relative transcript levels between patients and controls (Figure 2), distal (telomeric) genes showed a significant alteration in relative expression. While lymphoblastoid cells may not recapitulate obesity-relevant tissues, previous experiments have shown a high degree of correlation between expression levels in different tissues/cell lines29, suggesting that the same pathways may be similarly disrupted in different cell lineages. Thus, the involvement of these distal genes in the control of BMI in our study subjects seems unlikely.
Our study demonstrates the power of very large screens (>95,000 samples, the biggest of its kind so far) to characterize the clinical and molecular correlates of a rare functional genomic variant. We unambiguously demonstrate that carrying the 16p11.2 duplication confers a high risk of being clinically underweight and show that reciprocal changes in gene dosage at this locus result in multiple mirror phenotypes. As in the schizophrenia/autism20,34 and microcephaly/macrocephaly28 dualisms, abnormal eating behaviours, such as hyperphagia and anorexia, could represent opposite pathological manifestations of a common energy balance mechanism, although the precise relationships between these mirror phenotypes remain to be determined. We surmise that abnormal brain volume, thus neuronal circuitry, both cognitive function and eating behaviour, the latter possibly being the basis for the observed impact on BMI. Consistent with this are previous reports that a subgroup of children with microcephaly show concomitant reduction in weight percentile35. Our findings also support the observation that severe overweight and underweight phenotypes correlate with lower cognitive functioning7,36. Thus, abnormal food intake may be a direct result of particular neurodevelopmental disorders. Although it is possible that the 16p11.2 region encodes distinct genes specific for each trait, a more parsimonious hypothesis is that these different clinical manifestations of central nervous system dysfunction are all secondary to the disruption of a single gene-dosage-sensitive neurodevelopmental step. Further resolution of this issue may require identification of additional patients with rare atypical rearrangements in this region.
Underweight is defined in adults and individuals of less than 18 years of age as BMI ≤ 18.5 and Z-score ≤ −2, respectively.
Two-sided Fisher’s exact test was used to compare frequencies of the rearrangement in patients and controls. Z-scores were computed for all data using gender-, age- and geographically-matched reference populations. One-sided t-test was performed to test duplication carriers for lower than zero BMI, height, weight and HC Z-score values. We used Kruskal-Wallis to test differences in gene expression patterns. P-value thresholds were defined (by permutation) in order to control the false discovery rate at 5%. Relative risk of being underweight was calculated as the ratio of the fraction of underweight individuals among duplication carriers versus our control group.
16p11.2 duplication and deletion carriers were identified through various procedures: (i) CGH with Agilent (Santa Clara, CA) 44K, 60K, 105K, 180K, 244K arrays; (ii) Illumina (San Diego, CA) Human317, Human370, HumanHap550, Human610 and 1M BeadChips; (iii) Affymetrix (Santa Clara, CA) 6.0, 250K genotyping arrays; (iv) QMPSF, (v) FISH and/or (vi) MLPA. CNV analyses of GWAS data were variously carried out using cnvHap, a moving window average intensity procedure, a Gaussian Mixture Model, Circular Binary Segmentation, QuantiSNP, PennCNV, BeadStudio GT module and Birdseed. At least two independent algorithms were used for each cohort.
lymphoblastoid cell lines were established from carriers and controls. SYBR Green quantitative PCR was performed to assess relative expression of genes.
Patients with cognitive deficits are routinely referred to clinical genetics for etiological work-ups including aCGH. We surveyed 28 cytogenetic centers in Europe and North America (Supplementary Table S3), identifying 31,424 patients ascertained for developmental delay, intellectual disabilities and/or malformations. Clinical ascertainment was as follows: developmental delay and mental retardation (MR): 51.0%, autism spectrum disorder (ASD) with or without MR: 14.4%, language delay with or without MR: 42.9%, malformations with MR: 27.6%, and malformations without MR: 4.8%.
These analyses were performed for clinical diagnostic purposes, all available phenotypic data being those provided anonymously and retrospectively by the clinician ordering the analyses. Consequently, research-based informed consent was not required by the Institutional Review Board of the University of Lausanne, which granted an exemption for this part of the study.
Cases with schizophrenia and cases with bipolar disorder were ascertained at University Hospital, Rouen, France, from consecutive hospitalizations. All psychiatric diagnoses were established according to DSM-IV criteria following review of case notes and direct examination of cases. The Schedule for Affective Disorders and Schizophrenia was used for the clinical assessment of all cases with schizophrenia or schizoaffective disorder and the Diagnostic Interview for Genetic Studies (DIGS) was used in patients with bipolar disorder. The schizophrenia cohort was described before 37: 189 cases with schizophrenia and 47 cases with schizoaffective disorder. Post-morbid IQs were available for two-thirds of cases with schizophrenia; 18.0% of these cases had an IQ lower than 70. The bipolar cohort was made up of 150 patients with either bipolar disorder type I or type II. All subject have given written informed consent and this study was accepted by the local institutional review board.
This prospective population cohort was described previously22: 6,188 white individuals aged 35–75 years were randomly selected from the general population in Lausanne, Switzerland. These individuals underwent a detailed phenotypic assessment, and were genotyped using the Affymetrix Mapping 500K array; 5,612 samples passed genotyping quality control. The institutional review board of the University of Lausanne approved this study, and written consent was obtained from all participants.
The Northern Finland Birth Cohort 1966 is a prospective birth cohort of almost all individuals born in 1966 in the two northernmost provinces of Finland. Biochemical and DNA samples were collected with informed consent at age 31 years. Genotyping was done using the Illumina Infinium 370cnvDuo array and phenotypic characteristics of the cohort were as previously described23. Phenotypic and genotyping data was available for 5,246 subjects after quality control.
The Estonian Genome Centre at the University of Tartu (EGCUT) maintains a general population based biobank, described in greater detail in 24. 2994 unrelated subjects were randomly selected among the 48,000 Estonian participants and genotyped using the IlluminaCNV370-Duo or –Quad BeadChips. EGCUT is conducted according to Estonian Gene Research Act. The project was approved by the Ethics Review Committee on Human Research of the University of Tartu. Written informed consent was obtained from all voluntary participants.
Patients and controls were all Icelandic and were recruited from all over Iceland. All participants with ADHD met DSM-IV criteria for ADHD (477 combined type, 250 inattentive type, 58 hyperactive-impulsive type, 40 unspecified). ADHD subjects were recruited from outpatient pediatric, child, and adult psychiatry clinics in Iceland, and diagnoses had been made on the basis of standardized diagnostic assessments and had been reviewed by experienced clinicians 38,39. Autistic individuals (n=351) met Autism Diagnostic Interview-Revised (ADI-R) criteria and were ascertained through the State Diagnostic Counselling Center and the Department of Child and Adolescent Psychiatry in Iceland. schizophrenia diagnoses were assigned according to Research Diagnostic Criteria (RDC) 40 through the use of the Schedule for Affective Disorders and Schizophrenia Lifetime Version (SADS-L) 41. Schizophrenia patients (n=657) were recruited from outpatient pediatric, child, and adult psychiatry clinics, and diagnoses had been made on the basis of standardized diagnostic assessments and had been reviewed by experienced clinicians. Recruitment of the psychiatric patients at deCODE has been described in more details elsewhere, ADHD 42, ASD schizophrenia and the control population 17. All participants, cases and controls, returned signed informed consents prior to participation in the study. All personal identifiers associated with medical information, questionnaire results, and blood samples were encrypted according to the standards set by the Data Protection Committee of Iceland. All procedures related to this study have been approved by the Data Protection Authority and National Bioethics Committee of Iceland.
The Study of Health in Pomerania (SHIP) is a cross-sectional survey in Western Pomerania, the north-eastern area of Germany 43,44. A sample from the population aged 20 to 79 years was drawn from population registries. 7,008 subjects were selected randomly from each community, proportional to community population size and stratified by age and gender. Exclusively, individuals with German citizenship and main residency in the study area were included. 4,308 individuals participated to the study and were genotyped on Affymetrix 6.0 SNP arrays. Both genotyped data (after QC filtering) and BMI were available for 4,070 individuals.
The KORA study is a series of independent population-based epidemiological surveys of participants living in the region of Augsburg, Southern Germany 45. All survey participants are residents of German nationality identified through the registration office and were examined in 1994/95 (KORA S3) and 1999/2001 (KORA F4). In the KORA S3 study 4,856 subjects (response 75%), and in KORA F4 in total 4,261 subjects have been examined (response 67%). 3,006 subjects participated in a 10-year follow-up examination of S3 in 2004/05 (KORA F3). Individuals for genotyping in KORA F3 and KORA F4 were randomly selected. The age range of the participants was 25 to 74 years of recruitment. Age and sex were self-reported in a questionnaire survey. Height and Weight were measured following a standardized study protocol. Informed consent has been given by all participants. The study has been approved by the local ethics committee.
The French paediatric cohort was previously published 46 and was genotyped using the Illumina Human CNV370-duo array. 581 non-obese children (BMI ≤ 90th percentile) passed quality control. All participants or their legal guardians gave written informed consent, and all local ethics committees approved the study protocol.
This cohort was described elsewhere 47. This family cohort (1006 families) was recruited between 1993–1995 (first visit) at the Center for Preventive Medicine (CMP) of Vandoeuvre-lès-Nancy during a periodical health assessment. Inclusion criteria at the first visit were parents and grandparents of French origin; residence in the Lorraine region (north-east of France); nuclear families comprising two parents and at least two biological children over 6 years old; fidelity of the majority of the families coming for the second or third time. Exclusion criteria at that visit were chronic or acute disorders. The families were supposed healthy and free from any declared acute and/or chronic disease in order to be able to assess the effects of genetics on the variability of the intermediate phenotypes in physiological conditions without the influence of any medical treatment or disease. These data were used for the statistical analysis when comparing BMI of duplication cases ascertained in France.
These cohorts were described in a previous publication 13. The adult-obesity case-control groups and the child-obesity case control groups were as published previously46, and were genotyped with the Illumina Human CNV370-duo array. In all, 643 children with familial obesity (BMI≥97th centile corrected for gender and age, at least one obese first-degree relative, age less than 18 years), 581 non-obese children (BMI≤90th centile), 705 morbidly obese adults with familial obesity (BMI≥40kgm−2, at least one obese first-degree relative with BMI≥35kgm−2, age≥18 years) and 197 lean adults (BMI≤25kgm−2) passed quality control; All participants or their legal guardians gave written informed consent, and all local ethics committees approved the study protocol.
This cohort was described in a previous publication 13. Patients undergoing elective bariatric weight-loss surgery were recruited for the ABOS study at Lille Regional University Hospital. Genotyping was performed with the Illumina Human 1M-duo array, and data from 141 adults passed quality control. All participants gave written informed consent, and the study protocol was approved by the local ethics committee.
The SOS Sib Pair Study cohort was as published previously 52. It includes 154 nuclear families, each with BMI discordant sibling pairs (BMI difference>10kgm−2), giving a total of 732 subjects. Genotyping data with the Illumina 610K-Quad array was available for 353 siblings from 149 families. Expression data from subcutaneous adipose tissue (sampled after overnight fasting) were available for 360 siblings from 151 families. Subjects received written and oral information before giving written informed consent. The Regional Ethics Committee in Gothenburg approved the studies.
The clinical sample consisted of 285 male and 223 female Spanish subjects of Caucasian origin with morbid obesity or type 2 diabetes mellitus recruited between 2000 and 2006. The remaining subjects were from the general population. All subjects reported stable body weight for at least three months before the study. They had no systemic disease other than obesity and/or IGT. The mean BMI (kg/m2) was 32.0 (15.2–82.4). The average age at assessment was 46.1 years (range 18.0–79.1). The majority of the patients have been described in previous reports 48–51.
The clinical sample consisted of 85 Spanish Caucasian patients with obesity (35% were men and 65 % women). The mean BMI (kg/m2) was 53,9 (40–73). The average age at assessment was 42.63 years (19–69). The majority of the patients have been described in previous reports.53
The clinical sample consisted of 57 Spanish Caucasian patients with obesity (59% were men and 41 % women). The mean BMI (kg/m2) was 32,23 (27,01–48). The average age at assessment was 59,74 years (36–79). The cases have not previously been described.
The clinical sample consisted of 441 Spanish Caucasian patients with eating disorders (ED) consecutively admitted to the Eating Disorders Unit of the University Hospital of Bellvitge between 2000 and 2008. The majority of patients were female (94.6%), fulfilled DSM-IV criteria for ED, and were diagnosed using the structured clinical interview for mental disorders, research version 2.0 (SCID-I) 54. The sample consisted of n=109 anorexia nervosa (AN; 25%), n=193 bulimia nervosa (BN; 44%), n=111 ED not otherwise specified (EDNOS; 25%)) and n=28 binge eating disorder patients (BED; 6%). The mean lifetime minimum BMI (kg/m2) was 15.46 (SD 1.39) for AN patients, 19.89 (SD 2.95) for BN patients, 18.53 (SD2.75) for EDNOS patients and 24.31 (SD 4.57) for BED patients. The average age at assessment was 26.57 years (SD 7.67). The average age at onset of the disorder was 18.9 years (SD 4.53) for AN patients, 19.74 (SD 7.16) for BN patients, 18.94 years (SD 5.88) for EDNOS patients and 24.65 years (SD 9.59) for BED patients. The majority of the patients have been described in previous reports 55, 56, 57.
Cases ascertained for intellectual disabilities and developmental delay were identified through standard medical diagnostic procedures. CNV analyses of GWAS data were variously carried out using cnvHap 58; a moving window average intensity procedure; a Gaussian Mixture Model 59; Circular Binary Segmentation60,61; QuantiSNP 62; PennCNV 63; BeadStudio GT module (Illumina inc); and Birdseed 64 (see below). At least two independent algorithms were used for each cohort.
All diagnostic procedures (aCGH, QPCR and/or Quantitative Multiplex PCR of Short Fluorescent fragments) were carried out according to the relevant guidelines of good clinical laboratory practice for the respective countries. All rearrangements in probands were confirmed by a second independent method and karyotyping was performed in all cases to exclude a complex rearrangement.
CNV calling was previously described in 13. In brief, data were normalized using Illumina BeadStudio, then GC effects on ratios were removed by regressing on GC and GC2, while wave effects were removed by fitting a loess function65. CNV analysis was done using cnvHap58. All called 16p11.2 duplications were validated by direct analysis of log2 ratios. Data for each probe were normalized by first subtracting the median value across all samples (so that the distribution of ratios for each probes was centered on zero), and then dividing by the variance across all samples (to correct for variation in the sensitivity of different probes to copy number variation). All CNV calls were confirmed by MLPA.
Illumina (San Diego, CA, USA) Human317, Human370, HumanHap550, Human610 and 1M BeadChips were used for CNV analysis. BeadStudio (version 2.0) was used to call genotypes, normalize the signal intensity data, and establish the log R ratio and B allele frequency at every SNP according to the standard Illumina protocols. All samples passed a standard SNP-based quality control procedure; all samples with a SNP call rate lower than 0·97. PennCNV63, a free, open-source tool, was used for copy number variation detection. The input data for PennCNV are log R ratio (LRR): a normalized measure of the total signal intensity for the two alleles of the SNP and B allele frequency (BAF): a normalized measure of the allelic intensity ratio of the two alleles. These values are derived with the help of control genotype clusters (HapMap samples), using the Illumina BeadStudio software. PennCNV employs a hidden Markov model (HMM) to analyze the LRR and BAF values across the genome. CNV calls are made, based on the probability of a given copy state at the current marker, as well as on the probability of observing a copy state change from the previous marker to the current one. PennCNV uses a built-in correction model for GC content66.
Data normalization and CNV calling was previously described in 13. Data normalization included allelic cross-talk calibration 67,68, intensity summarization using robust median average and correction for any PCR amplification bias. Wave effects were corrected by fitting a Loess function65. CNV calling was done using a Gaussian mixture model (GMM) 59, that fits four components (deletion, copy neutral, 1 and 2 additional copy) to CN ratios. The final copy number at each probe location is determined as the expected (dosage) copy number. The method has been validated by comparing test datasets with results from the CNAT 69 and CBS 60,61 algorithms and by replicating a subset of CoLaus subject on Illumina arrays. Only duplications found by both GMM and CBS were considered.
Genotypes were called by BeadStudio software GT module v3.1 or GenomeStudio GT v1.6 (Illumina Inc). Log R Ratio and B Allele Frequency (BAF) values produced by the BeadStudio were formatted for further CNV analysis and break-point mapping with Hidden Markov Model based softwares QuantiSNP (ver.1.1) 62 and PennCNV 70 or CNVPartition 2.4.4 (Illumina Inc). All analyses were carried out using the recommended settings, except changing EMiters to 25 and L to 1,000,000 in QuantiSNP. For PennCNV, the Estonian population specific SNP allele frequency data was used. All detected duplications were confirmed by quantitative PCR.
Raw intensities were normalized using Affymetrix Power Tools (Affymetrix inc), CNV analysis was done using Birdseye from the Birdsuite software package64 and PennCNV63. PennCNV predictions with confidence score less than 10 were removed. Birdsuite predictions were filtered as in 21: CNVs were kept if their LOD score was greater than 10, length greater than 1kb and number of probes greater or equal to 5 and size per number of probes less than 10,000.
Genotyping for KORA F3 was performed using Affymetrix 500K Array Set consisting of two chips (Sty I and Nsp I). The KORA F4 samples were genotyped with the Affymetrix Human SNP Array 6.0. For both studies genomic DNA from blood samples was used for analysis. Hybridisation of genomic DNA was done in accordance with the manufacturer’s standard recommendations. Genotyping was done in the Genome Analysis Centre (GAC) of the Helmholtz Centre Munich. Genotypes were determined using BRLMM clustering algorithm (Affymetrix 500K Array Set) and Birdseed2 clustering algorithm (Affymetrix Array 6.0). For quality control purposes, we applied a positive control and a negative control DNA every 48 (KORA F3) samples or 96 samples (KORA F4). On chip level only subjects with overall genotyping efficiencies of at least 93% were included. In addition the called gender had to agree with the gender in the KORA study database. After exclusions 1,644 individuals remained in KORA F3 and 1,814 in KORA F4 for further analysis.
We used Multiplex Ligation-dependent Probe Amplification (MLPA) to determine changes in the copy number of a region of around 2 Mb on chromosome 16p11.2. Briefly, we designed, using hg18, nine probes within the targeted region, one control probe outside the rearranged region and seven control probes targeting unique position in the genome (Supplementary Table S10). Assays were performed with MRC-Holland reagents according to the manufacturer’s protocol71. The analysis of the amplification products was performed by capillary electrophoresis in the DNA Analyzer 3730XL and using the GeneMapper software v3.7 (Applied Biosystems, Foster City, CA). The calculations were performed independently for each experiment: we first normalized the MLPA data to minimize the amount of experimental variations summing all the signal values of each control probe for each sample, and afterwards, dividing each signal value of each sample with the sum. Then, the normalized signal values were compared to signal values from all other samples in the same experiment dividing the normalized signal values by the average calculated from all the samples in the same experiment. The product of this calculation is termed dosage quotient (DQ). A calculated DQ value below 0.65 and above 1.25 was considered as copy-number loss and gain, respectively as described 72–74.
DNA samples were labelled with Cy3 and cohybridized with Cy5-labelled DNA from the CEPH cell line, GM12042, to custom-made Nimblegen arrays. These arrays contained 71,000 probes spread across the short arm of chromosome 16 from 22.0 to 32.7 Mb (at a median space of 45 bp between 27.5 and 31.0 Mb) and 1,000 control probes situated in invariable region of the X chromosome. DNA labelling, hybridization and washing were performed according to Nimblegen protocols. Scanning was performed using an Agilent G2565BA Microarray Scanner. Image processing, quality control and data extraction were performed using the Nimblescan software v.2.5.
Weight, height, BMI and HC Z-scores were determined for pediatric cases (0–18 years) using clinical growth charts specific to the country of origin. Children were ascertained from 9 different countries. If charts were only available in percentile, those measures were transformed in Z-scores (cf. Statistics).
For United-States and Canada, data from the Center for Disease Control and National Center for Health Statistics (CDC/NCHS) were used to calculate Z scores. 75.
For the French pediatric population we used French national growth charts 76, 77. For the Swiss pediatric population we used Swiss national growth charts 78. For Dutch participants Dutch national growth charts were available. 79 For Italian, German, Finnish, Austrian cases (n= 6) height, weight and BMI Z-scores were estimated using WHO growth charts. 80
To check for discrepancies generated by the use of different growth charts, height, weight and BMI Z-scores were recalculated using WHO growth charts, for all cases less than 5 years, regardless of origin (http://www.who.int/childgrowth/standards/en/80). Z-scores obtained using the WHO data were not significantly different. These growth standards, developed by the World Health Organization Multicentre Growth Reference Study, describe normal child growth from birth to 5 years under optimal environmental conditions. These standards can be applied to all children everywhere, regardless of ethnicity, socioeconomic status, and type of feeding 81, 82.
If needed, Percentile values were transformed to Z-scores by the inverse-normal density function. When growth chart were unavailable we utilized reported LMS parameters (median (M), generalized coefficient of variation (S), and skewness (L)) to obtain Z-scores via the formula:
in which X is the genuinely observed value.
When LMS parameters were unavailable we estimated them from the available sex-, age-, and origin-matched Swiss- (CoLaus), Estonian-, or French control populations. For cases identified from population based cohorts, Z-scores were directly inferred from the cohort.
We established lymphoblastoid cell lines from deletion and duplication carriers, as well as controls (Supplementary Table S12), by transforming peripheral blood mononuclear cells with EBV. Patients and controls were enrolled after obtaining appropriate informed consent by the physicians in charge and approval by the ethics committee of the University of Lausanne. More control cell lines were obtained from Coriell Institute for Medical Research (http://www.coriell.org/) (Supplementary Table S12). SYBR Green real-time quantitative polymerase chain reaction was performed as published 29,83. Briefly, 1μg of total RNA from lymphoblastoid cell lines was converted to cDNA using Superscript VILO (Invitrogen) primed with a mix of oligo(dT) and random hexamers. Oligos were designed using the PrimerExpress program (Applied Biosystem) with default parameters (Supplementary Table S11). Non intron-spanning assays were tested in standard +/− RT reactions of RNA samples for genomic contamination. The amplification efficiency of each primer pair was tested in a cDNA dilution series as described 84. A full list of genes mapping within the rearranged interval and exclusion criteria are presented in Supplementary Table S1. All RT-PCR reactions were performed in a 10μl final volume and triplicates per sample. The set up in a 384 wells plate format was performed using a Freedom EVO robot (TECAN) and assays run in an ABI 7900 Sequence Detection System (Applied Biosystems) with the following amplification conditions: 50°C for 2 min, 95°C for 10 min, and 45 cycles of 95°C 15 sec/60°C for 1 min. A final incubation of 95°C for 15 sec followed by 60°C for 15 sec was carried out in order to establish a dissociation curve. Each plate included the appropriate normalization genes to control for any variability between the different plate runs. Raw threshold cycles (Ct) values were obtained using SDS2.4 (Applied Biosystems). In order to calculate the normalized relative expression ratio of individuals carrying the copy number variant and of controls, we used the Biogazelle qBase Plus software 85 including geNorm 86. This program identified appropriate normalization genes (EEF1A1, RPL13, GUSB and TBP) having a gene-stability measure of M=0.25. We note that, one gene, namely LAT, showed a startlingly high expression profile in one of the duplication samples (DASYL, Supplementary Table S13), reaching a relative expression value of 27.3 (SE=1.37), compared to an average expression for other duplications of 1.89 (SE=0.51). Whilst at this point we cannot exclude that this finding is genuine (although confirmed it in a second experiment), it was excluded from further analyses as an outlier in order to give a more accurate overview of expression profiles for these genes.
In silico analysis was performed to check for brain, and specifically hypothalamus, expression of genes within the rearranged 16p11.2 interval (Supplementary Table S1). This was done using Allen Brain Atlas Resources, Seattle (WA): Allen Institute for Brain Science. ©2009. Available from: http://www.brain-map.org.
Significant neurological signs were defined by (i) the presence of neurological signs such as severe hypotonia, hypertonia, ataxia, severe spasticity, hypereflexia, hyporeflexia and/or extra-pyramidal signs, (ii) the severity of the developmental delay (e.g. no speech at age 5 and/or severe gross motor delay in walking acquisition >24 months); and (iii) the presence of epilepsy. Mental retardation, autism, psychiatric symptoms, unspecified hypotonia and mild spasticity were not considered.
One-sided t-test was performed to test whether duplication carriers have lower than zero BMI, height, weight Z-score values. We found this analysis more suitable than linear regression analysis correcting for confounding factors such as sex and age, because these anthropometric traits have a highly non-linear dependence on these factors as can be observed in control population.
We used Kruskal-Wallis to test differences in the gene expression pattern between deletion and duplication carriers and control individuals. Since expression values are not necessarily normally distributed, this test is more adequate than a classical one-way ANOVA. To test pairwise differences, we computed the difference in mean group rank with its 95% confidence interval (as provided by the multcompare function in Matlab). Correction for multiple testing issues was done using a Bonferroni adjustment.
We determined false discovery rate (FDR) based association p value thresholds for each phenotype to correct for multiple testing. For each phenotype we replaced the observed Z-scores with numbers randomly drawn from a standard normal distribution and performed the same t-tests for the same strata. The procedure was repeated 1,000 times. For various p value thresholds we asked how many test would be declared significant for the null set on average (over the 1,000 random draws). FDR was estimated as the fraction of this number and the actual number we obtained for the observed Z-scores. With this approach we controlled the dependence between the nested tests we carried out.
Among adults, we defined (the risk of) being underweight as having BMI below 18.5. The estimated Relative Risk (RR) is the ratio of the fraction of underweight individuals among duplication carriers versus our control group. Underweight adults were defined by the WHO criteria (BMI ≤ 18.5). The standard error of log(RR) and its significance were calculated as described in 87. In our control group (population based cohorts), the frequency of underweight is 1.9% (38 males and 148 females out of 9470).
We thank the Vital-IT high-performance computing centre of the Swiss Institute of Bioinformatics. SJ is recipient of a « bourse de relève académique de la Faculté de Biologie et Medecine de l’Universite de Lausanne ». This work was supported by the Leenaards Foundation Prize (SJ, DM and AR), the Jérôme Lejeune Foundation (AR), the Telethon Action Suisse Foundation (AR), the Swiss National Science Foundation (AR, JSB, SB and SEA), a SNSF Sinergia grant (SJ, DM, SB, JSB and AR), the European Commission anEUploidy Integrated Project grant 037627 (AR, SB, XE, HGB and SEA), the Ludwig Institute for Cancer Research (AV), the Swiss Institute of Bioinformatics (SB, ZK), an Imperial College Dept of Medicine PhD studentship (JSe-SM), the Comprehensive Biomedical Research Centre, Imperial College Healthcare NHS Trust, and the National Institute for Health Research (PE), the Wellcome Trust and the Medical Research Council (AIFB and PF), the Instituto de Salud Carlos III (ISCIII)-FIS, the German Mental Retardation Network funded through a grant of the German Federal Ministry of Education and Research (NGFNplus 01GS08160) to A Reis and European Union-FEDER (PI081714, PS09/01778), SAF2008-02278 (XE, MG, FFA), the Belgian National Fund for Scientific Research - Flanders (NVA, RFK), the Dutch Organisation for Health Research and Development (ZON-MW grant 917-86-319) and Hersenstichting Nederland (BBAdV), grant 81000346 from the Chinese National Natural Science Foundation (YGY), the Simons Foundation Autism Research Initiative, Autism Speaks and NIH grant GM061354 (JFG), and the OENB grant 13059 (AK-B). YS holds a Young Investigator Award from the Children’s Tumor Foundation and Catalyst Award from Harvard Medical School, and BLW, a Fudan Scholar Research Award from Fudan University, a grant from Chinese National “973” project on Population and Health (2010CB529601) and a grant from Science and Technology Council of Shanghai (09JC1402400). ERS and SL, recipients of the Michael Smith Foundation for Health Research Scholar award, acknowledge the CIHR MOP 74502 operational grant. EGCUT received support from the EU Centre of Excellence in Genomics and FP7 grants #201413 and #245536, from Estonian Government SF0180142s08, SF0180026s09 and SF0180027s10 (AM, KM, AK). The Helmholtz Zentrum Munich and the State of Bavaria financed KORA, also supported by the German National Genome Research Network (NGFN-2 and NGFNPlus: 01GS0823), the German Federal Ministry of Education and Research (BMBF), and the Munich Center of Health Sciences (MC Health, LMUinnovativ). CIBEROBN and CIBERESP are initiatives of ISCIII (Spain). SWS holds the GlaxoSmithKline-Canadian Institutes of Health (CIHR) Chair in Genetics, Genomics at the University of Toronto and the Hospital for Sick Children and is supported by Genome Canada and the McLaughlin Centre. deCODE was funded in part by NIH grant MH071425 (KS), EU grant HEALTH-2007-2.2.1-10-223423 (Project PsychCNV) and EU grant IMI-JU-NewMeds. We thank for their help M. Hass, Z. Jaros, M. Jussila, M. Koiranen, P. Rantakallio, MC. Rudolf, V. Soo, O. Tornwall, S. Vaara, T. Ylitalo and the French DHOS national CGH network, as well as all participating patients and clinicians. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author contributionsSJ, AR, PF and JSB wrote the manuscript with contributions from FZ, LH, RGW, NDB, ZK, AIFB and AV. LH, AV and AR produced and analyzed the expression data. ZK, AV, RGW and NDB conducted the statistical analyses guided by SJ, AR, PF and JSB. SJ, AR, FZ, LH, DM, YS, GT, MB, SB, DC, NdL, BBAdV, BAF, FFA, MG, AG, JH, AK, ClC, KM, KO, OSP, DS, MMVH, SVG, ATVvS, FW, BLW, YY, JA, XE, JFG, AM, SWS, KS, UT, AIFB, JSB, PF and all other authors phenotyped and/or genotyped patients and/or individuals of the general population. SJ, AR and JSB designed the study. All authors commented on and approved the manuscript.