Non-DGV CNVs were Enriched in NTD Samples
A total of 85 cases (NTD-affected embryos) and 75 controls were collected. No case had the positive familial NTD history, suggesting they all were sporadic NTDs. The detailed categories of NTD cases were listed on . No known pathogenic or novel, rare mutation of MKS1/MKS3 genes was detected.
The general characteristics of the NTD cohort.
A total of 1057 CNVs in 85 cases and a total of 821 CNVs in 75 controls were detected. The per sample CNV load either by count or by size was not different between cases and controls (CNV count per sample: 12.89 in cases vs. 12.07 in controls, p>0.05; CNV size per sample: 283 kb in cases vs. 330 kb in controls, p>0.05). Stratifying CNVs by size or grouping samples by CNV count did not show any significant difference among cases and controls either ().
Summary of whole genomic copy number variants (CNVs) in the two groups; Wilcoxon rank-sum test was used for statistical analysis.
After filtering out benign CNVs reported in DGV and Chinese-specific common CNVs (four in cases and five in controls) 
, 81 non-DGV CNVs were identified, 55 in NTD cases and 26 in controls. We compared this non-DGV CNV list with the clinical diagnostic CNV dataset in Genetic Diagnostic laboratory, Boston Children’s Hospital and a Chinese CNV dataset consisting of about 300 mental retardation patients and their unaffected relatives. None of the non-DGV CNVs identified from NTD cases was present in these two datasets, suggesting that these non-DGV CNVs are very rare and/or NTD-specific. Out of 81 non-DGV CNVs, 59 CNVs (40 in NTD cases and 19 in controls) involve one or more genes (termed “genic CNVs”). Six different-sized genic CNVs validated by PCR are shown in Figure S1
for detailed primer information).
The average count of non-DGV CNVs per sample was 1.85-fold higher in NTDs than in controls (0.65 vs.
0.35) while the average count of non-DGV genic CNVs per sample was 1.88-fold higher in case than in controls (0.47 vs.
0.25). The proportions of samples with non-DGV CNVs and genic CNVs were significantly higher in cases than in controls (41.2% vs.
<0.05; 37.6% vs.
<0.05, see ). Non-DGV genic CNVs had a lower p
value than non-DGV CNVs (0.01 vs.
0.03, labeled *** in ). Logistical regression analysis showed that non-DGV CNVs and genic CNVs lead to an increased risk for NTDs (non-DGV CNVs, OR
1.10–4.74; non-DGV genic CNVs, OR
1.24–5.87). Furthermore, for samples without non-DGV CNVs, we compared the whole genomic common CNV burden in cases and controls. Neither the count nor the size of genome-wide common CNVs was different in cases and controls (CNV count: 11.61 vs.
>0.05; CNV size: 292 kb vs.
344 kb, p
>0.05). The above analysis indicated that it is the non-DGV genic CNVs, not common CNVs that are associated with increased risk of NTDs. We also compared the non-DGV CNV size in NTDs and controls and no difference was identified (Table S2
Comparison of non-DGV CNVs, non-DGV genic CNVs and ciliogenic CNVs in NTD cases and controls.
Enrichment of Ciliogenic CNVs in NTDs
The total number of genes involved in NTD-specific CNVs was about 3-fold that in controls (138 vs.
44, see Table S3
). Importantly, nine genes (CALR3
, RAB19, RABL5
gene family, ZIC
) in NTD-specific CNVs are homologous to genes that have been reported as NTD candidate genes in mouse models (labeled yellow in Table S5
), suggesting that a subset of genes in NTD-specific CNVs is directly contributing to the pathogenesis of NTDs in some samples. After referring to recently published literature reporting 828 ‘ciliome’ genes 
, forty one out of 138 genes from NTD-specific CNVs were identified as cilia genes. In controls, only twelve cilia genes were influenced by non-DGV CNVs.
IPA core analysis showed that bio-functional networks, related diseases/disorders, and canonical pathways were all different between gene lists from cases and controls (). In cases, the top-ranked network was related to “Cell Morphology, Developmental Disorder, Skeletal and Muscular Disorders”. Twenty-eight genes from cases (see , green indicates deletion, red indicates duplication) involved in this network. The top related diseases and disorders involved were “Genetic and Neurological Disease”. The canonical pathway analysis revealed that “tight junction signaling” and “protein kinase A signaling” were two significant pathways involved in genes from NTD-specific CNVs (, labeled as CP in the oval regions). The PAR3-PAR6-aPKC complex and CTNNB1 were shared components of the two canonical pathways.
Top networks, related disorders and diseases identified in NTD case, controls and known NTD candidate genes.
Top bio-functional networks in NTD-affected cases.
We further identified that 12 cilia genes (GLIS3, CENPN, CDC123, WDFY2, C16orf61, SLC17A5, CTNNA3, CLDN15, CDC16, ATP5G1, PARD3 and PARD6G) from 10 cases participate this top network (see for their locations in the network). Both the core gene of the network-HNF4A and shared components (PARD3, PAR6D6G and CTNNB1, see ) belong to cilia genes. In order to further evaluate the involvement of cilia genes in NTDs, we compared the frequency of non-DGV CNVs containing cilia genes (termed as non-DGV ciliogenic CNVs) in NTD cases and controls. In NTDs, 21 cases carry non-DGV ciliogenic CNVs, which was significant higher than that in controls. (24.7% vs. 9.3%, p<0.05, see ). Non-DGV ciliogenic CNVs led to a 3.19-fold (95% CI: 1.27–8.01) increased risk for NTDs. In order to further explore the association of non-DGV ciliogenic CNVs with subtypes of NTD (), NTD cases were stratified into cranial or spinal NTDs, isolated or systemic NTDs according to patient phenotypes. shows that the frequency of ciliogenic CNVs was not significantly higher in cranial NTDs than in spinal NTDs (21.6% vs. 29.4%, p>0.05). Meanwhile, the frequency of ciliogenic CNVs was not significantly higher in systemic NTDs compared with isolated NTDs (28.6% vs. 13.6%, p>0.05). Interestingly, the systemic NTDs with urinary/adrenal developmental anomalies showed significantly more non-DGV ciliogenic CNVs compared with those without such anomalies (42.3%% vs 18.9%, p<0.05, one-tailed Fisher exact test). The common urinary/adrenal abnormalities included renal agenesis/hypoplasia, adrenal gland agenesis/hypoplasia and polycystic kidney. This analysis revealed a possible role of cilia genes in some specific systemic NTDs. listed detailed phenotypes of 11 systemic NTD cases with non-DGV ciliogenic CNVs and abnormal urinary/adrenal development.
The Relationship of NTDs categories with non-DGV ciliogenic CNVs.
The detailed phenotypes in 11 systemic NTDs carrying ciliogenic CNVs and abnormal urinary/adrenal development.
We performed an IPA for 223 known NTD candidate genes. The top network and related diseases/disorders were similar to what was identified with the NTD-specific genes. “Embryonic Development, Neural System Development and Function, Organ Development” and “Amino Acide Metabolism, Molecular Transport, Small Molecular Biochemistry” were the top networks, and “Developmental and Neurological Diseases” were the top related diseases/disorders. “One carbon pool by folate” and “Methionine Metabolism” were the canonical pathways of “Amino Acid Metabolism”. For the network “Embryonic Development, Neural System Development and Function, Organ Development”, the canonical pathways of known importance to neural tube closure, including sonic hedgehog signaling (SHH), bone morphogenic protein (BMP) signaling, Wnt/β-catenin signaling work were identified in this top network (, labeled as CP). Also we identified tight junction signaling and protein kinase A signaling interact with SHH and BMP signaling (, labeled as CP) by sheared molecules PRKAC.
Top bio-functional networks in known NTD candidate genes.
IPA analysis was performed for 361 integrated genes consisted of known NTD candidate genes and 138 genes from NTD-specific CNVs. We observed that “Organ Development, Tissue Development, Embryonic Development” were in the top networks (), and “Developmental and Neurological Disorders” was the top related diseases/disorders. 8 genes from NTD-specific CNVs (APLF, GLIS3, PARP12, POLG, PYCR2, SUPT7L, TAF2, WNT4) involved in this top network. The SHH, BMP and Wnt/β-catenin signaling and protein kinase A signaling were identified in this network (, labeled as CP).
Top bio-functional networks in the integrated list of 361 genes.