2.1 Primer library design process
We obtained the target sequence for the human X chromosome exome from the UCSC genome browser RefSeq Genes track (hg18 build). The total reference sequence consisted of 7,427 fragments with a total size of 2,495,062 bases and included all coding and non-coding (3’ and 5’ untranslated regions) exons of human chromosome X. To enable greater than 4,000 unique primer pairs within a primer library the RainDance Primer Design Pipeline was modified to evaluate each primer pair to determine the ability to pool up to 5 primer pairs within the same primer droplet. The number of amplicons offered in an expanded content RDT primer library ranges from 4,000 to 20,000 unique primer pairs. Since the human X chromosome exome design required less than 12,000 primers, we chose to optimize 3 primer pairs per droplet. The custom primer library was designed using the manufacturer’s design parameters (RainDance Technologies, Lexington, MA, USA) and the Primer3 algorithm (http://primer3.sourceforge.net/
). All SNPs from dbSNP build 129 were filtered from the primer selection region. Repeat masking was not performed on the input regions to the primer design pipeline. The primer design pipeline performed an exhaustive primer selection across all of the regions submitted.
After the primers were selected, duplicate amplicons and their associated primers were removed from the full design and only unique regions were kept in the collapsed design. A total of 11,845 unique amplicons were required to cover the entire targeted human X chromosome exome (). Filtering of these amplicons led us to reject 27 primer pairs whose design parameters were too extreme to meet the stringent primer picking criteria used by RDT. An additional 242 amplicons and their associated primer pairs were deleted from the design because they were predicted to produce more than 3 products in the human genome.
Design summary for human chromosome X exome RDT Library
Primer pairs were pooled based on the proximity of each primer pair’s target within the genome. Each of the primer’s are evaluated for off target products using the re-PCR algorithm for each of the 6 primer-primer interactions within each pool. Once pools of 3 primer pairs were determined, each pool was processed within the RainDance Primer Library Manufacturing process. Then each pooled primer pair aliquot was processed to create an emulsion containing an equal representation of each of the 3 primer pairs within a unique primer droplet.
The final human X chromosome exome RDT expanded content primer library consisted of 11,576 amplicons and associated primer pairs, or 97.7% of the initial design (). The full length of DNA amplified was predicted to be 5,916,297 bases. In total, 98.05% of the targeted human X chromosome exome bases (2,446,304 out of 2,495,062) were covered by at least one amplicon (). After accounting for overlapping amplicons, a reference sequence consisting of 5,748 fragments with a total size of 4,723,733 bp was generated and used for mapping.
2.2 Sample selection
We chose 24 male samples from SFARI’s Simplex Collection, New York, NY, USA. The SFARI Simplex Collection (SSC) is a core project and resource of the Simons Foundation Autism Research Initiative (SFARI). SSC has a permanent repository of genetic samples from approximately 3000 families, each of which has one child affected with Autism Spectrum Disorder (ASD), one unaffected child, and two parents unaffected with ASD. Two male HapMap samples, NA18500 and NA18503, were also enriched and sequenced in order to compare HapMap genotype calls with those from our Illumina sequencing. Prior to processing, patient and control DNA samples were quantified by measuring OD260/280 using a NanoDrop instrument. Following quantification, 100 ng of DNA, as determined by the NanoDrop, were run on a 0.8% agarose gel to verify that the DNA was of high molecular weight. A total of 26 genomic DNA samples (2 HapMap males as control and 24 Autistic patients DNA) passing quality control were send to RDT facility and were then processed on the RDT 1000 with the RDT Sequence Enrichment Application using standard RDT procedures for genomic DNA.
2.3 Genomic DNA Fragmentation
Genomic DNA samples were fragmented using a nebulization kit (Invitrogen, Carlsbad, CA, USA, catalogue # K7025-05) following the manufacturer’s recommended protocol: 2.5 µg of genomic DNA was re-suspended in 750 µL Shearing Buffer (TE, pH 8.0, Fisher, Worcester MA, USA, catalogue # 50843207) containing 10% glycerol (Fisher, catalogue # AC15892) and was nebulized at 6 – 10 pounds per square inch (psi) for 90 seconds to produce 2–4 kb DNA fragments. Fragmentation of the genomic DNA to 2–4 kb allows for optimal template size for performing PCR in droplets. Sheared genomic DNA was precipitated by adding 80 µL 3 M sodium acetate, pH 5.2 (Fisher, catalogue # 50843081), 4 µL 20 mg/ml Mussel Glycogen (Fisher, catalogue # NC9329100) and 700 µL 100% isopropanol (Fisher, catalogue # AC14932) mixed and stored overnight at −20°C. The samples were centrifuged at the maximum speed for 15 minutes at 4°C. The supernatant was removed, 500 µL of cold 80% ethanol (Fisher, catalogue # 5739852) wash buffer was added and the DNA pellet was spun down by centrifugation at the maximum speed for 5 minutes at 4°C. The pellet was air dried and re-suspended in 10 µL 10 mM Tris-HCL, pH 8.0 (Sigma, St. Louis, MO, USA, catalogue # T2694). Fragmented genomic DNA was run on a 0.8% agarose gel to confirm that the genomic DNA was in the correct size range (2 – 4 kb).
2.4 Genomic DNA Template Mix
In order to prepare the input DNA template mixture for targeted amplification, 1.0 µg of the purified Genomic DNA Fragmentation reaction was added to 4.7 µL 10× High-Fidelity Buffer (Invitrogen, catalogue # 11304-029), 1.26 µL of MgSO4 (Invitrogen, catalogue # 11304-029), 1.71 µL 10 mM dNTP (New England Biolabs (NEB), Ipswich, MA, USA, catalogue # NO447S/L), 3.6 µL Betaine (Sigma, catalogue # B2629-50G), 3.6 µL of RDT Droplet Stabilizer (RainDance Technologies, Lexington, MA, USA, catalogue # 30-00826), 1.8 µL dimethyl sulfoxide (Sigma, catalogue # D8418-50ml) and 0.72 µL 5 units/µL of Platinum High-Fidelity Taq (Invitrogen, catalogue # 11304-029) the samples was brought to a final volume of 25 µL with Nuclease Free Water, Teknova (Fisher, catalogue # 50843418).
2.5 RDT 1000: Merge
PCR droplets were generated on the RDT1000 (RainDance Technologies, catalogue # 20-01000) using the manufacturer’s recommended protocol: To process a single sample the user placed onto the RDT1000 a single tube containing 25 µL of Genomic DNA Template Mix, a custom primer droplet library (RainDance Technologies) and a disposable microfluidic chip (RainDance Technologies). The custom primer droplet library consists of a collection of individual primer droplets where each primer droplet contains matched pairs of forward and reverse primer (5.2 µM per primer) for each amplicon that is in the primer library. The final primer concentration in the PCR reaction is 0.53 uM per primer. The RDT1000 generated each PCR droplet by pairing a single gDNA template droplet with a single primer droplet. The paired droplets flow past an electrode embedded in the chip and is instantly merged together. All of the resulting PCR droplets were automatically dispensed as an emulsion into a PCR tube and transferred to a standard thermal cycler for PCR amplification. Each single sample generated more than 1,000,000 single plex PCR droplets. After PCR Amplification the emulsion of PCR droplets were broken to release each individual amplicon from the PCR droplets and were purified over a Qiagen MinElute column. The purified PCR product was then run on the Agilent Bioanalyzer Bioanalyzer to confirm that the amplicon profile matched the predicted histogram profile (Supplemental Figure 1
2.6 Multiplex Illumina library preparation
After PCR purification, amplified fragments for each individual were repaired to blunt ends using NEB Quick blunting kit (NEB, catalogue # E1201L, 15 minutes RT) followed by inactivating the enzyme in the blunting reaction by heating at 70°C for 10 minutes. The PCR fragments were then concatenated using NEB Quick ligation kit (NEB, catalogue # M2200L). Ligation was done overnight at 25°C. After that 5 µl of Quick T4 DNA ligase was added to the reaction was incubated at 37°C for one hour followed by inactivating the ligase at 65°C for 15 minutes. The ligated products were made into 100 µl volume by adding elution buffer and were then sheared using Covaris E210 (Duty cycle 10%, Intensity cycle 5, Cycle/Burst: 200, Time :180 sec). The sheared fragments were then purified using Qiagen QIAquick PCR purification column and was eluted in 32 µl of elution buffer. The samples then entered the standard Illumina Genome Analyzer multiplex library introduced preparation protocol. At the enrichment step, a 6 base index tag was attached to each sample using PCR following the standard Illumina protocol. Only exception is, while purifying the adaptor ligated products we have used Invitrogen E-Gel SizeSelect 2%(Invitrogen, catalogue # G6610-02) instead of using the gel purification method suggested by Illumina. The enrichment was confirmed by running a Agilent BioAnalyzer 7500 DNA chip. A quantitative qPCR was done to quantitate the library using KAPA Library quantification kit (KAPABiosystems, Woburn, MA, USA, catalogue # KK4824).
2.7 Illumina sequencing and data analysis
Enriched DNA was denatured and diluted to a concentration of 8pM. Cluster generation and 70bp single end sequencing was performed using standard IGAII manuals and version 4 kits. We performed multiplex single-end sequencing of three samples per lane of Illumina sequencing. After sequencing, the reads were mapped and variants sites identified using EmoryMapper (Cutler and Zwick, pers. comm.) against the reference sequence consisting of 5748 fragments covering 4,723,773 bases. This region is larger than the actual targeted bases (2,446,304) as RDT included some intronic and intergenic regions to facilitate primer picking. Sequences obtained for HapMap samples NA18500 and NA18503 base calls were compared to those reported by HapMap using a custom perl script to assess the rates of data completion and accuracy. The HapMap data was assumed to be without error when estimating data accuracy.
2.8 Microarray-based Genomic Selection (MGS)
We performed MGS for the HapMap sample NA18503 twice using the methods published previously [6
]. After MGS the enriched DNA samples were each sequenced in a single lane of an Illumina IGA IIX (76bp, non-multiplexed, single end). The sequence obtained for the two replicates of sample NA18503 were each compared to those reported by HapMap using a custom perl script to assess the rates of data completion and accuracy [28
]. The HapMap data was assumed to be without error when estimating data accuracy.