The pediSNP program is publicly available [23
] and was based on SNPtrio [12
] (available at the same website). The pediSNP website includes the software, a tutorial, and text files containing all the genotype data used in this study.
SNP data based on Illumina's 550 K SNP chip were obtained from AGRE as Illumina BeadStudio data files. SNP data of genotype calls were exported from BeadStudio as text files, and used as the input to pediSNP. Forward trio analysis was performed by steps described in SNPtrio and plotted as trio tracks. Meiotic recombination sites were identified among reverse trio tracks of identical or opposite inheritance, based on patterns and descriptions shown in Figure .
Figure shows typical output from the pediSNP analysis. The top two panels (labeled as 'Fa.Mo__B1' and 'Fa.Mo__B2') are from normal pedigree runs: the two parents with the sons B1 and B2, respectively. The next two panels (labeled as 'Rev_B1.B2__Fa' and 'Rev_B1.B2__D3') are from reverse pedigree analysis, with the two siblings treated as if they are the parents, and each of the two biological parents treated as if he or she is the child, respectively.
A unique dimension introduced by pediSNP lies in blocks of dots shown in both the OPP and the ID tracks (refer to Table ) on the two reverse pedigree plots. Each black OPP (opposite inheritance) dot signifies that the parent of this reversed child1/child2/parent trio has the AB genotype while the children of this reversed trio include an AA genotype and a BB genotype. The child with the AA genotype inherited a copy of the A from that parent, while the child with the BB genotype inherited a copy of the B from that parent. A block of black OPP dots identifies a region where two non-identical alleles from that parent were transmitted separately to the two siblings.
Because any black OPP dot in one parent's reverse trio plot indicates that parent has an AB genotype while the two siblings have one AA genotype and one BB genotype, this means that the other parent must also have an AB genotype. Therefore, the dot on the other parent's reversed trio plot must also be a black OPP dot. (An exception is the occurrence of a chromosomal anomaly such as uniparental inheritance.) Thus, for example, in Figure the two reverse trios have OPP blocks that are aligned in the region labeled "opposite inheritance."
Each contiguous red ID box region on any one of those two reversed trio plots identifies a segment of that particular chromosome where one of that parent's allele was not transmitted to either of the two siblings. For that chromosomal region, the two siblings inherited the same allele from this parent. The beginning and the end points of each such chromosomal region mark two different meiotic crossover events on that parent's chromosome among these two siblings. Each crossover was transmitted to only one sibling and not to the other sibling. Both crossover events could have been transmitted to the same child.
Note that OPP dots signify that parent1 is AB while one child is AA and the other child is BB. Consider a region of semi-identical (IBS-1) inheritance (Figure ). When one of the parent's reverse pedigree plot has a red ID box region (i.e., these two siblings inherited the same allele from that parent), there will be no OPP dots on the corresponding space of the reverse pedigree plot from the other parental gamete, due to the sharing of an allele among the "parents" of this reverse pedigree. The matching pair of reverse pedigree plots could have overlapping red ID boxes, marking a region where each sibling inherited the opposite allele from each parent.
The SNPtrio tool can show paternal or maternal uniparental inheritance phenomena (UPI-P or UPI-M) such as hemizygous deletions, as well as Mendelian inconsistencies. SNPtrio panels are plotted as part of the pediSNP output. Similar patterns are occasionally evident on reverse trio plots. (For example, Figure panels 2 and 3 each include one dot on the UPI-M track on the short arm of chromosome 2 adjacent to the centromere, and these dots are reflected on the tracks labeled 5 of panels 4 and 5.) In reverse trios such patterns are not informative regarding meiotic recombination.
We developed a segmentation algorithm to define blocks and their overlaps. We first defined blocks having either opposite or identical patterns with a start and end position for the first and last informative SNP. We then ordered the blocks by starting position and defined overlapping blocks. For example, block 1 with ends A, B and block 2 with ends C, D overlap if A ≤ D and C ≤ B. Analyses of block overlaps were restricted to one parental gamete at a time, and were used to find n-1 overlapping blocks where n is the number of siblings. After overlapping blocks are identified, the algorithm identifies the maximum genomic distance within which the crossover event occurred. Crossovers are flagged as ambiguous if they are found to occur in two parental gametes. For overlaps in fewer than n-1 blocks, the criteria for calling a crossover are not met. All crossover calls are tabulated into a final output.
The segmentation algorithm does not include a genotype error model. Genotyping errors are not more likely to occur in parents or children, and Mendelian inconsistencies in father/mother/child trios are analyzed in the SNPtrio panels of pediSNP. The segmentation algorithm relies on data in tracks 2 (identical inheritance) and 4 (opposite inheritance) and is thus unaffected by genotyping errors which would generate signals in tracks 1, 3, or 5. The presence of a single signal in tracks 2 or 4 that was due to genotyping error would not disrupt the segmentation algorithm because it relies on locating the beginning and end of blocks to find the crossovers. A single MI-S or BPD in the middle of another block type would not generate enough points to create a block (and would be too far away from other blocks) to be considered. Furthermore, if there were by chance two or more dots interrupting a block, it could correspond to a true recombination event. This can be assessed by finding consensus among n-1 siblings. Even if it were found, an erroneous block would be ignored unless it matched in multiple individuals. For these reasons the segmentation algorithm is insensitive to single outliers (genotyping errors) and would only to start to be affected when there were multiple points clustered together. And then the clusters would only cause a problem if they were replicated between multiple siblings (e.g. the occurrence of parental errors of specific types and not sibling errors).
Synthetic data generation
Synthetic SNP data were generated to test the segmentation algorithm's sensitivity and specificity using data in which recombination sites and gamete choices were known. Data were based on a HapMap genotype scaffold (n = 1,066,825 markers) and distributed throughout the genome at their annotated coordinates. A three generation pedigree was designed to include maternal and paternal grandparents, two parents, and four children/grandchildren.
The first step in creating the synthetic pedigree was to create the grandparents' genotypes from which all individuals would be derived. The synthetic parents for each grandparent were defined as two unrelated HapMap Yoruba individuals for which SNP data were available. The father/mother for the Paternal Grandfather, Paternal Grandmother, Maternal Grandfather, and Maternal Grandmother were individuals NA18507/NA18858, NA18871/NA18517, NA19138/NA19172, and NA19160/NA19116. The HapMap founders' genotypes were initially randomly segregated into haplotypes. Location and number of crossovers were determined for each individual. Each chromosome arm could have one or two crossover events, and the probability of each scenario was dependent on chromosome arm size and sex (Table ), chosen to reflect a greater rate of crossing over in females.
Once the number of recombinations was determined, crossover locations were determined. For each crossover a 1 Mb bin was selected based on the weighted sex-specific recombination probability [14
] available from the UCSC genome browser [24
]. Once a 1 Mb bin was selected the exact location within the bin was selected at random. If two crossovers were chosen a 10 Mb interference region was generated on either side of the first crossover region. The second crossover was selected using the same method as the first from the remaining pool of bins.
With crossover locations decided, the haplotypes were swapped in the appropriate positions to create a choice of two non-recombinant and two recombinant gametes for each chromosome. The choice of which gamete to transmit to progeny was independent for all chromosomes in females. Males did not recombine their X, and therefore could only transmit the X to female offspring or the Y chromosome male offspring. The probability of selecting a non-recombinant chromosome was 35%, while the probability of selecting a recombinant chromosome was 65%.
Once the synthetic gametes were generated for each founder pair, the gamete haplotypes were stored for the individual they created, and that individual's genotypes were derived from the inherited haplotypes. This ensured that the individuals in the first generation (the grandparents) had allele frequencies representative of the sampled population and had genotypes that were based on real data.
With the grandparents established, synthetic children were derived in a very similar manner, the only difference being that haplotypes were already established. Each crossover location and gamete choice was recorded in the output, giving a complete list of all of the crossovers that occurred to create each individual. The reported data were given as tabular genotypes annotated by chromosome and position for each member of the pedigree.