In the presence of genotyping error, it is a well-understood phenomenon that standard family-based association tests (Transmission Disequilibrium Test 
, Family-Based Association Tests 
, etc) are biased under the null hypothesis and do not maintain the pre-specified α
. Under standard genotyping error models, more transmissions of the common allele will be observed than can be expected just by chance under the assumption of Mendelian transmissions 
. In a genome-wide association study, after the quality control filtering of all genotyped SNPs, the genotyping error rate for each individual SNP is expected to be small and departures of the transmission pattern from the null hypothesis that are caused by genotyping errors are unlikely detectable by a single locus analysis.
In order to estimate the undetected genotyping error for an individual proband after quality control filtering, the information about the transmission patterns has to be aggregated across all of the proband's genotyped SNPs. Consequently, we define for each proband an individual transmission test statistic that can be used to infer the underlying, undetected average genotyping error rate for the selected proband.
In order to keep the notation simple, we assume that one trio is available for genotyping at
bi-allelic marker loci. The variable
denotes the number of target/minor alleles in the proband of the trio at the
marker locus based on a called genotype. It, therefore, reflects any errors in genotyping and is not necessarily equal to the true allele totals. Similarly, the parental counts at the
locus are given by
. Then for
marker locus, we can define the Mendelian residual by
is computed based on the assumption of Mendelian transmissions. When the parental genotypes are unknown and genotypic information on additional probands is available, the parental genotypes in equation (1) can be replaced by the sufficient statistic of Rabinowitz & Laird 
. Based on the Mendelian residuals, a genome-wide transmission score for the proband in the trio can be constructed as
By summing over the Mendelian residuals
for all genotyped markers in the proband, the score
assesses the Mendelian transmission patterns globally and evaluates the null hypothesis of no preferential transmission of the minor allele at a genome-wide level. Given the SNP density on the currently used SNP chips, some proportion of the SNPs will be in linkage disequilibrium (LD). The potential correlation between the SNPs has to be taken into account when the variance of
is computed in order to standardize the test statistic. Standard approaches for the computation of the variance, as they are used, for example in the TDT or FBAT statistic, assume independence of the Mendelian residuals
and are therefore not applicable here.
However, the asymptotic properties of
can be derived without knowledge of the LD structure by interpreting
as a permutation test statistic. For the computation of the Mendelian residual at each SNP, an allele has to be selected as the target allele. For a bi-allelic marker locus, an exchange of the target allele implies a change in the sign of the Mendelian residual, i.e.
. Under the null hypothesis of no preferential transmission of either allele at a genome-wide level, the assignment of the target allele at each SNP can be considered as a random selection process, with selection probability 50% for each allele and with independent draws at each SNP locus. The absence or presence of LD between the SNPs does not affect the validity of this permutation argument, since the Mendelian residuals are treated here as fixed and the sign of the residual is selected randomly with equal probabilities. Hence, under the null hypothesis of no preferential transmission, the expected value of
and its variance are given by
for any user-specified choice of target alleles at the genetic loci under consideration. Although derived in a different context, this variance estimator is similar to the empirical variance estimator that is used in the pedigree disequilibrium test 
. Here, under the null hypothesis of no preferential transmission of one allele, the standardized genome-wide transmission statistic,
and will have an approximate
with 1 df when the null hypothesis (of no genotyping errors) is true. In our application of the genome-wide transmission statistic
, we will select the minor allele as the target allele for all SNPs. In the presence of genotyping errors across SNPs, the minor allele is expected to be under-transmitted, i.e. more negative Mendelian residuals than just by chance are expected 
. Consequently, by selecting the minor allele as target allele for all SNPs in the specification of
, we obtain a test statistic that will assess genotype error across all SNPs within one proband.
Since the sample size for the genome-wide transmission statistic
is the number of statistically independent SNPs on a particular chip, the proposed test will have a sample size of at least tens of thousands for most commercially available SNP chips. Consequently, for sample sizes, error rates and allele frequencies often observed in practice, the genome-wide transmission statistic
will have sufficient power to detect small to moderate departures from the Mendelian transmission patterns that are caused by genotyping errors, even though
is computed for only one proband. This theoretical property is verified and quantified in subsequent simulation studies.