An important methodological issue in association studies of complex disorders is differentiating between true predisposing etiological allele variants and those that are only in linkage disequilibrium (LD) to those loci [Koeleman et al., 2000
; Cordell and Clayton, 2002
]. Our objective here is to provide methods for selecting a subset of markers in a gene which explain the disease marker association seen in the entire set. There is a complicated relationship between LD and association as described in detail in Nielsen et al. 
; strong LD between markers does not always guarantee redundant association results. However, testing each marker at a time, when there is even one DSL in the region, is likely to result in multiple significant χ2
test statistics due to the LD between the markers. In our example, multiple SNPs in the IL10 gene test as significant for association with lung function. A test is needed that can evaluate the contribution of a marker while allowing for an effect at one or more nearby markers.
Case-control methods are used in genetic studies because they can have higher power than family-based tests, but family-based tests of a main genetic effect can have the advantage of being constructed to be completely robust to population substructure and model-free [Laird and Lange, 2006
]. In family-based designs, testing a genetic effect at one marker, while conditioning on another marker, has been partially considered in the literature. Koeleman et al. 
propose a likelihood-based approach to test for one marker in the presence of another. In a more general framework, Cordell and Clayton 
, Cordell 
, and Cordell et al. 
suggest a model-building approach using the retrospective likelihood of the genotype conditional on the trait and parental haplotype distribution to model multiple loci. This model based approach is completely robust to population substructure, but is limited to dichotomous traits and families with both parents genotyped. Dudbridge 
extends this model-based likelihood approach to missing parents and to quantitative traits using a normal model, but the results can be biased if the normal model does not hold. In principle, the normal model can be extended to using ascertained traits, unlike the approach we will present for quantitative traits. The approach by Dudbridge 
is not completely robust to population stratification when there are missing parents, but in practice it performs well when this assumption is violated. We present here methods that are applicable to arbitrary family structures, are completely robust to population stratification, and do not require distributional assumptions on the traits.
When using a test based on multiple genetic markers, we need to consider the difficulty in reconstructing the parental genotypes. When parents are effectively present, as is required in the test by Cordell and Clayton 
, Cordell 
, and Cordell et al. 
or available through nuisance parameters as in the test by Dudbridge 
, the haplotype density and phase resolution is not very difficult to compute, even for larger numbers of markers. However, once there are missing parents, reconstructing the haplotype density and phase resolution is more difficult, and can be computationally infeasible if there are more than a few markers. Thus it is advantageous whenever possible to avoid reconstructing the haplotype density of all of the markers when parents are missing and testing multiple markers. Our approaches are constructed with this thought in mind.
We first propose a model-free test for any trait that is completely robust to population substructure and phenotypic model misspecification, and allows for arbitrary pedigrees. We then propose separate model-based tests for dichotomous and continuous traits based on a linear model. The advantage of the model-based method for dichotomous traits over the previous methodology is its method of handling missing parents. This advantage is also shared by the method for quantitative traits, which has the additional advantage of being less restrictive on the phenotypic model than previous approaches. The model-based tests are still completely robust to population substructure, but not to phenotypic model misspecification. We assess the robustness of the model-based tests to phenotypic model misspecification via simulation. In our tests, we avoid reconstructing the full haplotype density by instead conditioning on the haplotype density of small subsets of the markers, or, preferably, the univariate densities of each of the markers. This results in more informative families, especially when parents are missing. We utilize the proposed test to analyze a lung function phenotype in the Childhood Asthma Management Program (CAMP) study.