All our samples were drawn from a prospective study in Kilifi District Hospital and were genotyped for 134 microsatellite markers using the ABI LD20 marker set (Applied Biosystems, USA) [see Additional file 1
]. Cases (n = 148) comprised a consecutive series of children aged <13 years who died of invasive bacterial disease, (variously, bacteraemia, bacterial meningitis or neonatal sepsis, for details see methods), a major contributor to childhood mortality in the developing world [33
]. Controls comprised 137 randomly selected healthy children matched on age to the cases. Microsatellite traces were scrutinised carefully to ensure homozygotes were identified with high accuracy.
For the study of HFCs a number of measures of heterozygosity have been proposed that offer potential benefits over straight heterozygosity, weighting scores variously by allele size (mean d2
], allele frequencies (internal relatedness, IR) [6
] and the variability of loci scored (HL) [34
]. However, in automated high throughput studies, heterozygosity assessment can sometimes be problematic, particularly where time for scrutiny of every trace is limited. Thus, null (non-amplifying) alleles, allele drop-out and, at some loci, high levels of stutter-bands can all contribute to a tendency for a minority of loci to carry misleading genotypes where heterozygotes are called as homozygotes or vice versa. Issues have also been identified with allele binning, in some cases causing single alleles to be split between two length classes [35
]. In an attempt to circumvent these problems we spent most empirical effort ensuring that heterozygotes and homozygotes were accurately scored and introduce a variant of the measure Standardised Heterozygosity (SH) [2
], designed to be highly conservative. SH controls for missing data by expressing heterozygosity as the ratio of the observed heterozygosity in an individual relative to the expected value at the markers genotyped, assuming Hardy-Weinberg equilibrium. Our measure, Standardised Observed Homozygosity (SOH), follows the same principle but instead of calculating the expected homozygosity from the allele frequencies, we used the observed homozygosity at each locus. In this way, SOH measures the extent to which any given individual is more or less homozygous relative to the level expected if all genotypes were randomized among individuals, negating the requirement for accurate allele frequency estimates and reducing the impact of allele drop out, null alleles and other possible artefacts.
We first asked whether SOH varied significantly among disease categories by conducting a one-way ANOVA. Raw SOH values exhibit a slightly skewed distribution, but this is removed by a simple log transformation (Shapiro-Wilk normality test, W = 0.9941, p = 0.322). Following transformation, SOH revealed highly significant variation among disease classes (F [5,281] = 6.75, P = 5.89 × 10-6) (Figure ). However, when the control class was excluded, the ANOVA was no longer significant (F [4,144] = 0.785, P = 0.54), indicating that the main effect is driven by a difference in heterozygosity between cases and controls rather than between disease classes. The direction of the deviation is toward greater homozygosity in cases compared with the controls.
Figure 1 Analysis of variance of standardized observed homozygosity values for cases and controls. SOH is the Standardised Observed Homozygosity for an individual genotyped for i loci, calculated as: where Nhom is the number of homozygote genotypes in the individual (more ...)
We next asked whether there was evidence of local effects due to chance linkage between one or more markers and a gene(s) experiencing balancing selection. To test this proposition we calculated age-adjusted odds ratios of mortality at each locus in turn (Figure ). Most markers show either a non-significant or borderline (at alpha = 0.05) association between homozygosity and risk of mortality. However, nine markers reveal a strong associations with experiment wide significance using full Bonferroni correction (p < 0.00037, see Table ). This is a highly conservative threshold since where multiple markers are expected, the less stringent false discovery rate approach can be justified [36
]. Although the spacing between markers is sufficient to ensure they behave as if unlinked, it is possible that multiple markers contribute to the same risk through linkage to related genes. Consequently, we then constructed a multivariable logistic regression model with mortality as the response and age, sex, locality, SOH and homozygosity at each of the nine largest-effect markers as explanatory variables. Sequential removal of terms that did not contribute significantly (likelihood ratio test, p < 0.05) yielded a final model containing age, location and five markers (D12S310, D13S158, D14S275, D16S3103, D16S423). SOH is dropped as a marginally significant term (LHR, p~0.07) whether fitted as a continuous variable or as a factor with five levels. By implication, it seems that genome-wide effects (inbreeding depression) are minimal or absent.
Figure 2 Odds ratios and 95% confidence intervals for mortality and homozygosity by marker, adjusting for age. Age-adjusted odds ratios (with 95% confidence intervals) of mortality at each locus. All markers were tested for significance using a chi-squared test (more ...)
Nine microsatellites showing the strongest association between heterozygosity and mortality due to invasive bacterial disease.
Beginning with the final model derived above, we explored further possible interactions between markers, and also between each marker and age. Among all possible pair-wise combinations of markers, two revealed significant interactions, both of which were retained in the model regardless of the order in which they were added (Table , last two columns). No significant interactions with age were detected. Our data contain approximately equal numbers of individuals who died from Gram positive (n = 63), gram negative (n = 79) or both (n = 6) infections and this allowed us to ask whether our markers identify genes that impact differently on diseases caused by bacteria of different classes. We therefore repeated the logistic regression approach above on each bacterial class separately, including dual infections in both analyses. Given the smaller datasets, criteria for initial inclusion in the model were relaxed to an initial OR significant at p < 0.005. The final models are summarized in Table and reveal surprising complexity, with susceptibility to gram positive and gram negative infections associated mostly with non-overlapping genomic locations. Only marker D12S310 is significant in all three models. Marker D9S164 reveals an interaction with age, infants being more likely to die if homozygous (OR = 1.65) and older children less (OR = 0.18). Interactions between marker pairs in the whole dataset suggest that homozygosity at both markers together confers no greater risk than homozygosity at either one alone. However, in the gram negative model the interaction of homozygosity at two markers, D7S486 and D16S423, indicates a significantly synergistic risk of mortality (odds ratio 40.7) where homozygosity at either of the markers alone confers no risk.
Table 2 Age- and geographic location-adjusted odds ratios for invasive bacterial death with homozygosity at specific microsatellite markers in multivariable models restricted to cases of Gram positive sepsis, gram negative sepsis or including all invasive bacterial (more ...)
Finally, to assess the magnitude of homozygosity effects with respect to the population, we calculated the population attributable risk fraction (PARF) [37
] for each marker (Table ). PARFs indicate the proportional reduction in mortality due to bacterial infection that would result if homozygosity at the locus could be eliminated. However, such direct interpretation in our case is problematic for many reasons, including the fact that the risk is probably driven by specific alleles whose prevalence differs greatly from that of homozygotes in general. None the less, the ORs of the full model indicate that the risks we describe are sizeable, particularly since the markers provide only indirect measures of homozygosity at the genes themselves. Furthermore, given that the population prevalence of homozygosity at the relevant markers is high, the population-wide effects of homozygosity are likely to impact very considerable on the total burden of invasive bacterial disease mortality, with the majority of deaths being genetically determined.
Population attributable risk fractions (PARF) for homozygosity at five microsatellite markers in a final multivariable model of bacterial diseases death.