In our evaluation of a population-based cohort for host genetic variation associated with IPD in children, we identified 27 tagSNPs in 11 genes (CD46, SFTPA1, SFTPB, SFTPD, IL1B, IL1R1, IL4, IL10, IL12B, FAS and PTAFR) associated in EA or AA at a liberal significance threshold of p≤0.05. In particular, in EA and AA, variants in the surfactant protein D (SP-D) encoded by SFTPD (gene ID 6441) are consistently underrepresented in IPD and pneumococcal bacteremia cases compared to controls, suggesting that variants in this gene or those in linkage disequilibrium may confer protection from IPD. This is the first study linking SFTPD gene variation specifically with clinical IPD.
SP-D is a member of the collectin subgroup in the C-type lectin superfamily including surfactant protein A (SP-A) and mannose binding protein. SP-D and SP-A are found primarily in the respiratory tract and other mucosal surfaces and recent data suggests that they impact respiratory infections on multiple levels. Surfactant collectins broadly bind carbohydrates and lipids on the surface of bacteria and viruses, with specific binding of SP-D to S. pneumoniae
SP-D deficient (sftpd−/−
knockout) mice are associated with persistent pneumococcal colonization, decreased clearance of bacterial pathogens, and early onset and increased levels of S. pneumoniae
bacteremia in colonized mice.
Overall, collectins exhibit both pro- and anti-inflammatory effects
: SP-D stimulates phagocytosis and scavenging of apoptotic cells with pro-inflammatory consequences.
Yet, SP-D and SP-A bind SIRPα 
, TLR2, and TLR4,
through their globular carbohydrate recognition domain (CRD) to down-regulate inflammatory cytokines; 
knockout mice exhibit high levels of pulmonary inflammation.
These findings have led to speculation that collectins have dual roles: if the collectin collaginase tail is bound in the absence of a pathogen stimulus, an anti-inflammatory response results possibly mitigating damage from incidental environmental stimuli.
However, when pathogen signals are present, pulmonary collectins may provide pro-inflammatory stimuli for pathogen phagocytosis and NF-κB-mediated cytokine release.
Further study is needed to confirm our tagSNP associations and further dissect how protection from IPD by SFTPD
variants reflects regulatory functions of SP-D.
Our analysis also identified variants in other innate immune and coagulation pathway genes (e.g., CRP
) and inflammatory mediators (e.g., IL1R1
) that may be associated with IPD. Since, as an exploratory study, we did not correct for multiple comparisons, definitive interpretation of these findings will require confirmation in larger cohort studies. Nevertheless, our findings support multiple pathways being involved in host response to IPD. Recent studies also suggest that additional genes in the toll-like receptor-signaling pathway (e.g., NFKB
may influence response to IPD. 
Furthermore, the collectin MBL2
had variants overrepresented in pneumococcal bacteremia and meningitis, but not for overall IPD. This suggests the possibility of syndrome-specific host genetic associations, but our study was underpowered to definitively evaluate this.
For this analysis, we took an indirect association approach by selecting and genotyping SNPs that are either causative SNPs or in LD with the causative SNP. The latter situation most likely applies to the majority of SNPs found associated with IPD in this study, as 18 of the 27 (67%) associated SNPs are located in introns. Furthermore, of the four SFTPD
variants associated with IPD in either EA and/or AA (including rs17886286 and rs1998374), all are intronic. Notably, in SeattleSNPs EA, rs17886286 and rs1998374 are in complete (r2
1.00) or high (r2
0.732) LD with rs3088308, a coding non-synonymous SNP, while in SeattleSNPs AA, these SNPs have little to no LD with rs3088308 (r2
0.002 for rs17886286 and r2
0.028 for rs1998374) and are not associated with IPD in AA. The two SFTPD
variants that are associated with IPD in AA are in moderate LD with a different coding non-synonymous SNP, rs4469829, which is monomorphic in EA (r2
0.613 for rs17878441 and r2
0.620 for rs12219080). Furthermore, SFTPD
variant rs721917, a non-synonymous SNP known to reduce serum levels of SP-D in EA, is in LD with rs1998374 in AA (SeattleSNPs r2
0.645) but not EA (SeattleSNPs r2
0.096). Thus, differences in LD patterns between ancestral populations may help to explain the disparate signals observed in EA compared to AA.
Our primary goal was to assess the feasibility of cross-linking surveillance data with an nDBS repository to perform tagSNP genomic studies, and toward this end we were highly successful: 82% of surveillance cases were linked to an nDBS, and 88% of samples successfully genotyped. Several key issues associated with this experience deserve emphasis. First, the completeness of IPD case surveillance in ABCs through use of active surveillance methods and routine audits of laboratory records combined with the overall low incidence of IPD in the general population support our assumption that controls were at low risk of having had IPD outside the surveillance time-period. Second, efficient linking of surveillance cases to nDBS samples was critical to minimize bias, but this linkage depends on consent requirements for nDBS use, which differ by state and continue to evolve.
Third, nearly a quarter of individuals identified as European-American through surveillance were found to have genetic characteristics indicating >10% African ancestry. Given differences in allele frequencies and LD between EA and AA populations, misclassification of ancestry can result in confounding. Although self-reported ancestry can be accurate in some settings,
our study underscores that more complete genetic evaluation for ancestry may be important to control for population stratification in some settings. Finally, the tagSNP approach we used is based on the assumption that most disease-causing variation will be captured through LD with common tagSNPs. Recent concern 
that low frequency host variation contributes substantial disease causation may underscore the need for large-scale gene sequencing and not a tagSNP analysis. Our earlier findings demonstrating that gene sequencing can be performed with high-fidelity using DNA from these nDBS samples 
suggests that application of large-scale sequencing for initial variation discovery could be useful in future replication studies.
Our findings are not corrected for multiple comparisons and therefore should be viewed as preliminary with definitive proof requiring replication in additional cohorts. However, given our study results and the wealth of existing public health surveillance data and existing large repositories of nDBS, replication studies powered to detect associations with IPD and invasive disease caused by other vaccine preventable encapsulated bacteria (e.g., N. meningitidis and H. influenzae) should be feasible and could help further define host genetic risk factors for IPD and other infectious diseases and permit economic and attributable risk analyses to determine the usefulness of such risk factors for implementation of public health prevention interventions.