|Home | About | Journals | Submit | Contact Us | Français|
Genomics is now a core element in the effort to develop a vaccine against HIV-1. Thanks to unprecedented progress in high-throughput genotyping and sequencing, in knowledge about genetic variation in humans, and in evolutionary genomics, it is finally possible to systematically search the genome for common genetic variants that influence the human response to HIV-1. The identification of such variants would help to determine which aspects of the response to the virus are the most promising targets for intervention. However, a key obstacle to progress remains the scarcity of appropriate human cohorts available for genomic research.
Despite repeated exposures, some individuals do not seem to become infected with HIV-1, and among those that do, there is marked variation in how the virus is handled and in the time-course of progression to AIDS. It is known that host genetic differences contribute to this variation, but our knowledge of the relevant host genetic factors is currently limited for two main reasons: first, many studies have suffered from suboptimal study design, which is a common theme in the genetic-association literature1; second, a limited number of host genes have been studied so far, and these studies have focused heavily on candidate genes of acquired immunity and on proteins implicated in molecular studies of HIV-1 cell entry and proliferation2,3 (Table 1).
The candidate-gene approach is based on a priori knowledge of the role or potential role of a gene in HIV pathogenesis. On selection of the candidate gene, the corresponding genomic region can be genotyped at known polymorphic positions, or re-sequenced with the purpose of identifying unknown variants. Association analysis can address the individual contributions of any single nucleotide polymorphism (SNP), or of a series of linked SNPs represented by a haplotype, to a study phenotype (Box 1). Statistical analysis needs to take into account issues such as multiple testing that will lead to an increasing number of false-positive tests as the number of SNPs, alleles, study endpoints, phenotypes or subgroups increase. Selected candidate SNPs are then followed by dedicated genetic, functional and biological testing to establish causality.
The HapMap project (stands for Haplotype Mapping Project) provides an invaluable resource for researchers interested in relating human genetic variation to human health by allowing the selection of polymorphisms that represent other common polymorphisms in the human genome. These polymorphisms are called tagging single-nucleotide polymorphisms (tag SNPs) because they ‘tag’ other polymorphisms, which then do not need to be genotyped (see figure). The aim is to ensure that all the SNPs in the genome are directly represented by at least one SNP that is included on the analysis chip. Increasingly, this is being done on a genome-wide scale using standardized sets of tagging SNPs. This aim has nearly been achieved in current commercially available platforms for whole-genome association studies, allowing interrogation of most common human polymorphisms either directly or indirectly through association with tag-SNP candidates. As shown in panel a of the figure, genotype data (for a gene or a region of interest) are freely available from the HapMap website. The associations among SNPs in the data are then assessed to select tag SNPs. In part b of the figure, SNPs (coloured stars), that associate closely with each other have the same colour. The overall set of SNPs can be represented by a subset of tag SNPs, depicted in the figure as one star of each colour. Part c of the figure shows subsequent genetic association analysis. The tag SNPs are genotyped in a population sample in which individuals vary for the trait of interest, for example, increased susceptibility to HIV-1. A tag SNP that correlates with resistance to infection indicates that one of the known or unknown SNPs with which it is associated (black lollipops) might influence this phenotype — in this case, the non-genotyped SNP might be responsible for the association of the blue and green tag SNPs with resistance to HIV-1, shown by the corresponding high P values. From the set of SNPs that were not genotyped as tags, those in exons 2 and 3 are good candidates (coloured arrows depict a high linkage disequilibrium). Text and figure modified with permission from Nature Ref. 13 (2005) Macmillan Publishers Ltd.
Despite important efforts by various research groups, the candidate-gene approach has yielded limited results so far (Table 1). Only a fraction of the observed variability in the course of HIV-1 infection is explained by current knowledge2,4. The reasons for the limited success of this otherwise intuitively attractive hypothesis-driven approach is manifold, and include a limited knowledge of the attributes of candidate genes, lack of statistical power to detect small effects, incomplete knowledge of the genetic variation of the region, the inherent complexity of establishing the functionality of particular alleles and, until recently, the absence of efficient genotyping technologies. However, an optimist might point out that although only a modest number of genes have been assessed, and incompletely at that, there have nevertheless been unambiguous discoveries of important gene variants. Notable among these are the highly protective CCR5 Δ32 allele and the crucial role of selected human leukocyte antigen (HLA) alleles in disease progression.
The impact of HIV-1 infection on human health, the rapidity of expansion of the global HIV-1 epidemic, and the limitations of the candidate-gene approach argue for an immediate adoption of genome-wide association approaches. Here, genetic analyses assess the whole genome, or large genomic regions, with the goal of identifying genes associated with susceptibility to HIV-1, even in the absence of a priori knowledge of the most important genes. In this Perspective article, we discuss the new opportunities in HIV-1 research that have arisen through the development of high-throughput systems for SNP genotyping using information from the International Haplotype Mapping Project (or HapMap), and the use of comparative genomics to identify regions that are involved in host–pathogen genetic conflicts. Success in HIV-1 host genomic studies will depend on several key steps, notably, the identification of appropriate study phenotypes and the availability of large human cohorts for genetic studies5. The potential to harness genomic information for vaccine design will rest on the nature of the genes and alleles identified from the expected plethora of information emanating from host genomic projects.
The heritable nature of susceptibility to HIV-1 can be observed in the familial segregation of a phenotype of resistance to infection, in studies of infected twins, and at the cellular level6–9. A significant heritable component of a phenotype constitutes the basis for genome analysis. Whole-genome studies can use data from families or from independent individuals; analysis of families can track the transmission of susceptibility alleles through generations. This approach, known as linkage analysis, is widely used in the mapping of monogenic disorders, and uses a few hundred polymorphic markers throughout the genome to identify the chromosomal region involved (for example, in rare familial syndromes of susceptibility to infectious diseases)10. However, family approaches are not feasible in the field of HIV-1 because of the limited familial nature of HIV-1 exposure (except in the setting of mother to child transmission). It is also known that family based approaches have limited power to detect gene variants that influence complex traits11. Therefore, studies of susceptibility loci in HIV-1 infection rely on population-based cohorts, using a denser map of genetic markers (hundreds of thousands) to do association studies between markers (such as SNPs) and the phenotype of interest. Genome approaches can also use data from other species, in particular from primates that show differences in the control of HIV-1 or of other lentiviruses, especially the simian immunodeficiency virus (SIV). These approaches are discussed further below.
Over the past few years there has been a remarkable increase in our knowledge of the variation in the human genome and, today, most common human SNPs are known. In a particularly important effort, the HapMap (Box 1) identified the minimum number of SNPs that are needed to represent common human diversity. For this, the HapMap characterized patterns of association among different gene variants — the patterns of linkage disequilibrium across the genome — to select a minimal set of variants that are sufficient to represent common human variation in the context of an association study. These selected SNPs are referred to as tagging (tag) SNPs12,13 (Box 1).
In parallel with this increased knowledge, genotyping technologies have advanced significantly and now allow sufficient throughput to accommodate genome-wide approaches14. There is now a set of commercial whole-genome arrays using tag SNPs that represent variation in the human genome in non-African populations15,16. This means that the association of common variation with HIV-1 susceptibility can now in principle be satisfactorily captured by using sample sizes that are large enough to detect the effect of gene variants of modest impact. But key constraints remain. For example, it is known that rare variants will not be well represented in tagging strategies17. In addition, and of particular concern in the HIV-1 host genetics field, the current platforms for whole-genome analysis are known to represent variation in Africa inadequately. This asymmetry is both unfair and a constraint in HIV-1 genomics research and will need to be addressed directly as a priority. Not only is HIV-1 infection extremely prevalent in many parts of Africa, it is probable that there might be gene variants in Africa relevant to HIV-1 that have allele frequencies that are different from other populations. This is because of general patterns of population differentiation and possibly also because of recent selection16. This means that some genetic variants might be easier to find in some African populations in comparison with non-African populations and vice versa.
In Fig. 1, the design of a whole-genome association analysis of phenotypes with susceptibility to HIV-1 is shown, using a tagging strategy based on HapMap data to represent variation in the human genome. The design must take account of the known problems in the field. At the stage of study design, these problems include the undefined influence of the environment, and the difficulty of controlling for stratification, in particular linked to population or ethnic substructuring of sample sets, which is often also confounded by socioeconomic factors that influence infection and host response18. In addition, these studies present difficulties in defining and achieving adequate power and in establishing the computational and statistical analytical routines that are necessary to distinguish true- and false-positive discoveries and, more generally, to prioritize candidate genes to be investigated using genetic and functional approaches.
The availability of commercial products for whole-genome association analyses means that the use of the HapMap and other resources for genetic-association analyses has shifted from the upstream tasks of selecting what SNPs to genotype to the downstream tasks of interpreting the observed genotype–phenotype associations, and prioritizing the associations on the basis of their likelihood of being real rather than chance associations. However, this transition has occurred more rapidly than most researchers seem to have expected, and there are few tools available that allow ready interpretation of the results of whole-genome association studies. Considerable development is now needed to establish a framework for post-association annotation that will allow convenient high-throughput assessment of all those SNPs that show suggestive association with target phenotypes. Because real causal variants are likely to often show associations of a magnitude similar to those that show association purely by chance (because of the many polymorphisms that have been assessed) the degree of association for each of the SNPs must be assessed relative to features such as proximity to exons, location of conserved stretches of sequence and genomic regions with predicted regulatory function. The HapMap project will be a key tool in interpreting association results by allowing researchers to assess which polymorphisms are associated with the study phenotype (Box 1).
Ge and Goldstein have recently developed one tool for this purpose (see the Duke Institute for Genome Sciences and Policy in Further information), that uses available databases to draw the predicted gene structure, including sites of alternative splicing, and puts all the SNPs and their associated significance values onto this gene structure. Patterns of linkage disequilibrium throughout the gene are also indicated using HapMap data. These gene diagrams can then be used to identify the associations that seem most likely to be biologically relevant and can also be used to determine which genomic regions need to be resequenced to identify the causal polymorphisms that might be responsible for the association.
Just as humans differ in how well they handle HIV-1, non-human primate species differ in their handling of related retroviruses19. These differences include: resistance to infection (for example, differences in the susceptibility of primate cells to various lentiviruses)20,21; the satisfactory control of viral replication when infected (for example, in most experimentally infected chimpanzees, HIV-1 replicates poorly)22; and the occurrence of infections that are characterized by high-level replication without the hallmarks of disease progression (for example, in sooty mangabey and African green monkeys)23,24. One key observation is that naturally infected African primates do not develop immunodeficiency and have lower levels of T-cell immune activation and activation-induced cell death than HIV-1-infected humans. Recently, Schindler et al. reported that differences in the accessory viral protein Nef among primate lentiviruses (HIV compared with SIV) could explain the patterns of immune activation25. Nef alleles of HIV-1 fail to downregulate the T-cell receptor CD3 from infected cells thereby maintaining the responsiveness of infected T cells to activation25, in contrast to SIVs that downregulate CD3. However, there are several apparent exceptions to this view, and resolution of the role of Nef in immune activation awaits further investigation26.
Just as comparing the genomes of the various lentiviruses might explain important differences in immune pathogenesis between humans and non-human primates, exploring differences in the host genome can provide crucial insights into defence against retroviruses. Evolutionary genomic approaches have been proposed as powerful tools to identify regions in host proteins that are relevant for host–pathogen interactions27. These methods identify signs of positive selection or negative selection in the genome. A recent report from Voight et al. presents a map of recent positive selection in the human genome that shows the tremendous shifts experienced by modern human populations in habitats, food sources and population densities28. These signatures of selection are likely to be valuable signposts for gene variants that might influence medically relevant traits, including the susceptibility to infectious diseases, clearly one of the key selection pressures in human history. The availability of the complete chimpanzee and Rhesus macaque monkey genomes, and the ease of cross-species sequence amplification owing to the high degree of sequence conservation among humans and non-human primates allows comparative host genome analyses.
The first applications of evolutionary genomics to the field of HIV-1 have shown remarkable success. Comparative analysis of the primate antiretroviral cellular defence genes that encode APOBEC3G and TRIM5α (discussed below) has revealed the powerful selective pressures that have emerged from a long-standing battle between retroviruses and their hosts29–31. These proteins belong to a newly described form of innate immunity, coined ‘intrinsic immunity’, that assures protection by providing an ‘always-on’ line of defence, generally through intracellular obstacles to the replication of pathogens32. This component of the immune system is a cornerstone of the resistance of mammals against several classes of retroelements and retroviruses32.
Primate APOBEC3B, 3C, 3F and 3G have antiretroviral activity associated with the hypermutation of viral DNA through cytidine deamination (for recent reviews see Refs 21,33). However, the best studied member in humans, APOBEC3G, fails to restrict HIV-1 owing to degradation induced by the HIV-1 accessory protein virion infectivity factor (Vif)34. By contrast, several primate APOBEC3G proteins show activity against HIV-1 (Refs 21,35,36). Analysis of APOBEC3G across primate species reveals many residues in the amino-terminal cytidine deaminase domain that are under positive selection, which coincides with the proposed region of interaction with Vif (Fig. 2). Analyses have also shown that amino-acid residue 128 is under positive selection, which fits with evidence that this amino acid discriminates among the various HIV and SIV Vif proteins35,36.
The tripartite motif (TRIM) family is a large family of proteins that are characterized by a structure comprising a RING domain, one or two B-box domains and a predicted coiled-coil region37. In addition, TRIM proteins have specialized carboxy-terminal domains38,39. Some TRIM proteins display antiviral properties that target retroviruses in particular20. The best studied antiviral TRIM protein, TRIM5α, is a retroviral restriction factor that targets the early steps of cellular infection20; TRIM5α specifically recognizes the viral capsid and promotes its premature disassembly40. Human TRIM5α has limited efficacy against HIV-1, whereas some primate TRIM5α proteins can potently restrict HIV-1 (for reviews see Refs 38,39). Analysis of the TRIM5α protein across primate species pin-points a patch of amino acids that is under positive selective pressure at variable 1 region (v1) and variable 2 region (v2) (Fig. 2). Chimeras of human TRIM5α that carry the v1 patch from other primates, or selected mutants in the variable regions, can restrict HIV-1 (Refs 41–44). The variable regions of TRIM5α might have evolved independently to recognize various retroviruses44; however, in the absence of a crystal structure of the molecule, the exact molecular mechanisms that underlie the evolutionary changes remain undefined.
Comparative and evolutionary genomic methods have been applied to the understanding of other components of innate immunity. These include the analysis of the vertebrate Toll-like receptor proteins (an example of evolutionary conservation at multiple levels)45, the DC-SIGN (CD209) family of C-type lectins46,47 and the KIR genes that encode the main functional receptors of natural killer (NK) cells in humans, in whom the evolutionary forces driving the genesis of NK receptors and their HLA ligands represent a concerted response to pathogens48. Signals of evolutionary selection have also been searched for in 168 genes that are related to immune function49.
One should expect evolutionary analysis to become routine in the initial assessment of proteins that are involved in the pathogenesis of HIV-1 and SIV infection and of genes identified through whole-genome association analysis. The identification of signs of positive selection and of patches of genetic conflict should be a criterion for biological analysis. Therefore, there is great interest in promoting the sequencing of whole genomes from many more primates. Importantly, this effort should include our most distant relatives, the new world monkeys (such as tamarins), and prosimians (such as lemurs), to provide the needed out group and reference to estimate the evolutionary history of the proteins. However, as indicated in a recent review50, the flood of data and analytical methods in evolutionary and comparative genomics raise many new challenges, in particular the inherent multiple comparisons problem in searching the entire genome for specific regions that show evidence of selection. This creates a large ‘opportunity space’ for finding regions that show unusual patterns of variation in populations, or divergence among species, and the appropriate methodologies to correct for the number of possible tests that could be constructed are not obvious. Therefore, there is a need for the clear demonstration of the usefulness of a series of new dedicated statistical tests, more rigorous demonstration of evidence for natural selection and the inclusion of functional evidence for candidate loci50.
Most genetic studies so far have focused on endpoints, such as ‘time to AIDS or death’, that represent complex phenotypes resulting from many potential influences. Studies have also compared infected individuals with individuals that remain HIV-1 free despite repeated exposure; however, there is often limited accounting for the degree of exposure51. Least satisfactory, many studies have compared the frequencies of certain alleles in the infected population with the frequency in the general population, which is an insensitive measure of the enrichment or depletion of a genetic variant that is putatively associated with susceptibility to infection. Modest genetic influences on a disease will be hard to trace if masked by the influences of environment or concurrent disease. Therefore, there is a pressing need for study phenotypes that can be measured precisely, that are least influenced by external factors and that best reflect the basic mechanisms of pathogenesis.
The most useful insights will come from studies that focus on specific aspects of the response to HIV-1 that are sufficiently precise and narrow to guide our understanding of how the immune system controls or fails to control the virus. The overall pattern of viral replication following infection is one of an initial rapid increase to a peak level followed by the establishment of a viral load set point (at which the level of viraemia remains constant), which persists for an extended period, often many years (Fig. 3). Little is known about what determines peak viral load or exactly how it relates to disease progression in humans. In the simian HIV-1 infection model, there is a strong correlation between peak viral load and the extent of CD4+ T-cell depletion during acute infection52. Up to 80% of CD4+ T cells are infected at peak viraemia and the proportion of CD4+ T cells that are destroyed is correlated with the peak viral load52. The simple relationship between viral load and CD4+ T-cell depletion could allow prediction of the level of viral control that is required to prevent early immune damage. However, it is difficult to capture the unique window of peak viraemia in human cohorts as it requires a substantial effort in HIV-1 surveillance53.
Similarly, the level of viraemia at set point is a determinant of the natural history of SIV infection54 and of the long-term prognosis of HIV-1 infection in humans55. The viral load set point is a particularly attractive target as a study phenotype. First, it is known that individuals vary by several orders of magnitude in the amount of virus per ml of blood at set point8, as illustrated in Fig. 3. For instance, there are rare individuals whose set point is at a level of virus that is essentially not detectable56. Second, the viral load set point is a characteristic of the individual. So far, there have been no large-scale genomic studies to determine the source of this variation. The identification of gene variants that are associated with the variation in viral load set point could implicate particular aspects of immune control. Peak and set-point viraemia are also of direct interest because they predict the degree of infectiousness of the individual57.
The fact that the viral load set point is established so early in the course of HIV-1 infection and that it seems to be particularly resistant to external variants over prolonged periods suggests that the environment might be only a small contributor. Therefore, the viral load set point seems to be determined mainly by two variables: influences from the host genome and the viral strain genome. Of course it cannot be ruled out that early environmental factors in an individual's life affect his or her immune system in such a way as to influence the set point that will be established on infection.
Another phenotype of significant interest for genetic studies is the status of exposed non-infected individuals. Indeed, the first identified gene variant that influenced susceptibility to HIV-1, CCR5 Δ32 was discovered using this framework. Several studies have examined heterosexual couples that are discordant for HIV-1 serostatus, female sex workers and men having sex with men who are highly exposed to HIV-1. The mechanisms identified or invoked to modulate susceptibility to infection in the various studies emphasize the relevance of differences in acquired immunity through the role of protective cytotoxic T-cell responses and NK activity in the context of specific HLA class I alleles, as well as differences in humoral responses at mucosal surfaces58,59. A large study is needed to compare uninfected individuals with known and quantifiable exposure to HIV-1 with a large cohort of infected individuals that are carefully matched demographically.
The importance of reliable study phenotypes is intimately linked to the creation of appropriate cohorts for genetic analysis. Several recommendations for optimal design of genetic-association studies in clinical trials and cohorts are presented elsewhere3,60–62. Examples of initiatives that have integrated genetic data for future clinical research are: the Adult AIDS Clinical Trials Group (AACTG) Protocol A5128 (Ref. 63) and the National Institute of Allergy and Infectious Diseases (NIAID)-sponsored GENOMICS protocol. Both protocols establish the conditions for storing DNA for studies that were not planned when informed consent was provided, and for future analyses.
The genome of HIV-1 co-evolves with that of the host. This is more pronounced for HIV-1 than for many other human pathogens because of the recognized capacity of retroviruses to mutate and thereby escape from the immune response and to adapt to the host environment. The need for continuous evolution for both the host and the pathogen is illustrated by the Red Queen principle.
At a population level, this host–viral interaction can be detected in several ways. For example, major histocompatibility complex (MHC)-restricted immune responses might shape viral genetic diversity over time because the immune selective pressure forces the emergence of viruses with escape mutations that result in infected cells that are no longer recognized by cytotoxic T lymphocytes64–67. However, certain host genotypes (for example, HLA-B*57) have reproducible associations with successful control of HIV-1 viral load68, implying that these genotypes exert selection pressure on diverse viral populations that are difficult to evade. The genome of HIV-1 can also be investigated using evolutionary genomics tools to identify signals of recent positive selection69,70. In the future, genomic approaches might need to simultaneously address the genome of both the host and the pathogen.
Although here we focus on the host genome, it is important to consider the role of the virus strain diversity and fitness as an equally important factor in HIV-1 susceptibility and pathogenesis71,72. The hypervariable nature of retroviruses and the simultaneous presence of quasispecies in any given patient make the goal of controlling for viral diversity a challenging one. Biological validation of newly identified host factors will, as a first step, require the use of a limited set of laboratory or clinical isolates and, increasingly, of data derived from studies using SIV.
Genomics provides many new possibilities for vaccine research. First, previous unsuccessful vaccine trials can be investigated to identify genetic variants that influenced the magnitude of an evoked immune response and thereby highlight reasons for their overall failure. For example, the failed VAXGEN trial included 5,403 HIV-negative volunteers in a randomized, placebo-controlled trial of a recombinant glycoprotein 120 vaccine73. There were 368 subjects who acquired infection after vaccination; 6.7% in the vaccine arm of the trail and 7.0% in the placebo arm The course of HIV-1 infection was comparable between the two groups. However, the VAXGEN trial reported that titres of neutralizing antibodies in vaccine recipients varied considerably among participants, therefore constituting a potential target for genome analysis74. So, host genetic analyses allow revisiting of past vaccine trials for analysis of a new order of biological endpoints and basic questions. Whole-genome analyses can provide a unique description of how host genetic variation influences the early stages of HIV-1 infection, the exposed and uninfected state, the generation of anti-HIV-1 neutralizing antibodies and the breadth of cytotoxic T-lymphocyte responses. In addition, comparative and evolutionary genomics can complement the analysis by pointing to species-specific aspects of disease susceptibility, in particular through the standardized analysis of the mechanisms of innate immunity and intrinsic cellular defence.
Some of the challenges and opportunities discussed above are being pursued as part of the recently funded Center for HIV/AIDS Vaccine Immunology (CHAVI) (Box 2). The host genetics team at CHAVI is attempting a detailed genetic investigation of the control of the earlier phases of HIV infection, focusing both on viral load set point and, more ambitiously, on the control of viral build-up during the acute phase of infection. A key feature of the CHAVI programme is to use a common genetic platform across multiple cohorts allowing both replication of associations and a careful assessment of how gene variants function at different points during infection and subsequent viral dynamics. In addition to the CHAVI genomics project, another initiative, the HIV Elite Controller Study, will apply genomic techniques to the investigation of people infected with HIV-1 who have been able to maintain viral loads at or below the limits of detection. This collaborative study will address the key viral, host genetic and immunological contributions to this extraordinary outcome of infection. However, data analysis remains a significant problem: developing analytical routines that are able to deal with the unprecedented quantity of genomic information will pose a considerable challenge to the statistical genetics community which, until recently, has been only modestly involved in HIV-1 host genetics studies75.
The Center for HIV-AIDS Vaccine Immunology (CHAVI; see Further information) is a significant component of the Global HIV Vaccine Enterprise77. Based at Duke University in Durham, USA, it includes investigators from institutions across the globe. CHAVI has included genomics as a core project in the quest for a vaccine against HIV-1. The genome initiative includes the establishment of a series of cohorts with appropriate phenotypes. The first target phenotype is the viral load at set point in individuals with a known date of seroconversion. There is the potential to study between 1,000 and 2,000 qualifying patients across different cohorts. CHAVI will progressively focus on the study of at least 2,000 exposed individuals that will include infected and non-infected individuals from several clinical sites in Africa. This study aims to identify genetic determinants of protection from infection. Genotyping will be done using chips designed explicitly for whole-genome association studies. These chips allow genotyping of polymorphisms that represent common variation in the populations studied in the HapMap project: 550,000 single nucleotide polymorphisms (SNPs) for the study of subjects of European ancestry, and approximately 650,000 SNPs in subjects of African ancestry to reflect the lower level of linkage disequilibrium in Africa.
It is both a scientific and a social priority to apply modern and powerful genomic analyses to the study of HIV-1 infection and to aid the understanding of other important human pathogens, such as malaria and tuberculosis. The population geneticist Andrew G. Clark compared these new ‘discovery sciences’ to the voyage of the HMS Beagle “…setting sail to who knows where, amassing genome data on our hard drives and pawing through it to discover things that have not been seen before.”76
Paradoxically, as technical (large-scale genotyping) and analytical issues (genetic statistics) are progressively solved, the main challenge is posed by the quality of cohorts. Despite 2006 being the 25th anniversary of the first reported case of AIDS, appropriate cohorts remain surprisingly undeveloped — there are no large acute-infection cohorts and only moderately sized seroconversion cohorts (limiting replication of data), and an overall lack of preparedness for genetic work (ethical and legal clearance, and appropriate informed consent for genetic studies). We should emphasize that it is not technology, but cohorts, that constitute the key limiting factor today. We believe that the solution to this problem lies in multinational collaborations to establish and pool cohorts to synergize with efforts such as the Global HIV/AIDS enterprise that is being spearheaded by the NIH and the Bill and Melinda Gates Foundation.
We thank B. Haynes, K. Shianna, S. Antonarakis and J. Beckmann for helpful comments, M. Ortiz for assistance with Fig. 2, and B. Ledergerber (Swiss HIV Cohort) for data for Fig. 3. Funding for our work is provided by the Swiss National Science Foundation and by the Center for HIV-AIDS Vaccine Immunology and the National Institute of Health.
Competing interests statement
The authors declare no competing financial interests.
Amalio Telenti, Amalio Telenti is at the Institute of Microbiology, University Hospital, University of Lausanne, 1011 Lausanne, Switzerland.
David B. Goldstein, David B. Goldstein is at the Institute for Genome Sciences and Policy, Center for Population Genomics and Pharmacogenetics, and the Center for HIV/AIDS Vaccine Immunology, Duke University, Durham, North Carolina 27708, USA.