The allele frequency net website is divided into four main sections: HLA, KIR, MIC and cytokine frequencies. Each section consists of different querying tools depending on the availability of data in each polymorphic region, i.e. haplotype, genotype or allele frequency, breakdowns to summarize the existing data in each polymorphism and other online tools such as searches for rare HLA alleles and the frequency of particular amino acids within a given position in a population (). Searches in each section have been designed with a set of instructions on how to perform the query. The AFND is regularly updated to include new user submissions and information from relevant peer-reviewed publications. Additionally, due to the constant increase in the number of alleles identified by molecular methods, the database is periodically updated according to the official nomenclature from latest releases available on the IMGT/HLA and IPD-KIR databases (12
). At present, alleles on the website have been updated containing the most recent nomenclature guidelines for allele designations (14
Population data sets
The collection of populations available on the AFND consists of 1133 population samples from 608
813 healthy unrelated individuals. The compilation comprises more than 100
000 records at allele, haplotype and genotype level within which HLA comprises 90% of the allele frequency entries (). These populations are divided in 786 HLA populations, 181 KIR populations, 110 Cytokines populations and 56 MIC populations. Populations available on the database are mainly derived from peer-reviewed publications or from direct submissions to the website from individual laboratories. Our aim is to capture all previously published studies (between 1990 to present) and we believe that the vast majority of published data sets have been included in AFND. To date, this includes publications from more than 65 journals (a complete list of data sets and journals can be consulted via http://www.allelefrequencies.net/datasets.asp
). Based on the interest of the user, the data may be searched according to the source (i.e. published | sent direct; anthropology studies, etc.). The bibliographic reference for each study is provided so that a user may verify what type of analysis the author has used for calculating frequencies. Frequency data submitted directly by an individual has typically been obtained by direct counting. When data has been published or sent directly to the AFND in which the author has not been able to differentiate some alleles (i.e. ambiguous data), a note in the publication details is used to describe how the frequency data was entered for one of the ambiguous alleles. For example, ‘unable to differentiate alleles that are identical over exons 2 and 3 (Class I) or identical over exon 2 (Class II)’, and where frequencies are reported for the first allele only.
Frequency data sets by polymorphic region at AFND
Submission of data
One of the most important objectives in the design of the website was to provide users with an online submission form to incorporate their own studies through the website and which ensures consistency of demographic data. To do this, individuals are assisted during the submission process with drop down boxes to provide basic information related to demographic data such as a descriptive name of the population (country name, geographic region and ethnic origin), sample size, polymorphic region, latitude and longitude coordinates (if known), family background, methods used in typing and references in literature, if the study has been published. If a publication uses an ethnicity code that is not included in the drop down box, the ethnicity given by the submitter is added to the current list As such, a list of ethnicity codes is maintained in AFND to standardize reporting, although as yet we do not map these to any wider community controlled vocabulary (see ‘Discussion’ section). Individuals are requested to input the corresponding frequencies through an online web form or by providing a pre-formatted spreadsheet containing frequency results. User submissions then undergo a data validation procedure, performed by a group of curators of the AFND. Some of the validations include the selection of an appropriate name of the population to best describe the origin, i.e. Name of country followed by region and ethnicity if known. If a population submitted by a second group of individuals is geographically and ethnically similar to an existing population on the database, a consecutive number is assigned to that population to differentiate both data sets and to allow them to be compared (e.g. China Guangzhou Han, China Guangzhou Han pop 2). Therefore, the system has been designed to validate duplications. Other controls performed include verifying that the correct and current nomenclature has been used for an allele and, if not, the allele name is updated. The database contains the current definition of alleles, thus, data entered directly to the website will contain correct allele names. If necessary, the author of the data is contacted with any query or any change made by curators. For frequency data, values are added and for any summation greater or less than 1 the author is contacted. If there are frequencies that are >1 which cannot be explained, the submission is rejected. Frequencies which sum <1 are kept in the database. Unfortunately, on many occasions, data that is published is not always correct and editors of journals concerned are contacted to discuss these issues. Whilst the AFND cannot assess the typing accuracy of data provided, >90% of the data on the website has been peer-reviewed and published. Thus, the AFND relies on the accuracy of data being verified by the reviewers of the journals and acts mainly as a source for compiling data. It is our intention in the future to collect the raw data in order that we can be more proactive in assessing the quality of data.
Allele frequency searches
The most commonly used tool within AFND is the allele frequency search (AFS), with which users can examine the frequency of a particular allele in the existing population data sets, by filtering results with a set of criteria. The AFS is available for all polymorphisms on the website. To perform the search, users usually start with the selection of a locus and a particular allele to identify which populations are more likely to present the allele. To extend the searching criteria, users can select one, several or all populations, a set or range of alleles, country, geographic region, ethnicity and/or the year in which data was submitted (). In HLA, MIC and KIR polymorphisms, alleles can be typed at different levels of resolution (i.e. allele group, specific HLA protein, synonymous allele with a substitution within the coding region and differences in a non-coding region in that order, e.g. HLA-A*01:01:01:01) (14
). The official nomenclature available on the IMGT/HLA and IPD-KIR databases describes alleles only at the highest resolution. To ensure that high resolution data can be retrieved when a low level resolution allele is selected, the search uses parsing methods to display all information that may be relevant to the user. For instance, a search for the HLA-A*02:01 allele will also display incidences of alleles at high resolution that start with HLA-A*02:01. Additionally, users are able to optimize their queries to further refine data sets by selecting populations with a sample size from a range of values and/or a specific level of resolution. Populations from recent years are more likely to contain alleles with a high resolution level and thus, more accurate data. Furthermore, recent additions include filters to search information on a specific source of data set and type of study, for example populations available in the literature oriented to anthropology studies. Results displayed in the search include the allele name, name of the population, allele and/or phenotype frequency and the sample size of the population to estimate the number of individuals who carry the allele. By clicking on the ‘Population Name’ hyperlink users can access demographic details of the population in which the allele is present. The list of output records can be sorted by allele or population and the corresponding frequency from highest to lowest value. Also, haplotype associations and graphical distribution overlaid on world maps are some of the recent options added for each record.
Figure 1. Screen shot of the HLA AFS. The figure shows an example of a search of the HLA-A*02:01 allele sorted by highest to lowest frequencies. Other data provided includes a link to the IMGT/HLA database for sequence information of the allele, link to frequency (more ...)
Haplotype frequency search
Following a similar multiple filter scheme, the AFND repository also includes a tool for querying haplotype frequencies from 7426 HLA haplotypes and 244 MICA-HLA-B association records from 147
325 individuals. At present, the collection of haplotypes consists of 344 globally distributed populations in 79 countries. The program permits the user to customize a frequency search by inputting an allele for one or more loci and search for associated haplotypes. Results can be filtered by a particular population, country, source of data, geographic region, ethnicity of the individual and number of loci tested for the haplotype. The haplotypic information can be more useful than information only on the allele, especially in clinical applications. Therefore, this search can be used as a complement of haplotype searches performed in bone marrow and solid organ transplant registries in which, on some occasions, the information about the ethnicity of the individual is unknown. Haplotypes can also be searched at lower or higher resolution and from two to eight routinely typed HLA loci (HLA-A, -B, -C, -DRB1, -DPA1, -DPB1, -DQA1 and -DQB1).
Genotype frequency search
One of the most recent developments included in the website has been the compilation of an inventory of KIR genotype profiles published in the literature. This section comprehends the most extensive archive of KIR genotypes and their corresponding frequencies in worldwide populations. Presently, the genotype data encompasses 2398 records of which 368 distinct KIR genotype profiles have been identified. The KIR genotype composition consists of 16 genes, which may be present or absent in a specific genotype. In the system, users can search for a particular genotype and examine its corresponding frequency from a list of 102 KIR populations available with genotype data. The genotype search provides different approaches to find the incidence of a specific profile. A list of all genotypes and the number of populations and individuals in which the profile has been found appears on the main screen. Users are provided with a range of options including the selection of one or specific populations and one or many genes that constitute the genotype. The information displayed after performing the search comprises the genotype, the id of the genotype, which is automatically assigned by the AFND as a consecutive number, the haplotypes AA, Bx (where x can be A or B) which constitutes the genotype and the genotype frequency of populations considered on the selection (). If the genotype is not found from the initial search, users are provided with a list of the closest genotypes differing by one gene.
Figure 2. Screen shot of the KIR genotype frequency search. The figure displays a view of the first 10 genotypes found in three populations (China Eastern Mainland Han, Ghana and Iran) sorted by the number of individuals on which the genotype has been reported (more ...)
Other online tools
Amino acid analysis
The website provides a range of tools for other analyses, including the comparison of the existing populations at amino acid level for HLA populations. One of the approaches commonly used in disease association studies is to compare frequencies of alleles between patient and control groups. We have thus developed a tool that allows users to investigate potential molecular mechanisms, by analyzing the main differences in frequencies for a specific position of the allele at the amino acid level. A summary of frequencies for each differing amino acid is presented, allowing users to compare incidences that may be implicated in the association. In the system, users can enter their own data in a tab-separated text file or select an existing population in the database. Populations and data sets provided by users must be typed at protein level (e.g. A*01:01) to be able to perform the analysis.
Following the continuation of a project of the 15th International Histocompatibility Workshop (IHWS) related to the rarity of specific HLA alleles, a utility has been built to allow users to search for a particular allele and display the number of confirmations submitted by different data sources [AFND, IMGT/HLA, national marrow donor program (NMDP) in the US and individual laboratories] (16
). A default mechanism uses criteria to classify the rarity of the alleles. However, the tool also allows individuals to decide whether an allele is considered to be rare by selecting their own criteria. In this search, users are invited to confirm an allele, which has been seen in their laboratories by providing basic information concerning the rare allele.
An important feature of the portal is the availability of bidirectional links to different databases for data sharing and referencing. For example, in the AFS a complete list of all populations possessing the A*02:01 allele can be accessed using the http://www.allelefrequencies.net/hla6006a.asp?hla_selection=A*02:01
link, which could thus be implemented in other resources to link into the AFND. A complete list of reference links can be consulted on the ‘External access’ section of the website (http://www.allelefrequencies.net/extaccess.asp
). The database maintains an active collaboration with other databases such as IMGT/HLA, IPD-KIR and NMDP for the update of nomenclature factors and confirmations of rare HLA alleles, respectively.
The site provides the option to export data to different format files including XML, tab-separated and comma-separated text files for the ‘HLA Rare Allele’ section, allowing users to integrate the information available in AFND with alternative bioinformatic packages. At present, users can print results from all searches using the printer-friendly version available for each search which can be used to export data sets in tabular format. To complement frequency data in searches for further analyses, the printer-friendly option includes information of latitude and longitude if users wish to plot frequencies on maps. Further download options will be developed in the future, in consultation with database users on their requirements.
The AFND has been extensively used in a wide range of contexts including clinical applications (Histocompatibility, Immunology, Epidemiology, Pharmacology, Rheumatology, etc.), Academic Research Centers, Research Centers (Cancer, HIV, Bone Marrow Transplant, Genomics, etc.), biotechnology and population genetics. The role of these users varies depending on their interest and is mainly categorized in three types: (i) users performing individual allele/gene frequencies queries to investigate whether an allele or haplotype from a tissue type of an individual may be frequent in a particular population, (ii) specialized users performing genetic population analysis by comparing specific frequency data sets from a particular group of populations and (iii) third-party application/database users interacting with the website by using bidirectional links and data sharing. The AFND has provided a significant resource for several genetics and cell function analyses, for example, several of the frequency data sets were used for analysis of balancing selection and heterogeneity in the HLA genes (17
), characterization of populations across regions (18
) and many others.