The 1,000 Genomes Project will bring to light a wealth of information on human variation and should be able to capture a vast majority of variants with a frequency of >1%. A detailed catalog of variants should aid association studies of complex traits to study variants which range from common to rare. It is hypothesized that rare causal variants for complex diseases are usually found in the frequency range between 0.1% and 1%, although the boundaries are not absolutely defined 
. The 1,000 Genomes Project will also identify very rare variants (e.g., frequency <0.5%), however, the study's ability to discover a substantial proportion of very rare variants will be dependent on whether or not very rare variants are shared across multiple populations, because individual ethnic groups which are included in the project will have a limited sample size, ~100 individuals. Many rare variants have occurred in recent human history and therefore they may not be shared among different populations. Thus the 1,000 Genomes Project currently does not have an adequate sample size to provide a comprehensive catalog of very rare variants which could be selected for genotyping in association studies of complex traits.
Although assuming equal variant frequencies is not realistic, it is easier to interpret these results than when a mixture of variant frequencies is used. To also investigate a more realistic situation where variants have a mixture of frequencies, coalescent simulation was used by generating haplotype pools under a neutral Wright–Fisher model with the assumption of no recombination. The simulation of haplotypes which reflect evolutionary history of human populations has been well researched and a neutral Wright–Fisher model is commonly used. For genes the impact of recombination is negligible due to gene length and genome-wide surveys 
have shown that recombination events occur unevenly across the human genome, and preferentially transpire outside gene boundaries. However, in reality genetic regions may display different distributions of variant frequencies than those obtained using coalescent simulation and therefore rare variant discovery may exhibit different results.
If it is believed that very rare variants contribute to disease etiology, sequencing of the study sample will be necessary to identify them. Although causal variants will be enriched in case samples, most genomic regions which are sequenced will not be involved in disease etiology. If cases are sequenced and the identified rare variants are genotyped in controls, this can lead to an increase in type I error, with the estimate of the OR being >1.0. The increase in type I error can also occur if the controls are sequenced and the cases are genotyped since the genes are not causative; in this situation the estimate for the OR will be <1.0. For situations where different proportions of cases and controls are sequenced and the remaining samples are genotyped, type I error may also be inflated. In a similar fashion, if to identify rare variants the exons of a gene are sequenced in cases and only those exons where rare variants were detected are sequenced in the controls, type I error can also be inflated. The differences in the variant frequencies between cases and controls are intrinsic to this study approach and cannot be controlled for by permutation. This inflation of type I error will not occur if the subjects that are used for variant discovery are not included in the association study. Whether or not an inflation of type I error occurs is dependent on the size of the initial sample which is sequenced, variant frequencies and the number of variants within the gene/genomic region. If the analysis is done on a specific gene/region, the level of type I error inflation is not monotonic with sample size or variant frequency as shown in . The type I error is a function of both the sample size and the difference in variant frequency between cases and controls. For small sample sizes, although the frequency difference between cases and controls is great the power to detect the difference is low due to sample size. On the other hand, for large sample sizes the variant frequency difference between cases and controls is small and the power to detect these small differences is also low, even though the sample size is large. Therefore the greatest inflation of type I error occurs for a moderate sample size and the exact sample size depends on the population frequency of rare variants. Although a monotone decrease in type I error was observed with increasing sample sizes for the examples displayed in and , monotonicity was violated when smaller sample sizes were analyzed (data not shown) demonstrating that monotonicity is not always the rule. Since neither the frequencies of variants in a population nor the number of variants within the gene/genomic region are known a priori, it is not possible to elucidate whether or not type I error has been inflated if variant discovery is carried out in a preponderance of cases.
Collapsing of genotypes was used for the association tests. It is also possible to analyze each variant separately, however for this approach to have sufficient power extremely large sample sizes will be necessary 
, with sample sizes increasing with decreasing variant frequencies and genotypic RRs. Power is particularly low when variants are either recent or de novo. Collapsing has been shown to be a powerful approach to analyze rare and very-rare variants 
and therefore we used it in our analyses.
Since mutation rates are unlikely to vary in different populations, it might be tempting to use the data from the 1,000 Genomes Project as a reference control population for various studies of complex traits. However the aggregate frequencies of rare variants in a genomic region may vary greatly from one ethnic group to another 
due to different evolutionary histories including genetic drift and bottlenecks. There are a number of examples where rare causal variants (e.g., variants in the CFTR
, and BRCA2
genes) have higher frequencies within the Ashkenazi Jewish population compared to other European Jewish and non-Jewish populations 
. In addition to rare causal variants having varying frequencies within ethnic groups, rare neutral variants may also have diverse frequencies which can lead to an increase of type I error if population substructure is not adequately controlled 
. In the study of rare variants, it is currently unknown if a consensus panel of controls can be used; for example, a European panel for complex trait association studies of Europeans and individuals of European descent, or if more stringent matching criteria are necessary. Additionally it has not been investigated if implementing current statistical methods; for example, principal components analysis 
using common variants will adequately control population substructure when analyzing rare variant data.
Studies of rare variants for complex traits are beginning to emerge and in the near future a large number of studies will be carried out for a variety of common diseases. Although there are many challenges in understanding the involvement of rare variants in complex disease etiology, one benefit from the study of rare variants compared to common variants is that rare variants have higher genotypic RRs, not only making it easier to implicate them in complex disease etiology but also the identification of rare variants should have a greater impact on risk assessment, disease prevention and treatment