Apart from a few examples of genetic mutations with high penetrance, such as found in BRCA1
, most genetic risk of breast cancer (BCa) resides at multiple low penetrance loci, more recently identified by genome-wide association studies (GWASs) 
. In general, GWASs utilize single nucleotide polymorphisms (SNPs) to tag common genetic variation in linkage disequilibrium (LD) blocks in order to identify genome-wide risk loci for complex diseases. To date, 71 replicated and independent BCa risk loci have been identified 
. There are thousands of SNPs in each LD block, and many of these SNPs are candidates to exert functionality in BCa risk. At the 71 BCa risk loci, at least 320,000 SNPs are associated with BCa risk. Due to this plethora of SNPs in LD, much of the heritability of complex diseases, such as BCa, remains unknown 
. Identification of underlying mechanisms that explain how SNPs affect risk will provide a better understanding of the genetic risk of complex diseases, such as breast cancer, which is described in this study.
In contrast to Mendelian disorders, where most disease-causing mutations result in absent or non-function proteins, many complex disease-associated variants, such as for BCa are mainly found in non-coding regions of the genome. Since >90% of the genome is non-coding and risk mechanisms of complex diseases are likely due to subtle regulation of gene expression, risk-SNPs are more often found in non-coding regions. Knowledge of the non-coding regions is rudimentary compared to the protein coding part. However, recent ENCODE data dramatically demonstrated that the non-coding part of the genome is much more than simply ‘junk’ DNA and contains well-demarcated gene regulatory regions, in particular enhancers 
We have recently formulated a roadmap to address the functionality of risk SNPs in non-coding regions by characterizing gene regulatory regions with nucleosome and transcription factor occupancy and histone modifications 
. Moreover, several research groups annotated genomic regions (coding and non-coding) to identify candidate functional SNPs involved in complex diseases 
. However, as more next generation sequencing (NGS) data (of chromatin annotations from consortia such as ENCODE), more loci (from meta and primary GWASs), and more SNPs at ever lower minor allele frequencies (from the 1000 genomes project) become available, further analyses utilizing updated data and methods are needed for specific diseases such as BCa.
In the present study, we addressed the hypothesis that BCa risk SNPs reside in functional genomic regions such as coding exons, TSS regions, and enhancers. In order to identify potentially functional SNPs, we conducted a comprehensive analysis on 656,895 SNPs from the 1000 genomes project data released in May 2012, at the 71 BCa risk loci by measuring LD and annotating them with 11 NGS datasets, all in primary breast epithelial cells. Thus, we found 1,005 potentially functional high LD SNPs. From these, we were able to frame specific hypotheses involving 547 SNPs in terms of novel biological mechanisms; 2 SNPs were at non-benign codon changes in one gene, 42 and 503 SNPs were within response elements of known transcription factors in TSS regions and enhancers, respectively. This shortlist of potentially functional SNPs will not only aid in prioritizing a manageable number of likely functional SNPs, but also reveal hidden biological mechanisms for the etiology of breast cancer.