Long intergenic non-coding RNAs (lincRNAs) are intergenic region-derived, large transcripts (>200 nucleotides) that do not give rise to proteins. lincRNAs, together with other long non-coding RNAs (lncRNAs), such as antisense, interleaved, and protein-coding-gene-overlapped lncRNAs, are actively transcribed from the genome, and consist of a substantial fraction of the human transcriptome (Carninci et al., 2005
; Mattick and Makunin, 2006
). Approximately 20–40% of the total genome is estimated to produce lncRNAs vs. less than 2% represented by protein-coding mRNAs (Nagano and Fraser, 2011
), which stimulated intensive interest in the exploration of the gene structures, expressional signatures and functionality of lncRNAs. Long intergenic non-coding RNAs (lincRNA) have recently come to our attention as a result of an increased intergenic coverage of whole transcriptome and sequencing analysis.
By means of various high throughput approaches, the population of identified human lincRNA transcripts is rapidly expanding—from a class of ~3300 identified using chromatin-state mapping (Khalil et al., 2009
) to a substantial catalog of over 8000 assembled from four billion RNA-seq reads (Cabili et al., 2011
). This number, though reaching as much as one-fifth of mRNAs, represents a conservatively lower estimate of lincRNAs, most of which remain unannotated. Moreover, the nucleotide sequences of lincRNAs appear to be evolutionarily conserved among mammals, especially their promoters that exhibit a strong conservation resembling known protein-coding genes, suggesting they are functional despite the lack of protein-coding capacity (Guttman et al., 2009
). Indeed, accumulating evidence has demonstrated various functionalities of annotated lincRNAs or even specific individual lincRNAs in mammals and humans (Gupta et al., 2010
; Huarte et al., 2010
; Keniry et al., 2012
; Yoon et al., 2012
). In addition to their tissue- and cell-specific expression signatures (Cabili et al., 2011
), epigenetic regulation of gene expression, and involvement in developmental processes (Loewer et al., 2010
; Guttman et al., 2011
), dysregulation in cancers and their functional mechanisms involved in tumor development, including breast tumorigenesis, have begun to be investigated. Notably, a recent finding showed that numerous lincRNAs in the HOX loci demonstrated differential expression in primary breast carcinomas and metastases, particularly with HOTARI expression being up-regulated by thousands of fold, leading to the proposal that it is an independent prognostic biomarker for breast cancer metastasis (Gupta et al., 2010
Breast cancer is among the most lethal malignant diseases in women (Howlader et al., 2012
). Analyses of this disease using existing large-scale whole-genome technologies have revealed complex genomic aberrations which are believed to be the driving force of its initiation or progression (Stephens et al., 2009
; Banerji et al., 2012
; Ha et al., 2012
). In addition to point genomic changes, large-scale structural alterations, including genomic copy number variation (CNV) and loss-of-heterozygosity (LOH) in breast tumors, though less well-understood, appear to be major contri-butors to this tumorigenesis.
CNV typically refers to genomic changes in terms of variable numbers of copies of a >1 kb DNA segment in comparison to a reference genome (Feuk et al., 2006
). CNV regions of the genome are often mapped by many dosage-sensitive genes since copy gains or losses of these genes primarily determine their expression (Feuk et al., 2006
; Zhou et al., 2011
). Studies on correlations between CNVs of protein coding genes and corresponding expression levels, have indicated that approximately 60% of cellular expression variation and expression-associated phenotypes are accounted for by CNVs (with the exception of brain) (Henrichsen et al., 2009
). Recent SNP array data from a collection of over 2000 breast cancer samples indicated that inherited and somatically acquired tumor CNVs had a strong influence on the expression of approximately 40% of genes (Curtis et al., 2012
). While the majority of these array analyses have largely emphasized the protein-coding genes, none of the reports have included investigation of the intergenic non-coding genes harboring CNVs. This is likely due to the lack of high-density array coverage and inadequate annotations in intergenic regions.
Since genetic structural variations of breast cancer-associated lincRNAs have not been reported, we sought out to examine the CNV of lincRNA genes between tumors and matched adjacent normal host counterparts using HumanOmni5 Quad Beadchips. This platform provides a high coverage of the intergenic portion of the genome for high-resolution aberration detection. By using this technology, we hope to identify lincRNA associated genomic variations that may influence the progression of this disease.