One of the most common causes of worldwide cancer premature death is non-small cell lung carcinoma (NSCLC) with a very low survival rate of 8%-15%. Since patients with an early stage diagnosis can have up to four times the survival rate, discovering cost-effective biological markers that can be used to improve the diagnosis and prognosis of the disease is an important clinical challenge.
In the last few years, significant progress has been made to address this challenge with identified biomarkers ranging from 5-gene signatures to 133-gene signatures. However, A typical molecular sub-classification method for lung carcinomas would have a low predictive accuracy of 68%-71% because datasets of gene-expression profiles typically have tens of thousands of genes for just few hundreds of patients. This type of datasets create many technical challenges impacting the accuracy of the diagnostic prediction.
We discovered that a small set of nine gene-signatures (JAG1, MET, CDH5, ABCC3, DSP, ABCD3, PECAM1, MAPRE2 and PDF5) from the dataset of 12,600 gene-expression profiles of NSCLC acts like an inference basis for NSCLC lung carcinoma and hence can be used as genetic markers. This very small and previously unknown set of biological markers gives an almost perfect predictive accuracy (99.75%) for the diagnosis of the disease the sub-type of cancer. Furthermore, we present a novel method that finds genetic markers for sub-classification of NSCLC. We use generalized Lorenz curves and Gini ratios to overcome many challenges arose from datasets of gene-expression profiles. Our method discovers novel genetic changes that occur in lung tumors using gene-expression profiles.
While proteins encoded by some of these gene-signatures (e.g., JAG1 and MAPRE2) have been showed to involve in the signal transduction of cells and proliferation control of normal cells, specific functions of proteins encoded by other gene-signatures have not yet been determined. Hence, this work opens new questions for structural and molecular biologists about the role of these gene-signatures for the disease.