|Home | About | Journals | Submit | Contact Us | Français|
Systematic collection of phenotypes and their correlation with molecular data has been proposed as a useful method to advance in the study of disease. Although some databases for animal species are being developed, progress in humans is slow, probably due to the multifactorial origin of many human diseases and to the intricacy of accurately classifying phenotypes, among other factors. An alternative approach has been to identify and to study individuals or families with very characteristic, clinically relevant phenotypes. This strategy has shown increased efficiency to identify the molecular features underlying such phenotypes. While on most occasions the subjects selected for these studies presented harmful phenotypes, a few studies have been performed in individuals with very favourable phenotypes. The consistent results achieved suggest that it seems logical to further develop this strategy as a methodology to study human disease, including cancer. The identification and the study with high-throughput techniques of individuals showing a markedly decreased risk of developing cancer or of cancer patients presenting either an unusually favourable prognosis or striking responses following a specific treatment, might be promising ways to maximize the yield of this approach and to reveal the molecular causes that explain those phenotypes and thus highlight useful therapeutic targets. This manuscript reviews the current status of selection of extreme phenotypes in cancer research and provides directions for future development of this methodology.
Despite the fact that human beings share the vast majority of their genetic information, the few remaining variations account for an astonishingly wide range of different phenotypes. The importance of characterizing such differences is widely recognized by the scientific community and enormous efforts are being made to understand their role in disease . Indeed, a useful method to advance in this path has been to correlate molecular data with phenotypes. The soundness of this strategy is straightforward and the development of phenotype databases has already been proposed [2, 3]. Yet, although such databases are being generated for species such as yeasts  or rodents [5, 6] and are under development for human beings [7, 8], progress is proceeding at a relatively slow pace due to the enormous complexity of this task, which is at least in part caused by the multifactorial origin of many diseases and by the intricacy of accurately classifying phenotypes.
In the meanwhile, a frequent approach has been to correlate molecular features in groups of patients with their phenotypes, expressed as clinical variables, such as prognosis or treatment effects. Some relevant examples of the use of this strategy in oncology are the correlation between thymidylate synthase expression and efficacy of 5-fluorouracil in digestive tumours ; or the identification of gene-expression profiles of prognostic value in lymphoma  or breast cancer patients . However, despite the significance of some results, the conclusions reached by many studies are of uncertain clinical significance or even contradictory . This has led to the establishment of specific guidelines to validate conclusions before their publication [12–14]. Several factors may cause these biases, including methodological issues, such as retrospective data collection, limitations in laboratory techniques or the biology of complex diseases [12, 15], such as cancer, which present multiplex phenotypes. In addition, classification of patients into subgroups with good or bad evolution that present moderate differences, such as subtle improvements in survival, from one to the other may lead to the identification of molecular features associated with modest differences of borderline clinical relevance.
A useful and intuitive approach to circumvent some of these problems has been to select individuals with very characteristic, clinically relevant phenotypes and to study the underlying causes. This strategy assumes that these patients are the most informative and thus should be studied separately, rather than being included in larger series of patients that might dissipate the information that they can provide. Even though this strategy has allowed the identification of relevant biological facts with great effectiveness, through the study of reduced numbers of subjects, and has been proposed as a methodology for the study of human disease [16–20], its use has not become widespread. This manuscript reviews the current status of extreme phenotype selection in cancer research and provides relevant examples that support its value, along with potential directions to further develop this strategy.
Apparent phenotypes present characteristic attributes and therefore can be identified by observation. Sometimes the phenotype is readily recognized, because its characteristics are obvious. This is the case of the widely employed strategy of identifying gene mutations that cause genetically inherited diseases . Paradigmatic examples in oncology include syndromes characterized by the development of multiple tumours, such as multiple endocrine neoplasia, type 1 (MEN-1). The description of parathyroid, pancreatic and pituitary tumours in autopsies of patients with acromegaly , and its familial association [23, 24] preceded by decades the identification of the MEN-1 tumour-suppressor oncogene  and its mutations in affected individuals . Another outstanding example is the detection of mutations in BRCA-1, a gene that was identified in families that presented a high incidence of early-onset breast carcinoma .
On other occasions the phenotypes are less evident and complex epidemiological studies are required to identify them. An illustrative example is the Li-Fraumeni syndrome, described through the identification of an increased incidence of rhabdomyosarcomas in siblings following the review of over 20,000 children’s death certificates [28, 29]. As in the previous example, the identification of the phenotype allowed the hypothesis to be formulated that eventually led to the detection of germ-line p53 mutations as the cause .
Finally, sometimes characteristic phenotypes are expressed only under certain circumstances, such as after treatment administration. For example, the description of severe toxicity after 5-fluorouracil administration  allowed the identification of the biochemical  as well as the genetic underlying causes . Another relevant example is the discovery of the expression of epidermal growth factor receptor (EGFR) mutations in tumours of patients responding to EGFR tyrosine-kinase inhibitors [34, 35]. Paradoxically, in this setting the study of a relatively low number of subjects yielded very relevant and clinically useful results, in contrast to the modest conclusions obtained after studying thousands of patients through conventional randomized trials [36–39], which assume that the benefit that the treatment produces only in a group of patients is large enough to administer it to the whole unselected population. Interestingly, the selection and study of patients with extreme responses (most vs. least sensitive), extreme drug metabolisms (high vs. low) or extreme toxic effects (no toxicity at high doses vs. high toxicity at low doses) following drug therapy has been described by Nebert as a well defined methodology in clinical pharmacology, in a manuscript that reviews studies in which this strategy has been successfully employed . The same author and his collaborators have also described in detail the statistical rationale that supports this methodology .
The common factor underlying all these examples is that the initial step was the identification of the characteristic phenotypes. Subsequent study of the selected individuals led to the identification of the molecular causes. The effectiveness of this strategy is notable, because the number of subjects that need to be studied is relatively small. In addition, the clinical relevance is high, because the attributes of the selected phenotypes are significant. Therefore, we believe that this methodology should be further developed in cancer research. Relevant case-selection should include cancer patients with very characteristic and uncommon evolution. Currently, many advanced solid tumours are considered incurable and result in short survival. Nonetheless, clinical experience shows that exceptions exist even among the malignancies presenting the direst prognosis, and every oncologist treats patients that are unexpectedly cured or that live far beyond their estimated prognosis. Even though many of those cases are not published, some reports of long-term survivors of apparently incurable tumours such as pancreatic cancer [41, 42], gastric cancer [43–46], colon cancer , small [48, 49] and non-small-cell lung cancer [50, 51] or multiple myeloma  can be found in the medical literature. Assuming that the diagnostic and staging work-up has been correctly performed, these individuals may represent extreme phenotypes worthy of detailed study. Similarly, patients presenting early-stage cancer that receive adequate treatment and that either do not relapse despite presenting a high risk of recurrence or relapse despite a very low risk of recurrence, might be interesting groups to study. A clinical example of the former design would be to study patients with T1G3 bladder tumours that do not relapse and patients that present low-grade papillary tumours that recur following adequate local treatment. Patients showing extreme responses or toxic effects to a given treatment, as suggested by Nebert , also constitute interesting phenotypes to identify and to study. In fact, this methodological design deserves further consideration in the current scenario in which hundreds of drugs are being developed, but only a few obtain regulatory approval, based on conventional drug development methodology. Further development of multiple drugs might be of clinical interest, even if they just show clinical activity in a limited subgroup of patients, assuming that it would be possible to identify such patients. Studies in which patients receive a treatment in a non-randomized fashion and those patients that present marked benefit are intensively studied to elucidate specific markers of activity might become complementary to current randomized studies. The sample size of these studies should allow to identify a sufficient number of patients presenting clear benefit from the treatment. Such studies might become a useful tool to further develop personalized medicine, allowing identification of those patients that achieve a truly significant benefit from a specific treatment.
Multiple potential causes might explain these phenotypes, and they could be related to host and tumour factors as well as environmental causes (Fig. 1). Host factors might include regulation of immune response, angiogenesis, apoptosis or DNA repair mechanisms, ability to control metastasis or advantageous metabolism of anti-cancer treatments, among others. Tumour factors might comprise abnormalities in drug or immune resistance, cell cycle and apoptosis regulation, and so on. Since multiple hypotheses should be studied, the assessment of specific major molecular pathways and the use of high-throughput techniques, which allow the simultaneous assessment of multiple biological factors, seem necessary. Even though the interpretations of these techniques is somewhat cumbersome, due to the large amount of information they produce, the use of a reduced sample population that present marked and clinically relevant characteristics should clearly improve the efficiency of such studies.
This strategy has already been followed by some researchers. The study of melanoma patients with long-term survival has highlighted tumour immune escape as a mechanism of disease progression and shifting of T-cell responses as a response to this escape , as well as prolonged persistence of specific CD8+ cells as a potential cause of maintenance of long disease remissions [54, 55]. Interestingly, all these observations, which support the concepts of cancer immunosurveillance and immunoediting , were obtained from the study of just 4 patients. We have used this strategy in renal cell carcinoma patients treated with the antiangiogenic drug sunitinib. Sera from 3 patients showing marked responses and from 3 patients that presented clear progressions were analyzed with a Human Cytokine Array, which evaluates 174 cytokines. We identified 27 cytokines, which varied significantly between both groups, and we further selected and assessed the most relevant cytokines by ELISA in 21 evaluable patients, concluding that TNF-α and MMP-9 baseline levels were predictive of response . In a similar study performed in melanoma and renal cell-carcinoma patients, vascular endothelial growth factor and fibronectin were identified as predictors of response to high dose intravenous IL-2 . Although these studies warrant confirmation, they support the relevance of selection of patients with extreme phenotypes and their study with high-throughput techniques as a valid method to identify candidate predictive factors of drug activity.
Non-apparent phenotypes are those not associated with characteristic attributes and therefore they cannot be identified by mere observation. An example is the existence of protection against developing a disease. Since one specific disease does not develop in the majority of individuals, the existence of protection cannot be distinguished from the simple absence of disease by chance, unless the risk of developing that disease is taken into consideration.
It is well known that cancer incidence under similar environmental conditions is not uniform. If cancer risk followed a normal distribution, as most biological variables do, we could hypothesize that just as some individuals present an increased incidence, other subjects may have lower incidence than expected. If these individuals exist, it would be naïve to attribute their phenotype to chance, at least until other causes have been ruled out, and their identification and study could increase our knowledge about cancer and yield useful therapeutic targets. This protection could be secondary to many factors, including specific mechanisms of DNA repair, cell cycle regulation, metabolism of carcinogens, angiogenesis, apoptosis or immunological response among others (Fig. 1).
Although it seems reasonable that selection of individuals presenting decreased risk of developing cancer might have been favoured by evolution to a certain extent, the question remains if such subjects do really exist. Looking to other diseases, we can find relevant examples, some of which have been successfully identified through selection of extreme phenotypes. One outstanding paradigm is the identification of alterations in the gene encoding the chemokine coreceptor CCR5 that confer complete protection against certain strains of the human immunodeficiency virus (HIV) [59, 60]. Since CCR5 mutations were not associated with phenotypic abnormalities, they were only identified after observing that some individuals highly exposed to HIV never developed the infection . The study of those individuals allowed the discovery of a relevant target in HIV investigation.
Returning to cancer, some preclinical studies support the biological plausibility of the existence of protective mechanisms. A relevant example is the creation of a “super-p53” mouse, carrying supernumerary p53 copies, which shows a decreased risk of developing chemically induced tumours . Although these mice and other similar models [63, 64] have been artificially developed, their relatively normal phenotypes raise the question of whether similar phenomena could spontaneously occur and be selected for in nature. It is known that small malignant tumours identified in autopsies or by high-performance imaging tests outnumber the quantity of clinically overt cancers . However, it is unknown whether this is caused by protective mechanisms or if it is due to other reasons, such as differences in tumour biology. The same questions arise when we analyse the different sensitivity of individuals to carcinogens such as tobacco or the variations in clinical aggressiveness of tumours in different patients: while large differences exist, it is difficult to establish their causes.
Even if we assume that individuals bearing protection against developing cancer may exist, the question of how to identify them remains. Several studies have assessed the protective role of enzymatic polymorphisms with inconclusive results, as reviewed elsewhere . Most compared cancer patients with normal individuals. Such a design offers the disadvantage that, rather than true protection against cancer, normal subjects may just present absence of disease with a normal risk of developing it. Instead, protection can be expected in individuals not developing cancer despite presenting an increased risk. This approach has been successfully evaluated in the study of polymorphisms of detoxifying enzymes, using elderly individuals not presenting cancer as controls, sometimes even when they were smokers [66–69]. While these studies truly select extreme phenotypes, their design could be improved by the use of high-throughput techniques, which study multiple potential causes, rather than just a few; and by selecting individuals with even more characteristic phenotypes, i.e., a markedly reduced individual or familial risk of developing cancer. Families presenting very low cancer incidence across several generations might show reduced familial risk. Subjects with high-risk cancer factors, such as extensive exposure to carcinogens or cancer familial syndromes that do not develop the disease, or in whom development is significantly delayed, may present reduced individual risk. Some sensitive models that have already been proposed might be familial adenomatous polyposis  or hereditary non-polyposis colorectal cancer , because they present high penetrance and, therefore, the likelihood that an affected individual will not develop cancer is low. The study of heavy smokers that do not develop cancer at an advanced age and of young smokers who develop the disease might also yield relevant information on cancer protective mechanisms and tumour development and growth. Different combinations of these strategies could be developed. Aggregation of similar phenotypes within one family would further support that their underlying cause is not random. Theoretically, the likelihood of finding clinically relevant results should be directly related to the magnitude of the discrepancy between the estimated risk of presenting cancer and the observed phenotype.
Extreme phenotype selection is a well defined biological concept that describes how environmental pressures favour, among traits exhibiting diverse phenotypes, the fittest to overcome the hazards encountered. Although it seems logical to study the favoured phenotypes to determine their mechanisms of success, the use of this strategy has often been restricted to obvious cases or to isolated observations made by discerning clinicians. The lack of a systematic approach to the identification of extreme phenotypes is what has probably precluded most of them from being studied. Therefore, a consistent methodology should be developed to maximize the potential benefit of this strategy. Specifically in the field of oncology, we propose the creation of databases compiling patient samples, together with clinical and epidemiological data from individuals presenting relevant phenotypes. The collection and intensive study of these extreme cases, rather than constituting a mere list of oddities, might provide excellent hunting grounds to discover Achilles heels of cancer.
This methodology does, however, raise some issues. Principal among these are the questions of how to classify phenotypes into quantitative groups  in order to define what constitutes an extreme phenotype, and how to identify them. Definition of phenotypes that might be clinically relevant and that can be found in clinical practice should probably be performed by consensus panels of experienced clinicians under the coordination of medical societies or cooperative groups. Case selection should be approached by training and creating awareness among medical specialists. Cancer patients with very favourable evolution, or with extreme responses or toxicities following therapy could be selected in oncology centres relatively easily, since their number is small and their characteristics are unusual. Individuals presenting cancer familial syndromes not developing cancer despite their high risk could be selected through genetic counselling units. In other cases, more complex epidemiological studies might be required to identify relevant discordances between the expected and the observed phenotypes. Since it is unlikely that an adequate number of subjects bearing extreme phenotypes can be detected in a limited number of institutions, this will require the collaboration of large cooperative groups, ideally at an international level.
Another issue is what variables should be studied in these subjects. Characteristic phenotypes may be caused by host or tumour factors, as well as by external causes. Therefore, all of these should be analyzed, and ideally samples from the tumour and from the host’s normal tissue should be collected, along with clinical and epidemiological data. The type and quantity of the samples should allow a wide variety of studies to be performed, including screening of cell genomes, epigenomic changes and transcriptomes through high-throughput techniques, as mentioned above, such as Genome Wide Association Studies (GWAS) or full genome sequencing. The samples should also permit additional studies to be performed in the future using techniques that are not yet available. At a minimum, whole blood, including serum and DNA and, in the case of cancer patients, fresh and paraffin-embedded tumor tissue, should be collected. The obtention of sequential samples, linked to relevant clinical events (e.g: start of a new treatment, evaluation of clinical response or observation of an unexpected toxicity) might provide relevant information about the evolution of different biomarkers in such situations. Even though this methodology does not avoid the inherent problems of sample collection, it does limit dramatically the number of samples to obtain and to study. Unavailability of samples might be an important limitation, which could be overcome by prospective collection of cases. The importance of an adequate infrastructure to collect and store samples and to manage the database cannot be overemphasized. Lastly, ethical issues may be another point of concern, since the use of biologic material is subject to strict regulations. Ethical boards must collaborate to make these studies feasible without compromising the rights of the participants. The possibility of contacting subjects in order to obtain more information or to perform functional studies should be taken into consideration. Well designed informed consent processes and prospective data collection should minimize these problems.
In conclusion, the selection and the study of extreme, clinically relevant phenotypes is an efficient strategy to identify their underlying causes. The creation of collaborative databases compiling biological samples and clinical information from such phenotypes might increase our knowledge of cancer and provide new therapeutic strategies. This will require close and continued collaboration between clinicians, who must identify appropriate cases, and basic scientists, who should perform adequate studies to identify and integrate the relevant targets. Even in the current age of modern molecular biology, clinical observation should remain a preferred strategy to generate hypothesis than intellectual speculation.
Conflict of interest The authors declare that they have no conflict of interest relating to the publication of this manuscript.
José Luis Pérez-Gracia, Medical Oncology Department, Clínica Universidad de Navarra, Universidad de Navarra, Avenida Pío XII, 36, ES-31008 Pamplona, Spain.
Alfonso Gúrpide, Medical Oncology Department, Clínica Universidad de Navarra, Universidad de Navarra, Avenida Pío XII, 36, ES-31008 Pamplona, Spain.
María Gloria Ruiz-Ilundain, Anesthesiology Department, Clínica Ubarmin, Pamplona, Spain.
Carlos Alfaro Alegría, Gene Therapy Unit CIMA, Universidad de Navarra Pamplona, Spain.
Ramon Colomer, MD Anderson International España, Madrid, Spain.
Jesús García-Foncillas, Medical Oncology Department, Clínica Universidad de Navarra, Universidad de Navarra, Avenida Pío XII, 36, ES-31008 Pamplona, Spain.
Ignacio Melero Bermejo, Gene Therapy Unit CIMA, Universidad de Navarra Pamplona, Spain. Internal Medicine Department, Clínica Universidad de Navarra, Universidad de Navarra, Pamplona, Spain.