|Home | About | Journals | Submit | Contact Us | Français|
Systems biology offers cutting-edge tools for the study of complementary and alternative medicine (CAM). The advent of ‘omics’ techniques and the resulting avalanche of scientific data have introduced an unprecedented level of complexity and heterogeneous data to biomedical research, leading to the development of novel research approaches. Statistical averaging has its limitations and is unsuitable for the analysis of heterogeneity, as it masks diversity by homogenizing otherwise heterogeneous populations. Unfortunately, most researchers are unaware of alternative methods of analysis capable of accounting for individual variability. This paper describes a systems biology solution to data complexity through the application of parsimony phylogenetic analysis. Maximum parsimony (MP) provides a data-based modeling paradigm that will permit a priori stratification of the study cohort(s), better assessment of early diagnosis, prognosis, and treatment efficacy within each stratum, and a method that could be used to explore, identify and describe complex human patterning.
Systems biology offers sophisticated objective tools for investigating how complementary and alternative medicine (CAM) treatments could result in complex and individualized effects on the body [1–5]. Most scientific research aims to identify the hidden patterns that exist within a population or among populations by decoding data complexity. These patterns could be biological variations or behavioral patterns, depending on the hypothesis being probed. Statistical methods have dominantly been employed to support the existence or lack of such patterns, whether the subject of the study is a human population, a plant family, or a cell culture.
However, recent evidence suggests that data averaging has significant limitations when dealing with heterogeneity as it masks intrapopulation diversity and homogenizes otherwise heterogeneous subpopulations. Heterogeneity at all levels is a product of evolution ; it confers better fitness on individuals and thus positions populations to survive bottleneck events. Evolutionary processes produce heterogeneity at many levels from cellular (e.g., genes, chromosomes, genomes, epigenetics, and tissues) to behavioral patterns (e.g., dietary, exercise, health promotion patterns). Variations at these levels constitute the basis of natural selection [6, 7].
Recent whole-genome sequencing projects have shown the presence of millions of variations as single-nucleotide polymorphisms (SNPs), small insertions and deletions, and copy number variations (CNVs) . However, the lack of proper analytical tools has reduced the significance of genetics studies and prevented meaningful interpretation of the data [9, 10].
The scientific community is addicted to statistical and phenetic approaches, and despite their inapplicability to certain high-throughput high-dimensional biological data, statistical parameters continue to be invoked even when their usefulness is doubtful . The commonly cited reason for this is the perceived absence of an alternative; but as we will detail in this paper, such alternatives indeed exist, and they should be studied and employed. They are based on the fact that heterogeneity, whether in normal or disease conditions, is an evolution-based phenomenon that has to be dealt with by applying evolutionarily compatible methods.
Although the paradigmatic and methodological argument is broadly applicable across domains and disciplines, we will present the case for the proposed approach using biological exemplars. Heterogeneity has implications for many aspects of research and clinical practice. It necessitates compensating for individual variations that produce significant differences in rates of treatments efficacy, effectiveness and side effects as well as responses to various therapeutic modalities, including whole systems of complementary and alternative medicine (WS-CAM) [12, 13]. For example, in a clinical trial where the study population encompasses individuals who are poor responders to a particular treatment, the treatment’s effects in good responders will be underestimated . As in other fields, this is an issue that is particularly important in WS-CAM research, where personalized intervention packages and individualized trajectories of treatment response are the norm [14, 15].
Rather than focusing on the commonalities of certain genes, metabolites, or proteins, profiling heterogeneity is better suited for dynamic systems . Prior to 1966, natural populations were assumed to be more or less genetically uniform, even though Lewontin and Hubby  and Harris  demonstrated that polymorphisms are common throughout populations. Today, we recognize variation at several levels; a gene-nucleotide level of variation could manifest in mutations and genetic polymorphism, while a genome-chromosome level heterogeneity can be present as CNVs, loss of heterozygosity (LOH), and epigenetic heterogeneity such as DNA methylation, non-coding RNAs, or chromosomal folding . Additionally, there are changes that take place independent of epigenetic alterations; these are influenced by environmental factors and are affected by nutrition, stress, exposure, and immune responses .
The recent clinical trials of targeted biomedical cancer treatments are examples of the current reductionist trend that has produced mostly disappointing results . However, the failed targeted treatment approach has served the purpose of bringing the issue of heterogeneity to the forefront of scientific thought [18, 20].
More recently, by recognizing the ubiquity of heterogeneity in complex systems and the negative effects of ignoring it, statisticians and researchers are calling for the two-stage study, whereby in the first stage, the study group is stratified into well-defined but broad populations using traditional experimental methods, followed by the construction of subgroups in the second stage [12, 21]. Similarly, CAM researchers previously identified a need for two-stage diagnosis, with the conventional medicine disease entity group diagnosis followed by the individualized WS-CAM diagnosis . Although the two-stage approach can be achieved fairly readily in a small non-complex situation with one to a few variables, it becomes difficult to conduct when the contributing variables are scaled up to the tens, hundreds, or thousands [22–26].
In a disease context, data heterogeneity can point out several phenomena, such as inter- and intra-specimen diversity in diseased specimens, a high rate of variability generation, and multiple pathways of disease development [18, 20]. Additionally, the disease process is further complicated by the multiphasic and dynamic nature of some pathologies, such as cancer and degenerative diseases, which pose the challenge of whether a multiphasic and dynamic process can be modeled by a bioinformatic paradigm.
Data interpretation requires analysis and synthesis compatible with the existing biological conceptual framework(s) and hypotheses. Thus, a biologically compatible analytical paradigm should incorporate four elements: the high-throughput data (e.g., genomics, metabolomics, proteomics), the disease phenotypes (e.g., hyperplasia, primary tumor, metastatic tumor), evolutionary theory, and bioinformatics (an analytical algorithm that processes the data). Parsimony phylogenetics offers an analytical algorithm that can bring these elements together to achieve novel multidimensional systems biology synthesis without the traditional overdependence on statistical methods.
Phylogenetics, also termed cladistics, is an analytical paradigm based on the principles of evolution . Its current codes known as phylogenetic systematics were laid down in the mid-1950s by the German systematist Willi Hennig . Phylogenetics differs from other systems of classifications in that, rather than using overall similarity to classify objects, it utilizes shared derived similarity as evidence of relatedness. The practice has been applied in many fields such as botany, microbiology, and zoology to construct relationships among species, populations, and individuals in an evolutionary sense .
The goal of a phylogenetic analysis is to model the data to produce a hypothesis of relationships among the specimens under study that accurately reflects the biological processes that led to the diversity of specimens. Phylogenetics constructs the hypotheses of relationships by sorting data points into ancestral (normal/within the normal parameters) and derived (abnormal, above or below the selected baseline, or falling outside the normal range) categories, and then grouping together the specimens that share the same derived states [28, 29]. The process of sorting out data points into derived and ancestral states is termed polarity assessment, data polarization, or outgroup comparison. The derived states represent the aberrations or the new changes; for example, in a disease, the aberration can be an overexpression of a gene, up-regulation of a protein, or a mutation.
In phylogenetic terminology, a shared derived state is termed synapomorphy (a potential biomarker); because sharing a synapomorphy is indicative of a relationship, a group of specimens that share one or more synapomorphies is called a clade. Phylogenetics presents its hypotheses in a graphical tree format called the cladogram (fig. 1), which is a map of clades (groupings) and their supporting synapomorphies.
There are a few methods to constructing phylogenetic cladograms (trees); among these are parsimony, maximum likelihood, and neighbor joining. They differ in their algorithmic functions and the type of data they handle. These methods have been compared, and parsimony has turned out to be the most suitable for the purposes of dealing with heterogeneous high-throughput data of various diseases. Parsimony, also known as Occam’s razor or the ‘principle of simplicity’, is generally defined as selecting the simplest hypothesis among competing ones. In phylogenetic analysis, it is the hypothesis that requires the least number of steps to construct, i.e., the shortest tree/cladogram, which is usually called the most parsimonious tree. A parsimonious approach produces a multidimensional analytical tool that is data based, not specimen based, which accounts for and integrates disease heterogeneity, nature of biological data, and principles of evolution [30, 31]. Yet, it is important that any analytical method has high predictive power; it must be able to differentiate between groups of people (e.g., those that are healthy from those with disease, or those who respond to treatment from those who do not), present the changes that distinguish between the two groups, show the transitional specimens that fall in between the two states , and stratify populations .
We provide an example to illustrate the application of a parsimony analysis of gene expression microarray data. We selected dataset GDS1439 from the National Center of Biotechnology Information (NCBI; www.ncbi.nlm.nih.gov/sites/entrez?db=gds), which contains 6 benign specimens, 7 primary, and 6 metastatic prostate carcinoma specimens . When dealing with large datasets that contain thousands of variables, especially datasets obtained from high-throughput microarrays or mass spectrometry, there are two steps in carrying out parsimony analysis; first, polarity assessment of data points through outgroup comparison into either derived (abnormal in case of disease phenotypes) or ancestral (normal) must be carried out. Polarity assessment transforms the continuous quantitative data points of gene expressions into discrete entities of zeros (0s) and ones (1s), where zero indicates that the value is ancestral (normal) and one indicates that the value is different and therefore assumed to be derived in an evolutionary sense. So, the new data matrix of polarized bivalent values has only zeros and ones, and it is this matrix that will be processed in a parsimony algorithm. The second step is the processing of the polarized values through a maximum parsimony algorithm to classify the specimens into a cladogram. The first step of the analysis was carrying out polarity assessment of the gene expression values that sorted the expression values into derived (abnormal) and ancestral (normal) by comparing the values of the cancerous specimens against the range of the benign specimens for every gene in the dataset. The new matrix was processed with the computer program MIX (the parsimony program of the PHYLIP package) using Wagner’s parsimony method , which produced only one most parsimonious tree/cladogram (fig. 1).
The cladogram of figure 1 is a graphical summary that showed the groupings (clades), their synapomorphies, and a topology that reflects the relationships among the clades and the direction of change accumulation among the clades and their specimens. The analysis showed the primary and metastatic specimens grouped separately from each other in two groups; while the metastatic occupied the top of the cladogram, the primary was nested in between the metastatic and benign clades. Separating the metastatic cases from the primary ones on the basis of their gene expression is an excellent outcome that confers confidence that this approach has good validity. The primary and metastatic clades shared a list of 302 synapomorphies (uniquely shared derived expressions or potential biomarkers in a biomedical sense) that separate them from the benign clades. The metastatic specimens at the top end of the cladogram are separated from the primary specimens by a list of 577 synapomorphies that are shared by their respective specimens. The cladogram topology has directionality; the specimens with the highest number of derived states occupy the upper part of the cladogram. Therefore, one could interpret the tandem arrangement of the primary and metastatic groups as a sequential relationship, where the initiation of the cancer required 302 derived gene expressions, while the transformation to a metastatic phenotype required an additional 577 changes.
As our example demonstrates, maximum parsimony has efficiently and accurately modeled the heterogeneous expression profiles of the diseased specimens, in this case, cancer with a rapid mutation rate. The analysis precisely classified the phenotypes (or genetic patterns) based on modeling of the disease genotypes (gene expressions); there was no mixing of the three phenotypes, and the gene expression data were perfectly congruent with their phenotypes.
The process of data polarization has the added advantage of reducing measurement variability. By transforming data points to distinct 1s and 0s, the comparison between specimens becomes qualitative rather than based on absolute quantitative values. Polarization of the data allows pooling of multiple experiments, and therefore facilitates intra- and inter-compatibility of the observed clades, types or classes. In this regard, the analysis is a systems biology approach that can pool data from related diseases to identify the common aberrations and differential features among them, e.g., several cancer types  or WS-CAM diagnostic subgroups [14, 15, 17, 35]. For example, Alraek and Baerheim  subgrouped their cystitis patients in three groups: (1) spleen yang/qi xu, (2) kidney yang/qi xu, and (3) liver qi stagnation; such subgrouping can more objectively be carried out by a phylogenetic analysis. Also, as Frei et al.  have shown, subgrouping of patients with attention deficit hyperactivity disorder (ADHD) before the commencement of a trial is important in order to avoid failure, since patients vary in their response to treatment and poor responders require alternative medication (see below on the use of phylogenetics for the stratification before clinical trial).
From a practical aspect, a parsimony approach can be translated into a clinical setting for diagnosis, prognosis, and post-treatment evaluation . By constructing a comprehensive cladogram that incorporates many diseases (for example, a tree of cancer), the cladogram becomes an instrument for diagnosis. To diagnose a case, data can be entered into the comprehensive cladogram, thus placing the case on the cladogram. This approach might also facilitate more accurate prescriptive practices, like those used in homeopathy, in which the process of choosing an individualized remedy or therapeutic schema often requires categorizing each patient’s global homeopathic phenotype, i.e., remedy type, by kingdom (animal, plant, or mineral) and specific family .
Parsimony phylogenetics could also be applied to WS-CAM and integrative therapies research. CAM clinicians have often claimed that a portion of the population responds positively to a particular therapy (responders) while others seem to have little to no change in outcomes (non-responders) [14, 15, 35, 37, 38]. Clinical trial limitations, often cited in the CAM literature, have fueled a call for new methodological strategies and sophisticated analyses. The successful analytical tool is the one that does not obviate or underplay heterogeneity-driven variability. Solutions such as genomic control, which adjusts association statistics for each marker by a uniform overall inflation factor, compensate only partially for heterogeneity [39, 40].
Parsimony phylogenetic methods can be used to differentiate among responder types by classifying into responder-type clades, based on shared synapomorphies or sets of intra-population characteristics. Thus, a wider set of study participants could be enrolled into CAM clinical trials, more closely aligning the trial population with those seen in clinical practice. CAM researchers could better evaluate treatment-effect variability and treatment-related risks, while predicting those persons who are most likely to benefit from a particular CAM therapy in a given solution [15, 41].
Others have suggested conducting multiple trials of treatment on each individual in an ‘n-of-one’-type design in order to minimize the data heterogeneity. Phylogenetic analysis would allow pooling of data across these types of studies to again identify responders and non-responders. Thus, information gained from clinical trials could be more informative and more easily extrapolated to the clinic .
We propose a three-stage clinical trial model that starts with a stratification of the study sample based on phenotypic and genotypic characters. A pre-trial phase using a priori stratification by parsimony phylogenetics will delimit the subpopulations that share common biological traits (classes) (fig. 2). A small blood sample subjected to high-throughput analysis could provide the data needed for stratification. Because the stratification is done without a priori weighing of variables, parsimony may also reveal the variables that define the subpopulation partitioning. Essentially, in the first stage, the recruitment could include a wide spectrum of inclusion criteria in order to embrace a heterogeneous study population that would reflect the ‘real-world’ setting. Based on the identified clades that reveal relatedness among groups and subgroups, the second stage of the trial could be implemented knowing that we reached a level of homogeneity at baseline within each clade. Thus, each clade could then be randomized to either control or intervention, depending on the clinical trial type.
Among the individuals of each clade there will probably be variable levels of responsiveness to the intervention, but most likely less variability than between clades. The cladogram will serve as a dynamic database for the implementation of the third stage, which corresponds to the translation of the clinical trial findings to the clinic. This means that, prior to using a new therapy or intervention, the patient will need to submit a blood sample to determine his/her clade. A clade membership determines the treatment options; what dosage he/she will need or how responsive to treatment the patient will be. Thus, the health care provider could make an informed decision when recommending a particular therapy or implementing a particular type of treatment. Potentially, this could lead to decreased treatment-related risk, improved outcomes, decreased costs, and treatment efficiency. Furthermore, in WS-CAM where the practitioner stratifies patients on categorical yet unifying classes (types of doshas, humors, temperaments, or imbalances in interrelationships among elements), this method could offer a modern verification of these concepts and potentially include them in the clinical design.
The advantages of this three-stage clinical trial design could be summarized in three significant points: (i) By employing parsimony analysis to carry out the pre-trial populational stratification into natural clades, the baseline heterogeneity per clade will significantly be reduced; (ii) individuals’ positions within a clade determine their treatment options; and (iii) by using the clades as a dynamic data base, the physician could prescribe the suitable treatment on the basis of the patient’s clade membership. Our proposed three-stage clinical trial design encompasses the practice of personalized medicine using a systems biology approach that addresses most of the currently debated issues of baseline heterogeneity, data heterogeneity, treatment-related risk, and translation of trials findings to the clinic.
Using parsimony phylogenetics as a means to account for heterogeneity from the subcellular to the whole-human behavioral level of function holds extraordinary promise in expanding clinical knowledge related to CAM therapies, and clinical treatment in general. Using a systems biology approach and putting data into an algorithm that can accurately model subtypes of people/phenomena in an evolutionary context offers a novel methodology to clinical design. This stands to deepen clinician understanding and confidence for matching interventions to those who are most likely to benefit.
The authors declared no conflict of interest.