|Home | About | Journals | Submit | Contact Us | Français|
Cocaine dependence (CD) is a multifactorial disorder, variable in its manifestations, and heritable. We examined the concurrent validity of homogeneous subgroups of CD as phenotypes for genetic analysis. We applied data reduction methods and an empirical cluster-analytic approach to measures of cocaine use, cocaine-related effects, and cocaine treatment history in 1393 subjects, from 660 small nuclear families. Four of the six clusters that were derived yielded heritability estimates in excess of 0.3. Linkage analysis showed genomewide significant results for two of the clusters. Here we examine the concurrent validity of the six clusters using a variety of demographic and substance-related measures. In addition to being differentiated by a variety of cocaine-related measures, the clusters differed significantly on measures that were independent of those used to generate the clusters, i.e., demographic features and prevalence rates of co-morbid substance use and psychiatric disorders. These findings support the validity of the methods used to derive homogeneous subgroups of CD subjects and the resulting CD subtypes. Independent replication of these findings would provide further validation of this approach.
Cocaine use is widespread in the U.S., producing a variety of adverse medical and neuropsychiatric effects (Wolff and O’Donnell, 2004; Karch, 2005; Nnadi et al., 2005; Substance Abuse and Mental Health Services Administration, 2004). Cocaine dependence (CD), as a broad diagnostic entity, is a complex, heterogeneous, multifactorial disorder that includes cognitive, behavioral, and physiologic features. One way to reduce the heterogeneity of CD is the use of a typologic (or subtyping) approach (Babor and Dolinsky, 1988; Epstein, 2001). The validity of such an approach can be evaluated in terms of the utility of the subtypes for understanding etiology, presentation, natural history, or response to treatment of individuals with CD.
Vulnerability to the development of CD varies among individuals. Studies in animals and humans have examined the relative contributions of environmental and genetic factors in the etiology of substance dependence (Uhl et al., 1995). Elucidating the genetic basis of CD would represent major progress in understanding the etiology of the disorder and could contribute substantially to the effort to develop efficacious medications to treat the disorder. This effort has, to date, been largely unsuccessful (Kosten et al., 2005). The failure to identify efficacious medications to treat cocaine dependence may reflect an inadequate understanding of the heterogeneity of the disorder or an incomplete understanding of the pathophysiology of the disorder, with inadequate specification of potential medication-responsive dimensions.
Twin studies have shown that cocaine and other stimulant dependence is genetically influenced (Tsuang et al., 1996; Kendler and Prescott, 1998; Kendler et al., 2003). Although these studies considered CD as a single diagnostic entity, there appear to be multiple subtypes of CD (Weiss and Mirin, 1986). If the broader set of CD subjects could be decomposed into valid subgroups, members of which were more similar phenotypically to each other than to members of other groups, these more homogenous subgroups could provide a basis for more powerful genetic analysis.
Univariate approaches for subtyping CD have focused on co-occurring psychopathology (Rounsaville et al., 1991), family history of substance abuse (Roehrich and Gold, 1988) and personality dimensions (Craig and Olson, 1992; Ball et al., 1998). Multivariate approaches to subtyping CD and other drug dependence (Ball et al., 1995; Feingold et al., 1996; Cohen, 1999; Basu et al., 2004) have used an empirical (k-means) clustering technique that was first applied to differentiate alcoholics into Type A (i.e., low-risk/severity) and Type B (i.e., high-risk/severity) subtypes (Babor et al., 1992). The present study describes a multivariate cluster analytic approach that has been refined from those used previously to yield cluster assignments for use in a genome-wide linkage analysis.
In our genome-wide linkage study of CD (Gelernter et al., 2005), we examined CD as a unified entity and used cluster analysis to identify subgroups that may vary in terms of heritability and of the genes underlying their vulnerability. Based on a six-cluster solution, we found interesting linkage results with two of these clusters. In this manuscript, we provide evidence for the concurrent validity of these clusters by comparing the subtypes on a variety of cocaine-related features, as well as the prevalence of co-morbid substance use and psychiatric disorders.
We recruited 1393 subjects, from 660 small nuclear families. Of these families, 482 had at least 2 siblings with CD, 207 had at least 2 siblings with opioid dependence, and 156 had at least 2 siblings with both CD and opioid dependence. The average age of subjects was 39.2 years (range 17-79) and 51.8% were women. The majority of subjects (57.1%) were never married, 27% were divorced, separated, or widowed, and 15.9% were married. The ethnic/racial distribution of the sample was 49.6% African-American (AA), 33.0% European-American (EA), 12.6% Hispanic, and 4.8% Native American, Pacific Islander or members of other minority groups (according to subject self-report). With respect to level of education, 7.3% had only completed grade school; 39.5% had some high school, but no diploma; 30.7% had completed high school; and 22.6% had education beyond high school.
The most common lifetime DSM-IV (American Psychiatric Association, 1994) substance use and psychiatric disorders are shown in Table 1, both in the aggregate and separately by sex. Nearly 90% of individuals were cocaine dependent, as might be expected given that ascertainment was based primarily on that diagnosis. Nicotine dependence was the next most common diagnosis, with approximately two-thirds of individuals receiving that diagnosis, followed by alcohol dependence and opioid dependence, each with a prevalence of about 45%, and cannabis dependence, occurring in just over one-quarter of the sample. Major depressive episode (MDE) was the most common psychiatric disorder, with a prevalence of nearly 15%, followed by antisocial personality disorder (ASPD), which was diagnosed in about 12% of individuals. The frequency of posttraumatic stress disorder (PTSD) and compulsive gambling each approached 10%.
There was no sex difference in the prevalence of CD or nicotine dependence, or the less commonly diagnosed sedative dependence and stimulant (other than cocaine) dependence. However, men were significantly more likely to receive diagnoses of alcohol dependence, opioid dependence, and other substance (i.e., PCP, hallucinogens, inhalants, solvents, or combinations such as “speedballs”) dependence, as well as ASPD and compulsive gambling. In contrast, women were more likely to receive a diagnosis of MDE, PTSD, panic disorder, or agoraphobia.
Small nuclear families were recruited for participation in genomewide linkage studies of CD and opioid dependence. All subjects gave written, informed consent to participate, using procedures approved by the institutional review board at each participating site. Inclusion required the participation of a sibling pair, both members of which were affected with a lifetime diagnosis of either CD, opioid dependence, or both. Additional family members, including parents and other siblings (irrespective of their having a lifetime substance dependence diagnosis), were also invited to participate. Recruitment was conducted at substance abuse treatment programs and through clinical referrals and advertisements in local media in Connecticut (through sites at Yale University and the University of Connecticut); Boston, MA (McLean Hospital) and Charleston, SC (Medical University of South Carolina). Data were submitted electronically to a database at Boston University, where analysis was conducted.
Phenotypic information was obtained through administration of the SSADDA (Pierucci-Lagha et al., 2005), which includes a separate section for the diagnosis of cocaine dependence. Questions from the Addiction Severity Index (McLellan et al., 1992) were added to the SSADDA, to allow the estimation of lifetime measures of substance (including cocaine) use.
The SSADDA was formatted for computer-assisted administration, incorporating automated skip patterns and logical data-entry checks. The reliability of the SSADDA has been shown to be good-to-excellent for all of the major substance dependence diagnoses (Pierucci-Lagha et al., 2005). Specifically, the test-retest and inter-rater reliabilities for CD dependence diagnosed using criteria from the Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM-IV; American Psychiatric Association, 1994) were 0.92 and 0.83, respectively. Items (n = 68) from the cocaine use disorder section of the SSADDA were used to generate the clusters. These included age of onset and frequency and intensity of cocaine use, route of cocaine administration, occurrence of psychosocial and medical consequences of cocaine use, quit attempts, and cocaine abuse treatment sought and received.
All items from the cocaine use disorders section of the Semi-Structured Assessment for Drug Dependence and Alcoholism (SSADDA; Pierucci-Lagha et al., 2005) were used initially in the development of subtypes. Multiple correspondence analysis (MCA) (Greenacre, 1984; Lebart et al., 1984), a non-parametric data reduction method, was used to identify the underlying dimensions in the data from the CD section of the SSADDA (as briefly described previously in Gelernter et al., 2005). Binary symptoms included cocaine use characteristics, cocaine-related effects, and cocaine treatment history. Each study participant with phenotypic data was assigned a score on each of the retained dimensions (data not shown). An iterative k-means partitioning, using nearest centroid sorting (Hirano et al. 2003), was then conducted using several different starting points and a larger than expected number of clusters (k=50). Starting with a number of clusters that is far higher than a number we would accept as a final solution allows the identification of stable small groups. The next stage involved cross-classification of the results of the k-means clustering and retention of the groups that consistently clustered together. These groups and the remaining observations were then used in an agglomerative hierarchical clustering, with Ward’s method (Ward 1963) serving to identify the final cluster structure. Ward’s method avoids the idiosyncrasies that can occur with different starting seeds for k-means clustering by identifying intact groups for the hierarchical clustering that are joined prior to the agglomerative process (Hirano et al. 2003). It thereby retains the strengths of both types of clustering while mitigating the weaknesses. The selection of the final subgroup structure was based on a comparison of the within-to-between-group variation on items used to form the groups and group profiles on other variables. We considered solutions with larger numbers of clusters, though with more than six clusters, single individuals or small groups of outliers emerged, making it impossible to interpret the solution.
Binary logistic regression was used to estimate the probability of cluster membership for each study participant in each of the clusters. Variables selected for clustering were used in the estimation of the probability of cluster membership. Each study participant was assigned a score on each of the retained dimensions using a procedure similar to the assignment of factor scores. The natural logarithm of the probability of membership in each group was the dependent measure in the quantitative trait analyses. Chi-square analysis was used to compare clusters on categorical measures and analysis of variance (ANOVA) was used to compare clusters on continuous measures.
SPAD software (DECISIA: Paris, France; http://eng.spad.eu) was used for both the MCA and the clustering algorithms [see Ambrogi et al. (2005) for an alternate, compatible approach]. SAS software (SAS Institute, 2001) was used for subsequent analyses, including cluster profiling. Heritability of the log of probability of group membership was computed using Sequential Oligogenic Linkage Analysis Routines (SOLAR; Almasy and Blangero, 1998), with sex as a covariate. We used linkage analysis methods to identify regions of chromosomes that harbor genes influencing risk for CD; those are beyond the scope of the present report and are described in detail in Gelernter et al. (2005).
As shown in Table 2, cluster analysis yielded a 6-cluster solution, with the following distribution of individuals (% of the total) across the 6 clusters: Cluster 1 [“Heavy, Cocaine Use Predominant,” N = 336 (24.1%)], Cluster 2 [“Heavy, Mixed Drug Injector,” N = 303 (21.8%)], Cluster 3 [“Heavy Cocaine Use, Later Onset,” N = 350 (25.1%)], Cluster 4 [“Moderate Cocaine and Opioid Use,” N = 258 (18.5%)], Cluster 5 [“Low Cocaine and Opioid Abuse,” N = 104 (7.5%)], and Cluster 6 [“Opioid Abuse,” N = 42 (3%)]. Comparison of the clusters on demographic features showed significant differences among clusters on age, sex, race, and marital status. Clusters 1 and 3 contain predominantly AA women who were never married. Cluster 2 has the oldest subjects, contains the fewest women, and, together with Clusters 5 and 6, the fewest AAs. Clusters 2 and 6 also contain the largest percentage of individuals who were divorced, separated, or widowed. Although Clusters 4, 5, and 6 are all roughly evenly divided by sex, with the majority of individuals in each having never married, these clusters differ substantially from one another on race, with Cluster 5 containing the largest proportion of Hispanics.
The clusters were clearly differentiated on cocaine use characteristics, cocaine-related effects, and on cocaine treatment history (Table 3).
Consistent with the prevalence of CD in the clusters, Clusters 1, 2, and 3 each included a high percentage of individuals who reported having used cocaine daily or almost daily (96.2%, 96.4% and 92.0%, respectively). However, members of Clusters 1 and 2 reported earlier ages of initial cocaine use and of heaviest use than members of Cluster 3. Members of Cluster 1 were more likely than those of Clusters 2 or 3 to endorse having gotten higher and stayed higher longer than others when they first started to use cocaine. With respect to other measures of cocaine-related effects, including adverse ones, Clusters 1 and 2 were comparable to one another and generally higher than all other clusters, including Cluster 3. Subjects in Clusters 1 and 2 were more likely than those in Cluster 3 to have received formal treatment for cocaine abuse, but comparable proportions of the three clusters reported ever having attended a self-help group meeting due to cocaine abuse.
Cluster 2 members were predominantly intravenous cocaine users who progressed to their periods of heaviest cocaine use sooner after initiating cocaine use than did the members of Clusters 1 or 3 (6.8 vs. 8.3 years for both Clusters 1 and 3). The high rate of intravenous drug use is consistent with the high rate of opioid dependence in this cluster.
As shown in Table 3, members of Cluster 3 were more likely than those in Clusters 4, 5, or 6 to endorse a variety of cocaine use characteristics and cocaine-related effects. However, Cluster 3 individuals reported a later age of initial cocaine use than did Cluster 4 members, and their heaviest use occurred later than it did for members of Clusters 4, 5, or 6.
Although more than 90% of Cluster 4 members endorsed having used cocaine daily or almost daily, the prevalence of cocaine dependence was lower in this cluster than for the heavy cocaine use clusters. However, Cluster 4 had a higher prevalence of opioid dependence and a higher rate of intravenous cocaine use than did Clusters 1 or 3. More than one-third of Cluster 5 members met criteria for CD, though fewer than 3% endorsed daily or almost daily use of cocaine. However, more than 40% of this cluster met criteria for opioid dependence and nearly 20% endorsed intravenous cocaine use. Both Clusters 4 and 5 contained substantial proportions of individuals who endorsed cocaine-related effects and cocaine treatment histories.
Members of Cluster 6 reported the latest onset of both cocaine use and heavy cocaine use among the clusters. Cluster 6 also consistently had the lowest proportion of members who endorsed cocaine-related effects and cocaine treatment. However, this cluster had the second-highest prevalence of opioid dependence.
As can be seen in Table 4, more than 99% of individuals in Clusters 1-3 met lifetime DSM-IV diagnostic criteria for CD, a significantly higher percentage than that in Clusters 4, 5, or 6 (78%, 38%, and 7%, respectively, all significantly different from one another). Based on these findings and other characteristics described below, we identified these groups as Heavy Cocaine Use (Clusters 1-3), Low-to-Moderate Cocaine Use (Clusters 4 and 5), and Occasional or No Cocaine Use (Cluster 6) clusters.
Significant differences exist among the clusters on the lifetime prevalence of all other substance dependence diagnoses (Table 4). The Heavy Cocaine Use clusters showed a higher prevalence of nearly all substance dependence diagnoses than the other three clusters (which did not differ consistently from one another). Of the Heavy Cocaine Use clusters, Cluster 2 showed the highest prevalence of all categories of substance dependence, particularly opioid dependence; Clusters 4, 5, and 6 also showed a high prevalence of opioid dependence, despite having a lower prevalence of the other substance dependence diagnoses. The prevalence of cannabis dependence was highest in Clusters 1 and 2, with Clusters 3 and 4 being intermediate, and Clusters 5 and 6 having the lowest prevalence of this disorder.
Subjects in Clusters 1 and 2 also had the highest prevalence of major depressive episode (MDE), antisocial personality disorder (ASPD), and posttraumatic stress disorder (PTSD), with no consistent pattern of differences among the other clusters on these disorders. Differences in the prevalence of compulsive gambling, panic disorder, and agoraphobia showed a similar pattern, but were less pronounced, possibly due to the lower overall prevalence of these disorders.
Although not a co-morbid disorder, cocaine-induced paranoia (CIP) is a psychopathologic feature that occurs commonly among individuals with CD (Satel et al., 1991a; Brady et al., 1991; Rosse et al., 1995; Bartlett et al., 1997; Cubells et al., 2005). There is evidence that CIP is genetically influenced (Gelernter et al., 1994; Cubells et al., 2000). The prevalence of CIP varied significantly across clusters in a pattern similar to the co-morbid substance use and psychiatric disorders (Cluster 1: 87.5%, Cluster 2: 78.2%, Cluster 3: 52.9%, Cluster 4: 38.4%, Cluster 5: 35.6%, Cluster 6: 4.8%) [□2(5) = 295.8, p < 0.001].
All of the clusters showed significant heritability, with the estimate for each of the first four clusters exceeding 30% (Table 5). Although the size of the clusters was not fully explained by their size, the two smallest clusters (Clusters 5 and 6) showed the lowest heritability estimates.
As reported previously, we (Gelernter et al., 2005) conducted linkage analysis on a sub-sample of 986 individuals from the present study sample using CD diagnosis, CIP, and cluster membership as phenotypes. Interestingly, of the phenotypes examined (which included DSM-IV CD, CIP, and cluster membership), the strongest linkage results were observed for cluster membership. These findings included a lod score of 4.66 for membership in Cluster 1 on chromosome 12 (in EAs only) and a lod score of 3.35 for membership in Cluster 4 on chromosome 18 (in AAs only). In addition, there was suggestive evidence of linkage for Cluster 1 on chromosome 3 (in EAs only), for Clusters 1 or 3 on chromosome 6 (in the total sample), and for Cluster 3 on chromosome 2 (in AAs only).
DSM-IV CD may not represent the optimal CD-related phenotype for genetic mapping; other phenotypes might identify more genetically homogeneous sets of subjects. Based on demographic characteristics and cocaine use histories, we used a cluster analytic approach to identify phenomenologically distinct CD clusters. Subjects in Clusters 1, 2, and 3 were characterized by a history of heavy cocaine use, a high degree of CD severity (measured by the vast majority of individuals in these clusters having endorsed enough cocaine dependence symptoms to receive a diagnosis of CD). A preponderance of individuals in these clusters also reported having participated in both self-help recovery programs and formal treatment for cocaine abuse. In addition, although these measures were not included as variables in the cluster analysis, members of the heavy cocaine use clusters (particularly those in Clusters 1 and 2) were substantially more likely to meet diagnostic criteria for a variety of other substance dependence and co-morbid psychiatric disorders.
Cluster analysis, which is used to classify a set of observations into two or more mutually exclusive groups based on a set of interval measures, results in a solution in which members of the groups share properties in common (Stockburger, 1998). Cluster analysis allows many choices about the nature of the algorithm for combining groups and, although it will always produce a grouping, the structure of the resultant solution will vary with the method employed, the cases included in the analysis, and the quality of the data used in the analysis. To address these concerns, we applied a multi-staged, iterative approach to identify cluster membership, which effectively repeated the clustering to achieve stable groups. The advantages of the modified strategy are that the clustering is repeated with several starting points and two different methods are used to obtain stable clusters, which should increase the generalizability of the cluster results. In addition, careful quality assurance procedures were employed in data collection using the SSADDA (Pierucci-Lagha et al., 2005), which is likely to have reduced variance attributable to poor reliability of the assessment. These factors may help to explain why the six-cluster solution obtained here differs from the two-cluster solution obtained previously in samples of cocaine- and other drug-dependent individuals analyzed using a simple k-means clustering procedure (Ball et al., 1995; Feingold et al., 1996; Ball et al., 1998; Basu et al., 2004). In addition to the different analytic approach, the current study sample differs from the clinical samples recruited in prior studies, in that the family-based recruitment approach led to the inclusion of a substantial number of individuals without drug dependence diagnoses.
The validity of the cluster solution was supported by its capacity to discriminate between variables that were not used to create the grouping (Stockburger, 1998). That is, the clusters differed significantly on demographic features, but also particularly on prevalence rates of co-morbid substance use and psychiatric disorders. The differences among the clusters were further confirmed by estimates of the heritability of the trait embodied in cluster membership, which was greatest for the heavy cocaine use clusters. Because the sample was ascertained using an affected sib pair strategy, the heritability estimates may be inflated; consequently, they were treated as directional, and were used to select traits to examine with a genomewide linkage analysis (Gelernter et al., 2005). Linkage analysis yielding genomewide significant findings in a sub-sample of individuals from the sample described herein using cluster membership as phenotypes provides further validation of the subtyping procedure.
These findings lend support to the use of a data reduction and cluster-analytic strategy to provide more homogeneous subgroups of CD subjects for linkage analysis. Because the choice and implementation of the clustering algorithm, as well as data quality and the stability of the underlying groups can influence the likelihood of replicating the results of cluster analyses, we applied a multi-staged approach to identify cluster membership. The advantages of this strategy are that the clustering is repeated with several starting points and two different methods are used to obtain stable clusters, increasing the generalizability of the cluster results.
This novel application of clinical subtyping to a molecular genetic investigation of drug dependence is consistent with prior efforts to explain the phenotypic variability among alcoholics (Cloninger et al., 1981; Babor et al., 1992) and drug abusers (Ball et al., 1995; Feingold et al., 1996; Ball et al., 1998; Basu et al., 2004). The approach satisfied a stringent criterion for validity of the cluster structure by demonstrating its utility as a phenotype for linkage analysis. Further analysis of the subtypes provided evidence of their concurrent validity. By reducing genetic heterogeneity, this subtyping approach represents a powerful and useful refinement for the diagnosis of CD. Replication of the findings reported here in a second sample of individuals with CD would add further to evidence of the validity and utility of this approach.
This work was supported by NIDA grants R01 DA12849, R01 DA12690, K24 DA15105, and K02 DA00326, and NIAAA grant K24 AA13736. We appreciate the efforts of the following SSADDA interviewers: Michelle Slivinsky, Michelle McKain, Deborah Pearson, Kevin Young (at Univ. CT); Alisha Pollastri, Yari Nunez, Matthew Madura, and Melyssa Pokrywa (at Yale); Victoria deMenil and Catherine Cogley (at McLean); and Heather Remy (at MUSC). Jennifer Blesso and John Farrell provided excellent database support. Alisa Manning, Deborah Cebrik, and Carolien Panhuysen assisted in statistical analyses and data management.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.