|Home | About | Journals | Submit | Contact Us | Français|
The purpose of this study is to identify population subgroups of adolescents who are homogenous with respect to sociodemographic factors and potentially modifiable risk and protective factors related to overweight status in a nationally representative sample of adolescents ages 12–17. Methods: The data used for this study are from the Centers for Disease Control and National Center for Health Statistics' National Survey of Children's Health, 2003 (NSCH). Classification and Regression Trees (CART) were used to identify population segments of adolescents based on risk and protective factors for obesity.
In the final CART model, 12 variables remained, including: poverty level, race, gender, participation in sports, number of family meals, family educational attainment, child physical activity, participation in free lunch programs, neighborhood safety and connectedness, TV viewing time, and child age in years. Poverty level was determined to be the most variable related to weight status in this sample of adolescents. Adolescents living in households below approximately the 300% poverty level were subject to a different constellation of predictors than adolescents living in homes above the 300% poverty level.
Our results demonstrate how risk and protective factors related to obesity emerge differently among sociodemographic subgroups and the relative importance of these risk and protective factors in relation to adolescent overweight status. Interventions that work for one population subgroup may not work for another.
Overweight and obesity have been on the rise over the past 3 decades, with the prevalence among adolescents more than doubling during this time . The overweight epidemic among U.S. youth is of public health concern because of its immediate and prolonged negative psychosocial and physical health outcomes. In the psychosocial domain, youth obesity is linked to weight concerns, negative self-evaluations, and problems with peer relationships . Regarding physical health outcomes, youth obesity is linked to cardiovascular disease risk, abnormal glucose tolerance, elevated risk for type 2 diabetes, and hypertension . Given that youth obesity persists into adulthood , overweight youth are at increased risk of developing obesity-related morbidities in adulthood.
Youth obesity is caused by multiple factors working collectively, including individual-, family-, and community-level factors . Sociodemographic factors such as race/ethnicity  and risk and protective factors such as physical and sedentary activity , caloric intake , family level socioeconomic status [8,9], family meal frequency , parent physical activity , and neighborhood conditions , to the extent that they influence dietary and activity patterns , are known correlates of adolescent weight status. However, the extent to which these factors co-act to confer differential risk or protection for subsets of the population (e.g., black females) is unknown. Thus, identifying the most relevant risk or protective factors to target in efforts to prevent or reduce obesity in diverse groups of adolescents, can be challenging.
One approach to improving the effectiveness of prevention and health promotion efforts designed to address adolescent obesity is audience segmentation. Audience segmentation refers to the process of partitioning a large, heterogeneous population into homogeneous subgroups of individuals based on shared factors related to an outcome of interest . This approach has been used for identifying high-risk groups for physical activity , and dietary supplement use , based on a constellation of risk factors. Once the homogeneous subgroups are identified, health promotion activities can be tailored to their specific needs. A previous study using an audience segmentation approach to understanding risk factors for overweight in more than 4,000 German children ages 4 to 7 years highlighted a number of individual (e.g., weight gain from birth to 2 years), parental (e.g., parental weight status), and sociodemographic (e.g., parental education) factors that were predictive of overweight at school entry . We employed audience segmentation to identify homogenous adolescent population subgroups with respect to sociodemographic factors and potentially modifiable family and neighborhood risk and protective factors related to overweight status in a nationally representative sample of adolescents. This approach may provide better information regarding risk and protective factors associated with youth obesity to develop effective preventive efforts, particularly given the multifactorial nature of obesity.
We relied on the Centers for Disease Control and National Center for Health Statistics' National Survey of Children's Health, 2003 (NSCH). This survey is a module of the State and Local Area Integrated Telephone Survey, and includes data on physical, behavioral health indicators, and information on family and neighborhood environments for children ages 2 to 18 in the United States; data from adolescents, ages 12 to 17 years are used in the present study. A total of 102,535 surveys were completed, with a response rate of 55.3%. Data were sampled within each of the 50 states. One child was randomly selected from each identified household. Only respondents who indicated that they were the child's mother or father were included in the analysis.
We selected measures that are known in the literature to be associated with obesity in youth. We focused on adolescent, family, and neighborhood level factors known to relate positively or negatively to youth body mass index (BMI).
We calculated adolescents' BMI from parent-reported height and weight using the Centers for Disease Control BMI calculator for youth ages 2 to 19. This calculator supplies age- and gender-specific BMI and the corresponding BMI percentiles based on standardized reference criteria . We further dichotomized BMI classification as overweight (= 2) and not overweight (= 1). Adolescents were classified as overweight at BMI ≥ 95th percentile and not overweight at BMI <95th percentile.
Parents rated adolescents' overall health; responses were coded as 1 (excellent, very good, or good) or 2 (fair or poor). Parents also reported whether the adolescent had been diagnosed with diabetes (yes = 1/no = 0).
Physical activity was estimated by asking how many days in the past week the adolescent spent in physical activity for at least 20 minutes that made him/her breathe hard and sweat (e.g., basketball, running, or fast bicycling); responses ranged from 0–7. Sports participation was estimated by asking whether the adolescent was on a sports team or took sports lessons after school or on the weekends in the past 12 months (yes = 1/no = 0). We also assessed the amount of time (hours/day) adolescents spend on an average school day: (a) watching television/videos or playing video games, and (b) using a computer for purposes other than schoolwork.
Included adolescent's age (in years), race/ethnicity (white = 1, black = 2, or Hispanic = 3), and gender (male = 1 or female = 2).
Parents reported whether they regularly exercised or played sports in the past month vigorously enough to make them breathe hard or make their heart beat fast (yes = 1/no = 0).
Less than high school (1), high school (2), greater than high school (3).
This measure is based on the Federal Poverty Level (FPL). It is defined as the minimum amount of income that a family needs for food, shelter, clothing, and other necessities and varies by household family size. The Department of Health and Human Services (DHHS) sets the FPL for families residing in the United States. FPLs are updated, adjusted for inflation, and reported annually by DHHS. Percent of FPL indicates how far a household income is above the poverty level. For example, for a family of four, 300% above poverty level would equate to a gross yearly income of $63,600 . In this study, poverty level was treated as a continuous variable, ranging from 1–8, was expressed as percent of poverty level, and included eight levels: 1 = less than 100% poverty level; 2 = 100% to 133% poverty level; 3 = 133% to 150% poverty level; 4 = 150% to 185% poverty level; 5 = 185% to 200% poverty level; 6 = 200%–300% poverty level, 7 = 300%–400% poverty level, and 8 = ≥ 400% poverty level.
Parents reported the frequency with which all family members who live in the household ate a meal together in the past week; response options ranged from 0–7.
Parents reported whether they were: (a) biological or adoptive parents, or a two-parent step-family, (b) a single parent, or (c) other.
Parents reported whether their family participated in any of the federal food (food stamps, received free/reduced school meals, or WIC) or cash assistance programs in the past 12 months (yes = 1/no = 0).
We created two variables to measure parent perceptions of neighborhood characteristics:
Parents responded to four questions that we averaged to create a variable that indicated sense of connectedness in their neighborhood. Items included, “People in this neighborhood help each other out”; “We watch out for each other's children in this neighborhood”; “There are people I can count on in this neighborhood”; and “If my child were outside playing and got hurt or scared, there are adults nearby whom I trust to help my child.” Mean scores ranged from 1 (definitely agree) to 4 (definitely disagree), with lower scores indicating more positive perceptions of the connectedness in their neighborhood.
Parents responded to four questions that we averaged to create variable indicating parent perceptions of neighborhood safety. Items included: “There are people in this neighborhood who might be a bad influence on my child/children”; “How often do you feel your child is safe in your community or neighborhood?”; “How often do you feel your child is safe at school?” and “How often do you feel your child is safe at home?” Response options for the first question (which we reverse coded) ranged from 1 (definitely agree) to 4 (definitely disagree) and from 1 (never) to 4 (always) for the remaining three questions. Higher scores indicate more positive perceptions of neighborhood safety.
We used descriptive statistics to examine the distribution of risk factors among the sample of adolescents and employed chi-square tests to evaluate bivariate relationships between BMI class and risk/protective factors. We used Stata version 9.1 survey procedures to account for survey weights and sampling design (Stata Corporation, College Station, TX).
Classification and Regression Trees (CART) were used to identify population (audience) segments of adolescents based on risk and protective factors for obesity. [(CART analyses were conducted in CART 6.0 software. (CART is a registered trademark of California Statistical Software, Inc., and is exclusively licensed to Salford Systems .). We selected CART to examine complex interactions among multiple risk factors that may not be apparent or may be difficult to interpret in a traditional regression analysis, and for its ability to identify and segment homogeneous and possibly high risk subgroups of the population, based on similar characteristics, that may benefit from different or tailored intervention strategies. CART generates a multivariable description of individuals who are members of a subgroup, whereas regression is based on the identification of variables as they relate to the outcome, averaged over all individuals .
Specifically, CART was used to develop of a classification and regression tree to stratify the study sample into meaningful homogenous subgroups in relation to a particular target variable. In our case, the target or the dependent variable is overweight status (overweight or not overweight). Predictors include the child, parent, family, and neighborhood measures described in the measurement section.
Development of the tree involves several steps, including growing the tree, pruning the tree, and finally, validating the tree structure. The entire sample forms the root node, which represents the full sample. Daughter nodes represent the most homogeneous split from the root node or daughter nodes in a previous layer . Through binary recursive partitioning, the root node is then split into smaller daughter nodes. Each node in a layer is a subset of the root node.
Numeric and ordinal versus categorical variables are split in different ways. The number of possible splits allowed for numeric or ordinal variables are one fewer than the number of observed values. The number of possible splits for a nominal variable is based on the number of possible permutations of the levels (ki) in the variable (e.g., number of splits = 2k−1 − 1).
This splitting continues until homogenous terminal nodes are established. Although complete homogeneity in the terminal nodes is ideal, it not typically achieved. The goal of CART is to partition the purest possible node. The Gini index was employed as the indicator of node purity . This index identifies the independent variable and the corresponding cutpoint that leads to the most homogeneity in the two groups that result from the split. When splitting a node, two factors are considered: the goodness of the split and the amount of impurity in the daughter nodes.
Each terminal node was set to require a minimum of 100 individuals. We then pruned the tree, resulting in the creation of simpler trees through cutting off of unimportant nodes. Finally, we selected the optimal tree, which is the best fit from our pruned trees and does not overfit the data. We pruned the tree based on 10-fold crossvalidation. We used the weight option invoke the NSCH survey weights. CART procedures have built-in methods to impute missing data based on the pattern of other variables in the dataset. (Details on how CART deals with missing values can be found in Brieman 1984, p. 142–146 .)
A total of 35,184 adolescents who met the age criteria were included in the sample; 50% of the sample was male. Seventy-six percent of adolescents were white, 17.9% were black, and 5.7% were Hispanic. Approximately 67% had an adult who completed high school or greater. Approximately 15% of the families live below federal poverty level. More boys were overweight compared to girls (16.1% and 8.6%, respectively, p < .05). More blacks were overweight (21.2%) compared to white (10.8%) and Hispanics (15.6%).
Table 1 displays the distribution of selected risk and protective factors by BMI classification. Compared to nonoverweight adolescents, overweight adolescents were less likely to participate in sports (48.1% vs. 60.2%, p < .0001) and regular physical activity (3.9 days/week vs. 3.4 days/week, p < .0001). Overweight adolescents were more likely to spend time watching TV (1.7 hours vs. 2.0 hours, p < .05) and receive free lunch at school (51.7% vs. 60.8%, p < .0001) compared to their nonoverweight counterparts. Overweight adolescents were also more likely to live in low-income homes (5.5 vs, 4.6, p < .0001) and single-family homes (36.6% vs. 26.6%, p < .0001). Mothers of overweight adolescents were less likely to participate in regular physical activity (56.8% vs. 60.4 %, p < .05), and more likely to be in fair to poor health versus good/very good or excellent health (80.9 vs. 88.9, p < .0001) compared to parents of nonoverweight adolescents. Overweight adolescents were also less likely to live in a home where at least one adult had finished high school, compared to their nonoverweight counterparts (55.0% vs. 69.1%, p < .0001). Parents of overweight adolescents were also likely to report lower ratings of perceived neighborhood safety (3.19 vs. 3.26, p < .0001) and connectedness (1.58 vs. 1.75, p = .0001) compared to parents of nonoverweight adolescents.
The final model contained 19 nodes and had a misclassification error of 30.01%. The target class was set as overweight. After pruning the tree, 12 variables remained in the model, including poverty level, race, gender, participation in sports, number of family meals, family educational attainment, child physical activity, participation in free lunch programs, neighborhood safety and connectedness, TV viewing time, and child age.
Terminal nodes ranged in size from N = 123 to 8,156. The most important variable for determining overweight status was poverty level, which split from the root node. Poverty level split between a value of 6 and 7 or at approximately 300% of poverty level. For the distribution of poverty level among the sample of adolescents' families, see Table 2. Figure 1 represents the overall tree structure and the size and position of each node.
Adolescents residing in households below the ~300% poverty level were subject to a different constellation of predictors than adolescents living in homes above ~300% poverty level. In adolescents below the ~300% poverty level and being male contributed to overweight (terminal node). No additional predictors influenced overweight in adolescent males living below ~300% poverty level.
For female adolescents living below the ~300% poverty level, gender, race/ethnicity, free lunch status, family education, neighborhood connectedness, and family meals were important contributors to overweight status. Specifically, for white females receiving free lunch, living in a low-income and low-educational attainment household, residing in a neighborhood with limited connectedness, and eating more than an average of 3.5 family meals per week in combination with living close to the poverty level was related to increased probability of being overweight. Black and Latina females were more likely to be overweight (18.4%) compared to white females (9.8%). No predictors variables explained overweight status in minority female adolescents living below the ~300% poverty level.
For adolescents living above the ~300% poverty level, gender, race, TV viewing time, physical activity, family meals, age, neighborhood safety, and neighborhood connectedness were found to be important contributors to overweight status.
For females, exercising vigorously was protective against overweight. Female adolescents who exercised at least 2.5 days/week were less likely to be overweight (4.2%) compared to those who exercised less than 2.5 days/week (6.1%). Among female adolescents who exercised less than 2.5 days/week, TV viewing of more than 1.5 hours/day was associated with increased likelihood of being overweight. Lower neighborhood connectedness in combination with viewing TV more than 1.5 hours/day further increased the overweight risk.
For male adolescents, being either black or Latino was associated with being overweight (19.4% of black and Latino adolescents: terminal node) compared to whites (11.3%). For white males, TV viewing less than 1.5 hours/day was associated with a lower chance of being overweight (9.5%) compared to white male adolescents who watched TV more than 1.5 hours/day (12.5%) or 2.5 hours/day (18.9%: terminal node). For white males who watched TV less than 1.5 hours/day, participation in sports provided further protection from being overweight. White male adolescents were more likely to be overweight during puberty (ages 12–14.5) when they watched fewer than 1.5 hours/day of TV and did not play sports. Among white males who watched TV 1.5 to 2.5 hours/day, low overweight risk was associated with having more than 1.5 family meals/week and further protection was provided when they lived in connected neighborhoods. Overall, protective factors of being overweight for white male adolescents include: watching less than 1.5 hours/day of TV, having more than 1.5 family meals/week, living in a neighborhood perceived to be safe, and being beyond pubertal age.
Classification and regression trees were used to identify obesity-related risk profiles for subgroups of adolescents. Our results demonstrate that complex combinations of obesity-related risk factors differ among subgroups of adolescents. Similar to other studies, we found that poverty level was an important risk factor for obesity [24,25]. In addition, our study found that different risk constellations emerged for adolescents above and below 300% of poverty level and for males versus females.
Few studies have assessed the combined effect of risk factors for adolescent obesity to identify high-risk subgroups using nationally representative samples. Boone-Heinonen and colleagues (2008) used cluster analysis to identify populations subgroups similar in obesogenic behavior . This study found that dietary and physical activity behaviors such as participation in sports and restrictive dieting clustered differently for males versus females. Similarly, Singh et al  employed joint association estimation techniques to investigate the independent effect of child and neighborhood factors and joint effects of race/ethnicity, socioeconomic status, TV viewing, and physical activity on adolescent obesity . The authors reported that the prevalence of obesity differed among clusters of risk groups.
Our study adds to this developing body of literature by employing a tree-based regression method to identify (a) population subgroups with similar obesity risk-related profiles and (b) the relative importance of risk factors for obesity among these population subgroups. Furthermore, this method provides a simple visual in the form of logical if–then statements that displays how risk factors are interrelated. In our study, for adolescents living below the 300% poverty level, risk and protective factors parsed differently for male and female adolescents. For white female adolescents, living in a low-income and low-education household and eating more than three family meals per week increased the risk of obesity, whereas living in similar low-resource households and a highly connected neighborhood conferred protection against overweight. Because female adolescents are often less likely than male adolescents to participate in organized sports , perhaps living in a neighborhood with a greater sense of collective efficacy may encourage unorganized physical activity. No variables explained risk for overweight in male adolescents living in a low poverty level environment, suggesting that poverty plays a major role for obesity in this subgroup.
Our finding that greater family meals conferred greater risk for white females near the poverty level is one particular finding that highlights the utility of an audience segmentation approach. Previous research shows that eating more family meals is protective of overweight in adolescence [29–31]; however, in the current study, eating frequent family meals was protective for white males but was counterproductive for white females near the poverty level. Perhaps eating family meals together is only protective when they are of high dietary quality. Thus, an obesity prevention program focused on increasing family meals would need to tailor the program for families at different poverty levels. Family meals may differ drastically in families that can afford to provide healthful meals for their children compared to families that cannot afford to purchase similar meals. Further studies are needed to delineate the specific mechanisms by which family meals are associated with weight status in youth.
For both male and female adolescents living above the 300% poverty level, TV viewing and physical activity was predictive of overweight and obesity. Interestingly, these factors did not parse to be as relevant for adolescents living below 300% poverty level. Our findings are consistent with Singh et al, who reported a greater risk reduction for both TV viewing and physical activity among more affluent adolescents as compared to less affluent adolescents.
Our model provided limited information regarding factors that affect black and Latino adolescents. In fact, the tree terminated shortly after black and Latino adolescents spilt from white adolescents. Although we studied common correlates of childhood obesity, other variables that are not captured or measured in this study issues likely explain overweight in black and Latino adolescents. Cultural issues, such as beliefs about food, exercise, body image, and sociostructural and environmental factors may be more important correlates of overweight in black and Latino adolescents [32–35]. These findings in particular highlight the need to consider the complexity of risk factors in different racial/ethnic groups when developing prevention programs.
This study has several limitations. The survey consisted of single-item measures and was based on parental reports; thus, the items may not fully represent the constructs of interest. However, parental reports of child height and weight have been found to be reliable . Second, objective measures of body composition, food intake, and physical activity were not available in the dataset, limiting the validity of parents' reports of these characteristics and behaviors. Possibly, parents of overweight youth may have over-reported or under-reported such factors.
Furthermore, parental weight status is an important determinant of youth overweight. Parents contribute both genetic and environmental influences to children's weight status . Overweight in youth may partly result from families with overweight parents whose lifestyle factors differ from those of nonoverweight parents. We did not have data on parents' weight status in this study, which precluded us from examining parental weight status as a risk factor for overweight in this sample of adolescents.
Additionally, the CART method is not based on a probabilistic model but driven solely by the data. As a result, confidence intervals are not calculated; crossvalidation methods are the only means of establishing the predictive power of the model for new datasets . Also, compared to standard regression models, CART may miss variables that are relatively weak effects but have uniform effect across the entire sample . Furthermore, in CART, misclassification error is parallel to sensitivity in a binary classification test. For classifications issues where identifying someone who has the condition is critical (e.g., HIV or cancer), sensitivity would need to be higher. When casting an overall net for health promotion-based intervention strategies, a misclassification error around 30% is not uncommon and may be higher .
Our results demonstrate how constellations of risk and protective factors related to obesity emerge differently among different sociodemographic groups and the relative importance of these risk and protective factors in relation to adolescent overweight status. Multiple factors interact to confer risk differently for different population subgroups. Interventions that work for one population subgroup may not work for another. In sum, adolescent obesity health promotion and intervention strategies may benefit from accounting for risk factor complexity among population subgroups.