|Home | About | Journals | Submit | Contact Us | Français|
Multilevel modeling techniques were used with a sample of 643 students enrolled in 37 secondary school classrooms to predict future student achievement (controlling for baseline achievement) from observed teacher interactions with students in the classroom, coded using the Classroom Assessment Scoring System—Secondary. After accounting for prior year test performance, qualities of teacher interactions with students predicted student performance on end-of-year standardized achievement tests. Classrooms characterized by a positive emotional climate, with sensitivity to adolescent needs and perspectives, use of diverse and engaging instructional learning formats, and a focus on analysis and problem solving were associated with higher levels of student achievement. Effects of higher quality teacher–student interactions were greatest in classrooms with fewer students. Implications for teacher performance assessment and teacher effects on achievement are discussed.
Improving the quality of teacher–student interactions within the classroom depends upon a solid understanding of the nature of effective teaching for adolescents. A number of descriptions of classroom environments or quality teaching have been put forth in the educational and developmental literatures listing factors likely to be related to student learning (e.g., Brophy, 1999; Eccles & Roeser, 1999; Pressley et al., 2003; Soar & Soar, 1979). Hamre and Pianta (Hamre & Pianta, 2010; Hamre, Pianta, Burchinal, & Downer, 2010) developed an assessment approach that organizes features of teacher–student interactions into three major domains: emotional supports, classroom organization, and instructional supports. This approach to assessing classroom interaction qualities has been tested and validated for the grades of prekindergarten to five, with evidence supporting this latent structure of dimensions and domains across grades and across content areas (Hamre et al., 2010).
The Classroom Learning Assessment Scoring System—Secondary (CLASS-S; Pianta, Hamre, Hayes, Mintz, & LaParo, 2008) was developed for secondary schools as an upward extension of previous work. Within each domain considered are specific dimensions of classroom interactions that past research suggests are likely to be important to student learning (see Figure 1). The importance of qualities of emotional and relational support in the classroom is suggested by both attachment and self-determination theories (Bowlby, 1969/1982; Connell & Wellbom, 1991; Pianta, 1999), and is captured via assessments of the dimensions of positive classroom climate, teacher sensitivity, and regard for adolescent perspectives. Although the need for emotional support of students is perhaps more self-evidently important in the lower grades (Ladd, Birch, & Buhs, 1999), adolescents are highly sensitive to the emotional rapport they establish with adults in school settings, and the experience of strong connections to adults has been consistently linked to long-term academic success (Allen, Kuperminc, Philliber, & Herre, 1994; Bell, Allen, Hauser, & O’Connor, 1996). The organizational support domain encompasses dimensions of classroom management, productivity, and use of varied instructional learning formats, which facilitate the development of adolescent self-regulation skills and enhance learning (Blair, 2002; Cameron, Connor, & Morrison, 2005; Emmer & Stough, 2001; Paris & Paris, 2001; Raver, 2004). The instructional support domain reflects teachers’ content understanding, focus on analysis and problem solving, and quality of feedback, which are areas that have long been recognized to allow students to learn on a deep level (Marton & Saljo, 1976; National Research Council, 2005).
The domain approach used by the CLASS-S aligns well with constructs from several existing theoretical and practical approaches. For example, Brophy (1999) described 12 principles of effective teaching encompassing many of the same dimensions as the CLASS-S, including supportive classroom climate, opportunities to learn, curricular alignment, thoughtful discourse, scaffolding engagement, and achievement expectations. Similarly, Pressley and colleagues (2003) draw from their studies of effective teachers (e.g., Bogner, Raphael, & Pressley, 2002; Wharton-McDonald, Pressley, & Hampston, 1998) to suggest that effective teaching strategies can be organized into decisions regarding motivational atmosphere, classroom management, and curriculum and instruction. Eccles and Roeser (1999) suggest that schooling is optimally characterized by organizational, social, and instructional processes that help regulate children and adolescents’ development across cognitive, social-emotional, and behavioral domains. A similar approach has been taken by McCaslin, Burross, and Good (2005) in their examination of classroom setting effects on student motivation. The CLASS-S draws upon constructs from each of these frameworks; however, a key challenge raised by all of these theoretical frameworks is how best to operationalize and assess the constructs they delineate in ongoing teacher–student interactions. The CLASS-S seeks to meet this challenge by conceptualizing and operationalizing these constructs in terms of observable, ongoing qualities of teacher–student interactions.
In terms of actual assessment approaches, a long line of research suggests the value of observation in classrooms as a means for capturing the social assets in those settings (Brophy & Good, 1986; Pianta & Hamre, 2009; Shinn & Yoshikawa, 2008; Tseng & Seidman, 2007). The CLASS-S builds on a rich tradition of approaches to observing instructional environments. The Instructional Environment System—11 (Ysseldyke & Christenson, 1987, 1993) approach outlines an array of qualities of instructional support and notes the role of positive emotional climate, although it has not been linked to achievement outcomes in typical classroom environments. Similarly, the Ecobehavioral Assessment Systems Software (Greenwood, Carta, & Dawson, 2000; Greenwood, Carta, Kamps, & Terry, 1994), initially developed for use in alternative school classrooms, addresses many similar constructs to the CLASS-S (Wallace, Anderson, Batholomay, & Hupp, 2002; Watson, Gable, & Greenwood, 2011). One key distinction, however, is that the CLASS-S focuses on broad patterns of interaction assessed at a molar level, as opposed to the time-sampling and counting of discrete behaviors as in the Ecobehavioral Assessment Systems Software. Thus, the CLASS-S is in keeping with principles of developmental psychology that suggest the importance of a focus on the broader organization of molar patterns of behavior as a means to get at subtle processes not easily captured via counts of discrete behaviors (Sroufe, Egeland, Carlson, & Collins, 2005), such as emerging self-regulation skills and resilience. At the younger grades, there is emerging evidence that a broad range of classroom interaction qualities can be observed and linked to student learning gains using this more global yet standardized approach to assessment (Landry et al., 2006; Pianta, Belsky, Vandergrift, Houts, & Morrison, 2008). Similarly, an intervention approach based on the CLASS-S framework has demonstrated evidence of efficacy along with evidence that program effects may be at least partially mediated via a single global measure of teacher–student interaction qualities from this system (Allen, Pianta, Gregory, Mikami, & Lun, 2011).
Unfortunately, outcome-based research on classroom observation as a predictor of actual student achievement is still relatively rare in secondary education. A modest amount of research beyond the elementary grades has successfully observed teaching in specific content areas or for specific lessons (Johnson, Kahle, & Fargo, 2007; Roth et al, 2006; Weiss, Pasley, Smith, Banilower, & Heck, 2003). Seidel and Shavelson (2007) conducted a broader meta-analysis of 125 studies using a variety of methodologies to link teacher behavior to student achievement. They did not break out the tiny amount of observational research from nonobservational research at the secondary level, but did note that overall, assessments of teaching relied heavily upon distal and proxy variables (e.g., survey data) of questionable reliability and validity. They concluded that effects on secondary student academic performance from current approaches to measuring teaching quality were too small to be of practical significance (e.g., effect sizes ≤ 0.04). Thus despite the strong theoretical interest in identifying qualities of teacher–student interactions linked to student achievement, scientific evidence is quite sparse regarding our capacity to identify and observe the critical features of teacher–student interactions that actually predict student learning within the secondary school classroom. Virtually no evidence exists regarding the effectiveness of assessment systems designed to capture broad interactional patterns and apply across diverse content areas at the secondary level.
One significant potential confound in efforts to identify qualities of effective teacher–student interactions linked to student achievement is the likelihood that high-quality interactions may come more easily among students who are already academically motivated and successful. Given the likelihood that students are to some degree tracked into higher and lower achieving groups in secondary schools (either explicitly or implicitly), different teachers are likely to face students with very different characteristics at the start of an academic year. End-of-year student test scores are highly dependent on preexisting student levels of academic proficiency and are typically highly correlated with prior year test scores. Failure to account for prior year test scores would thus misattribute variance in student achievement that would be more directly accounted for by background student proficiencies. In keeping with the growing recognition of the importance of “value-added” approaches to assessing student learning (Hanushek, Rivkin, Figlio, & Jacob, 2010; Roth-stein, 2010), we assess end-of-year test scores after first accounting for prior year test scores, which we consider to be an indicator of student academic proficiency independent of the current classroom environment.
Some qualities of teacher–student interactions may primarily reflect student characteristics as they enter class at the start of the year, particularly to the extent that students are implicitly or explicitly grouped into higher and lower achieving classes. Without awareness of this possibility, it would be all too easy to misattribute the qualities of classroom interactions to teacher skill levels, rather than recognizing that they may primarily reflect the academic characteristics of the students they are teaching. Thus, to provide the most helpful and balanced information to school personnel, this study also focused on identifying qualities of teacher–student interaction that are linked to preexisting student characteristics at the beginning of a school year, which are also qualities that might otherwise be misattributed to teacher skill. By identifying such student-driven qualities, this study seeks to provide appropriate contextual balance to our emerging picture of the role of teacher–student interactions: showing not only where these interactions predict future achievement, but also where they may also simply reflect preexisting student characteristics, rather than simple teacher skill.
This study addressed three overarching goals:
In pursuing these goals, we account for key contextual and background factors as both covariates and potential moderators (e.g., class size, and student gender, grade level, and family poverty status). Analyses also considered whether the predictive efficacy of the CLASS-S might be moderated by classroom content domain (e.g., math/science vs. English/social studies).
Participants were 643 students enrolled in 37 classrooms (across 11 schools in six districts), which served as the control, teaching-as-usual condition classrooms in a larger study of an intervention to improve classroom interaction qualities. The larger intervention study included 78 classrooms altogether and 1267 students in Year 1 and is further described in Allen, Pianta, Gregory, Mikami, & Lun (2011). Students were eligible for participation if they (a) were in a classroom participating in the control condition of the intervention study, (b) had parental consent, and (c) provided their own informed assent to participate. Teacher informed consent to participate in the study was also obtained. Each teacher was asked to identify one class that he or she considered “challenging” to teach, and only one classroom per teacher was assessed in the study. Classes averaged 23 students (SD = 6.1) in size and an average of 76.4% of students participated from each class. Classrooms were approximately equally divided between math or science courses (N = 17), and history, social studies, or English courses (N = 20). Table 1 presents descriptive information regarding both students and teachers; the table indicates that both groups consisted of racially and ethnically diverse samples, and that the participating teachers reflected a broad range of experience levels.
The CLASS-S was the primary source of standardized observational data on teacher-student interactions in the classroom. The pre-K and elementary versions of the CLASS are among the most current and widely used standardized assessments of social and instructional interactions in classrooms (Burchinal et al., 2008; Howes et al., 2008; McCaslin et al., 2005; Rimm-Kaufman, Curby, Grimm, Nathanson, & Brock, 2009). The CLASS-S version was modified to capture precisely those aspects of classroom interactions that we hypothesize to be critical resources for educational achievement in adolescence. The CLASS-S consists of a set of global 7-point rating scales (one rating scale for each dimension below) with behaviorally anchored scale points providing detailed descriptions of a specific dimension of classroom process and its scaling from low to high. The CLASS-S scales are organized into three overarching domains, similar to those reported in factor analyses of the elementary version:
The Emotional Support domain is composited from subscales for Positive Climate, reflecting warmth and sense of connectedness in classroom; Negative Climate, reflecting expressed negativity in classroom; Teacher Sensitivity, reflecting responsiveness to student academic/emotional needs; and Regard for Adolescent Perspectives, reflecting the teacher’s ability to recognize and capitalize on student needs for autonomy, active roles, and peer interaction in the classroom.
The Classroom Organization domain is composited from subscales for Behavior Management, reflecting teacher ability to use effective methods to encourage desirable behavior and prevent/redirect misbehavior; Productivity, reflecting teacher ability to manage the classroom so as to maximize instructional time; and Instructional Learning Formats, reflecting teacher use of varied and interesting materials and teaching techniques in an organized fashion.
The third domain is Instructional Support, which is composited from subscales for Content Understanding, reflecting teacher presentation of content within a broader intellectual framework; Analysis and Problem Solving, reflecting emphasis upon engaging students in higher order thinking skills; and Quality of Feedback, reflecting provision of contingent feedback designed to challenge students and expand their understanding of a concept.
Student academic achievement was assessed using the Standards of Learning (SOL; Commonwealth of Virginia, 2005). The SOL is the state-mandated accountability measure for the Commonwealth of Virginia that is administered to meet the requirements of No Child Left Behind It was first used in 1998, with a 7-year period for schools to align their curricula and adjust to the testing requirement before the school accreditation process began (Commonwealth of Virginia, 2005). The SOL program has now been in place for almost a decade, making it one of the oldest such programs in the nation. Students take SOL tests (which consist of between 45 and 63 multiple-choice questions, depending upon the specific test used) at the end of the course in core subjects taught by their teacher, and each is standardized on a 200–600 point scale, with 400 defined as a passing score with real-world implications both for student graduation and school accreditation. In terms of validity, the subject tests have strong unidimensionality and correlate between .50 and .80 with the Stanford 9 achievement tests (Hambleton et al., 2000). In terms of reliability, high school assessments had KR-20 coefficients of .87 and .91 (Hambleton et al., 2000).
External reviewers have also reviewed the tests and found that the “reliability evidence for the SOL assessments is solid and typical of high quality assessments” (Hambleton et al., 2000, p. 8). Although tests use the same scales across subject matter, student scores for a given test were adjusted using statewide normative data for each test to assure that scores could be fully equated across different tests. We obtained each students’ baseline SOL score, which was taken from a similar course the year before and each students’ end-of-year SOL score, which was directly linked to the instructional content of the classrooms under examination (i.e., math courses were paired with prior year math courses, social studies courses were paired with prior year social studies courses, and so on).
School records were used to identify students’ gender, race/ethnicity, and grade level. Records also indicated whether students came from low-income families (coded based on student eligibility for free and reduced-price lunch, which extends to families with incomes up to 185% of federal poverty line). Teachers reported on their gender, race/ethnicity, years of experience teaching, and education level. Class size was obtained from teachers’ enrollment rosters.
Student SOL data were collected via standardized end of course assessments both for the study year and for the year prior to the study. Classroom observation data were collected within a limited window between the 4th and 8th weeks of the school year. Teachers were asked to record a typical classroom session and coding was performed based on video-recording of this classroom interaction for a teacher, with the camera positioned to capture both the teacher as well as a significant number of students. Teachers were told to record a class session in which they were actually teaching (as opposed to giving a test, watching a video, and so on). The window of observation at the beginning of the school year was deliberately selected to minimize the degree to which qualities of student–teacher interactions might be influenced by ongoing student engagement and achievement (as opposed to predicting such achievement).
A team of advanced undergraduate and graduate student coders were trained in a two-day workshop on the CLASS-S system. Coders learned to rate each of the ten specific CLASS-S dimensions along a 1–7 scale, with a 1 or 2 indicating low quality; 3, 4, or 5 indicating midrange quality; and 6 or 7 indicating high quality. At the end of the workshop, each coder passed a reliability test, in which they scored within one point of the master-coded tapes on 80% of scores, across five video segments. The master coders had extensive knowledge of the CLASS-S instrument. In addition, team members met regularly during the year to jointly code master tapes in order to prevent drift and increase coding agreement. The recorded classroom session was divided into two 20-min segments and each segment was coded independently by two trained coders, with a different pair of coders used for each of the two segments.
Scores were averaged across raters. Reliabilities between the two raters for domains, assessed via intraclass correlations, using Cicchetti’s (1981) norms for interpreting intraclass correlation coefficients, ranged from good (.73 for Instructional Support) to excellent (.77 and .82, for Emotional Support and Classroom Organization, respectively). For the specific dimensions, with the exception of Negative Climate (intraclass correlation coefficient [ICC] = .50—in the “fair” range), all other dimensions also had ICCs in the good to excellent range (ranging from .64 to .78).
This study used a nested design that included multiple students within each classroom. Initial analyses revealed a significant ICC at the classroom level and related design effect (.52 and 9.43, respectively). Classrooms were also nested within schools, although numbers of schools and classrooms within them were too small to permit adequate modeling of school level effects. Hierarchical linear modeling (Raudenbush & Bryk, 2002) was thus used as the conceptual and analytic framework for specifying two-level models that examined the association between measures of classroom quality and individual-level child outcomes (end-of-year achievement test scores), after accounting for student grade level, gender, family poverty status, and classroom size. PROC Mixed in SAS, using restricted maximum likelihood estimation, was used to specify the models derived from the following equations (Singer, 1998). In the first level of the two-level model (Equation 1), an end-of-year test score (Y) for a student (i) who is in classroom (j) is a function of the mean post-test score for students in this class (β0j) after adjusting for pretest scores (β01) and demographic characteristics of students (β02–4), and the error term associated with this estimated mean (rij).
Equation 2 specifies in the second-level model that the adjusted mean post-test score for students in each classroom (β0j) is a function of the grand mean post-test score (γ00), classroom interaction quality (γ01), and class size (γ02) and the error term associated with this estimated mean (u0j).
Because all Level 1 coefficients for the controls were fixed, the only substitution is for β0j resulting in Equation 3:
All Level 1 coefficients aside from β0j were fixed, meaning that they were not allowed to vary across classes. For ease of interpretation, the outcome variable was standardized at the grand mean for all analyses.
We conducted three sets of primary analyses. The first examined whether each global domain of teacher–student interaction quality predicted student achievement, after first taking into account student-level and classroom-level demographic characteristics. Second, we conducted a fine-grained analysis of each dimension within each global domain of teacher–student quality as earlier, entering student and classroom demographic factors first in the model. This identified which specific dimensions were significant predictors of achievement. Finally, we ran similar analyses to assess the extent to which certain observed qualities of student–teacher interactions were actually predictable from students’ prior levels of achievement.
For each set of these primary analyses, we also tested to see whether the findings obtained might be moderated by classroom or student characteristics. Equations 4 and 5 provide examples of the models used for testing moderation of the relation of CLASS-S scores to achievement via variables at the classroom level (Equation 4) and at the student level (Equation 5). Equation 4 specifies adding an interaction term to the second level model so that the adjusted mean post-test score for students in each classroom (β0j) is a function of the grand mean post-test score (γ00). classroom interaction quality (γ01). class size (γ02), the interaction between both Level 2 main effects (γ03), and the error term associated with this estimated mean (u0j).
Means and standard deviations for all primary variables of interest in the study are reported in Table 1. Table 2 presents a matrix of the bivariate correlations between the three overarching domains of Emotional Support, Classroom Organization, and Instructional Support along with the specific dimensional scales for measures of teacher–student interaction quality. Confirmatory factor analysis, using maximum likelihood analysis, supported the loading of scales onto the domains as presented earlier, with all factors loading above 0.4 on the relevant factor (χ2 = 39.60, p = .06, confirmatory factor analysis = 0.97, root mean square error of approximation = 0.07; exact factor loadings are available from the authors upon request). As expected, prior year achievement test scores were substantially correlated with current year achievement test scores (r =.69, p < .001). All analyses also accounted for a range of potentially confounding variables: contextual factors, including student grade level, gender, and family poverty status, and classroom size. Potential moderating effects of these contextual factors were also examined (via interaction terms created from centered versions of these variables) and yielded no significant interactions, with the exception of a moderating effect of classroom size, discussed further later.
Table 3 presents results of three hierarchical linear modeling models, one for each of the three overarching domains assessing teacher–student interaction quality. In these models, end-of-year student achievement test scores are predicted from baseline scores, from student grade in school, gender, and family poverty status (at the child level), and from classroom size (at the classroom level), followed by the classroom interaction quality variable of interest. The domains of teacher–student interactions overlapped sufficiently that when analyses examined all three simultaneously as predictors, it was not possible to identify significant unique predictive variance from any single domain, after accounting for the others. Thus analyses were conducted separately by domain.
Results indicate that each of the three domains of teacher–student interaction were predictive of higher student achievement test scores at the end of the year, even after covarying prior-year scores and relevant student and classroom characteristics. Using the approach suggested by Peugh (2010), we examined the proportional reduction in classroom level variance obtained by using this composite, relative to the classroom variance remaining after considering the effects of student grade, gender, and family poverty status, and classroom size. Results indicated a proportional reduction in classroom-level variance in student outcomes of 12.8%, 5.3%, and 8.9%, respectively, from the single assessments of emotional, organizational, and instructional support. For ease of general interpretation, we examined effects not simply in terms of raw score equivalents, but in percentile terms (examining the change in percentile terms in test scores associated with a one standard deviation increment in an observed domain score). The magnitude of the strongest prediction, from the Emotional Support domain, indicates that, after accounting for other measured variables, a student entering with average prior test scores (i.e., 50th percentile) in a class that was one standard deviation below the mean in Emotional Support would on average place in the 41st percentile in end-of-year tests; whereas an average student with the same background characteristics in a class that was one standard deviation above the mean in Emotional Support would on average place in the 59th percentile in end-of-year tests.
Class size interacted with Emotional Support (B = −4.81, SE = 2.00, p = .02) and with Instructional Support (B = −3.54, SE = 1.78, p = .046). These interactions are depicted in Figure 2 using standardized scores for all variables and plotting lines for classrooms that were one standard deviation above and below the mean in class size (reflecting class sizes of approximately 29 and 17, respectively). Figure 2 shows that measured Emotional and Instructional Support in the classroom was of greatest predictive value for student academic achievement in smaller as compared to larger classrooms.
No moderator effects of course content area were found, which indicates that relations between past and current test scores were not significantly different for math/science versus English/social studies courses and that results did not significantly differ depending on the type of course being taught. (There were also no differences in the predictive power of prior years’ test scores for math versus science courses or English versus social studies courses, indicating it was appropriate to collapse into these two broad categories for analytic purposes).
For descriptive purposes, analyses next followed up on the findings regarding the three global domains of teacher–student interaction in finer grained detail, by assessing the specific dimensions that were coded and composited into each construct. As with the domain scores discussed earlier, the specific dimensions overlapped sufficiently that when analyses examined them simultaneously as predictors, it was not possible to identify significant unique predictive variance from any single dimension, after accounting for the others. These analyses were conducted separately for each dimension and should be interpreted as post hoc follow-up tests to the overall results noted earlier demonstrating significance at the domain level. Results are presented in Table 4 These results indicate significant predictions of achievement from observed positive climate, teacher sensitivity, and regard for adolescent perspectives in the Emotional Support domain, instructional learning formats in the Classroom Organization domain, and analysis and problem solving in the Instructional Support domain.
Finally, analyses sought to determine the extent to which measures of observed teacher–student interactions could be predicted from levels of student achievement prior to entering the class. Such predictions can identify which aspects of teacher–student interaction may be influenced heavily by student entry characteristics. These results are presented in Table 5. Higher quality behavior management, instructional learning formats, content understanding, and quality of feedback were significantly predicted by students’ baseline achievement test scores, along with the domain-level scales for Classroom Organization and Instructional Support. However, no effects of baseline levels of student achievement on Emotional Support were observed.
To see whether one might utilize these findings to construct a measure of observed teacher–student interactions that would optimize prediction (at least in this somewhat small sample) on a post hoc basis, we next created a composite comprised solely of the five individual dimensions of observed interaction that were significantly predictive of end-of-year test scores: positive climate, teacher sensitivity, regard for adolescent perspectives, instructional learning formats, and analysis and problem solving. When considered in models identical to those in Table 3, this composite yielded a significant B of 30.2 (SE = 8.91, p < .001). Results indicated a 16.3% reduction in unexplained variance at the classroom level from this composite measure, even after accounting for the other covariates in the model. Alternatively, the effect size of this association is equivalent to a finding that a student entering with average test scores (i.e., 50th percentile) in a classroom one standard deviation below the mean on this composite reflective of overall quality of teacher–student interactions would on average place in the 37th percentile in end-of-year tests; whereas a student entering with average test scores in a class that was one standard deviation above the mean on this scale would on average place in the 63rd percentile on end-of-year tests.
This study identified specific features of teacher–student interactions in the classroom, observed using standardized measurements, that were directly linked to student achievement over the course of an academic year, even after accounting for prior levels of student achievement, demographic characteristics, and classroom size. Moreover, these findings applied regardless of student grade levels and content area of instruction; in short, “good teaching” was good regardless of content or grade level. These results have implications not only for theories regarding the role and value of interactions and proximal processes in social settings serving youth (Tseng & Seidman, 2007), but also for contemporary policies and practices related to the assessment and improvement of teacher performance (Weisberg, Sexton, Mulhern, & Keeling, 2009).
These findings are consistent with theoretically driven expectations regarding the mechanisms through which secondary school instruction affects adolescents’ learning, as operationalized via the CLASS-S observations. These results suggest that the “value-added” of classroom settings may in part be attributable to qualities of teacher–student interactions. These results also extend similar findings obtained in prekindergarten and elementary school settings, in which observational approaches using the elementary version of the CLASS have been used to predict measures of student learning (e.g., Hamre & Pianta, 2005; Rimm-Kaufman et al., 2009). Collectively, these studies provide empirical support for the argument that critical teacher behaviors can be measured in a standardized observational assessment.
Although studies of student achievement have been important in laying a foundation for inquiry into classroom effects (Ladd, 2008; Nye, Konstantopoulos, & Hedges, 2004; Rivkin, Hanushek, & Kain, 2005), they had not yet succeeded in identifying specific processes that may lead to student learning and positive social adjustment across an array of content areas. As a result, the field has thus far been left to rely largely upon an atheoretical, post hoc approach as reflected in Hanushek’s (2002) definition of teacher quality: “Good teachers are ones who get large gains in student achievement for their classes; bad teachers are just the opposite” (p. 3). This definition, although apt as far as it goes, provides only limited guidance to efforts to produce effective teaching (Cochran-Smith & Zeichner, 2005). The present study extends our understanding of the connection between specific observed teacher–student interactions and student achievement into secondary school classrooms and supports the theoretical proposition that properties of these interactions have value for student learning and development regardless of the content being taught.
In the present study, aspects of teacher–student interactions reflecting instructional support and classroom organization were predicted by the baseline level of achievement of students in a given class (although these still went on to predict achievement after accounting for these baseline levels). In contrast, however, features of teachers’ emotional support appeared independent of students’ baseline levels of achievement. These findings have implications for the observation of instruction (i.e., teacher oversight) in contexts in which value-added achievement test analyses cannot be readily conducted. These findings suggest that observed levels of instructional support and classroom organization, although important in predicting future student achievement, are likely to reflect both teacher skill and student background. Thus, to the extent students are implicitly or explicitly grouped or tracked into higher achieving and lower achieving classes, it would be important not to assume that observed qualities of instructional support and classroom organization were solely a result of teacher skills or efforts. Observed emotional support, in contrast, also predicts future student achievement after accounting for baseline achievement but appears relatively independent of student background characteristics, and thus may be more likely to be determined by individual teacher qualities. Notably, there is evidence now from experimental studies that teachers’ behavior in all three CLASS domains can be improved with teacher training that targets these interactive behaviors via training either through college courses (Hamre et al., 2010) or through ongoing, job-embedded consultation (Allen, Pianta, Gregory, Mikami, & Lun, 2011; Brown, Jones, LaRusso, & Aber, 2010; Pianta, Belsky et al., 2008; Raver et al., 2008).
With regard to emotional support, attachment theorists posit that when adults provide emotional support and respond contingently, children develop self-reliance and are better able to explore (Ainsworth, Blehar, Waters, & Wall, 1978; Bowlby, 1969/1982), a premise that has been repeatedly validated in school environments (Birch & Ladd, 1998; Hamre et al., 2010; Howes, Hamilton, & Matheson, 1994). Self-determination theory (Connell & Wellborn, 1991; Skinner & Belmont, 1993) posits that motivation to learn is in part related to adults’ support for competence, relationships, and autonomy (Roeser, Eccles, & Sameroff, 1998). Instructional interactions have been put in the spotlight in recent years as more emphasis has been placed on the translation of cognitive science, learning, and developmental research to educational environments (Bransford, Brown, & Cocking, 1999; Carver & Klahr, 2001; National Research Council, 2005). Teachers who use strategies that focus students on higher order thinking skills, give consistent, timely, and process-oriented feedback, and work to extend students language skills, tend to have students who achieve more academically (Hamre & Pianta, 2005; Justice, Meier, & Walpole, 2005; Meehan, Hughes, & Cavell, 2003; Taylor, Pearson, Peterson, & Rodriguez, 2003).
Emotional support may be particularly important in instructional settings serving adolescents. Unlike in elementary education, in which most students can be expected to have a disposition toward seeking to please teachers and comply with authority, engaging adolescent students emotionally may be critical to maximizing their academic motivation in the classroom. Further, as adolescents seek greater autonomy with respect to parents, having other settings in which they can receive emotional support from adults may provide a powerful motivation to engage in those settings (National Research Council, 2004).
That adolescents achieve at higher levels across a range of content areas in the context of more positive classroom interactions, although not yet well documented in the educational literature, is actually quite consistent with developmental theory. In contrast to assessments targeted at specific content areas (e.g., seventh-grade math instruction), this assessment broadly targeted instruction across the full range of core secondary instructional content areas. Moderator analyses provided no evidence that predictions differed in magnitude across different content areas, suggesting that the generality of the approach to assessing instruction across widely different content areas was successful in this instance.
Follow-up descriptive analyses shed light on specific dimensions of teacher–student interaction that were most predictive of student outcomes. These analyses were explicitly descriptive and exploratory in nature; however, of necessity, the dimensions examined contained a degree of redundancy, both with the larger domains examined, but also with the other dimensions observed. Although we present them in terms of conventional significance levels, we recommend they be interpreted with caution, requiring further replication. In the Emotional Support domain, teachers’ ability to establish a positive emotional climate (Positive Climate), their sensitivity to student needs (Teacher Sensitivity), and their structuring of their classroom and lessons in ways that recognize adolescents’ needs for a sense of autonomy and control, for an active role in their learning, and for opportunities for peer interaction (Regard for Adolescent Perspectives) were all associated with higher relative student achievement, after covarying baseline levels of such achievement. Similarly, use of instructional learning formats that encouraged active participation by students and that provided variety in classroom approaches (Instructional Learning Formats) was also predictive of student achievement, as were lessons that required high levels of analysis and problem solving by students (Analysis and Problem Solving).
Overall, the particular constellation of interactions that was most linked to future achievement seemed to focus upon tailoring a classroom experience to be maximally emotionally and intellectually engaging to the adolescent. In post hoc analyses, this constellation could account for a difference in student achievement test performance spanning the 37th–63rd percentiles, a magnitude that has considerable ramifications for teachers, classrooms, and schools in terms of accountability. Stated differently, on the particular tests used, the swing in scores reflected in moving from an average classroom to one that was one standard deviation above the mean score in overall teacher–student interaction quality would be sufficient in the abstract to reduce failure rates on these high-stakes tests from 17% to 11%.
Classroom observations have been used as measurement tools in educational research for more than three decades (Gage & Needels, 1989). In most studies using observational methods, approaches typically focused on specific teacher pedagogical behaviors. The present study results and other related findings (Pianta, Belsky et al., 2008) demonstrate the predictive utility of a focus on interactions, given the fact that they can be assessed in secondary classrooms via 40 min of video, and that global ratings of interactions, as they demonstrate patterns of behavior and response over the segment, can be coded reliably and account in some part for the value of that classroom setting for student learning. Notably, two of the domains of teacher–student interaction quality that were assessed, emotional and instructional support, were more strongly related to achievement in smaller as opposed to larger classrooms. One explanation for these findings is that qualities such as sensitivity to student needs or provision of high-quality feedback to students might have the greatest effect when they are concentrated among relatively fewer students. Conversely, the effect of these factors might be relatively diluted in very large classrooms.
The particular methods for observation used in this study also warrant discussion as they have implications for the use of standardized observations as measures of teacher performance in states and in districts. Results were obtained based on observations of just 40 min of a single classroom session early in the school year using global, judgment-based standardized ratings of teacher–student interaction. Under these conditions, the CLASS-S is likely capturing only a modest portion of the true variance in the quality of teacher–student interactions that exists across an entire year (Mashburn, Downer, Rivers, Brackett, & Martinez, 2013). Thus, the estimates of the effects of classroom interaction quality in this study, although substantial, should be taken not only as a likely minimum estimate of the effects of the specific qualities observed, but also as an indication that stable and reliable between-teacher differences in interaction can be observed in a fairly efficient manner.
At the same time, it should also be acknowledged that this was not an experimental study, and that even longitudinal analyses of end-of-year achievement controlling for baseline levels are not logically sufficient to demonstrate causal relations. It could be, for example, that other unmeasured factors were promoting both student achievement and high-quality instruction in the classroom. Even if this were the case, however, the results of this study would still indicate that the domains assessed were sensitive proxies of important classroom processes related to student achievement. Also, as achievement data could not be obtained on 24% of students in classrooms (because of a lack of parental consent), it remains possible that such additional data might have altered study findings, although it is not clear whether they would have biased findings in any particular direction. Teachers were also asked to select one of their more challenging classes to teach for this study, and thus results should properly be generalized to such classrooms. It is also possible that bias existed in which classroom session teachers selected for video recording, thus introducing unmeasured bias into the assessments obtained. Finally, it was not possible to assess effects of classroom nesting at the school level, which introduces an additional potential source of unmeasured bias into the results.
In sum, it is striking that given the importance of classroom settings as vehicles for the transmission of knowledge and skill in our system of education, that little-to-no population level data exist pertaining to adolescents’ exposure to specific practices in the classroom that are (a) known to relate to academic success or failure, (b) desired on the basis of certain policies or values, or (c) even hypothetically expected to relate to outcomes. In secondary classrooms there is no current work that provides national-level, observational data on these environments. The Measures of Effective Teaching Study (Bill and Melinda Gates Foundation, 2010) will likely produce a number of important findings in this regard in coming years. The MET study includes observations of over 3,000 teachers in Grades 4–9 and is using several observational protocols. Several, like CLASS-S, focused on global instructional process, and some focused specifically on content. We argue that designing standardized observation protocols into current value-added state-standards tests and large-scale student assessments could leverage not only our understanding of classroom effects (Early et al., 2005; National Institute of Child Health and Human Development Early Child Care Research Network, 2005; Pianta et al, 2005; Pianta, La Paro, & Hamre, 2008) but also the capacity to systematically produce effective teaching.
In the midst of concern about standardized tests, block versus regular class scheduling, school size and budgets, this study emphasizes the fundamental importance of the emotional quality of the classroom as a key to adolescent learning. Although we can think of them as “learners,” adolescents are first and foremost highly social and emotional beings. The role of school psychologists and administrators in both supporting this recognition and in helping teachers understand and enact it is potentially paramount. The elements of a positive classroom climate—teacher sensitivity to adolescents’ needs, and recognition of their desire for peer interaction and for a sense of autonomy regarding classroom activities—are not simply the niceties in a functional classroom, they are key predictors of adolescent learning, even regarding otherwise “dry” subjects such as algebra or geometry. An advantage of the CLASS-S is that it provides highly specific markers of these qualities, which can be used not only by those evaluating teacher–student interactions, but also by teachers seeking to enhance them. For example, teachers may enhance the positive climate of a class by laughing with students, engaging them as they enter the room (vs. sitting behind a desk at the front of the room), and asking about events outside of class at the beginning and end of a lesson period. They may give students choices (within a prespecified range) of ways of approaching learning assignments. They may use varied instructional learning formats—moving beyond lecturing to a largely silent class, for example, to asking students to work briefly in small teams, present ideas to the class, and so on—and focus on student analysis and problem solving by asking “why” questions not simply factual questions. All of these elements were part of the ratings of interactions linked to greater student learning in this study.
Equally important, for those charged with evaluating teachers, this study identified qualities of teacher–student interactions that may appear to be linked to quality teaching because they occur more in high-achieving classrooms, but which appear primarily to reflect qualities of students entering the classroom than to predict future achievement gains of those students. The quality of behavior management in a classroom is associated with student achievement, for example; however, the association is primarily to preexisting levels of student achievement rather than to future gains in achievement. This is not to say that behavior management is unimportant, but rather to suggest that great care be used in assessing teachers on this quality unless one also accounts for the qualities of the students being taught.
Perhaps the most important reason to conduct observational assessment of classrooms is for the purposes of professional development. Professional development typically occurs in the absence of a direct link to actual teaching behavior in classrooms, particularly for already-trained and certified teachers (Caspary, 2002). Systematic classroom observation systems provide a standardized approach to measuring and noting teachers’ strengths and weaknesses and evaluating whether professional development activities are actually helping improve the classroom interactions responsible for learning. The current study suggests that professional development focused specifically on teachers emotional, organizational, and instructional interactions with students may enhance teacher effectiveness in ways that have a direct effect on student learning (Mashburn, Downer, Hamre, Justice, & Pianta, 2010), while clearly raising important questions for future research to consider.
Joseph Allen, PhD, is the Hugh P. Kelly Professor of Psychology at the University of Virginia, specializing in the study of adolescent social development.
Anne Gregory, PhD, is an associate professor of psychology in the Graduate School of Applied and Professional Psychology at Rutgers University, specializing in the study of disproportionality in implementation of disciplinary procedures across racial/ethnic groups in schools.
Amori Mikami, PhD, is an assistant professor of psychology at the University of British Columbia, specializing in the study of ways in which school and home environments influence social development.
Janetta Lun, PhD, is currently a research associate in the Department of Psychology at the University of Maryland, specializing in the study of intergroup relations and cultural psychology.
Bridget Hamre, PhD, is a research scientist at the Curry School of Education at the University of Virginia and currently serves as associate director of the Center for Advanced Study of Teaching and Learning at the University of Virginia.
Robert Pianta, PhD, is the Novartis U.S. Foundation Professor of Education and currently serves as dean of the Curry School of Education and director of the Center for Advanced Study of Teaching and Learning at the University of Virginia.
Joseph Allen, University of Virginia.
Anne Gregory, Rutgers University.
Amori Mikami, University of British Columbia.
Janetta Lun, University of Maryland.
Bridget Hamre, University of Virginia.
Robert Pianta, University of Virginia.