Search tips
Search criteria 


Logo of cbelifesciLink to Publisher's site
CBE Life Sci Educ. 2017 Spring; 16(1): ar18.
PMCID: PMC5332044

How Is Science Being Taught? Measuring Evidence-Based Teaching Practices across Undergraduate Science Departments

Michelle Smith, Monitoring Editor


While there is a wealth of research evidencing the benefits of active-learning approaches, the extent to which these teaching practices are adopted in the sciences is not well known. The aim of this study is to establish an evidential baseline of teaching practices across a bachelor of science degree program at a large research-intensive Australian university. Our purpose is to contribute to knowledge on the adoption levels of evidence-based teaching practices by faculty within a science degree program and inform our science curriculum review in practical terms. We used the Teaching Practices Inventory (TPI) to measure the use of evidence-based teaching approaches in 129 courses (units of study) across 13 departments. We compared the results with those from a Canadian institution to identify areas in need of improvement at our institution. We applied a regression analysis to the data and found that the adoption of evidence-based teaching practices differs by discipline and is higher in first-year classes at our institution. The study demonstrates that the TPI can be used in different institutional contexts and provides data that can inform practice and policy.


There is a large pool of evidence for the benefits of active-learning techniques (Freeman et al., 2014 blue right-pointing triangle). This evidence, in conjunction with a strong call for change in the classroom (American Association for the Advancement of Science, 2009 blue right-pointing triangle), highlights the need for instructors to drastically alter their teaching methods. However, there is also significant difficulty in getting teaching faculty to fully adopt and properly implement active-learning techniques (Gess-Newsome et al., 2003 blue right-pointing triangle; Wilson, 2010 blue right-pointing triangle). The evidence supporting active learning is not enough for instructors to implement it (Andrews and Lemons, 2015 blue right-pointing triangle). Instead, there are a number of interrelated issues that contribute to the lack of active-learning implementation. These include instructors seeing teaching as conflicting with their “professional identity” (Brownell and Tanner, 2012 blue right-pointing triangle) or as a “necessary evil” (Anderson et al., 2011 blue right-pointing triangle) and a lack of support (Henderson et al., 2011 blue right-pointing triangle, 2012 blue right-pointing triangle). Furthermore, many instructors find themselves pressured to place a lower priority on teaching due to the lack of incentives when compared with research (Wilson, 2010 blue right-pointing triangle).

There is a lack of alignment between education research and teaching practice (Dolan, 2015 blue right-pointing triangle) that the issues described above only partly explain. Moving forward, Wieman (2015) blue right-pointing triangle argues strongly that a necessary condition for change is to have reliable and objective measures of teaching quality. This focus on teaching quality is essential given the evidence about effective teaching approaches (National Research Council, 2012 blue right-pointing triangle). To this end, one of the main challenges in measuring the implementation of active-learning techniques is finding a suitable method to gauge such practices. Our knowledge of the adoption levels of these teaching approaches at the broad institutional level is limited (Wieman and Gilbert, 2014 blue right-pointing triangle), but there is a growing body of literature that we now describe.

Some early studies of active-learning uptake involved directly interviewing teachers regarding their teaching (Trigwell et al., 1994 blue right-pointing triangle). These approaches often used qualitative methods, focusing on the intentions and ideals of the teachers rather than actual practice. A popular approach employed in 97% of departments (Berk, 2005 blue right-pointing triangle) in gauging teaching quality is student course evaluations, but many concerns have been raised about their reliability. The major concern is that the evaluations are biased by factors unrelated to the teaching, such as class size, difficulty or workload, interest in the subject, and personality of the teacher (Wachtel, 1998 blue right-pointing triangle; Marsh, 2007 blue right-pointing triangle). Both Wachtel and Marsh conclude that student evaluations are generally reliable, but some studies find a clear bias in terms of personality (e.g., Shevlin et al., 2000 blue right-pointing triangle). It is for these reasons, among others explained by Wieman (2015) blue right-pointing triangle, that student course evaluations may not be an appropriate tool for the objective measure of active-learning implementation.

The most direct approach is for independent observers to attend classes and record the use of different active-learning methods by instructors. Several different protocols have been proposed for this purpose. The Reformed Teaching Observation Protocol (RTOP; Sawada et al., 2002 blue right-pointing triangle) measures 25 qualities of the instruction with a five-point Likert scale: there was a strong correlation between the RTOP score and learning gains for the classes where pre/post testing was available. The RTOP was also used by Ebert-May et al. (2011) blue right-pointing triangle. The Classroom Observation Protocol for Undergraduate STEM (COPUS; Smith et al., 2013 blue right-pointing triangle) takes a quantitative approach, measuring how much time instructors (and optionally students) spend on different categories of activity. Analysis of the COPUS results for 50 science, technology, engineering, and mathematics (STEM) courses showed that there was a wide, continuous range of teaching practices that did not cluster into traditional and active approaches (Smith et al., 2014 blue right-pointing triangle). Lund et al. (2015) blue right-pointing triangle used a clustering analysis combining both RTOP and COPUS data for 73 instructors and identified 10 broad teaching styles (ranging from lecturing through to group work). Hora and Ferrare (2013) blue right-pointing triangle combined interviews with observations (using the Teaching Dimensions Observation Protocol) in a study of 57 instructors, revealing qualitative differences in teaching methods between different disciplines. The physics, chemistry, and biology instructors in their study used a wider range of teaching approaches in class than the mathematics and geology instructors.

An alternative approach is to ask instructors to self-report on their teaching practices. This approach has the advantage that all aspects of a course can be measured and there is no need to fund and train observers. The Higher Education Research Institute Faculty Survey (Eagan et al., 2014 blue right-pointing triangle) has collected a very wide range of data—including teaching practices—since 1989. It demonstrates a slow decrease in lecture-dominated teaching in favor of more active approaches. Marbach-Ad et al. (2014) blue right-pointing triangle asked teaching assistants and faculty members how often they implement active-learning techniques in the classroom: there was still a high level of “extensive lecturing” reported. While this approach focused more directly on teaching, many of the questions had qualitative-response options (such as “most class sessions”). Teaching practice surveys have also been applied to specific disciplines: geoscience (Macdonald et al., 2005 blue right-pointing triangle), engineering (Borrego et al., 2010 blue right-pointing triangle), and physics (Henderson et al., 2012 blue right-pointing triangle).

The Teaching Practices Inventory (TPI; Wieman and Gilbert, 2014 blue right-pointing triangle) is a new survey instrument specifically designed for the rapid measurement of research-based teaching practices. It attempts to be more objective than some previous work by asking yes/no questions and requiring definite number ranges for activities. Most importantly, it uses a scoring rubric that weights activities according to the research evidence linking them to learning gains. For instance, the use of a personal response system (e.g., “clickers”) contributes to the score only if it is “followed by student–student discussion.” The TPI also offers a more extensive range of questions than some previous surveys. There has been some discussion about the accuracy of self-reporting approaches such as the TPI, but Smith et al. (2014) blue right-pointing triangle compared TPI results with independent COPUS observations and found that instructors were reporting reliably on their approaches.


Our broad aim is twofold, focused on both research and practice. First, we want to contribute new knowledge by using the TPI to measure the use of evidence-based teaching in the bachelor of science (BSc) program at a large research-intensive university. Our work extends the Wieman and Gilbert (2014) blue right-pointing triangle study to a new context. Such a study would begin to address the limited literature on the uptake of evidence-based teaching practices at the discipline and degree-program levels, which is needed in the sciences as we shift from individual practices to whole-program approaches to enhance the student learning experience (Fairweather, 2008 blue right-pointing triangle). Second, we seek to capture quantitative data on teaching practices within our institution to inform a BSc curriculum review (McManus and Matthews, 2015 blue right-pointing triangle). These data will allow us to establish an evidential baseline and prioritize areas offering the highest return on our resourcing of teaching development (see Knight, 2001 blue right-pointing triangle; Wieman et al., 2010 blue right-pointing triangle; Mercer-Mapstone and Matthews, 2017 blue right-pointing triangle).

We address the following questions in this paper:

  • To what extent are evidence-based teaching approaches used in the BSc courses at our institution?
  • How do these compare with the corresponding data from a Canadian university?
  • How does the use of evidence-based teaching approaches differ, within our institution, by class size, year level, and discipline?

Throughout this paper we use the term “course” to refer to a single unit of study or subject.



The study was situated within an Australian research-intensive university ranked in the top 100 universities worldwide (in the Times Higher Education World University Rankings and the Quacquarelli Symonds World University Rankings, among others). The university is a large public university with a student body just under 49,000 (74% undergraduates, 25% postgraduate course work, and 1% research higher-degree students). It has a research budget of AU$380 million per annum. The faculty of science is one of six faculties within the university and includes eight multidisciplinary departments. The BSc degree program is administered by the faculty of science with all eight departments (plus five departments from other faculties) teaching courses in the degree program.

The BSc degree program comprises 3 years of undergraduate study with an optional fourth year for honors, and consistently attracts applicants straight from high school. The BSc is marketed as a flexible, generalist degree that prepares students for a range of postgraduate opportunities. Students have a wide range of elective choices, although they must complete a first-year statistics course and meet requirements of a major in a specific discipline to graduate. BSc courses typically involve 3 hours of lectures in a week plus a 2- or 3-hour laboratory (either every week or biweekly) or tutorial sessions (more common in mathematics/statistics).

Survey Instrument

We used the TPI (Wieman and Gilbert, 2014 blue right-pointing triangle) to measure the use of evidence-based teaching approaches in our institution. The TPI is a 72-item questionnaire that asks an instructor specific questions about the use of selected evidence-based practices within a designated course. Typical questions are “Students asked to read/view material on upcoming class session” (yes/no) or (what is the) “Average number of times per class: show demonstrations, simulations, or video clips.” The survey groups the questions into eight different categories of teaching activity (see Table 1, which also gives example questions) and gives a score in each category as well as the total score for a course. The scores are calculated by weighting the activities according to their learning impact (see Wieman and Gilbert, 2014 blue right-pointing triangle).

Teaching categories measured by the TPI

We made minor modifications to the TPI to clarify certain terms for use in our local context, then tested the survey twice. First, staff from the teaching and learning institute of our university confirmed the language was appropriate for an Australian context. We then tested the survey on members of our science teaching and learning committee as a key stakeholder group. They gave useful advice about the survey implementation, notably to offer prizes to incentivize participation (10 gift vouchers were offered). They also suggested we clarify the type of teaching involved (lectures, not laboratory classes) with the explanatory statement we added to the preamble, to keep in line with the intended scope of the TPI (Wieman and Gilbert, 2014 blue right-pointing triangle). The list of all the changes we made is given in Table 2. We present our modified survey in the Supplemental Material for this paper.

Changes made to the text of the TPI for the local Australian context


We asked the staff member coordinating each course to complete the TPI, following the practice of Wieman and Gilbert (2014) blue right-pointing triangle. We selected courses satisfying all the following criteria:

  1. Courses contributing to BSc degree in semester 1, 2015.
  2. Courses with more than 15 BSc students enrolled. This cutoff is arbitrary, but removes very small courses while including the core third-year courses.
  3. Courses that are delivered in class rooms (i.e., excluding fieldwork, laboratory-based, and online courses).

There were 178 courses contributing to the BSc, with 136 remaining after the other criteria were applied, drawn from 13 departments (multidisciplinary organizational units; see Table 3). These courses had student enrollments ranging from 15 (the minimum criterion) to 1425. Ethics approval was given by the local ethics committee (approval number 2015000345).

Numbers of courses, TPI completion rates, and mean results for the eight departments with major contributions to the bachelor of science degree

Data Collection

We opened the survey at the start of week 5 of the 13-week semester 1 teaching period in 2015 via an email inviting instructors coordinating courses in the BSc to complete the online survey. An incentive was offered (a drawing to win a $100 gift voucher) and two reminder emails were sent to encourage participation in the study. We closed the survey at the start of week 9 of the semester, at which point we had 129 responses, giving an overall completion rate of 95%. The average reported completion time was 11.4 minutes and only one instructor raised concerns about the time taken. We calculated TPI scores using the spreadsheet provided by Wieman and Gilbert (2014) blue right-pointing triangle and present the individual scores for all courses in Supplemental Table S1.

Analysis Methods

To explore our first research question—the use evidence-based practice at our institution—we calculated descriptive statistics (e.g., frequencies, means, standard deviations) of the results for our institution, including overall scores (Table 3 and Figure 1), the adoption rates of selected practices (Table 4), and scores by category (Table 5, Supplemental Table S2, and Figure 2).

Comparison of total TPI scores for our institution (129 courses) and the comparison sample (93 courses; Wieman and Gilbert, 2014 blue right-pointing triangle). The graph shows the percentage of courses falling into the score ranges displayed on the horizontal axis. Note that the ...
For each of the eight teaching categories, we compare the TPI scores for our institution (distributions and means) with the mean scores of the comparison institution (Wieman and Gilbert, 2014 blue right-pointing triangle). The horizontal axis for each category shows the maximum ...
Percentage of courses adopting the most robust evidence-based teaching practices for each category of the TPI
Comparison of total and category TPI scores across two institutionsa

We compared our institution with the Wieman and Gilbert (2014) blue right-pointing triangle data from a Canadian institution—our second research question—by testing for different mean scores in the TPI categories. This involved multiple t tests, which increases the probability of false detections based on the individual p values. We controlled for the false discovery rate over multiple tests by applying the Benjamini and Hochberg (1995) blue right-pointing triangle correction to the p values (using the p.adjust function in the R programming language). We considered two results to be significantly different in a set of tests when the adjusted p value was p´ < 0.05. The Benjamini and Hochberg (1995) blue right-pointing triangle correction means that the probability of making a false detection with this condition across a set of tests is less than 0.05. We use these adjusted p values in all of our analyses given here.

Wieman and Gilbert (2014) blue right-pointing triangle noted a possible bias in their data if the responding instructors were not representative of their whole departments. We minimized any bias by limiting any comparisons to the three departments in the Wieman and Gilbert data with completion rates greater than 70% (n = 93 courses). Such bias in our data is likely to be small given our high overall completion rate: all our departments were more than 80% complete (Table 3).

We used a multiple linear regression analysis to address our third research question: how the TPI scores within our institution vary by class size, year level, and discipline. The advantage of a regression approach is that it determines the relative contributions of the different parameters simultaneously (see Theobald and Freeman, 2014 blue right-pointing triangle).

We defined discipline by classifying all the courses into the following four broad, cognate discipline clusters (a common classification of disciplines for government reporting in Australia):

  1. Biological (biology and life sciences)
  2. Physical (chemistry, physics, and earth sciences)
  3. Mathematical (mathematics, statistics, and computer science)
  4. Psychology

Our research question defines three predictor variables: discipline and year level (both categorical variables) and class size (a continuous variable). We report the TPI results for subsamples split by these variables in Supplemental Table S2. We tested the independence of these variables using a chi-square test between the categorical variables (discipline and year level) and an analysis of variance test between each of the categorical variables and the continuous class size variable. We used the chisq.test and aov functions in the R programming language, respectively, for these tests and present the results in Supplemental Table S3. There is evidence of correlations between all the variables. There is one extremely significant correlation (adjusted p value < 10−9) between class size and year level. We therefore discounted class size as an independent variable, leaving discipline and year level as the two predictor variables for our analysis.

The possible outcome variables are the eight individual TPI category scores, but as we discuss below, our institution has a mandatory policy that ensures a high score in the first TPI category, so we discount this category from our model. Our model for the linear regression is therefore seven relations that predict the TPI scores for the categories course information (II) through to collaboration (VIII). Each relation has the form

equation image(1)

where ε is an error term and the coefficients βni are found by multiple linear regression. The variable i ranges from 2 to 8 for the seven TPI categories predicted. The independent category variables Dbiol, Dmath, and Dpsych have the value of 1 for courses in the biological, mathematical, or psychology disciplines, respectively, and zero otherwise (the reference category is physical). The independent category variables Y1 and Y2 have a value of 1 for first- and second-year courses, respectively, and zero otherwise (the reference category is the third-year courses). The choice of reference category is arbitrary: we chose those associated with lowest total TPI scores in Supplemental Table S2 so that any significant coefficients would reflect increased TPI scores. The values of the fitted coefficients are shown in Table 6, with those significantly different from zero highlighted.

Multiple linear regression on category TPI scoresa


Extent of Use of Evidence-Based Teaching Practices in BSc at a Research-Intensive Australian University

We show the extent of uptake of the most robust evidence-based teaching practices at our institution in Table 4. We define robust as the “practices for which there is evidence suggesting they provide particularly large and robust benefits” (Wieman and Gilbert, 2014 blue right-pointing triangle, p. 556). These practices are indicated by a weight of 2 or more in the TPI scoring rubric. We did not include category II (supporting information), as all the activities in this category had weights less than 2. We also give the percentage of courses that used at least one of the robust activities in each category. The most notable observations are as follows:

  1. Category I (course information): The adoption of a “list of topic-specific competencies” was very high (91%). This was the only category in which the Australian institution scored significantly higher than the Canadian institution.
  2. Category III (in-class activities): 74% of courses used at least one robust practice in this category, but many were not using these practices effectively. For example, 82% used demonstrations, simulations, or video, but only 14% used the better approach, in which “students first record predicted behavior and then afterwards explicitly compare observations with predictions.” The data revealed several such examples of staff adopting practices but not including the scaffolding that made the practices most effective.
  3. Category V (feedback): This category had the lowest adoption of high-impact activities (32%). This is consistent with the low overall low scores for the category at the Australian institution.
  4. Category VI (other, diagnostics): This category was notable for the very low (8%) use of independent pre–post tests. This may be due to a perceived lack of such instruments, since many are available, as reviewed by Libarkin (2008) blue right-pointing triangle.
  5. Category VII (tutors): The only robust practice (regular meetings with tutors) was adopted by less than half the courses. This result is inconsistent with the tutor training at our institution, which implies staff will meet regularly with tutors.
  6. Category VIII (collaboration): A considerable number of staff (56%) consult the literature about their teaching, indicating many instructors adopt a scientific approach to teaching (Handelsman et al., 2004 blue right-pointing triangle). However, only 28% were actively visiting a colleague’s classes.

Comparison of Results across Two Research-Intensive Universities

We compare the total TPI scores for our courses with the Wieman and Gilbert (2014) blue right-pointing triangle sample in Figure 1. We used the mean total scores and standard deviations published by Wieman and Gilbert (2014) blue right-pointing triangle and only used their departments with completion rates greater than 70%. Our mean score of 30.3 (SD = 7.5)1 is lower than that of the comparison data, with a mean of 32.4 (SD = 8.9). This difference in mean scores is not statistically significant (see Table 5), even though the distribution in Figure 1 shows that our institution has fewer high-scoring courses than the comparison institution.

We compare the scores in each category for our institution with the comparison sample in Figure 2. The scores in most categories have broad distributions at our institution, but Figure 2 shows that the course information category (I) is skewed to high values and the diagnostics category (VI) is skewed to low values. We compare the numerical values with the comparison data in Table 5. The mean scores demonstrated a statistically significant difference in three categories. Our institution scored higher than the comparison institution in the course information category (I) and lower for the in-class activities (III) and feedback (V) categories (see Table 5).

Variation within an Australian Institution by Year Level, Class Size, and Discipline

We split the data from our institution into separate subsamples based on year level, class size, and discipline. We present the mean TPI results for these different subsamples in Supplemental Table S2.

As discussed earlier, we used a multiple regression model to separate the effects of the different predictor variables on the TPI scores after removing class size from the model, as it is extremely highly correlated with year level. We present the results of the regression model in Table 6. Our null hypothesis is that none of the predictor variables (discipline and year level) contribute to any of the TPI category scores II–VIII (supporting information through to collaboration). Table 6 shows that five of the regression coefficients have values significantly different from zero (based on the adjusted p values), so our null hypothesis is rejected.

The predictor variables with significant (positive) contributions according to the model are as follows. First, courses in the biological discipline have higher scores for collaboration. This includes activities such as attending lectures by colleagues. Second, psychology courses have a higher score for tutor (teaching assistant) training (e.g., tutors have at least half a day of training). Third, independent of discipline, first-year courses have higher scores for feedback (V), diagnostics (VII), and collaboration (VIII). Feedback activities include the release of midsemester exam answers and diagnostic activities include the use of pre and post student surveys.


Our aim was to contribute to both research and practice on evidence-based teaching practices in the sciences by adopting the TPI for use in our local context. Having measured practice across our whole degree program, we can now determine which institutional policies and practices have contributed positively to the TPI scores at our institution. We have also used the data to identify priorities for future development.

Central Policies That Contribute to Stronger TPI Results

We have identified three central policies at our institution that have contributed to improved TPI results across our BSc program.

The courses at our institution had consistently high TPI scores for category I (course information), the only category that was significantly higher than the comparison institution (Table 5). More than 90% of our courses provide a “list of topic-specific competencies,” so students are given a clear sense of the course aims, objectives, learning activities, and assessments. The importance of this information is grounded in educational literature on constructive alignment and student-centered learning (Biggs, 1996 blue right-pointing triangle). We can explain the high result in this TPI category by a strong central policy at our institution. The policy requires that all courses have an electronic “course profile” document that is publicly accessible and updated each semester. The policy is strongly supported by an institutional system that captures examination details from these profiles such that examinations will not be scheduled for a course unless its profile has been published. Staff create the profiles by completing a fixed-format electronic form that requires the inclusion of aims and objectives. Implemented in 2006, the system is a common source of complaints among academics but is used across all courses, demanded by students, and now accepted by staff.

The regression analysis of our results (Table 6) revealed that our first-year courses have significantly higher TPI scores for feedback, diagnostics, and collaboration. This is contrary to our expectation that teaching practices become more personalized and aligned with evidence-based approaches in smaller, upper-level courses. The first-year courses in our institution have large enrollments and are typically characterized by hundreds of students sitting in a lecture theater with one or two instructors. Given the high levels of students leaving science within the first year, tremendous energy and resources have been devoted to effective teaching practices in our first-year courses. Following the previous large-scale review of the BSc program (Foster et al., 2008 blue right-pointing triangle), the faculty of science made first-year course development the top priority for internal teaching and learning grants. There was also an institution-wide focus on “first-year experience” projects. The TPI data suggest that these central policies have had a measurable effect on improving our first-year courses.

The institution being studied also has a central policy on tutor training. The faculty of science runs 5 hours of training on topics such as the expectations of being a tutor, strategies for tutoring effectively, and how to provide assessment. This training is mandatory for all new tutors. Many departments (multidisciplinary organizational units) also require tutors to attend further training sessions specific to their classes (e.g., laboratory training). This method of having individual departments manage further tutor training is a common theme among Australian universities (Chan et al., 2007 blue right-pointing triangle; Mocerino et al., 2009 blue right-pointing triangle). This policy should have contributed to stronger scores for category VII (tutor training), but 55% of respondents responded “no” when asked if tutors “receive ½ day or more of training in teaching.” This result from the TPI survey revealed a communication problem at our institution, since effective policies and practice for tutor training are in place and have been for 5 years.

Contribution of Discipline Practice to Stronger TPI Results

Given the body of research highlighting the role of discipline on teaching (Becher and Trowler, 2001 blue right-pointing triangle; Mårtensson et al., 2012 blue right-pointing triangle) and the broad nature of our BSc, which includes majors in psychology, mathematics, and biosciences, we expected to find disciplinary differences. We have identified local practices that are associated with higher TPI scores for individual disciplines.

The regression analysis (Table 6) showed that courses in the biology and life sciences discipline scored higher for collaboration (category VIII), even after controlling for year level. Courses in this discipline are taught by two departments at our institution. While neither department has a specific policy regarding collaboration, both departments have regular staff meetings (one to two times per semester) to discuss assessment and curriculum. There may also be a greater need for collaboration in this discipline, because a small number of large first-year biology courses provide foundation material for a large number of different specialty areas across the life sciences. The staff teaching the higher-level courses need to coordinate with those teaching the foundation courses to ensure the appropriate prerequisite material is taught.

The regression analysis (Table 6) also showed that, after controlling for other variables, courses in the psychology discipline had higher scores for the tutor-training category. This follows from a long tradition of supporting and training tutors (teaching assistants) in this discipline at our institution. The tutor-training program used by the institution was originally developed in this discipline.

Future Policy and Practice Resulting from TPI Data

We can also identify areas to prioritize for future development from our TPI results, especially in comparison with other institutions.

Wieman and Gilbert (2014) blue right-pointing triangle piloted the TPI at a university known for its effort in enhancing science teaching and learning. Both their institution and ours are large, public, research-intensive universities. While changing science teaching practices is notoriously challenging in research-intensive universities (Wieman et al., 2010 blue right-pointing triangle), the comparative nature of this study, drawing on the TPI, indicates that improvement is possible. Furthermore, comparing results helps to make sense of the findings and provides evidence to have focused conversations among instructors, which Cooper et al. (2015) blue right-pointing triangle proposed as the key to transforming tertiary science education.

The comparison between institutions (Table 5) shows that scores at our institution were significantly lower than the comparison institution in the categories of in-class activity, assignments, and feedback. Our institution has adopted these as priority areas for our BSc program, by setting “high impact learning activities such as assessment and feedback” as the top priority for internal teaching and learning grants.

Our central policy has had a strong effect in some areas (see Central Policies That Contribute to Stronger TPI Results above), but it may be limited in its further effect on personal teaching practice given the importance of the local, department-level culture. It is well established that academics are most strongly influenced in their teaching practices by their immediate peers (Becher and Trowler, 2001 blue right-pointing triangle; Handelsman et al., 2004 blue right-pointing triangle; da Silva et al., 2009 blue right-pointing triangle; Mårtensson et al., 2012 blue right-pointing triangle). This is consistent with the department-specific practices we have identified at our institution. The TPI results also reveal exemplary courses already using a range of active-learning activities that could be shared within and across departments. Focusing professional development at the department and discipline levels, drawing on informal social learning and evidence of effective practice, is known to be effective in changing teaching practices (Handelsman et al., 2004 blue right-pointing triangle; Mårtensson et al., 2012 blue right-pointing triangle). A local approach can also help address the tendency of science academics not to engage with generic teaching development (Matthews et al., 2014 blue right-pointing triangle). As a result, our faculty of science has funded locally based teams to upgrade a few courses in each department: this funding is specifically tied to the adoption of high-impact practices listed in the TPI (see Table 4).

While strong local peer networks drive teaching practice, there is a danger they can become isolated, limiting opportunities to introduce new ideas, innovation, or change (e.g., Roxå and Mårtensson, 2009 blue right-pointing triangle). Matthews et al. (2015) blue right-pointing triangle argue that it is therefore vital to have links beyond the immediate local networks to spread innovation. For this reason, it will not be sufficient to work within the existing departmental networks, but it will be necessary to expose them to external connections. Matthews et al. (2015) blue right-pointing triangle discuss the role of academic developers who can join local networks to act as “weak ties” to bring in ideas from other networks. This connection is something that the central faculty body could support at a policy level, while still effecting change in the local environment.

Limitations of the TPI Data

Several possible limitations should be considered when interpreting our study. First, there might be some bias in how instructors self-report on their courses. Wieman and Gilbert (2014) blue right-pointing triangle conducted several tests of the TPI measurements. They did not find any significant differences when the data about the course structure were independently measured. They also tested the in-class activity data by direct observation in a number of courses using the COPUS (Smith et al., 2013 blue right-pointing triangle) and found no systematic difference in the results. Smith et al. (2014) blue right-pointing triangle conducted a more extensive comparison of TPI and COPUS results in 51 courses and also found they were consistent. Second, two instructors contacted us (after receiving a summary of results for their course) to report errors they had made when completing the TPI. As busy instructors rushed to complete the TPI, some misread questions and thus entered the wrong responses. Such errors in survey-type instruments are to be expected. Third, some terminology in the TPI initially confused instructors, particularly those not active in scholarly teaching approaches. We addressed this by piloting the instrument with colleagues to identify such terminology. It is particularly important to test the terminology when using the TPI in a different cultural/country context.

Conclusion and Recommendations

The complexities of changing teaching practices in undergraduate STEM courses are well articulated by Henderson et al. (2011) blue right-pointing triangle. They remind us that change has to align with the institutional and disciplinary norms and personal beliefs. The TPI was developed with an appreciation for the belief systems of scientists and provides data on teaching practices at the individual, departmental, and degree-program levels. As a research tool, it offers much-needed insight into how science courses are being taught. Furthermore, it provides data that can be used to inform and guide undergraduate curricular reviews in the sciences focused on “how teaching happens.” The data are actionable. If we shift our view from teaching to learning, from academic to student (Barr and Tagg, 1995 blue right-pointing triangle), then future research and practice has to view science curricula from the perspective of students who experience many courses over several years. As a result of this study, we make the following recommendations to engage instructors across multiple departments in a more integrative approach to promoting effective teaching practice:

  1. Focus the “unit of analysis” on a whole degree program, extending beyond departments, to characterize how students are being taught at the program level.
  2. Before administering the TPI, consult with faculty on the purpose of collecting such data.
  3. Analyze the TPI data to measure the effect of central policy and local department practice, as well as the use of teaching practices in individual courses.
  4. When planning staff development based on the TPI results, consider both central and local policy approaches that allow for the way staff form their beliefs about teaching practices.

Instruments like the TPI that can measure teaching practices across a whole degree program, as opposed to individual teaching champions, are what is needed to move undergraduate science education forward into the 21st century. This study has demonstrated that the TPI can be used in different institutional contexts and that it provides data that can inform both policy and practice.

Supplementary Material

Supplemental Materials:


We thank Carl Wieman and Sarah Gilbert for helpful advice about this study and for providing additional TPI data. We thank our colleagues in our institution’s central teaching and learning unit and our science teaching and learning committee for piloting the instrument and offering helpful feedback. We thank Gwen Lawrie (University of Queensland) for helpful comments on the draft paper. We particularly thank our anonymous referees and the editor for very insightful suggestions that have greatly enhanced this article. Finally, we are very grateful to those instructors who completed the TPI and engaged in workshops on the TPI results. This work was funded by a teaching fellowship to M.J.D. from the Institute for Teaching and Learning Innovation and the Faculty of Science, University of Queensland.


1We quote standard deviations of all values unless stated otherwise.


  • American Association for the Advancement of Science. Vision and Change: A Call to Action. Washington, DC: 2009.
  • Anderson WA, Banerjee U, Drennan CL, Elgin SCR, Epstein IR, Handelsman J, Hatfull GF, Losick R, O’Dowd DK, Olivera BM, et al. Changing the culture of science education at research universities. Science. 2011;331:152–153. [PubMed]
  • Andrews TC, Lemons PP. It’s personal: biology instructors prioritize personal evidence over empirical evidence in teaching decisions. CBE Life Sci Educ. 2015;14:ar7. [PMC free article] [PubMed]
  • Barr RB, Tagg J. From teaching to learning—a new paradigm for undergraduate education. Change. 1995;27:1–12.
  • Becher T, Trowler PR. Academic Tribes and Territories. 2nd ed. Ballmoor, Buckingham, UK: Open University Press; 2001.
  • Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Statist Soc B. 1995;57:289–300.
  • Berk R. Survey of 12 strategies to measure teaching effectiveness. Int J Teach Learn High Educ. 2005;17:48–62.
  • Biggs J. Enhancing teaching through constructive alignment. J High Educ. 1996;32:347–364.
  • Borrego M, Froyd JE, Hall TS. Diffusion of engineering education innovations: a survey of awareness and adoption rates in U.S. engineering departments. J Eng Educ. 2010;99:185–207.
  • Brownell SE, Tanner KD. Barriers to faculty pedagogical change: lack of training, time, incentives, and tensions with professional identity. CBE Life Sci Educ. 2012;11:339–346. [PMC free article] [PubMed]
  • Chan S, Crossley P, Deer L, Maguire D, Samson A, Anaid A. The role of the casual tutor in design and delivery of courses: experiences from teaching geopolitics in 2006. 2007. (accessed 8 September 2015)
  • Cooper MM, Caballero MD, Ebert-May D, Fata-Hartley CL, Jardeleza SE, Krajcik JS, Laverty JT, Matz RL, Posey LA, Underwood SM. Challenge faculty to transform STEM learning. Science. 2015;350:281–282. [PubMed]
  • Da Silva KB, Fawcett R, Hunter N, Buckley P, Roberts M, Dent L, Wood D, Gannaway D. Raising the Profile of Teaching and Learning: Scientists Leading Scientists. Canberra: Australian Government, Office for Learning and Teaching; 2009. (accessed 25 November 2015)
  • Dolan EL. Biology education research 2.0. CBE Life Sci Educ. 2015;14:ed1. [PMC free article] [PubMed]
  • Eagan MK, Stolzenberg EB, Berdan Lozano J, Aragon MC, Suchard MR, Hurtado S. Undergraduate Teaching Faculty: The 2013–2014 HERI Faculty Survey. Los Angeles: Higher Education Research Institute, University of California–Los Angeles; 2014.
  • Ebert-May D, Derting TL, Hodder J, Momsen JL, Long TM, Jardeleza SE. What we say is not what we do: effective evaluation of faculty professional development programs. BioScience. 2011;61:550–558.
  • Fairweather J. Linking Evidence and Promising Practices in Science, Technology, Engineering, and Mathematics (STEM) Undergraduate Education. Washington, DC: Board of Science Education, National Research Council, National Academies; 2008.
  • Foster J, Matthews KE, Mattick LE, McManus ME, Strong J. In: Self-review for Higher Education Institutions. Melbourne: Australian Universities Quality Agency; 2008. Self-review in higher education: experience from the University of Queensland; pp. 47–70.
  • Freeman S, Eddy SL, McDonough M, Smith MK, Okoroafor N, Jordt H, Wenderoth MP. Active learning increases student performance in science, engineering, and mathematics. Proc Natl Acad Sci USA. 2014;111:8410–8415. [PubMed]
  • Gess-Newsome J, Southerland SA, Johnston A, Woodbury S. Educational reform, personal practical theories, and dissatisfaction: the anatomy of change in college science teaching. Am Educ Res J. 2003;40:731–767.
  • Handelsman J, Ebert-May D, Beichner R, Bruns P, Chang A, DeHaan R, Gentile J, Lauffer S, Stewart J, Tilghman SM, Wood WB. Scientific teaching. Science. 2004;304:521–522. [PubMed]
  • Henderson C, Beach A, Finkelstein N. Facilitating change in undergraduate STEM instructional practices: an analytic review of the literature. J Res Sci Teach. 2011;48:952–984.
  • Henderson C, Dancy M, Niewiadomska-Bugaj M. Use of research-based instructional strategies in introductory physics: where do faculty leave the innovation-decision process. Phys Rev Spec Top Phys Educ Res. 2012;8:1–15.
  • Hora MT, Ferrare JJ. Instructional systems of practice: a multidimensional analysis of math and science undergraduate course planning and classroom teaching. J Learn Sci. 2013;22:212–257.
  • Knight P. Complexity and curriculum: a process approach to curriculum-making. Teach High Educ. 2001;6:369–381.
  • Libarkin J. Concept inventories in higher education science. 2008. Prepared for the National Research Council Promising Practices in Undergraduate STEM Education Workshop 2, held 13–14 October 2008, in Washington, DC.
  • Lund TJ, Pilarz M, Velasco JB, Chakraverty D, Rosploch K, Undersander M, Stains M. The best of both worlds: building on the COPUS and RTOP observation protocols to easily and reliably measure various levels of reformed instructional practice. CBE Life Sci Educ. 2015;14:ar18. [PMC free article] [PubMed]
  • Macdonald RH, Manduca CA, Mogk DW, Tewksbury BJ. Teaching methods in undergraduate geoscience courses: results of the 2004 on the cutting edge survey of U.S. faculty. J Geoscience Educ. 2005;53:237–252.
  • Marbach-Ad G, Ziemer KS, Orgler M, Thompson KV. Science teaching beliefs and reported approaches within a research university: perspectives from faculty, graduate students, and undergraduates. Int J Teach Learn High Educ. 2014;26:232–250.
  • Marsh HW. In: The Scholarship of Teaching and Learning in Higher Education: An Evidence-Based Perspective. Netherlands: Springer; 2007. Students’ evaluations of university teaching: dimensionality, reliability, validity, potential biases and usefulness; pp. 319–383.
  • Mårtensson K, Roxå T, Stensaker B. From quality assurance to quality practices: an investigation of strong microcultures in teaching and learning. Stud High Educ. 2012;39:1–12.
  • Matthews KE, Crampton A, Hill M, Johnson ED, Sharma MD, Varsavsky C. Social network perspectives reveal strength of academic developers as weak ties. Int J Acad Dev. 2015;20:238–251.
  • Matthews KE, Lodge JM, Bosanquet A. Early career academic perceptions, attitudes and professional development activities: questioning the teaching and research gap to further academic development. Int J Acad Dev. 2014;19:112–124.
  • McManus ME, Matthews K. In: Transforming Institutions: Undergraduate STEM Education for the 21st Century. West Lafayette, IN: Purdue University Press; 2015. Review of the undergraduate science curriculum at University of Queensland.
  • Mercer-Mapstone LD, Matthews KE. Student perceptions of communication skills in undergraduate science at an Australian research-intensive university. Assess Eval High Educ. 2017;42:98–114.
  • Mocerino M, Yeo S, Zadnik M. Preparing demonstrators for first year science laboratories. 2009. In: Proceedings of The Australian Conference on Science and Mathematics Education, held 30 September–2 October 2009, in Sydney, Australia.
  • National Research Council. Discipline-Based Education Research: Understanding and Improving Learning in Undergraduate Science and Engineering. Washington, DC: National Academies Press; 2012.
  • Roxå T. Significant conversations and significant networks—exploring the backstage of the teaching arena. Stud High Educ. 2009;34:547–559.
  • Sawada D, Piburn MD, Judson E, Turley J, Falconer K, Benford R, Bloom I. Measuring reform practices in science and mathematics classrooms: the Reformed Teaching Observation Protocol. School Sci Math. 2002;102:245–253.
  • Shevlin M, Banyard P, Davies M, Griffiths M. The validity of student evaluation of teaching in higher education: love me, love my lectures. Assess Eval High Educ. 2000;25:397–405.
  • Smith MK, Jones FHM, Gilbert SL, Wieman CE. The Classroom Observation Protocol for Undergraduate STEM (COPUS): a new instrument to characterize university STEM classroom practices. CBE Life Sci Educ. 2013;12:618–627. [PMC free article] [PubMed]
  • Smith MK, Vinson EL, Smith JA, Lewin JD, Stetzer MR. A campus-wide study of STEM courses: new perspectives on teaching practices and perceptions. CBE Life Sci Educ. 2014;13:624–635. [PMC free article] [PubMed]
  • Theobald R, Freeman S. Is it the intervention or the students? Using linear regression to control for student characteristics in undergraduate STEM education research. CBE Life Sci Educ. 2014;13:41–48. [PMC free article] [PubMed]
  • Trigwell K, Prosser M, Taylor P. Qualitative differences in approaches to teaching first year university science. High Educ. 1994;27:75–84.
  • Wachtel HK. Student evaluation of college teaching effectiveness: a brief review. Assess Eval High Educ. 1998;23:191–212.
  • Wieman C. A better way to evaluate undergraduate teaching. Change. 2015 (accessed 16 December 2015)
  • Wieman C, Gilbert S. The Teaching Practices Inventory: a new tool for characterizing college and university teaching in mathematics and science. CBE Life Sci Educ. 2014;13:552–569. [PMC free article] [PubMed]
  • Wieman C, Perkins K, Gilbert S. Transforming science education at large research universities: a case study in progress. Change. 2010;42:6–14.
  • Wilson R. Why teaching is not priority no. 1. Chron High Educ. 2010, September 5

Articles from CBE Life Sciences Education are provided here courtesy of American Society for Cell Biology