|Home | About | Journals | Submit | Contact Us | Français|
Undergraduate biology education reform aims to engage students in scientific practices such as experimental design, experimentation, and data analysis and communication. Graphs are ubiquitous in the biological sciences, and creating effective graphical representations involves quantitative and disciplinary concepts and skills. Past studies document student difficulties with graphing within the contexts of classroom or national assessments without evaluating student reasoning. Operating under the metarepresentational competence framework, we conducted think-aloud interviews to reveal differences in reasoning and graph quality between undergraduate biology students, graduate students, and professors in a pen-and-paper graphing task. All professors planned and thought about data before graph construction. When reflecting on their graphs, professors and graduate students focused on the function of graphs and experimental design, while most undergraduate students relied on intuition and data provided in the task. Most undergraduate students meticulously plotted all data with scaled axes, while professors and some graduate students transformed the data, aligned the graph with the research question, and reflected on statistics and sample size. Differences in reasoning and approaches taken in graph choice and construction corroborate and extend previous findings and provide rich targets for undergraduate and graduate instruction.
Graphs are the main components of the scientific language, because they can be used to condense and summarize large data sets. The result is a symbolic representation that displays experimental findings used by scientists for communication (Beichner, 1994 ; Tairab and Al-Naqbi, 2004 ; Wainer, 2013 ). The development of the skill to create appropriate and clear graphs is necessary for the scientifically literate individual (Padilla et al., 1986 ). Indeed, recent calls to reform the undergraduate curriculum include incorporating aspects of data literacy into the science, technology, engineering, and mathematics disciplines. Within the discipline of biology, there is an emphasis on the infusion of quantitative reasoning into the classroom, including creating and interpreting graphical representations (Association of American Medical Colleges, 2009 ; American Association for the Advancement of Science, 2011 ). The increasing implementation of course-based undergraduate research experiences (CUREs) emphasizes the importance of understanding how students grapple with data and data presentation to facilitate their mastery of this skill (see Figure 1 in Auchincloss et al., 2014 ). Furthermore, current studies in the field of biology education have shown that students who engage in research practices feel more inclusive in the learning process and gain better science process skills, such as data analysis and graphing (Bangera and Brownell, 2014 ; Brownell et al., 2015 ; Linn et al., 2015 ).
The purpose of a graph is to communicate observational or numerical data in a visual format (Tufte, 1983 ; Leinhardt et al., 1990 ), with the hope that the graph is interpreted in the same manner and with the same take-home message as the graph constructor intended. Extensive research has documented student difficulties with graph interpretation. Tairab and Al-Naqbi (2004) showed that students in 10th grade had difficulty understanding that the x- and y-axes illustrate the relationship between the independent and dependent variables. Other studies show similar difficulties with interpreting interactions and slope of a line (Preece and Janvier, 1992 ; Picone et al., 2007 ; Colon-Berlingeri and Burrowes, 2011 ).
While these studies focused on graph interpretation, the concepts and skills that they studied are integral to graph construction as well. Before constructing the graph, the graph constructor should have a clear purpose in mind, along with an adequate understanding of variables and graph types (Berg and Smith, 1994 ; Friel and Bright, 1996 ; Clase et al., 2010 ; Grunwald and Hartman, 2010 ; Angra and Gardner, 2016 ). For a graph to be an effective communication piece for both the creator and the observer, four main components should be considered: 1) data form, 2) graph choice, 3) graph mechanics, and 4) aesthetics and visuospatial aspects. While these are four distinct components, they are all interrelated and influence the type and quality of the message communicated by the graph (Table 1). For example, the form of the plotted data (e.g., raw data vs. averages) can influence the type of graph and labeling used to clearly display those data.
Owing to its complexity, choosing and constructing an appropriate graph for data can be considered a problem-solving task (Angra and Gardner, 2016 ). Our previous findings (see Figure 1 in Angra and Gardner, 2016 ) on the steps taken during a pen-and-paper graphing construction task by expert professors resembled the four steps of Polya’s problem-solving cycle in mathematics (Polya, 1945 ). Polya’s problem-solving model has been adapted based on the data and trends that have emerged from our work to explain expert graph-construction behavior and can be distilled into three phases: planning, execution, and reflection (for a detailed description, see Angra and Gardner, 2016 ). During the planning phase, before the graph is constructed, data to be plotted are evaluated, understood, and characterized. Specifically, decisions on the purpose for graphically displaying the data are clarified, ways to organize the data on the graph are considered, decisions on data transformation are made, and a graph type is chosen (Friel and Bright, 1996 ; Ainley et al., 2000 ; Patterson and Leonard, 2005 ; Angra and Gardner, 2016 ). During the execution phase, the graph is constructed with appropriate elements of graph mechanics for clear communication (e.g., descriptive title, variables on axes, scales appropriate for data, key, etc.) and data are plotted (Angra and Gardner, 2016 ). Finally, during the reflection phase, the constructed graph is critiqued, graph choice is evaluated, and the graph is checked for alignment with the intended purpose (Angra and Gardner, 2016 ).
As noted, current trends in biology education engage students in data analysis and graphing; however, students across the K–16 continuum struggle with many fundamental concepts and skills relevant for graphing, including scaling axes, using a best-fit line, and assigning variables to axes (Padilla et al., 1986 ). Further, while there are standards and recommendations for K–16 education in areas related to quantitative literacy (Aliaga et al., 2005 ), standards for graduate education have been lacking. There have been increased efforts to formalize quality training for graduate students as instructors (Schussler et al., 2008 ; Reeves et al., 2016 ) and scholars (National Institutes of Health [NIH], 2016 ; National Science Foundation [NSF], 2016 ). However, specific objectives for concepts and skills for all graduate students to master have not been widely implemented outside the activities of funded programs, such as training grants (NIH, 2016 ). Quantitative skills related to data representation are most likely developed by graduate students through experience reading primary literature, analyzing and presenting their own data, and with guidance from their research mentors. However, graphing difficulties exist and have been documented in individuals who possess advanced and/or terminal degrees, that is, professors (Bowen and Roth, 2005 ), professionals (Rougier et al., 2014 ; Weissgerber et al., 2015 ) and medical doctors (Cooper et al., 2001 , 2002 ; Schriger and Cooper, 2001 ; Schriger et al., 2006 ).
Previous studies share suggestions and sample data sets to encourage practice with graph creation (Tairab and Al-Naqbi, 2004 ; Patterson and Leonard, 2005 ; Bray-Speth et al., 2010 ). For instance, Patterson and Leonard (2005) advocate for training students to use software for graph construction, using a balance of analytical thought and creative artistry. However, before letting students use software, they suggest that students should focus on the message they want to communicate in a graph, explain the appropriate statistics, and sketch a graph by hand so they know what the end product produced by the software should look like (Patterson and Leonard, 2005 ). Other suggestions to remediate graphing difficulties include incorporating graphing into the science classroom. This will provide more opportunities, repetition, and student–instructor feedback to tackle graphing difficulties and increase student competency with graphing (Roth and McGinn, 1997 ; Roth and Bowen, 2001 ; McFarland, 2010 ; Harsh and Schmitt-Harsh, 2016 ).
The best methods and techniques for graph construction when translating raw data into a graph are still unknown, which can lead to challenges for both undergraduate and graduate students and active research scientists. The underlying thought processes used by graph constructors when choosing and constructing graphs are not fully understood. Therefore, one problem we face is having an incomplete understanding of the reasoning that occurs during graph choice and construction. While constructing a graph using software programs is useful and replicates the authentic graph-making processes that occur in classrooms and laboratories, it can interfere with thoughtful and reflective decision making. Software programs overload the graph constructor with multiple graphing choices, without having the graph constructor reflect on decisions regarding variables, data, graph choice, and the purpose of the graph. In this study, we aim to uncover the reasoning that occurs during graph choice and construction and the attributes of the resulting graphs by using the pen-and-paper mode of graph construction.
Our study design and data analysis are guided by the metarepresentational competence (MRC) framework (diSessa and Sherin, 2000 ). This framework outlines the knowledge and reflective reasoning practices that an individual competent in creating external representations (e.g., graphs), such as an expert scientist, would exhibit. As such, implicit in the MRC framework are expert-like knowledge and skill (diSessa, 2004 ), which can provide helpful benchmarks when studying student MRC (National Research Council, 2000 ; diSessa, 2004 ) and can inform classroom practices. The components of the MRC framework can be leveraged to reveal a person’s areas of competence and difficulty with graph choice, construction, and critique. Specifically, these components are invention, critique, functioning, and learning or reflection (diSessa and Sherin, 2000 ; summarized in Table 2). In our study the MRC component of invention is assumed, because all participants created a graph. Therefore, we use the last three components from MRC to define graph-construction reasoning as a persons’ reflection on graph choice and construction by understanding the function of different types of graphs and being able to thoughtfully analyze a graph based on the type of data it is representing, variables, and the overall advantages and disadvantages of the chosen graph. As diSessa (2004) argues, creating a graph is not a difficult task, but the act of being critical, reflecting on the task and the graph itself, is what needs to be practiced to gain automaticity and independence with graphing.
The overarching research objective of this study is to elucidate the differences in graph-construction reasoning that may exist among undergraduate students, graduate students, and professors in the biological sciences. To accomplish this objective, we sought to answer two questions:
In this study, we used a pen-and-paper graphing task in the context of think-aloud interviews to describe the reasoning behind graph choice and construction and the final graph artifacts. All interviews were conducted between March 2013 and October 2014. The LiveScribe pen was used to collect data, as it synchronizes written notes with recorded audio and has an embedded infrared camera that detects pen strokes when used with the LiveScribe dot paper (LiveScribe, 2015 ). Participants were randomly presented one of two scenarios (i.e., bacteria or plant scenario; Supplemental Material, Table 1) predetermined before the interview. Participants were asked to read the scenario prompt aloud and were then instructed to create a graph from the data in the scenario, narrating their thought process during this graph-construction task. Constructing a graph by hand may not be an everyday activity that most participants engage in, neither is thinking aloud while performing a task. To account for this, the interviewer gently probed the participants to articulate their thinking, especially if there were prolonged silences during graph construction. The think-aloud format provided insight into the thought process and reasoning, which was then used to characterize and delineate differences between experts and novices (Angra and Gardner, 2016 ). Think-aloud interviews are reliable sources of data, because they reveal the thought processes that occur and the sequences of thought (Ericsson, 2006 ). Several studies have found no evidence for differences in the accuracy of performance between those who silently completed the task versus those who verbalized their thoughts (Ericsson and Simon, 1993 ; Ali and Peebles, 2011 ). This gave us confidence that active narration would not influence the performance with the graphing task. After the participants finished their graph construction, the interviewer intervened and asked them to reflect on the following questions:
The graphing task, with associated interview, ranged between 10 and 30 minutes in duration.
The development of the scenario used in our think-aloud interviews involved outside validation and literature review. Knowing that some of our participants would have had at most a partial semester of introductory biology at the time of the interview, we consulted an award-winning high school teacher to get her opinion on biological scenarios that would be familiar to students who had ninth-grade biology. We used two scenarios: bacterial growth or plant growth (Table 1 of the Supplemental Material), because we wanted to minimize the threats to internal validity: instrumentation and diffusion of treatment (Drost, 2011 ). Both bacteria and plant scenarios are isomorphic, consisting of a dependent variable, independent variable, and two treatments with three replicates in each treatment. Simple numbers were used, so participants could easily manipulate the data, if they chose to do so (Konold et al., 2015 ). In four sentences, the scenario provided the participants with a brief background and a data table that organized the elements mentioned earlier. We organized data in a table instead of a paragraph with numbers, because in scientific practice, data are often initially organized in a table so that it is easy for the graph constructor to visualize the raw values (Wainer, 2013 ). To validate the graph-construction prompts, we piloted the plant and bacteria scenarios with two undergraduate biology students and one professor. Pilot interviews were conducted in Fall 2012 to solidify the interview protocol and prompts and gauge the amount of time it took to construct a graph (Seidman, 2013 ). To ensure that the graphing scenario and task of constructing a graph while thinking aloud aligned our research questions, pilot interviews were transcribed and memoed (Patton, 2001 ) to look for ideas previously reported in the graphing literature.
As part of a larger, multipart graphing study, undergraduate students, graduate students, and professors were recruited from the biological sciences department at a large, midwestern research university. A stratified, purposeful sampling method was used to obtain the target population (Hatch, 2002 ). To obtain a heterogeneous and representative sample of the undergraduate student population, we sent recruitment emails to faculty teaching large biology courses. Personal recruitment emails were sent to graduate students and biology faculty from diverse biological subdisciplines. All recruitment methods were approved by the Institutional Review Board (protocol no. 1210012775). Recruitment criteria for undergraduate students were based on 1) their status as or intention to be a biology major and 2) their current enrollment in or successful completion of the introductory biology lecture and laboratory course. At the time of recruitment, undergraduate research experience was not one of our criteria, but it was incorporated postinterview, based on literature outlining data representation skills and concepts students learn while engaged in research (Auchincloss et al., 2014 ). In this paper, we report data from undergraduate students who did not have research experience at the time of the interview (UGNRs) and undergraduate students who did have research experience (UGRs). Recruitment criteria for graduate students (GSs) were based on 1) their enrollment in the graduate program—all graduate students were pursuing a PhD degree; 2) successful completion of their qualifier examination taken at the end of their first year; and 3) their having held a teaching assistantship or having mentored undergraduate students. Criteria for professors were based on 1) their credentials—all professors held a PhD in a subdiscipline of biology; 2) their having an active research laboratory with postdocs, graduate students, and/or undergraduate students; and 3) their having taught for at least 1 year.
Our initial pool of participants included seven professors, 13 graduate students, and 39 undergraduate students. This pool was reduced based on the following inclusion criteria. To minimize the threat to internal validity, we eliminated the six undergraduate and one graduate student interviews that were conducted early in the project with an interviewer who did not follow the think-aloud protocol with high fidelity. From the remaining 33 undergraduate student interviews that were conducted by the first author (A.A.), we further eliminated students who spontaneously constructed multiple graphs during the first prompt to construct a graph, as they did not articulate their reflection on graph choice for all graphs they constructed, and the interviewer felt it was inappropriate to interrupt the flow of thought during graph construction. Although these data are interesting and will be analyzed in future work, for this study, we chose to exclude them to ensure uniformity across all participant groups. The same criteria were applied to graduate students and professors. Our final participant pool consisted of five professors, eight graduate students, and 15 undergraduate students. Of the 15 undergraduate students, 10 reported having no research experience and five reported having research experience. In this study, we categorized and defined our most novice participants as the ones who reported not having any research experience, followed by undergraduate students who reported research experience, graduate students, and finally, the professors, who each had more than 10 years’ experience conducting research and constructing graphs. Participants in our study represented many subdisciplines in biology. Professors’ specialties ranged from cellular neurobiology to behavioral ecology, while the graduate students’ research interests ranged from virology to avian behavior. The Supplemental Material, Tables 2–4, provides demographic information for our participants. Because undergraduate research experiences vary immensely, we found that using the relative approach described here to group experts as professors, graduate students as advanced, undergraduate students with research experience as intermediates, and undergraduate students without research experience as novices (Chi, 2006 ) to be a useful method of analysis.
Think-aloud interviews were transcribed verbatim and systematically organized and coded using inductive analysis to address the first research question (Strauss and Corbin, 1998 ; Patton, 2001 ). This initial step of transcript segmentation began the process of open coding within each phase of thought (planning, execution, and reflection phases). Selective coding was then used to organize the codes into a story that described the complex network of themes that emerged (Creswell, 2013 ). For the final step, themes from the selective coding step were aligned to the categories present in the MRC framework. The first author (A.A.) independently coded all transcripts from the think-aloud interviews and compared her codes with 20% of those coded by the second author (S.M.G.). Both authors met regularly to compare and discuss the coding, until a consensus was reached on the final codes and themes.
To see whether there was a difference among the participant groups in terms of the time it took to plan, construct, and reflect on the graph, we conducted an independent-samples t test using Statistical Package for the Social Sciences, version 22 (SPSS v. 22; IBM, 2013 ). Levene’s test for the equality of variance was conducted, and equal variances were not assumed when reporting the p value (α < 0.05; SPSS v. 22; IBM, 2013 ). Because we were interested in differences across participant groups, we did not perform inferential statistics across phases of the graph interview. Professors also used more words than undergraduate students in their thought processes and explanations. Roth and Bowen (2003) used word analysis to understand how experts interpreted graphs. We used a similar method to quantify and characterize the number of words spoken during each phase by the participants. Transcripts were coded in Microsoft Word by placing portions of the interview transcript under specific codes in our codebook. To standardize time spent talking by each participant, we performed word analysis. Words mentioned multiple times within a given phase were counted and coded once. The words for each code were counted and the number was divided by the total number of words uttered by the participant. This number was then multiplied by 100 to obtain the percentage of words uttered for particular codes for the particular phase. Final results are displayed in Figure 1.
Owing to the small sample size in each participant group, statistics for themes on the qualitative interview data are not reported, but the absence or presence of themes and the occurrence of the MRC categories between the three participant groups are summarized in Figure 2.
For addressing the second research question, graphs constructed by professors, GSs, and the two undergraduate population groups (UGNR and UGR) were described qualitatively based on four broad categories: graph mechanics, data form, graph choice, and aesthetics. The evaluation categories are listed in Table 1.
To answer our first research question, we identified the themes that emerged from the transcripts from our think-aloud graph-construction interviews for each phase of the graph-construction process (planning, construction, and reflection). We mapped the emergent themes to the categories of the MRC framework.
The planning phase occurred after participants were presented with the task and before they began graph construction, as indicated by the drawing of the axes. Figure 1 displays the amount of time the participants spent talking in each of the interview phases. Looking across the three phases and at the four participant groups, we notice that, relative to the other two phases in the interview, participants spent the smallest amount of time planning. Within the planning phase, almost everyone took time to think about the scenario and data before proceeding with graph construction. This is indicated by the sample size in the second column in Figure 2.
Three out of the four categories from the MRC framework map onto the planning phase: function, invention, and learning/reflection (Figure 2 and Table 3). The definitions of the themes, example quotes, and the alignment of the themes to the MRC categories can be found in Table 3. Within the MRC category invention, the themes of data type and graph construction were prevalent across the multiple participant groups. However, the theme data type was seen only for one UGR, and the theme graph construction was seen for only one UGNR. Within the MRC category function, the themes purpose and graph choice emerged. In the planning phase, the theme purpose was observed only for professors and UGRs. The theme graph choice was observed for multiple GSs and UGRs, but only for one UGNR. Professors were unique in that they were the only group who did not explicitly state the graph choice in the planning phase.
Finally, within the MRC category learning/reflection, the theme data table appeared with multiple subjects and across all participant groups.
The construction phase followed the planning phase and began with the drawing of the axes and ended when a participant signaled that he or she had finished constructing the graph. Relative to the planning phase, most participants spent more time constructing their graphs (Figure 1). However, professors spent less time than the other three participant groups. This is consistent with the graphs they created (see Graph Attributes). Although each participant constructed a graph, some of the participants regurgitated the information presented in the data table and focused on plotting points, labeling axes, titling the graph, making a key, and scaling the axes.
All four MRC categories were present in the construction phase, with a focus on invention (Figure 2). Ideally, as participants were constructing their graphs, they also should have been reflecting on their graph choice, critiquing the data provided, and ending with a take-home message of the data they just plotted. A summary of the MRC categories, themes, and examples from transcripts is displayed in Table 4. Compared with the planning phase, there was more diversity in the distribution of themes across the MRC categories and across the participant groups during the construction phase (Figure 2).
Themes within the MRC category invention were similar across the participant groups and were data type, statistics, and graph construction. However, the theme statistics was seen only for one UGR. Within the MRC category critique, sample size was seen for multiple professors, but only one UGNR. Professors critiqued the data presented and indicated that a bigger sample size would be preferable to run inferential statistics (Figure 2). However one professor connected the small sample size to a possible real-life situation a biologist could encounter, saying “With 3 plants in each, I guess you could put a standard error on that, n = 3 is pretty small but sometimes in biology, you are stuck with pretty small.” The theme aesthetics emerged for multiple UGNRs and one UGR, but did not appear for professors or GSs. Within the MRC category function, only one theme, graph choice, emerged for GSs, UGRs, and UGNRs. Within the MRC category of learning/reflection, the themes technology and evaluation emerged. While evaluation was prevalent for multiple participants across all groups, technology was only present in GSs and UGRs.
The reflection phase followed the construction phase and began when the interviewer intervened and probed the participants to elaborate on their graph choice and what they plotted. Figure 1 displays the amount of time the participants spent answering the reflection question “Why did you choose to make this type of graph?” There was a significant difference in the amount of time spent reflecting between GSs and UGRs (p < 0.05; independent-samples t test, SPSS v. 22) and GSs and UGNRs (p < 0.01; independent-samples t test, SPSS v. 22).
All four MRC categories were present in the reflection phase, which specifically targeted the learning and reflection category. We expected participants to elaborate on graph choice, using the graph created in the construction phase (invention) to provide a reflection and critique. A summary of the MRC categories, themes, and examples from transcripts are displayed in Table 5.
All participants provided an answer for this phase, and the most prevalent theme across the participant groups was evaluation, which is not surprising, because the participants were probed to reflect on their graph choice. However, there were different reasoning categories under this theme. Four UGNRs and four GSs used their personal experiences and intuitions when reflecting on their graph choice; two UGNRs, two GSs, and professors used this opportunity to justify their graph choice by explaining why bar, pie, and scatter plots would not accurately display the data; two UGNRs and one GS formulated the take-home message for the graph; and the other two used the data table to justify their reasoning for constructing a line graph—a theme that was not seen in the professor group and was only seen with one UGR and GS. It is also interesting to note that the themes purpose and variables were present only in the GS and professor populations. The professors stated the purpose of the experiment and aligned it with the message portrayed by their graph.
All of the participants who mentioned time in their reflection constructed line graphs. We did not notice differences in the participants’ graph reflection themes and the graphing scenarios.
The distribution of themes within the MRC categories and across the participant groups was the most diverse in the construction and reflection phases (Figure 2). Across all the MRC categories, there were multiple instances in all population groups when all three themes under the MRC category of invention were mentioned in either the planning, construction, or reflection phases (see Figure 2). In the MRC category function, the theme graph choice appeared for all participant groups and multiple times either in the planning, construction, or reflection phases. Another theme that was well represented across the construction and reflection phases for all participant groups was evaluation, and it fell under the MRC learning and reflection category. A second theme under this same category, data table, was common across all participant groups, but only in the planning phase. Remaining themes under the MRC categories critique, function, and learning and reflection were less frequent.
To address our second research question aimed at characterizing the quality and attributes of graphs constructed by participants, we described the graphs qualitatively based on similarities and differences that emerged across participants and participant groups (Table 1, Figure 3, and Figures 2–8 in the Supplemental Material).
Graphs constructed by undergraduate students (UGRs and UGNRs) and graduate students (GSs), but not professors, followed basic graph conventions and included meticulously labeled axes, titles, tick marks, scale, and key. Ten of the 15 undergraduate students titled their graphs, whereas only one of the eight GSs and one of the five professors titled their graphs. In terms of axis labels, all participants labeled their axes appropriately based on the data they chose to plot with time on the x-axis and either number of leaves or cells on the y-axis. However, one UGNR struggled with labeling the axes, initially having a difficulty deciding how to organize the axes and label them such that the independent variable, time, is on the y-axis instead of the x-axis. Almost all participants indicated time in either minutes or hours. All participants had an appropriate scale, except for Professor 2, who did not scale the y-axis. Two students did not plan ahead concerning the space they needed for the scale, realizing midway through the scaling process that they were running out of space, so they decided to add axis breaks (Figures 1–8 of the Supplemental Material). In contrast to the undergraduate and graduate students, professors tended to sketch their graphs, omitting detailed axis labels and meticulous plotting (Figures 1–8 of the Supplemental Material).
Of the 15 undergraduate students, eight plotted all of the raw data points, four plotted some of the raw data, and three plotted averages. In contrast, graduate students and professors and three undergraduate students collapsed the data, plotted transformed data values, and sketched error bars (descriptive statistics) or mentioned a statistical test they would run (inferential statistics) to show meaningful trends and changes.
Participants who were randomly assigned the bacteria scenario generally constructed a line graph, except for three graduate students who constructed either a scatter or a bar graph (Figures 1–8 of the Supplemental Material). Line graphs represent the general consensus for this scenario in biology textbooks (e.g., Freeman et al., 2017 ) and primary literature (e.g., Ratnowsky et al., 1982 ; Zwietering et al., 1990 , 1991 ), because they are associated with either logistic or exponential growth models. There are also studies that report data on bacterial growth with temperature in bar graphs, box-and-whisker plots, and categorical dot plots (e.g., Seel et al., 2016 ). There was greater variety among the graphs constructed by participants who were randomly assigned the plant scenario (Figures 2, 4, 6, and 8 of the Supplemental Material). These results are similar to the bar and line graphs displayed by Mayak et al. (2004) , looking at how water affects plant growth. In our study, we did not see specific themes that were exclusive to either only the bacteria or the plant scenario. We did notice that some of the participants who constructed a line graph used the theme time in their graph reflection.
The graphs constructed by all participants were, in general, aesthetically sound, and the presence of gestalt principles (i.e., proximity, continuity, and connectedness) enabled easy observation of the general data trends and take-home message. The ink-to-white space was appropriate, and what was plotted was clear without extraneous elements. However, there were five graphs that had too many lines with overlapping data point labels, which made it difficult to understand the take-home message. In particular, the graph constructed by UGNR3 was sufficiently unclear that the viewer found it difficult to identify the data points and formulate a clear take-home message (Figures 1–4 of the Supplemental Material).
An important purpose of graphs that summarize data is the alignment of the data presented and graph chosen with the research question and/or hypothesis. In our interview task, this was looking at either how temperature affects the growth of bacteria or how the amount of water influences plant growth. The graphs of four undergraduate students did not align with the research question or hypothesis, as only a subset of the data was plotted (e.g., data from one treatment). All graphs constructed by graduate students aligned with the research question posed in the task.
In this study, we used the MRC framework to understand how undergraduate students, graduate students, and professors reason with graph choice, data, and graph construction and how the attributes of the graphs constructed by the study participants might differ.
Implicit in the MRC framework is expert competence with creating and understanding external representations. While all participants engaged in reasoning within all MRC categories, there is evidence for expert–novice differences across our participant groups (Figure 2). All professors took time to understand the data before proceeding with graph construction, and all but one graduate student planned, whereas only some of the undergraduate students planned before proceeding with graph construction. Generally, we saw that, when reflecting on their graphs, expert professors focused on the function of the graph and showcased their understanding with concepts related to experimental design, while novice undergraduate students generally relied on their intuition and data given to them in the task. We also saw expert–novice differences in the data plotted in the graphs of undergraduate students, graduate students, and professors. Most undergraduate students meticulously plotted all raw data, whereas most professors and graduate students plotted transformed data values. Our data are reminiscent of an expert–novice study conducted in the context of neurobiology that also noted differences in drawing of neurons by undergraduate students, graduate students, and laboratory leaders (professors; Hay et al., 2013 ). Undergraduate students’ representations were meticulous reproductions of neurons illustrated in textbooks. Neuron drawings by graduate and postdoctoral students closely resembled images seen under the microscope and were influenced by observations from their research projects, whereas the expert laboratory leaders used years of research experience to create imaginative drawings based on hidden hypotheses. Findings reported by Hay et al. (2013) and our graphing study are supported by the National Research Council (2000) , which states that experts organize their knowledge in a way that reflects a deep understanding of the subject matter and expert knowledge cannot be recalled as a set of isolated facts but is applied to the context or the problem that is being solved. Deep understanding is evident in professors’ graph reflections as they talk about the purpose of the graph, experimental design, and relevant concepts that are not present in the reasoning of the undergraduate students. Jordan et al. (2011) found that, when solving a task, experts were more likely to use their prior knowledge and discuss ideas at a broader context as compared with novices, who solved the task with only the information given to them. Likewise, in the Hay et al. (2013) study, neuron drawings by the laboratory leaders were original and unlike those found in textbooks, because the experts’ drawings were informed by years of experience and accumulated knowledge.
Our study revealed that, while all participant groups showed evidence of reasoning within all MRC categories, the identity of that reasoning was often different in a manner that is consistent with expected expert–novice differences as highlighted earlier. Further, the graphs produced by participants in the study also varied along the novice–expert continuum. Figure 4 summarizes the graph-construction reasoning, behaviors, and graphs that we observed in the most novice and most expert participants. The distinctions summarized in this figure highlight the beginning of hypothetical learning trajectories and potential target areas for instructors to promote more expert-like reflective data handling and graphing practices. As more undergraduate students are encouraged to engage in inquiry and research project–based biology labs and seek research apprenticeship opportunities during their higher education, they will be engaged in the scientific practice of data analysis and presentation. Therefore, it is important to provide students with targeted instruction that not only advances their biology content knowledge but also facilitates their data handling and representation skills toward expertise. While students have experience with graphing dating back to elementary school, our data suggest that refocusing and scaffolding their data handling and graphing activities in the context of their undergraduate learning experiences is needed. Kim and Hannafin (2011) suggest designing and implementing instructional scaffolds that target student difficulties with conceptual, procedural, metacognitive, and strategic knowledge (Kim and Hannafin, 2011 ).
Conceptual scaffolds, as they relate to graphing, can structure students’ understanding of the purpose of a graph and allow them to gauge their graph knowledge. Sketching a graph to visualize concepts in experimental design is an approach suggested by Dasgupta et al. (2014) . Procedural scaffolds help students learn the stepwise procedures that underlie graph choice and construction. There are many published examples that emphasize taking a procedural approach to graphing (Kosslyn, 1994 ; Paniello et al., 2011 ; Webber et al., 2014 ; Duke et al., 2015 ). Metacognitive scaffolds allows students to monitor their problem-solving processes with a focus on constant reflection (Kim and Hannafin, 2011 ). We published a tool (see Step-by-Step Guide in Angra and Gardner, 2016 ) that helps students plan their data, construct graphs, and then reflect on their graphs in a methodical manner. This tool is a metacognitive scaffold (Kim and Hannafin, 2011 ), because it contains the reflection piece after graph construction. Even in this study, the interviewer followed up with participants with reflective questions asking about graph choice. In a classroom setting, instructors can include reflective prompts throughout multiple assignments to help students develop their metacognitive abilities. The last scaffolding strategy, strategic scaffolds, challenges students to consider other options as they are solving problems. Although previously published graphing materials provide students with many examples of graphs, these resources do not provide explicit strategic scaffolding, because they do not ask students to consider other options.
Using these tools and scaffolding strategies to emphasize graph choice and construction skills will encourage students to think critically about data and graphs in and outside the classroom. This is important, because students are rarely asked to reflect critically on the affordances and limitations of representations that they choose (diSessa and Sherin, 2000 ). Incorporating these and other graphing materials during teacher education may provide teachers with tools to guide students successfully and confidently toward proper graph construction. This would be useful in undergraduate curricula as well, as has been suggested by a continuing education approach for biologists teaching statistical concepts (Weissgerber et al., 2016 ).
Four main study design features bounded the scope of our conclusions. First, data were collected from students and professors at a single midwestern U.S. research university, which is a unique environment with its own curriculum and student population. Furthermore, our study consisted of a small group of participants, so the claims we present are not broad generalizations to the types of things that all professors or students do or think. However, many of our findings are consistent with and extend from previous work by others. To verify our findings fully, future work is needed at other types of institutions, in different disciplinary fields, and with their own unique participants to fully understand and appreciate the reasoning behind graph choice and construction.
Second, we provided all participants with a simple data set with one independent variable, one dependent variable, and two treatments with three replicates each. For our study to be replicated in a different disciplinary context, the bacteria and plant scenarios would need to be modified to fit the appropriate purpose, with data and experimental methods that conform to the disciplinary norms and practices. However, the simple data set did confirm some previous difficulties documented in the literature. UGNR4 and UGR3 showed difficulty with scaling axes (Figures 1–8 of the Supplemental Material; Padilla et al., 1986 ; Li and Shen, 1992 ; Brasell and Rowe, 1993 ; Ainley, 2000 ), as indicated by the awkward positioning of the axis breaks, and UGNR3 showed difficulty with variables, as indicated by the graph produced (Tairab and Al-Naqbi, 2004 ; Figures 1–8 of the Supplemental Material). However, the simplicity of the data set may have caused Professor 2 to go into “teacher mode” and quickly sketch the data to illustrate how temperature influences bacteria growth, instead of taking time to plot data.
Third, participants in our study were given a data set. Previous studies have shown that, when students use their own data to perform advanced tasks, they show deeper reasoning than when they use someone else’s data (Kanari and Millar, 2004 ). A future study can examine graph choice and construction with a more elaborate data set and with data the participants collected themselves in CUREs or inquiry lab classes or with data from simulations.
Finally, participants in this study constructed graphs manually using a LiveScribe pen and paper instead of the modern and conventional method of graph construction on the computer. Having participants narrate their thought processes during manual construction allowed us to fully understand their reasoning. If we had asked participants to construct graphs using software programs, that request might have tampered with their graph choice by biasing them toward graph choices presented by the software package. Manual construction allowed us to slow participants down and probe their graph-construction reasoning fully. We do acknowledge that biologists at all levels of expertise rarely construct graphs for formal presentation by hand. However, informal communication with peers during instruction often involves the generation of quick, sometimes simplified graphs (Roth and Bowen, 2003 ). We saw evidence of this with our professor population, one professor in particular studied the data table and then sketched the data with error bars to answer the research question quickly. With the data from our simple task, we can now move to more complex data sets and digital environments to further reveal areas of difficulties and competencies with graphing.
We thank Dr. Kathryn Obenchain for her qualitative research expertise and constructive feedback on early drafts of this article. We thank Ms. Janetta Greenwood for helping us decide on the plant and bacteria scenarios. We are indebted to all of the biology undergraduate students and professors who participated in this study. We also thank our research group, PIBERG, for their feedback on this project. This project emerged from ideas initiated within the Biology Scholars Research Residency program (S.M.G.). The interpretation of this work benefited from the ACE-Bio Network (NSF RCN-UBE 1346567).