Investigating Small-Group Cognitive Engagement in General Chemistry Learning Activities using Qualitative Content Analysis and the ICAP Framework

The level of students’ engagement during active learning activities conducted in small groups is important to understanding the effectiveness of these activities. The Interactive-Constructive-Active-Passive (ICAP) framework is a way to...


Introduction
Active learning (AL) strategies have been shown to enhance student success beyond traditional methods (Kuh et al., 2005;National Research Council, 2012;Freeman et al., 2014), often improving outcomes for students who have been historically underrepresented within science, technology, engineering, and mathematics (STEM) fields (Lorenzo et al., 2006;Haak et al., 2011;Eddy and Hogan, 2014). For these reasons, AL strategies have been at the center of national calls for the adoption of evidence-based instructional practices to transform education in STEM fields (National Research Council, 2012; President's Council of Advisors on Science and Technology (PCAST), 2012).
At the same time, evidence supporting the effectiveness of a given strategy can be inconsistent (e.g., Andrews et al., 2011) and simply adding AL strategies to a learning environment does not necessarily lead to the same performance outcomes across groups (Shortlidge et al., 2019). Likewise, a 2019 meta-analysis of peer-reviewed studies on the effectiveness of a wide range of AL strategies within chemistry found that the effect size of these practices varied widely, in some cases resulting in no positive impact (Rahman and Lewis, 2020). As Cooper (2016) points out, the umbrella of AL also covers a wide range of classroom practices, making it difficult to define what specific aspects of AL are effective and under what conditions such strategies work.
At a minimum, the effectiveness of any AL strategy depends on learners' meaningful cognitive engagement with the learning materials (Bonwell and Eison, 1991). While there is little dispute that learners benefit more from active compared to passive learning (Freeman et al., 2014), a broader hierarchy of cognitive engagement has been proposed (Chi, 2009;Chi and Wylie, 2014). The ICAP framework (Chi, 2009;Chi and Wylie, 2014) offers a way to understand the varied outcomes in AL through a hierarchy of four levels of cognitive engagement: Interactive, Constructive, Active, and Passive. In this framework, simply being Active is one of the lower levels of engagement and is less likely to foster students' understanding than the higher level Constructive or Interactive modes (Chi and Wylie, 2014). In the ICAP framework, students' level of cognitive engagement is evaluated based on their overt physical and verbal behaviors (Fig. 1). For example, behaviors related to receiving information, such as reading a text or listening to instructions, would indicate Passive engagement. Active engagement would involve physical manipulations of information while learning, such as highlighting or underlining text. During Constructive engagement, students would perform the same physical manipulations that occur in Active engagement; in addition, they would generate output beyond the information provided in the learning materials. Examples of Constructive engagement include summarizing a text or taking notes in one's own words. Similar to the Constructive mode, during Interactive engagement, students would generate new information; however, this generation would occur through dialoguing among students or between students and instructors.
Studies within the ICAP framework have operationalized cognitive engagement by observing students' physical behaviors (Villalta-Cerdas and Sandi-Urena, 2014;Wiggins et al., 2017), categorizing activities by their broad instructional design features (Wiggins et al., 2017;Henderson, 2019;Lim et al., 2019;Menekse and Chi, 2019), and analyzing student conversations (Chi, 2009;Menekse and Chi, 2019;Liyanage et al., 2021). Each of these approaches has strengths and weaknesses. ICAP studies that examine engagement in terms of students' physical behaviors have used large-scale observation of overt behaviors at regular intervals at a distance in order to capture whole-class data (i.e., an observer seated in the back of the room with a chart, such as the ''live coding'' used by Wiggins et al. (2017) or the observation procedures used in Villalta-Cerdas and Sandi-Urena (2014)). While this approach may be able to distinguish Passive engagement from higher ICAP levels, the differences between Active, Constructive, and Interactive engagement are difficult to tease out at this level of granularity. For example, if a student is writing something on a worksheet, this could be simply Active engagement if it involves identifying relevant information on a graph and recording the answer. However, if the student is making inferences based on trends observed in the same graph, this student would be engaging at a Constructive level. What students are saying while engaging in these physical behaviors is essential to determining what level of engagement they reflect.
ICAP studies that rely on the instructional design features of the activity as a whole are based on the idea that the structure of the activity itself will constrain the ways that students can engage with it. For example, to assess the impact of cognitive engagement on learning, Henderson (2019) used a series of instructional conditions designed to reflect various ICAP levels, in which a lecture-based condition was used for Passive engagement, an individual writing activity was used to elicit Constructive engagement, and a peer instruction format was used to prompt Interactive engagement. This focus on coding engagement based on instructional design features assumes, however, that all students in a group will engage at the same level throughout an activity and does not distinguish among the levels of cognitive engagement required for different types of questions or phases within an activity.
These assumptions merit greater scrutiny. Research has shown that the type of activity students participate in can affect the nature of their conversation when working in small groups (Young and Talanquer, 2013). These differences in group conversations may reflect different modes of engagement. A study on small group activities using Peer-Led Guided Inquiry (PLGI) found that students' construction of arguments varied based on the number of students participating . It is possible that these students were engaging at different levels. Variations in conversation may also be important in Process-Oriented Guided Inquiry Learning (POGIL) activities (Farrell et al., 1999;Hanson et al., 2018). Through the lens of the ICAP framework, not every part of an activity may elicit the same level of cognitive engagement. For example, POGIL activities involve a three-step learning cycle (Atkin and Karplus, 1962) where students first explore information provided in a model, then identify trends and patterns during the concept invention step, and finally apply the learned concept to new situations (Hanson et al., 2018). The direct questions about a model during the exploration stage of this cycle are meant to ensure that students understand the model on which later parts of the activity are based. In terms of the ICAP framework, many of these questions rely primarily on Active engagement because they ask students to identify and/or reflect on information in a model that is provided for them and do not require the generation of additional information. By contrast, questions from the concept invention and application stages are more likely to elicit Constructive or Interactive engagement because they require students to make inferences that go beyond the information provided in the original model. This type of variation might be expected in any type of scaffolded learning activity.
ICAP studies that examine student conversations have generally used discourse analysis as a means for understanding student engagement during AL activities. Discourse analysis examines texts and talk in context in order to understand participants' actions (Wood and Kroger, 2000), and in education research, discourse analysis focuses on the role of spoken language in teaching and learning (Cole et al., 2014). Discourse analysis research in chemistry education research has largely focused on patterns of interaction or argumentation in various Fig. 1 Modes of cognitive engagement (in bold) and characteristic behavior (in italics) according to the ICAP framework (Chi et al., 2018). instructional settings Xu and Talanquer, 2013;Young and Talanquer, 2013;Warfa et al., 2014;Current and Kowalske, 2016;Moon et al., 2016;Repice et al., 2016;Shultz and Li, 2016;Stanford et al., 2016;Dohrn and Dohn, 2018). The use of discourse analysis in ICAP studies both within and outside of chemistry education research has generally been oriented toward the coding of individual student conversational turns, for example, the frequency of specific discourse moves (e.g., claim, accept, oppose) (Menekse and Chi, 2019), or the frequency, distribution, and engagement level evident in student conversational turns during small-group discussions (Liyanage et al., 2021).
Discourse analysis can also be applied at a broader level, beyond individual turns. Because the highest two engagement levels outlined in the ICAP theory rely on distinctions that relate not just to what individual students are doing but to how they respond to one another during small-group conversations, coding longer exchanges is especially useful for distinguishing between Constructive and Interactive engagement. As noted above, there is a need to examine the extent to which actual student engagement in an activity matches the planned level of engagement based on the instructional design features of the activity itself. Therefore, using these ICAP levels as coding categories for both the activity design features and for students' observed engagement as evident in their conversations across different parts of an activity can provide a systematic way of investigating this alignment.
Whereas discourse analysis is useful in understanding how students interact with one another, an alternative method is needed to investigate what is being said, i.e., the content of the conversation. Qualitative content analysis (QCA) is well suited to filling this gap. QCA offers a method for systematically coding the content of textual data, whether verbal or written, to identify patterns (Schreier, 2012). QCA includes both deductive approaches (directed content analysis) and inductive approaches (conventional content analysis) (Hsieh and Shannon, 2005). Conventional content analysis can provide insights into phenomena that are not yet well described (Hsieh and Shannon, 2005). Because little research to date has explored the alignment between the instructional design features of individual parts of an activity and the actual level of engagement that they generate, an inductive approach is better suited to developing an understanding of instances where mismatches occur. Where mismatches between the planned and actual levels of engagement are found, conventional content analysis can be used to examine the content of students' discussions during these parts of an activity in order to identify patterns or themes that explain these mismatches. Therefore, conventional content analysis can be used to identify patterns as to which specific aspects of question design seem to foster higher or lower engagement across different groups as well as any other relevant themes that arise in students' conversations.

Research questions
The purpose of this study is to investigate cognitive engagement during small-group activities at the question level. To do so, we used qualitative content analysis and the ICAP framework to answer the following research questions.
(1) What range of engagement modes are expected during a general chemistry AL activity based on the question design?
(2) What range of engagement modes are observed during a general chemistry AL activity based on students' physical and verbal behaviors during group conversations?
(3) If mismatches occur between the expected and observed levels of cognitive engagement, what themes account for this mismatch?

Setting
Students from the first and second terms of a three-term General Chemistry sequence at Portland State University in the Pacific Northwest of the United States participated in this study. This course consisted of 20-30 students who were enrolled in the Honors College. Students in these courses come from a variety of STEM majors, including biology, chemistry, physics, and the pre-professional tracks, such as pre-medical and pre-dental. The first term occurred during fall quarter 2020, the second term occurred during winter quarter 2021, and the fall and winter term courses were taught by two different instructors. Classes met three times per week for 65 minutes and were conducted remotely through Zoom due to the COVID-19 pandemic. Each activity day began with a short lecture introducing the new material. Students were then placed in groups of 3-4 students in breakout rooms to work collaboratively on an activity worksheet. These groups remained consistent over the course of the term.
Activity worksheets were developed in house and structured using a format which included a model containing conceptual material followed by key questions, exercises, and problems. Key questions (KQ) generally asked about information explicitly presented in the model, providing an opportunity for students to gain familiarity with the content. Exercises (EX) included questions which required students to apply the content and infer an answer either conceptually or by performing a calculation. Problems (P) were similar to exercises but tended to be more complex, generally involving multiple steps or novel applications of the model content. The completed activity worksheets were turned in through the learning management system, and a nominal number of points were awarded for participation and attendance during the activity.

Data collection
Institutional Review Board (IRB) approval for this research study was received from Portland State University (HRRP# 2007004-18). Students were recruited at the beginning of each term by author S. Y. E. During the fall term, seven students consented to participate and were divided into two groups: Group A consisted of four students and Group B consisted of three students (Table 1). Three students from the fall also consented to participate during winter term and formed a new group: Group C. All student names reported in this manuscript are pseudonyms.
Three activities were observed during fall term and one activity was observed during winter term. The activities during the fall were evenly spaced, with the first one covering the concepts of mole and molar mass occurring near the beginning of the term, the second one covering concepts involving solutions and dilutions occurring near the midway point of the term, and the third activity covering electronegativity and polarity occurring near the end of the term. During winter term, the single activity occurred near the beginning of the term and covered concepts surrounding thermal energy and calorimetry. Each breakout room session was audio and video recorded. These recordings were transcribed verbatim by a transcription service. Transcripts were then reviewed and edited as needed by author S. Y. E. and pertinent physical actions from the participants (e.g., nod of agreement) were added to the transcripts. Unclear conversation was denoted by [XXX] in the transcripts.

Data analysis
Most of the prior work done using ICAP to investigate engagement during group activities assumed a single engagement mode over an entire activity (Menekse et al., 2013;Wiggins et al., 2017;Henderson, 2019). As these activities may contain different types of questions, this assumption may not be correct. Therefore, for the four activities observed, a finer grain size was used. The unit of analysis was each question within an activity. At this level of analysis, each question was first coded according to the ICAP framework, where the intended engagement mode of students was identified based on the question design.
Previous work investigating group conversations using ICAP looked at quantitative measures such as frequency of conversational turns or discourse moves (Wiggins et al., 2017;Menekse and Chi, 2019); however, this type of analysis does not provide insight into the relation between the group conversation and the question design. To address this gap, a second round of coding applied the ICAP framework to the group responses to each question in an activity. Each group's response to a question was coded based on the content of the conversation and the definition of each of the ICAP modes. The codebook for both types of coding is presented in Table 2. Each question and group response in the transcripts was coded deductively based on features of the levels of engagement outlined in the ICAP framework (Chi, 2009;Chi and Wylie, 2014).
Question coding. Three of the four engagement modes of ICAP ( Fig. 1) were applied to each question in an activity (Table 2). For multi-part questions, each part was assigned a separate code. Passive engagement was not used to code questions because the questions were designed to be used in a group activity with the intent for students to engage actively at a minimum. Questions were coded as Active (A) if the information to answer the question could be found in the presented materials; it was assumed that students would use this information in their response. For the higher engagement modes (i.e., Constructive and Interactive), the difference between these modes is determined by whether the generation of new information occurs through dialogue. Since it is not possible to distinguish this difference based on the structure of the questions alone, Constructive and Interactive engagement were collapsed into a single code, Constructive/Interactive (C/I).
Group response coding. Each group response to a question (or part of a question, for multi-part problems) was coded separately, resulting in a response code for each question answered in each activity. Passive engagement was not used as a code because by virtue of conversation simply occurring, students were manipulating information, and therefore, the lowest mode of engagement students could participate in at the whole-group level would be Active. Although it is possible for individual students to be engaging passively, the group response code was based on the conversation that occurred among all group members. The response was coded as Active (A) if the students in a group explicitly referred to the information presented in the activity in their response. The Constructive (C) code was defined by the conversation generating new information to respond to the question; this new information was generated by a single student. Conversation may still occur between students with other students agreeing with the student generating information; however, this type of dialogue does not constitute co-generation of information and therefore would still be coded as Constructive. This contrasts with the  Table 2 Codebook for question and group response codes Question codes Active (A) Information to answer the question can be found in the provided materials Constructive/Interactive (C/I) New information needs to be generated to answer the question prompt Group response codes Active (A) Conversation reflects that an answer was taken from information provided Constructive (C) One person provides the answer, generating new information. Can include forms of agreement from other group members (e.g., head nods, ''yeah'', ''uh-huh'', etc.) Interactive (I) Participants generate information to answer the question based on one another's responses. Other participants' contributions of off-topic talk or forms of agreement are not included in this code Interactive (I) code, where new information is generated through dialogue between two or more students. During the dialogue, each student contributed new information and each contribution built upon information previously generated in the conversation.
Mismatch between question and group response codes. Across all four activities and three groups, group responses were observed, coded, and compared to the corresponding question code. When the question code and the group response code were not the same, this was identified as an instance of mismatch. Since Constructive and Interactive engagement were a single code (i.e., C/I) for the questions, if the corresponding group response was coded as Constructive or Interactive, either of these was considered a match. For each case of mismatch, the group conversation was examined inductively using conventional content analysis (Hsieh and Shannon, 2005) to determine if there were any themes that may explain the cause of the mismatch. To identify potential causes, each question and group response showing mismatch was read by two researchers. The researchers then independently identified specific phrases which were thought to contribute to the cause of the mismatch. The researchers then discussed these mismatch causes and combined common causes into themes.
Trustworthiness. Trustworthiness of the findings in this study was established through the evaluation of quality criteria such as qualitative reliability and credibility (Korstjens and Moser, 2018;O'Connor and Joffe, 2020). To enhance reliability in coding the questions and responses, a secondary coder was employed to evaluate the application of the codes in a two-stage process. The author S. Y. E. developed the codebook (Table 2), and both author S. Y. E. and the secondary coder first each individually coded each question and group response in a single activity. The coders met, discussed and resolved differences in coding, and came to consensus. Through the discussion to achieve consensus, the coders agreed that no modifications to the codebook were needed. The two coders then coded all the questions and group responses across the remaining activities. Inter-rater reliability (IRR) at each stage was evaluated by calculating Cohen's kappa (Cohen, 1960). During the first stage, the IRR values for question and group response coding of the single activity were 0.88 and 0.56, respectively. The IRR values for the subsequent question and group response coding across all remaining activities during the second stage were 1.00 and 0.99, respectively. Kappa values greater than 0.8 are generally considered to have good reliability (Landis and Koch, 1977). For the identification of themes related to mismatched engagement levels between the questions and group responses, investigator triangulation (Lincoln and Guba, 1985) was used to establish credibility. Two of the authors (S. Y. E. and A. J. H.) used conventional content analysis to identify patterns in the transcripts and worked together to combine these patterns into themes.

Question coding
Questions were coded as either Active (A) or Constructive/ Interactive (C/I) based on how the information to answer would be derived (Table 2). Fig. 2 presents a portion of the model from the Solutions and Dilutions (SD) activity.
For example, Key Question 6 from the Solutions and Dilutions activity (SD-KQ6) was coded as Active because the information in the model (Fig. 2) explicitly states the required information in the text blurb and in the equation in the gray box at the top of the table.
(SD-KQ6) When making a dilute solution, which of the following remains constant? (i) The concentration (ii) The moles of solute (iii) The volume of the solution.
However, Key Question 9 from the same activity (SD-KQ9) asks students to provide an algebraic expression for M D (i.e., the molarity of the dilute solution). Since this question asks students to manipulate the equation in the model (Fig. 2), they would be generating new information. Therefore, SD-KQ9 was coded as a Constructive/Interactive question. (SD-KQ9) In preparing for an experiment, you need to know what the concentration of a dilute solution (M D ) will be. Provide an algebraic solution using the relation in the model for this concentration.
In total, 68 questions were coded across the four activities (Table 3). Since the groups did not complete the activities in their entirety during the time allotted, the data includes only those activity questions which had a corresponding group response. Additionally, questions which were answered by both Groups A and B were counted only once. The overall results show that 13 questions were Active and 55 questions were Constructive/Interactive. In general, the majority of questions (81%) were Constructive/Interactive questions across all activities. Table 3 shows that within the different activities, the percentage of questions coded as Active can vary, consisting of up to around one quarter of the total coded questions. Such variation was not captured in previous studies which coded at the activity level (Menekse et al., 2013;Wiggins et al., 2017;Henderson, 2019).

Group response coding
Group responses were coded as Active (A), Constructive (C), or Interactive (I) based on if more than one student contributed to the answer and whether their response(s): (1) generated new information, and (2) involved students building upon each other's statements to develop a final answer. In the conversation excerpts that follow, line numbers are used to allow for easy identification of pertinent portions of the text, information in parentheses refers to non-verbal actions, and information in square brackets has been added to the transcripts for clarity.
Excerpt 1 illustrates a group response that was coded as Active. In this excerpt, members of Group A are responding to SD-KQ6. Beth's comment (line 261) mentions looking at the equation which is a reference to the model (Fig. 2); therefore, this group response was coded as Active. Excerpt 2, on the other hand, illustrates a group response that was coded as Constructive. This excerpt focuses on Group C's response to Key Question 3 from the Thermal Energy and Calorimetry activity (TEC-KQ3), where students are asked to explain the difference in heat capacity between two blocks. Fig. 3 presents a portion of the model from the Thermal Energy and Calorimetry (TEC) activity.
In Excerpt 2, Helen provides the answer to the question associated with this portion of the model (line 52), and the contributions from Nani and Grace are forms of agreement (lines 53 and 54). Therefore, Helen is the only student generating new information and this group response was coded as Constructive.
(TEC-KQ3) How does the difference in specific heat capacity between blocks 2 and 3 relate to their final temperature? Briefly explain.
Excerpt 2: Group response to TEC-KQ3, coded as Constructive 51 GRACE: So, ''How does the difference in specific heat capacity between blocks two and three relate to their final temperature?'' 52 HELEN: So it, it's the same as mass, right? So, like a greater specific heat capacity will result in a lower final temperature. 53 GRACE: Yeah. 54 NANI: (nods). 55 HELEN: So, so block two will have a greater final temperature. 56 GRACE: Mm hmm. Excerpt 3 gives an example where the coding of the group response was ambiguous. In this excerpt, students from Group A respond to Key Question 7 from the Solutions and Dilutions activity . Although the answer is present in the model (Fig. 2), and Katie gives the correct answer (line 271), it is unclear from the conversation whether Katie's response was based on the information in the model (Active) or she generated new knowledge (Constructive). In the absence of evidence that the response came from the model, it was assumed that she generated new knowledge and the group response was coded as Constructive.
(SD-KQ7) When making a dilute solution, which of the following decreases? Circle your response. (i) The concentration (ii) The moles of solute (iii) The volume of the solution  (Table 4). Groups A and B have a different number of response codes for each activity because they moved at different speeds and therefore did not answer the same number of questions. As with the question coding, since students did not complete the activities during the time allotted, coded responses are only for completed questions, not all questions in the activity. Overall, group responses were distributed across the three engagement modes with 8 responses coded as Active, 32 responses coded as Constructive, and 61 responses coded as Interactive. Results indicate that Interactive group responses ranged from 64% to 87% for Group A and from 39% to 77% for Group B across the Mole and Molar Mass, Solutions and Dilutions, and Electronegativity and Polarity activities. Only Group C completed the Thermal Energy and Calorimetry activity, and only 58% of their responses during this activity reached the level of Interactive engagement. Overall, observed engagement levels across groups and across questions within an activity varied widely.

Matches between question and group response codes
A total of 68 questions (Table 3) and 101 group responses (Table 4) were coded across the three groups and four activities. We began the comparison between coding groups by examining the questions coded as Constructive/Interactive and their corresponding group responses. Table 5 shows the breakdown of the frequency of Constructive/Interactive coded questions by activity and group. It also shows how the group responses were distributed across the Constructive and Interactive codes. These results indicate that when the question was coded as Constructive/Interactive, all the group response codes were either Constructive or Interactive, indicating a match with this question code but different levels of engagement. Across all groups and activities, the portion of group responses coded as Interactive ranged from 40% to 90%. In total, just over twothirds of the responses were coded at the level of Interactive engagement.
In addition to variation in response coding seen across activities, variation was also observed across groups (Table 6). For groups A and B, who completed the same three activities, several of the response codes differed across the two groups on the questions that both groups completed. For example, Table 6 Table 4 Frequency of group response codes by activity and group. Percentages of group response codes by activity and group are given in parentheses   (67) shows that on the 8 completed questions coded as Constructive/Interactive in the Mole and Molar Mass activity, the responses of groups A and B only overlapped on 6 question responses, all coded as Interactive. The fewest matches between groups were observed on the 11 Electronegativity and Polarity questions, with only 5 of the response codes matching. Upon comparison of question codes to the response codes of each group, mismatches were found exclusively in questions coded as Active. A breakdown of the frequency of questions and group responses coded as Active is shown in Table 7. While 19 total questions were coded as Active, only 8 responses were also coded as Active, a 42% match. This means that more than half of the questions coded as Active had a mismatch with their corresponding group response codes, where students were responding at a higher engagement mode than was indicated by the question design. Among the 11 Active questions which showed a higher group response engagement mode, the responses split almost evenly between Constructive (5) and Interactive (6) engagement.
To further investigate these mismatches, conventional content analysis was used to identify the potential causes by examining each mismatched question and group response for specific phrases that identified the source of the mismatch. Causes were then collected into common themes. Table 8 summarizes these results. Each of the questions in these mismatched cases was coded as Active because the information to answer the question was explicitly available in the activity.

Themes relating to mismatch
Conventional content analysis was used to investigate each of the group responses for details that explain the higher level of engagement displayed by the conversation compared to the question. The analysis suggested three possible themes: model use, unfamiliar vocabulary, and molecular representations. Although Key Question 7 from the Solutions and Dilutions activity (SD-KQ7) and Key Question 4 from the Electronegativity and Polarity activity (EP-KQ4) showed a mismatch, our inductive analysis did not suggest that the cause of mismatch in these cases falls into one of the identified themes. The group responses on these items were deemed to be ambiguous because it was not clear from the conversation if the students' response was taken from the activity material.
Theme 1: Model use. Three of the 11 instances of mismatch were due to improper model use. These cases occurred during the Thermal Energy and Calorimetry (TEC) and the Mole and Molar Mass (MM) activities. Because the answers to these questions were explicitly stated in the model, it was expected that the students would use the model to answer these questions, and that the group conversation would show evidence of this.
For example, in Excerpt 5, Group C responds to Key Questions 4 and 5 from the Thermal Energy and Calorimetry activity (TEC-KQ4 and TEC-KQ5). Since the answers to both these questions are explicitly stated in the model (Fig. 3), these  a Groups A and B have different numbers of active questions because Key Questions 1-4 were assigned prior to class, and Group A did not discuss them while Group B went over them as a group before proceeding.

View Article Online
questions are coded as Active. Although the group response to TEC-KQ4 did refer to the model and was coded as Active, the response was incomplete. The correct response should have included DT and q, but Helen and Grace used the model to decide that the answer should only include DT (lines 68-71). Because of this incomplete use of the model, Helen and Grace engaged interactively to answer the next question in the activity, TEC-KQ5, which built upon the aspects of the model highlighted in TEC-KQ4. This interaction starts from line 72 and Grace's realization that they need two variables. From there, Helen builds upon this, suggesting the two variables are T i and T f (line 73).
Although the final answer they come to is incorrect, one can see that it is the incomplete use of the model in TEC-KQ4 which prompts the Interactive engagement in TEC-KQ5. TEC-KQ4: When mathematically determining q, which variables can be positive or negative? TEC-KQ5: How are the two variables in KQ4 related? if the final's higher than the initial, then you get a positive number. If the initial's higher than the final, you get a negative number. So I suppose that's how it's related...right. Theme 2: Unfamiliar vocabulary. Two of the 11 instances of mismatch involved students' use of unfamiliar vocabulary, specifically the scientific term ''aliquot'' in Key Question 8 of the Solutions and Dilutions activity (SD-KQ8). Although this question is coded as Active because the information to answer the question is explicit in the model (Fig. 2), responses from both Groups A and B display a higher mode of engagement due to unfamiliarity with the term ''aliquot''. For example, in Excerpt 6, the higher engagement mode of Group B's response is prompted by Helen's question about the meaning of ''aliquot.'' Jacob responds and Grace looks up the definition, ostensibly on Google (lines 166-168). It is evident that the Interactive engagement resulted from unfamiliarity with the term ''aliquot''.
(SD-KQ8) In a dilution, which is always larger? Circle your response. (i) The volume of the aliquot (ii) The volume of the final solution.
Excerpt 6: Example of unfamiliar vocabulary 166 HELEN: I know it's the second one, but what exactly is the ali-aliquot? Cause I know [XXX] fairly small, so small sample or whatever. 167 JACOB: I guess the aliquot would be, do you think it would be the given volume? 168 GRACE: I'm just looking it up. 169 JACOB: Fair enough. 170 HELEN: What does Google say? 171 GRACE: A portion of a larger whole, a specific sample taken for chemical analysis or other treatment. I think it's like a portion of the sample. So the portion is obviously going to have less. 172 JACOB: So in a dilution, which is yeah, the volume of final solution will be larger. Theme 3: Molecular representations. Communicating complex scientific ideas is dependent on using multiple ''languages of science'', which may include symbolic, graphical, or mathematical representations (Osborne, 2010). Four of the 11 instances of mismatch involved students' struggles in moving between different representations in the Electronegativity and Polarity activity. Fig. 4 depicts a portion of the model from this activity.
Key Question 8 from the Electronegativity and Polarity activity (EP-KQ8) asks students to explain why DL 2 is a polar molecule. Since this information is depicted in the model (Fig. 4), this question is coded as Active. Students in Groups A and B seemed to have difficulty moving between the Lewis structure representation and bond dipole representation of molecules. In Excerpt 7, the Interactive engagement of Group A is prompted by Beth asking about the number of arrows that  While the Interactive engagement in Excerpt 7 was prompted by difficulty in translating between the Lewis structure and the representation depicting bond dipoles, in Excerpt 8, we see a desire to understand more deeply the role of specific features of the Lewis structure (i.e., lone pairs of electrons) in the dipole representation is the trigger for the Interactive engagement. In Excerpt 8, Group B engages interactively to try to gain a deeper understanding of what the vector model of dipoles represents. Their response to the same question begins with a discussion of the Lewis structure to identify the molecular geometry (lines 368-371). From there, they reference the model to determine how to draw the components of the bond dipoles (lines 373-386). Lines 387-396 show the group generating new information as they attempt to make the connection between the lone pairs of electrons in the Lewis structure and the bond dipoles. In lines 385 and 386, both Helen and Jacob directly refer to Fig. 4 in the model, stating that the answer is there (Active engagement). However, Grace's desire to understand how the lone pairs fit into the vector representation causes the group to engage at the higher Interactive mode (lines 387 and 393). In both groups' conversations, it is apparent that the students attempting to move from the Lewis structure representation of the molecule to the vector model of bond dipoles is the trigger for the higher mode of engagement.
Excerpt 8: Example of molecular representations (Group B) 368 GRACE: Oh, and this one has lone pairs. What kind of structure does that make? 369 JACOB: The chart's...DL2, lone pairs. 370 JACOB: It's bent.

Conclusion
Previous studies using the ICAP framework of cognitive engagement to investigate active learning environments assumed a single engagement mode for the entire activity (Wiggins et al., 2017;Henderson, 2019). However, the data examined above suggest that students may engage differently with different parts of an activity. In addition, some studies have also assumed an engagement mode based on the activity design instead of overt student behavior (Menekse et al., 2013;Wiggins et al., 2017). ICAP identifies engagement modes based on student behavior, and as seen above, it may not be accurate to assume the expected engagement mode based on activity design would be the same as the observed engagement mode based on student behaviors. To address these concerns, we used ICAP to investigate cognitive engagement of student groups during AL activities in answering the following research questions. RQ1: What range of engagement modes are expected during a general chemistry AL activity based on the question design? This study used a finer grain size, i.e., identifying engagement modes at the question level rather than the activity level. Results indicated that across the four activities observed, the majority of questions (81%) were designed to elicit Constructive or Interactive engagement. Investigation at this finer grain size confirms that not all questions were designed with the same mode of engagement in mind, and therefore studies which assume a single engagement mode for the entire activity may miss insights that can be seen when looking at engagement at the question level. RQ2: What range of engagement modes are observed during a general chemistry AL activity based on students' physical and verbal behaviors during group conversations?
The study also identified observed engagement modes of student groups by using ICAP to examine group conversations. Results indicated that within a single activity, the engagement of the group based on their conversation varied from Active to Interactive, with the majority of the group responses (60%) showing Interactive engagement. Additionally, within each group, the percentage of Interactive responses was not consistent across all activities (64-88% for Group A; 39-77% for Group B). These results provide further evidence that coding engagement at the question level for both questions and responses can give insight into students' engagement which is lost when coding at the activity level. RQ3: If mismatches occur between the expected and observed levels of cognitive engagement, what themes account for this mismatch?
By comparing the expected engagement mode based on the question design with the observed engagement mode based on the group responses, cases of mismatch were identified. The group conversations were then further investigated using qualitative content analysis for common themes that caused the mismatches. Results suggested that the causes of the higher than expected observed engagement levels were related to three themes: model use, unfamiliar vocabulary, and struggles with different molecular representations.

Limitations
Due to the small sample size used in this study, these results are not generalizable to large populations. Additional studies are being conducted in author Barbera's research group to provide more generalizable insights into students' engagement in small group learning activities. Since the observed groups were recorded through Zoom, we were unable to see what students were writing unless papers were held up to the camera. Because of this limitation, engagement modes of groups were based solely on the group conversation. However, being able to see what students were writing on their worksheets could have provided additional insight into their cognitive engagement. Future data collections will take place in person and will be able to account for these actions. Finally, the coding of activity questions according to ICAP was based solely on design features present in each question and not explicitly on any stated intention on the part of the activity designers. Therefore, although the activity questions may have been written to elicit a specific type of thinking or engagement on the part of the students, the questions could only be coded based on specific features that were present in the questions themselves.

Implications for instructors
Results of this study showed that there were multiple instances of Constructive or Interactive engagement occurring in Key Questions where Active engagement was expected. Incomplete or lack of model use was one reason for this. In some cases, this resulted in students engaging at a higher level but obtaining an incorrect answer. While many instructors discuss the structure of and expectations for these types of learning activities at the start of a term, we would suggest that instructors regularly remind students to read through the model prior to answering any questions in the worksheet and to refer back to it in their responses. This would reinforce the purpose of the models and may focus the groups' conversations on the data and details within the materials.
Use of new and potentially unfamiliar scientific terms can possibly promote students' curiosity and potentially lead to higher modes of engagement. This idea was supported in this study where use of the unfamiliar term ''aliquot'' resulted in more conversation and a higher engagement mode. Although there is the danger that discussion of such vocabulary could result in unhelpful, tangential conversations, group discussions around the term ''aliquot'' seemed to help students reason out an answer to the question. In addition, learning relevant new vocabulary is essential to students' growth as scientists. Therefore, use of unfamiliar vocabulary that is relevant to the concept being taught can be a useful tool to promote student learning.
It should be noted that although ICAP states that cognitive engagement increases as one moves from Passive to Active to Constructive to Interactive, it should not be inferred that Interactive engagement is always the most desirable. As shown in this study, these higher than expected modes were due to a variety of factors that could provide insight to future improvements in the activities or instructional practices. Worksheets for these activities were structured such that students begin with Key Questions which are designed to orient students to the

View Article Online
pertinent information in the model (i.e., Active engagement), followed by Exercises and Problems, which allow students to manipulate and apply the information in a more advanced manner (i.e., Constructive or Interactive engagement). By scaffolding worksheets in such a manner, students use knowledge gained at the lower engagement modes to foster a deeper understanding during the more complex Exercises and Problems.

Implications for research
Investigation of student conversations using qualitative content analysis has opened avenues of further exploration. While this study looked at the engagement mode of the group as a whole, it is apparent that not all participants within a group are engaging to the same degree. For example, in Group A, Nani was a very quiet student who rarely contributed to conversations but was always writing on her worksheet and nodding along with other students' statements. Exploring the individual students' engagement could provide insight into how a student's engagement correlates with learning outcomes. Other factors such as group dynamics and how these dynamics change over time may also be understood by analyzing the engagement of each individual. In addition, further exploration into the root causes of the identified mismatch themes can be explored. For example, the unfamiliar vocabulary theme could be due to differences in prior knowledge that students bring to the activity. Research in this area could increase understanding of how prior knowledge affects students' engagement in smallgroup activities.

Conflicts of interest
There are no conflicts of interest to declare.