A resource for understanding and evaluating outcomes of undergraduate field experiences

Abstract Undergraduate field experiences (UFEs) are a prominent element of science education across many disciplines; however, empirical data regarding the outcomes are often limited. UFEs are unique in that they typically take place in a field setting, are often interdisciplinary, and include diverse students. UFEs range from courses, to field trips, to residential research experiences, and thereby have the potential to yield a plethora of outcomes for undergraduate participants. The UFE community has expressed interest in better understanding how to assess the outcomes of UFEs. In response, we developed a guide for practitioners to use when assessing their UFE that promotes an evidence‐based, systematic, iterative approach. This essay guides practitioners through the steps of: identifying intended UFE outcomes, considering contextual factors, determining an assessment approach, and using the information gained to inform next steps. We provide a table of common learning outcomes with aligned assessment tools, and vignettes to illustrate using the assessment guide. We aim to support comprehensive, informed assessment of UFEs, thus leading to more inclusive and reflective UFE design, and ultimately improved student outcomes. We urge practitioners to move toward evidence‐based advocacy for continued support of UFEs.


| Background
Conducting research, collecting data, and teaching students outside of a laboratory or classroom setting are commonplace across disciplines. For many scientists, being "in the field" is paramount to the work that they do (Cutter, 1993;Rudwick, 1996;Wilson, 1982). Therefore, in numerous disciplines, engaging undergraduates in experiences that take place in the field ais not only expected and intuitive (Dressen, 2002), but also considered central to training goals (Fleischner et al., 2017;Giles et al., 2020;Gold et al., 1994). For the purposes of this paper, we borrow from the work of colleagues (Fleischner et al., 2017;Morales et al., 2020;O'Connell et al., 2021) to define what we are considering to be a UFE. UFEs are designed explicitly with student learning in mind and occur in a field setting where students engage with the natural world, or through a virtual experience, meant to mimic an experience in the field. UFEs can take place in a variety of settings and durations including immersive, residential courses or programs at field stations and marine laboratories, short field trips as part of traditional on-campus university courses, or long, multi-day field trips. The COVID-19 pandemic has further encouraged the development of remote UFEs and challenged us to reflect on how lessons in field education design might apply beyond in-person settings (e.g., Barton, 2020). The discussion that follows largely applies to in-person as well as remote UFEs. Further, we are not limiting our discussion of UFEs to a few prominent disciplines, as we are aware of the wide-range of UFEs, and aim to be inclusive.
Some have argued that a student's undergraduate experience in disciplines such as biology, ecology, and the geosciences is not complete without a UFE (Cutter, 1993;Klemow et al., 2019;Nairn, 1999;Petcovic et al., 2014). A survey of participants at the Geological Society of America meetings (2010 and 2011) showed that the majority (89%) of survey participants felt that field experiences were vital to geoscience education and that the bulk of the value lies in cognitive gains, and to a lesser degree, sustained interest in the field (Petcovic et al., 2014). The Governing Board of the Ecological Society of America showed strong support of UFEs by including fieldwork and the ability to apply natural history approaches as two of the ecology practices in the recently adopted Four-Dimensional Ecology Education Framework (Klemow et al., 2019).
Participating in a UFE can spark students' interest in the scientific topic being explored in the field (Dayton & Sala, 2001;LaDue & Pacheco, 2013;Petcovic et al., 2014), increase student cognitive gains in disciplinary content (Easton & Gilburn, 2012;Scott et al., 2012), improve student understanding of the process of science (Patrick, 2010), foster development of discipline-specific technical skills (Peasland et al., 2019), and increase persistence in STEM fields (Jelks & Crain, 2020). UFEs can also have far-reaching impacts, even changing the trajectory of students' lives by influencing career choices, or solidifying long-term commitments to the environment (Barker et al., 2002;Palmer & Suggate, 1996). UFEs have been identified as critical contributors to students' development of a sense of place (Billick & Price, 2010;Semken, 2005;Semken et al., 2017;Van Der Hoeven Kraft et al., 2011) as well as fostering a resonance with Indigenous peoples and Traditional Ecological Knowledge (Cajete, 2000;Riggs, 2005).
Despite these key outcomes, some have voiced fears about field experiences going "extinct" and have sounded alarm bells for stakeholders to consider how to gain further support for such experiences (Barker et al., 2002;Swing et al., 2021;Whitmeyer et al., 2009a).
There is a widespread occurrence of, and in many cases, fervent advocacy for undergraduates learning in the field. Yet, there is a lack of systematically collected data on specific outcomes resulting from the diversity of possible field experiences (Mogk & Goodwin, 2012).
Practitioners (field instructors, directors, coordinators, and staff) want to understand the efficacy of their individual programs, while universities and funding agencies require evidence of success for continued support of undergraduate field programs. Stakeholders across disciplines have made it clear that more empirical studies that test claims of positive student outcomes are needed for continued support of UFEs (Clift & Brady, 2005;NRC, 2014;O'Connell et al., 2018;Smith, 2004). This is particularly true as it relates to improving equity, access, and inclusion in the field (NRC, 2003, Brewer & Smith, 2011Wieman, 2012;Morales et al., 2020). Collecting evidence of student outcomes will help to identify opportunities and challenges for supporting the inclusion of all students in UFEs and aid in tackling some of the challenges with inclusion that we already know exist .
Practitioners report an interest in collecting evidence of outcomes from their UFEs for iterative improvement, to demonstrate value of their programs, and to contribute to broader understanding of field learning, but do not feel confident in their ability to measure student outcomes, given that it is not their expertise (O'Connell et al., 2020). Indeed, most of the studies that have measured outcomes from UFEs are conducted by education researchers, trained in quantitative and/or qualitative research methods. To meet practitioners where they are, and support mindful, efficacious assessment of UFEs, we: (1) present a resource for practitioners to use when they want to assess UFE outcomes and improve their programs and courses, (2) address how assessment and evaluation of UFE outcomes can help practitioners better design inclusive field experiences, and (3) identify an existing pool of instruments that align with intended student outcomes of UFEs.

| Conceptualization of this paper
The authors of this paper are members and founders of the Undergraduate Field Experiences Research Network (UFERN; www.ufern.net), a NSF-funded Research Coordination Network focused on fostering effective UFEs. UFERN brings together diverse perspectives and expertise to examine the potentially distinctive learning and personal growth that happens for students when they engage in UFEs across the range of disciplines and formats. During a UFERN meeting (2019), it became apparent that undergraduate field educators from across disciplines were frequently requesting help in how to collect empirical evidence about complex student outcomes from UFEs (O'Connell et al., 2020). The work presented here emerged from conversations at that UFERN meeting and is a collaboration between STEM education researchers, social scientists, and undergraduate field educators from multiple disciplines, to directly address calls for guidance on assessing UFEs.

| Strategies for assessing UFEs
We advocate that stakeholders work to understand and evaluate their UFEs or UFE programs in clear alignment with the unique goals of each individual field experience. Reflecting best practices in designing learning environments that support student gains, we draw from the process described as "backwards design" (Wiggins et al., 1998). Importantly, this method emphasizes the alignment of UFE design to the outcomes being measured. We build from a "how to" guide designed for assessing course-based undergraduate research experiences (CUREs) presented by Shortlidge and Brownell (2016) and have expanded and tailored the guide to be specific to UFEs. Figure 1 is to be used as a guide and a mechanism for reflection, allowing practitioners to refine a UFE to better serve the students, meet the intended outcomes, and/or change and build upon data collection methods already in place.
We provide guide that is inclusive to those who intend to assess, evaluate, and/or conduct education research on UFEs, and therefore will describe how these are separate but interrelated and likely overlapping actions. In order to clarify potential misunderstandings, we explain the language that we use regarding assessment, evaluation, and research.
We use the word assessment when we are referring to measuring student learning outcomes from UFEs. Assessment tools refer to the instruments that are used to collect the outcome data (e.g., a survey, rubric, or essay). Assessment may be qualitative (e.g., interviews), quantitative (e.g., surveys), or a mix of approaches (Creswell, 2013).
A programmatic evaluation might aim to holistically understand the experience that all or individual stakeholders have in a UFE; the evaluation could include students, instructors, program directors, and community partners. To evaluate something is to determine its merit, value, or significance (Patton, 2008), and program evaluation has been described as "the systematic assessment of the operation and/or outcomes of a program or policy, compared to a set of explicit or implicit standards as a means of contributing to the improvement of the program or policy" (Shackman, 2008). Thus, an evaluation of a UFE would determine the appropriate assessment methodology and identify whether programmatic goals are being met. Such information can inform how a UFE can be improved. Evaluation is often conducted by an external evaluator who may work with the UFE leadership team to develop a plan, often through the creation and use of a site-specific logic model (Taylor-Powell & Henert, 2008). An evaluation may target a range of UFEs, from a singular disciplinary program, or an entire field station's season of hosted UFEs.
The collection of empirical evidence about a UFE, which can be gathered through assessment and evaluation, and adds new knowledge, could potentially be used for education research. The authors Towne & Shavelson state that: "…education research serves two related purposes: to add to fundamental understanding of educationrelated phenomena and events, and to inform practical decision making… both require researchers to have a keen understanding of educational practice and policy, and both can ultimately lead to improvements in practice." (Towne & Shavelson, 2002, p. 83).
If the aim is to publish research outcomes from a UFE, practitioners will likely need to submit a proposal to an Institutional Review Board (IRB). The IRB can then determine whether a human subjects' research exemption or expedition protocol will be necessary. If an IRB protocol is needed, this should occur before data collection begins. Gaining IRB approval is contingent on researchers having been certified in human subjects' research and a robust and detailed research plan that follows human subjects' research guidelines. Thus, conducting education research on UFEs requires advance planning and ideally would be conducted in partnership with or with advisement from education researchers. Typically, if a study is IRB approved, participants of the study need to consent to their information to be used for research purposes.
Publishing outcomes may be desirable, but not all data will be collected in a way that yields publishable results, yet those results may be highly informative to practitioners and UFE programs. Designing effective formative assessments to understand and modify a UFE might be the most appropriate workflow before F I G U R E 1 Guide for Assessing Undergraduate Field Experiences (UFEs). The figure presents a guide to walk practitioners through assessing their UFE. The green arrows signify that each box informs the other, and iterative reflection and refinement are a key aspect of informed evaluation and assessment engaging in intentional research studies on the outcomes of a UFE. Importantly, we do not advocate that one method is better, or more or less appropriate than another; the approach should depend on the aims and intentions of the stakeholders and the resources available.

| Identify the intended outcomes from the UFE
The main focus of this work is to provide the tools and resources needed such that stakeholders can confidently assess whether students are meeting expected learning outcomes from UFEs. Such learning outcomes could be: students expand their knowledge of endemic amphibians, or students report an increased interest in environmental sustainability. Programmatic outcomes and goals (e.g., participants are involved in community engagement and scientific knowledge-building activities) are also critical components of this type of learning environment, and thus are also represented in example vignettes ( Figure 2).
We draw upon Bloom's Taxonomy of Learning (Anderson et al., 2001;Bloom & Krathwohl, 1966)  interactions, accurately identify geological formations, or solve a problem using an interdisciplinary lens (Bauerle & Park, 2012;Fuller et al., 2006;Tripp et al., 2020). Affective outcomes could include: a newfound interest in a subject, such as conservation; motivation to continue seeking out field learning experiences; or, development of a connection to place (Boyle et al., 2007;Scott et al., 2019;Simm & Marvell, 2015). Outcomes in the psychomotor domain could include: the improved ability to geolocate, collect and measure sediment in a lake with the appropriate instrumentation and accuracy, or use established methodology to sample stream invertebrates (Arthurs, 2019;Scott et al., 2012). In addition to considering these three fundamental learning domains, UFEs may promote student outcomes that cross domains and/ or enter the social realm, such as developing communication skills (Bell & Anscombe, 2013), building friendships and collaborations (Jolley et al., 2019;Stokes & Boyle, 2009), or developing a sense of belonging to a discipline or place (Kortz et al., 2020;Malm et al., 2020;O'Brien et al., 2020). Lastly, students participating in UFEs could result in broader, societal level outcomes, such as: students pursuing conservation efforts; contributing to citizen science projects; increased awareness of social justice issues; or support for sustainability efforts (Bell & Anscombe, 2013;Ginwright & Cammarota, 2015;Grimberg et al., 2008).
In Table 1, we present a list of common intended student outcomes from UFEs. The list of outcomes was propagated by UFE practitioners, first identified from a UFERN landscape study (O'Connell et al., 2020) and by participants at the 2018 UFERN meeting.
O'Connell et al. (2020) surveyed practitioners on expected student outcomes from their UFEs. We then refined the list of outcomes by removing outcomes that were redundant, not measurable, or linked to very specific contexts (not field universal), and then grouped them by what we call "primary aim." The primary aim category is an umbrella category by which to group similar intended outcomes. Thus, students gaining content knowledge and skills is a prominent goal for practitioners of UFEs, but content can also be learned in many contexts. We and others propose that the distinctive impact of participation in a UFE may actually be more in the affective domain (Kortz et al., 2020;Van Der Hoeven Kraft et al., 2011). Thus, we encourage practitioners to consider focusing less on content-level outcomes and more on the full spectrum of possible outcomes.
F I G U R E 2 Vignettes of Undergraduate Field Experiences (UFEs). These vignettes (a-d) represent actual examples of UFEs and illustrate how to apply the components of Figure 1 (Strategy for Assessment of Undergraduate Field Experiences (UFEs)) to assess each UFE. Figure  2d was based on Gilley et al., 2015; (Lonergan & Andresen, 1988;O'Connell et al., 2020;Whitmeyer et al., 2009b). For example, some are strictly disciplinary , others interdisciplinary (Alagona & Simon, 2010); they might occur locally (Peacock et al., 2018), in short duration (Hughes, 2016), over an entire course (Thomas & Roberts, 2009), or as a summer research experience held at a residential field station (Hodder, 2009, Wilson et al., 2018. O'Connell et al. (2021) comprehensively describe and organize the evidence for how student factors such as student identity, prior knowledge, and prior experience and design factors such as setting and social interaction influence learning in the variety of UFE formats . In this paper, we urge practitioners to consider student factors (e.g., prior knowledge, skills and experiences, motivation and expectations, social identity, and personal needs) and design factors (e.g., setting, timing, instructional models, and activities) when determining an appropriate assessment approach. These contextual factors should inform assessment decisions as well as data interpretation, and how to use the data to make decisions about next steps in assessment or evaluation. The intention is for practitioners to use the guide (Figure 1) to inform iterative change and improvement and reflective practice, not as static scaffolding.

| Student factors
As with any learning environment, it is critical for instructors and staff to have a good idea of who the participating students are, and preempt what information may be pertinent to their experiences as practitioners plan to understand the outcomes of a UFE (Fakayode et al., 2014;Ireland et al., 2018;Pender et al., 2010;Stokes et al., 2019 This effect can occur when a large proportion of subjects begin a study with very high scores on the measured variable(s), such that participation in an educational experience yields no significant gains among these learners (Austin & Brunner, 2003;Judson, 2012). In this case, instead of the survey, the practitioner might learn more by crafting an essay assignment that probes the physiology students' environmental values. This option would demonstrate consideration of the student population in the assessment strategy.
Other factors to consider might include student motivation and expectations. An assessment of students in a pair of geoscience UFEs in New Zealand showed that study abroad students were more intrinsically motivated, pro-environmental, and had a stronger sense of place than local students in a similar field experience, although they were held in the same place . This assessment highlighted the need to adapt the design of the field experience to be more applied, environmentally focused, and placebased, rather than simply applying the same curricula unchanged to a different student population . Here, future assessments could be targeted toward investigating whether the revised UFE design for study abroad students effectively captured their motivation and interest. And/or, a deeper qualitative investigation could be conducted to characterize their field experiences in relation to the environmental and place-based content.
Prior experiences and identity are also critical to consider have the opportunity to ask questions, and are free from coercion.
In some cases, this may mean having someone who is not the course instructor conduct the assessment. Although questions like these would be addressed if the study requires approval through an IRB or similar, we encourage their consideration regardless as they have a bearing on student comfort and perceptions of safety.
Programmatic processes such as recruitment efforts or selection criteria can also influence student factors (e.g., O'Connell et al., 2021;Zavaleta et al., 2020). Are all students enrolled in a class participating in the UFE (as in a CURE), do they self-select, or are they chosen to participate based on certain criteria? It is important to keep in mind that any outcomes from a UFE are only representative of the students who actually participated, and thus not broadly representative of any student who might participate. In summary, when applying the assessment strategy presented in this paper, one must consider the following: Are the UFE outcomes reasonable to achieve and measure given the specific student population? Student factors must be considered in UFE design and will likely moderate or even become the subject of assessment efforts.
In the vignettes, we identify various factors that may inform pro- all majors (e.g., Figure 2c).

| Setting and timing
Fundamental to the definition of UFEs is that they are immersive, communal, and somewhat unstructured (even if conducted remotely) (Posselt, 2020, p. 56-57). This distinctive learning environment should be considered when picking an assessment approach and interpreting assessment data. If a practitioner wanted to evaluate how a UFE impacts student knowledge of a particular concept, then a two-week, on-campus UFE focused on urban greenspaces may yield less deep learning about forest ecology than a semesterlong field course held in a live-in forest field station. Thus, a summative assessment on forest ecology concepts should be reflective of the amount of time and depth the students have had to amass relevant cognitive gains.

Previous work indicates that instructors and students place
high value on UFEs where participants live and work together in the field (Jolley et al., 2019). However, cohabitation and isolation may also present challenges in the way of mental health stressors (John & Khan, 2018) and unfamiliar and overstimulating environments (Kingsbury et al., 2020). In an almost opposite, yet timely and relevant example, Barton (2020) describes how remote UFEs need to reduce or change expected learning outcomes specific to being "in the field" to outcomes more relevant. Considering how the UFE setting might impact student learning should be factored into determining intended student outcomes, and subsequently how to test whether those outcomes are being met. Figure 2 illustrates how factors such as residential/non-residential settings, length of the UFE, and accessibility of the setting can inform assessment strategies.

| Contextual factors can intersect
The student experience (and thus the student outcomes) are influenced by the intersection of setting and timing factors, making interpretation of the results complex. For example, perhaps a student is a primary caregiver for someone at home and is distracted by irregular or absent cellular service, therefore are unable to establish a connection to place due to distraction and worry. Some students may identify that eating as a community helps them to establish a sense of belonging among peers and instructors, whereas eating in a group setting may cause a student with a complex relationship with food to experience extreme discomfort. These examples are provided to highlight how residential or community settings may have contradictory impacts on different students in the same UFE; thus, it may not always be appropriate or meaningful to solely look at assessment findings on an average or "whole-class" scale.

| Instructional model and activities
As with any learning experience, working backwards from the specific learning outcomes will help instructors to ascertain whether the curriculum is in alignment with those goals, or whether there are activities that are not aligned or extraneous. If intended student outcomes are to increase skills with research practices (e.g., Figure 2A), then the actual activities should support this outcome.
In this vignette, students are supported to develop a research project, aligning the instructional model and activities to the outcome.

Similarly, an intended outcome of the Humanities Course at a Field
Station vignette (Figure 2c) was to develop stronger connections to place in Northern Michigan, and the course curriculum included activities focused on exposure to place, and fostering a sense of place.
In the Urban Field CURE vignette (Figure 2b), an intended outcome was for students to engage with relevant stakeholders, accordingly, the students received feedback on thier experimental design from the stakeholders. There are multiple options for designing curriculum or activities that will allow practitioners to gauge the participant experience, thus acting as a form of formative assessment. For example, designing a written reflection activity that probes the student experience or their learning in that particular environment, or collecting student artifacts from the UFE can yield information regarding how a student experiences the UFE, and can in turn inform UFE stakeholders.

| Accessibility and inclusion
As illustrated previously, basic characteristics of the location and pedagogy of the UFE can have an impact on the physical, cognitive, and/or emotional accessibility of the learning environment for various students. In efforts to include as many students as possible, it is important to consider factors such as physical space (e.g., restroom availability, non-gendered housing, housing for students with physical, emotional, or psychological concerns), quality of Internet connection (if remote), sleeping arrangements, skills needed to participate (e.g., training in swimming), or other health concerns (e.g., allergies). Additionally, social isolation/inclusion can be especially prevalent in UFEs for students who do not share the same identities with other participants and/or are from underrepresented groups Morales et al., 2020). One of the vignettes ( Figure 2d) is specifically tied to accessibility and demonstrates the importance of directly working with students and faculty with disabilities on a field trip in order to address the intended outcomes of the UFE.

| Assessment approach
Key to choosing an assessment approach is first asking: What is the motivation for collecting the data? As discussed earlier, there are a number of reasons and ways one might assess a UFE including: identifying if students are meeting specific learning goals; to collect publishable data on students' sustained interest in a topic; or to identify if the UFE is meeting programmatic goals to report back to a funding agency or university. Regardless of stakeholders' motivations, using backward design to clarify and align program goals, activities and assessments will allow for a solid platform for improvement and evaluation.
We recommend that practitioners consider both formative and summative assessments. A formative assessment might be a UFE student completing a written reflection or keeping a "reflective diary" (Maskall & Stokes, 2008;Scott et al., 2019) regarding an aspect of their learning experience. This strategy would provide students a chance to reflect on their learning process and their changing experience and competencies in their own words. Further, such a formative assessment would allow instructors/stakeholders to better understand how programming, or more specifically a particular aspect of programming may impact student perceptions and possibly how to adjust the learning experience. A summative assessment strategy could be employed if practitioners wanted to know whether students have gained a greater appreciation for the natural world as a result of a UFE, which could be measured for example by conducting a pre/postsurvey designed to measure this specific construct (e.g., Table 1. Primary Aim: Connection to Place, Assessment Tool: Place Attachment Inventory (PAI), Williams & Vaske, 2003). Figure 1 is meant to be useful in planning assessment strategies but could also serve as a helpful communication tool when engaging with funders and stakeholders.
It may also be appropriate to hire an external evaluator. An advantage of external evaluation is that it presumably provides an unbiased view of the program, as the evaluator will assess the impacts of programming on participants and report findings in an objective manner. From the evaluator's perspective, is the program meeting its intended goals? For whom does the UFE appear to be "working," and are there certain student groups that are not being impacted in the way designers of the experience had intended? An external evaluator will often work with the team to identify goals and then conduct a holistic programmatic evaluation, including all stakeholders. The caveat regarding external evaluation is cost. If grant-funded, external evaluation may be encouraged or even required; if not grant-funded, finding funding would be necessary in order to hire the evaluator or evaluation team.

| Data collection and analysis
Deciding what type of data to collect will require having a reasonable idea of the program's goals and anticipated outcomes, as well as an awareness of the time it will take to collect and analyze the type of data collected. Practitioners may consider using quantitative measures such as surveys, or qualitative methods such as interviews or open-ended questions. A mixed methods approach can employ both qualitative and quantitative methodology, allowing for a more nuanced understanding (Creswell & Clark, 2007).
Identifying if the intention is to publish the data (requiring IRB review), or to use it internally to gain a better understanding of an aspect of programming should play a key role in determining the approach and the "rigor" with which one collects and interprets the data.
Using best practices in research will help to avoid conflicts of interest, and better ensure that valid and reliable data are collected (Ryan et al., 2009). If, for example, a program recruits students for interviews after they participate in a UFE, someone outside of the UFE leadership or instructional team should be the interviewer. This practice would minimize the power differential between participant and researcher, thereby ensuring that UFE interview participants feel that they can be honest about their experiences, and not worry about pleasing or offending those involved in the program (Kvale & Brinkman, 2009). Further, the interview questions should be vetted by others (similar to the target population) before the interviews begin to ensure that the questions are interpreted by the participants as they are intended.
Using appropriate methodology in planning data collection and conducting analyses, will allow for apt interpretation of the results (Clift & Brady, 2005). As illustrated in the vignettes (Figure 2d), deeply understanding the lived experiences of participants may call for knowledge of qualitative methodology. One may not want to conduct numerous interviews with students and staff without the resources to hire researchers, or ample time to analyze the data.
Analyzing rich qualitative data typically involves iterative "coding" by multiple trained researchers who develop and revise codebooks and then apply those codes to the transcribed text, regularly checking for coding reliability among researchers (Belotto, 2018;O'Connor & Joffe, 2020;Saldaña, 2011). Coding processes can vary, sometimes guided by a theoretical framework, a priori ideas, and/or they may allow for inductive, deductive, or a combination of coding approaches (see Saldaña, 2015 for a comprehensive manual on coding).
Similar to qualitative data, quantitative data collection and analysis requires planning and expertise. Researchers will want to ensure that the research aims are well-aligned with the data collection methods or tools, and in turn, allow for appropriate interpretation the data. Comparing pre-post survey responses would be one seemingly straightforward way to measure change over time in participant learning (e.g., Figure 2C). Yet, we do caution against simply pulling a tool from Table 1 or elsewhere and simply assuming that by using it, it "worked." We recommend collaborating with experts who are familiar with quantitative methods. Using a survey tool may yield quickly quantifiable results, but if the survey has not undergone vetting with individuals similar to the population of study, or it has not previously shown to collect valid data in very similar populations, one cannot assume that the data collected are valid or reliable (Barbera & VandenPlas, 2011;Fink & Litwin, 1995). Just as we do not use micropipettes to measure large volumes of lake water, we would not use a tool developed to measure academic motivation in suburban elementary school students to measure motivation of college students participating in a residential UFE and expect to trust the survey results outright. If a tool seems appropriate for a given UFE and the student population, we encourage first testing the tool in that population and work to interpret the results using best practices (for a comprehensive resource on these practices, see American Educational Research Association (AERA) 2014). As described previously, Table 1 consists of several assessment tools which are potentially relevant for measuring UFE outcomes. We only included tools that have been peer-reviewed and published in the table. We strongly recommend reviewing the associated peer-reviewed paper before using a tool, as well as looking in the literature to see whether others have used the tool and published their findings.
It is also possible that one would want to measure an outcome for which a tool has not yet been developed. In this case, working on an attuned assessment strategy based on iterative adaptations and using lessons learned may be appropriate (Adams & Wieman, 2011). There are many steps involved with designing and testing a new assessment tool that is capable of collecting valid and reliable data. Therefore, if stakeholders deem it necessary to create a new tool to measure a particular outcome, or develop or modify theory based on an UFE, we recommend working with psychometricians or education researchers.

| What are the next steps?
We encourage that the process of evaluation and assessment is a reflective, cyclical, iterative process of improvement as it relates to UFE design and implementation. There are inevitably going to be aspects of any learning experience that could be improved, and this guide to assessment (Figure 1) can help practitioners visualize alignment between intended outcomes, programming, assessment, and evaluation; and how each informs the other. The next steps for many UFEs might be to first report to stakeholders (funders, the institution, etc.) on the outcomes of the UFE. Or, if the goal of the assessment effort was to conduct novel research, then the next steps might be to analyze, write up, and submit the results of the study for peer review, thereby contributing to the growing literature of empirical outcomes from UFEs. For example, one vignette (Figure 2b) describes how the assessment strategy will provide pilot data for ongoing publishable projects. Other vignettes (Figure 2a,c)  be paid at the start of their experience and identified field research projects that were located in student communities, and in another case, accommodations were made for the student's family to join them as part of the residential field experience (Ward et al., 2018). This is just one example of how assessment data can be used to inform the design of future UFEs and highlights how the assessment process can be both informative and iterative.

| E XPANDED VI G NE T TE S
Here, we provide detailed narratives that more fully illustrate two of the vignettes introduced in Figure 2 (Figure 2a,c). The expanded vignettes are intended to transform the collective ideas presented here and summarized in Figure 1 into concrete examples, serving as an example to guide assessment of diverse UFEs.   (1) increased understanding of and proficiency with research practices and processes; (2) increased understanding of discipline-specific concepts and content;

| Vignette a-
and (3) stronger skills in discipline-specific methods and procedures.
Secondary student outcomes included (1) expanded professional networks; (2) greater sense of belonging in the scientific community; (3) more refined career goals; and (4) stronger professional skills. During the internship, students were assigned to one of three long-term projects at the TMU Biology Field Station and conducted this research as part of a small group of students and one faculty mentor. In addition, students were required to conduct a small-scale independent-study project of their own choosing, in collaboration with a faculty mentor. For the independent-study project, students were required to conduct a literature search, write a proposal, and carry out the project within the course of their summer internship. At the conclusion of the summer, students made on oral presentation on their group work and a poster presentation on their independent project.

| Course and field station context
In addition, student interns were required to attend a summer seminar series during which professionals presented their research and spent a day observing the students in action. Lastly, students participated in field trips and tours to laboratories at the EPA, USFW, and local governmental agencies and served as mentors for a weeklong STEM camp for high school students.
The TMU Biology Field Station is a residential field station, where students live together in houses. In addition to the residential structures, there are three laboratories, four classrooms, and a STEM Outreach Center. Students, staff, and faculty eat meals together and socialize together in both formal and informal activities throughout the summer.

| Data collection
In order to assess change (increases in perceived ability or value), the field station director used a pre/postsurvey to identify student perceptions before they began the internship and after they ended the internship. The survey included measures about research practices and processes, discipline-specific concepts and content, and discipline-specific methods and procedures. The survey also included measures about career goals and professional skills. The field station director also conducted mid-summer and exit interviews with each student intern to explore perceptions about their knowledge and skills gained through the program. While this assessment was created for an institutional annual report, the Director also used these data for support of additional external funding in grant applications and also compared the findings to previous years' surveys.

| Next steps
Findings from the survey responses and interviews indicated that students in the internship program gained knowledge and skills in research practices and in discipline-specific content, methods, and procedures. Further, students indicated more refined career goals and professional skills, namely oral and written skills. Students in the internship perceived increased confidence in their ability to communicate about science and an increased scientific network.
Future assessment work will consist of additional surveys and interviews with students a year later to explore how the internship experience impacted their academic work in the subsequent school year and career development. Lastly, attempts are being made to contact student interns from previous years to determine their specific career path and status.

| Development of student outcomes
During the humanities course development, UMBS staff, including the program manager and program evaluation coordinator, discussed outcomes that they wanted to explore with this particular class to include in their annual program assessment. These outcomes were informed by discussions with the faculty as well as through reviewing syllabi. The intended student outcomes included (1)

| Data collection
In order to assess change (increases in perceived ability or value), the program evaluation coordinator used a pre/postsurvey to identify student perceptions before they began the course and after they ended the course. The survey included measures about sense of place, sense of connection to larger-scale problems or issues, and ability to communicate with scientists about scientific work. The program evaluation coordinator also conducted a focus group with students in the course to explore perceptions about their value of the interdisciplinary nature of science, ability to communicate, and connections to place in more detail. Interviews with the instructor and a focus group with the TA for the course also provided insight into change in student perceptions about these topics and how these changes developed in their time taking this course at UMBS.
While this assessment was created to share for an annual report, the program evaluation coordinator was interested in sharing this information with the larger field education community, and so all of the assessment of this course (and all courses at UMBS) had IRB approval. In addition, the program evaluation coordinator selected published measures to include on pre/postsurveys that had been tested in college populations. The program evaluation coordinator intentionally conducted focus groups because students had no interaction with her until this meeting and she was not associated with their grades or evaluation for their course.

| Next steps
Findings from the first year of survey responses and focus groups indicated that students in the course formed extremely close-knit bonds. Future assessment work will consist of interviews with students, faculty, and TA to explore how connections to others (sense of belonging in the class) impact learning and understanding of different course topics.
In addition, findings from surveys and focus groups indicated that students in the course perceived increases in the value of the interdisciplinary nature of science and increased confidence in their ability to communicate about science. Findings from faculty interviews supported student responses and also indicated that faculty had a strong interest in doing more intentional collaboration with biophysical courses in the future. After discussing all of the assessment data, UMBS staff decided to expand their assessment for the next year. Specifically, they wanted to know whether students from biophysical courses who interacted with students in the humanities course also experienced increases in perceived value of the interdisciplinary nature of science and ability to communicate about science.
The program evaluation coordinator intends to add additional assessment approaches to examine interactions between this course and other courses at the station. This may include observations of structured and unstructured activities with the humanities and biophysical courses as well as adding survey questions and/or focus group questions for all students who are taking courses at UMBS.
Thus, the results of the assessment of the humanities course not only addressed whether the student outcomes were achieved in the humanities course, but also highlighted changes in the program that would happen in future iterations, and informed additional assessment of all UMBS courses in the next year.

| CON CLUS IONS
We encourage using contextual information about a UFE to iteratively inform assessment strategies and in turn -improve the value and inclusivity of the UFE for the full spectrum of participants and stakeholders. We encourage practitioners to use the supports pro- Through a thoughtful assessment approach along with consideration of student context factors, practitioners may begin to unravel which design factors of their UFE are specifically leading to which student outcomes for which students. Future work could model which design factors lead to specific outcomes, as demonstrated by work to better understand how CURE elements influence student outcomes (Corwin et al., 2015).
We believe that the process of informed assessment and reflection will improve the accessibility and inclusivity of UFEs. Morales et al. (2020, p. 7) call for continuing a "conversation about creating student-centered field experiences that represent positive and formative experiences for all participants while removing real or imagined barriers to any student participating in field research." Explicit attention to diversity, equity, access, and inclusion regarding who gets to participate in UFEs and the learning that results from the experiences are key conversations with important implications (Carabajal et al., 2017;Demery & Pipkin, 2021;Giles et al., 2020;Morales et al., 2020;Nairn, 1999;Stokes et al., 2019;Zavaleta et al., 2020). As illustrated in Figure 2d, for example, authentically considering what it means to be accessible and inclusive is an important question, and we suggest that practitioners begin to systematically evaluate who is served by their UFE and who is not served and why, thus deeply investigating how the UFE may become more inclusive for diverse individuals. It will be necessary to work across disciplines to learn what is needed to support and advocate for accessible and inclusive UFEs such that as many students as possible can participate and have a positive experience.
The recent COVID-19 pandemic has brought to the forefront vital questions about the role of virtual field experiences (Arthurs, 2021;Swing et al., 2021), as well as aligned assessment practices.
We suggest that this is one area where developing novel assessment tools is needed to effectively measure impact and to ask such questions as: What are the characteristics defining a virtual or remote UFE? As it relates to outcomes, what can we learn about the impacts of in-person experiences vs. remote on a student's affect such as their sense of belonging?
Here, we meet a call from the community to aid practitioners and stakeholders in using best practices to assess, evaluate, and/or research the spectrum of UFEs. UFEs are widespread and diverse, yet unique and complex. As we consider more deeply the outcomes that are specific to UFEs, we urge practitioners to move toward evidence-based advocacy and improvement for the continued support of UFEs.

ACK N OWLED G M ENTS
This work was supported by the National Science Foundation under RCN-UBE grant #1730756 (Awarded to K.O.). We would like to thank participants of a UFERN panel and Erin Dolan for their helpful feedback on previous drafts of this paper.

CO N FLI C T O F I NTE R E S T
The authors declare no competing interests.