Published In

Physical Review Physics Education Research

Document Type

Article

Publication Date

1-1-2025

Abstract

[This paper is part of the Focused Collection in Artificial Intelligence Tools in Physics Teaching and Physics Education Research.] We present a study in which a version of a common conservation of mechanical energy introductory physics problem, an object released on an inclined plane, is given to OpenAI’s GPT-4 large language model (LLM). We investigate how different permutations of object, action verb, and property of the incline impact the responses of the LLM. The problem setup and prompting was left purposefully minimal, requiring the LLM to state multiple assumptions to justify the final answer. We specifically studied how different keywords lead the LLM to analyze the system as rolling versus sliding and how this may be different from physics experts and novice learners. We found that domain-specific terminology may impact the LLM differently from students. Even for correct answers, it generally did not specify the necessary assumptions it needed to state to come to this solution, falling short of what would be expected from an expert instructor. When conflicting information was provided, the LLM generally did not indicate that was the case in its responses. Both issues are weaknesses that could be remedied by additional prompting; however, they remain shortcomings in the context of physics teaching. While specific to introductory physics, this study provides insight into how LLMs respond to variations of a problem within a specific topic area and how their strengths and weaknesses may differ from those of humans. Understanding these differences, and tracking them as LLMs change in their capabilities, is crucial for assessing the impact of artificial intelligence on education.

DOI

10.1103/PhysRevPhysEducRes.21.010153

Included in

Physics Commons

Share

COinS