First Advisor

Ameeta Agrawal

Date of Award

Spring 6-14-2026

Document Type

Thesis

Degree Name

Bachelor of Science (B.S.) in Computer Science and University Honors

Department

Computer Science

Language

English

Subjects

Multilingual LLMs, Multilingual Reasoning, non-English Reasoning, Prompting

DOI

10.15760/honors.1857

Abstract

Multilingual LLMs reason more accurately in English than in other languages, and recent work links part of this gap to reasoning behavior: native-language traces contain fewer cognitive behaviors (verification, backtracking, subgoal setting, backward chaining) that support effective problem solving. We test whether prompting for these behaviors at inference time narrows the gap, across seven conditions varying chain-of-thought, instruction and reasoning language, and cognitive-behavior descriptions, on two models, three languages. We find that English-scaffolded reasoning is the strongest single strategy on both models, closing the Hindi gap on Qwen, though the explicit scaffold's value over plain chain-of-thought is model-dependent. Beyond aggregate accuracy, no single strategy captures a model's full reasoning capability: the fraction of questions solved by at least one strategy exceeds the best single strategy by up to 20-25 points on the weaker model, with a meaningful share recovered only by reasoning natively. Inference-time prompting is thus a real but partial strategy to closing the multilingual reasoning gap.

Persistent Identifier

https://archives.pdx.edu/ds/psu/44791

Available for download on Tuesday, September 01, 2026

Share

COinS