First Advisor
Ameeta Agrawal
Date of Award
Spring 6-14-2026
Document Type
Thesis
Degree Name
Bachelor of Science (B.S.) in Computer Science and University Honors
Department
Computer Science
Language
English
Subjects
Multilingual LLMs, Multilingual Reasoning, non-English Reasoning, Prompting
DOI
10.15760/honors.1857
Abstract
Multilingual LLMs reason more accurately in English than in other languages, and recent work links part of this gap to reasoning behavior: native-language traces contain fewer cognitive behaviors (verification, backtracking, subgoal setting, backward chaining) that support effective problem solving. We test whether prompting for these behaviors at inference time narrows the gap, across seven conditions varying chain-of-thought, instruction and reasoning language, and cognitive-behavior descriptions, on two models, three languages. We find that English-scaffolded reasoning is the strongest single strategy on both models, closing the Hindi gap on Qwen, though the explicit scaffold's value over plain chain-of-thought is model-dependent. Beyond aggregate accuracy, no single strategy captures a model's full reasoning capability: the fraction of questions solved by at least one strategy exceeds the best single strategy by up to 20-25 points on the weaker model, with a meaningful share recovered only by reasoning natively. Inference-time prompting is thus a real but partial strategy to closing the multilingual reasoning gap.
Persistent Identifier
https://archives.pdx.edu/ds/psu/44791
Recommended Citation
Mistry, Harshiv, "Beyond the Best Prompt: A Coverage View of Multilingual Reasoning" (2026). University Honors Theses. Paper 1820.
https://doi.org/10.15760/honors.1857