First Advisor

Ameeta Agrawal

Term of Graduation

Winter 2026

Date of Publication

3-12-2026

Document Type

Dissertation

Degree Name

Doctor of Philosophy (Ph.D.) in Computer Science

Department

Computer Science

Language

English

Subjects

Cross-Lingual Retrieval, Large Language Models, Long-Context Reasoning, Neurosymbolic Reasoning, Reasoning-as-Code, Verifiable Inference

Physical Description

1 online resource (ix, 144 pages)

Abstract

Large Language Models (LLMs) are increasingly deployed as general-purpose reasoners, yet their reliability degrades in three settings that frequently arise in practice: multilingual inputs, long contexts, and symbolic or formally constrained domains. In multilingual settings, uneven training coverage produces substantial performance disparities and uncertain generalization to languages with little or negligible pretraining exposure. In long-context settings, relevant evidence may be sparsely distributed, and models exhibit the "lost-in-the-middle" phenomenon, undermining retrieval and multi-step synthesis. In symbolic settings such as mathematics, small arithmetic or logical slips invalidate solutions, and prose rationales are difficult to verify automatically.

This dissertation first characterizes these failure modes empirically. We quantify multilingual performance drivers across more than 200 languages by separating languages with substantive pretraining exposure (SEEN) from those with negligible exposure (UNSEEN). Pretraining exposure and data volume dominate SEEN performance, while UNSEEN generalization is mediated by cross-lingual transfer signals such as script and language family; token similarity and country similarity further emerge as consistent predictors across classification and translation. We also evaluate long-context retrieval and reasoning over multilingual documents, showing sharp degradation as contexts grow and when key evidence is placed mid-context. Finally, we analyze mathematical reasoning failures and show that many errors arise from unverifiable intermediate steps that can be detected—and often prevented—through execution-based checks.

Building on these insights, the dissertation introduces three complementary frameworks that make reasoning more robust and checkable. CROSS improves long-context cross-lingual reasoning by retrieving a small, semantically ranked set of candidate sentences from very long multilingual documents, mitigating mid-context failures while remaining cost-efficient. NSAR strengthens multi-target reasoning by extracting symbolic facts and executing generated code, producing more interpretable and verifiable conclusions than purely neural prompting. Finally, SymCode reframes mathematical problem solving as the generation of self-contained SymPy programs with an iterative self-debugging loop, improving accuracy and token efficiency on challenging math benchmarks.

Rights

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

Persistent Identifier

https://archives.pdx.edu/ds/psu/44572

Share

COinS