First Advisor

Fei Xie

Term of Graduation

Summer 2024

Date of Publication

7-3-2024

Document Type

Dissertation

Degree Name

Doctor of Philosophy (Ph.D.) in Computer Science

Department

Computer Science

Language

English

Physical Description

1 online resource (xv, 132 pages)

Abstract

Scripting languages, such as JavaScript and Lua, are becoming more and more popular. They are typically easy to learn and use, making them accessible to a wide range of developers, even those with limited programming experience. Lua, for instance, is a lightweight, efficient, and versatile scripting language. It is designed to be easy to integrate into other systems and is often used as an embedded scripting language in larger applications such as NMap, which is a network scanning tool.

As another example, web front-end development with JavaScript (JS) is a popular choice for developers due to its ability to add interactivity to websites. JavaScript has also evolved into a versatile and popular programming language for not only front-end development but a wide range of server-side and client-side applications. With such popularity, there is a great demand for thorough testing of scripting language applications.

We propose to develop a holistic framework for applying concolic testing to applications in scripting languages. Concolic testing synergistically integrates concrete and symbolic execution for test generation, which alleviates the path explosion problem of- ten encountered in symbolic execution by only exploring symbolically along a concrete execution path. Under this framework, scripts are executed in their native execution engines concretely instead of the modeled test environments, these executions are efficiently traced in OS-level virtual machines (VMs) and analyzed in a customized manner within symbolic engines, and new test inputs generated by symbolic engines are fed back into the concrete execution in their native environment to drive new iterations of test generation. As a result, test cases generated reflect realistic usage to the full extent.

First, we present an approach for applying concolic execution on attacking scripts in NMap for Lua language in order to automatically generate lightweight fake versions of the targeted services so that they can fool or slow down the attacking scripts. The behavior of the attacking scripts was captured within their native execution environments for symbolic analysis later. By doing so in an automated and scalable manner, this approach can enable rapid deployment of custom honeyfarms leveraging the results of concolic execution to trick an attacker’s script into returning a result chosen by the honeyfarms, making the script unreliable for the use by the attackers.

Second, for JavaScript, we present an approach to applying concolic testing to JS scripts in-situ, i.e., JS scripts are executed in their native environments as part of concolic execution, and test cases generated are directly replayed in these environments. We implemented this approach in the context of Node.js, a JS runtime built on top of Chrome’s V8 JS engine, and evaluated its effectiveness and efficiency through applications to 180 Node.js libraries with heavy use of string operations. For 85% of these libraries, it achieved statement coverage ranging between 75% and 100%, a close match in coverage with the hand-crafted unit test suites accompanying their NPM releases. Our approach detected numerous exceptions in these libraries. We analyzed the exception reports for 12 representative libraries and found 6 bugs in these libraries, 4 of which are previously undetected. The bug reports and patches that we filed for these bugs have been accepted by the library developers on GitHub.

Third, we present a novel approach to concolic testing of front-end JavaScript web applications based on in-situ concolic testing. This approach leverages widely used JavaScript testing frameworks such as Jest and Puppeteer and conducts concolic execution on JavaScript functions in web applications for unit testing. The seamless integration of concolic testing with these testing frameworks allows the injection of symbolic variables within the native execution context of a JavaScript web function and the precise capture of concrete execution traces of the function under test. Such concise execution traces greatly improve the effectiveness and efficiency of the subsequent symbolic analysis for test generation. We have implemented our approach on both Jest and Puppeteer. The application of our Jest implementation on Metamask, one of the most popular Crypto wallets, has uncovered 3 bugs and 1 test suite improvement, whose bug reports have been accepted by Metamask developers on GitHub. We also applied our Puppeteer im- implementation to 21 Github projects and detected 4 bugs.

At last, we improved how execution traces of scripts are captured concretely and analyzed symbolically. These traces were captured through OS-level VMs and were at the binary level before, which is time-consuming and complex. Symbolic analysis of binary traces was also less efficient than analyzing higher-level traces. We have developed a new execution tracer leveraging V8’s Sparkplug baseline compiler to improve the tracing process and a new assembly to LLVM IR using remill libraries. It improves the efficiency and effectiveness of the infrastructure of execution tracing and trace translation for JavaScript while keeping the native execution environments for JS scripts under test. We evaluated its effectiveness and efficiency by comparing the coverage, bug detection, and time consumption with the in-situ approach on the same test set, which are 160 Node.js libraries that heavily utilize the String type and its operations. The results show our approach achieves similar statement coverage on these libraries within no more than 10% difference on average and is able to detect all bugs that are detected by the in-situ approach, which only uses a fraction of the time needed by the in-situ approach.

Rights

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

Persistent Identifier

https://archives.pdx.edu/ds/psu/42520

Share

COinS