Ph.D. Dissertation Defense - Ali Ahad

Tuesday, April 21, 2026
1:30 p.m.
IRB-4109

ANNOUNCEMENT: Ph.D. Dissertation Defense
 
Name: Ali Ahad
 
Committee:
Professor Yonghwi Kwon (Chair)
Professor Ang Li
Professor Dana Dachman-Soled
Professor Tudor Dumitraș
Professor Michelle Mazurek (Dean's Representative)

Date/Time: Tuesday, April 21, 2026 at 1.30 PM - 3.30 PM

Location: IRB-4109

Title: Improving Binary Decompilation: From Forensically Equivalent Transformations to LLM-Driven Decompilation

Abstract: Reverse-engineering malware is a crucial capability for investigating the details of cyberattacks. In particular, a decompiler is a highly desirable reverse-engineering tool that can translate a malware binary into a human-readable source code representation. Despite decades of research, decompilers remain unreliable: they either fail outright on certain inputs or, worse, silently produce incorrect output that misleads downstream analysis. This dissertation improves binary decompilation reliability and quality through two complementary, decompiler-agnostic techniques that operate before and after the decompilation process, respectively.

First, we address decompilation of Python binaries, which have become a prevalent malware vector. We systematically analyze over 2,000 decompilation failures across five Python decompilers and identify four root causes: missing parsing rules, conflicting parsing rules, unsupported instructions, and implementation bugs. To address these failures without modifying any decompiler, we propose Forensically Equivalent Transformation (FET), a new class of binary transformation that relaxes the semantic-preserving constraint to enable decompilation while retaining sufficient semantics for forensic analysis. We implement FET in Pyfet, a system that transforms error-inducing Python binaries into decompilable forms. We are also the first to systematically detect and repair implicit errors—cases where a decompiler silently produces logically incorrect code. We evaluate Pyfet on 17,117 real-world Python malware samples, identifying and fixing all 77,022 decompilation errors across five decompilers.

Second, we tackle decompilation of ARM binaries, where both traditional and LLM-based decompilers suffer from structural errors: traditional tools produce convoluted control flows, while LLM-based tools hallucinate incorrect structures. We present DeARM, a structure-aware decompilation framework that decomposes binaries into structural slices and applies structure-specialized LLMs (for loops, conditionals, and switches). DeARM then iteratively repairs the decompiled output through a CFG-guided structural repair pipeline that enforces topological faithfulness with the original binary, complemented by a dataflow validation and repair stage that maps variables across decompiled outputs and corrects inconsistencies against the ground-truth assembly using dataflow graph (DFG) similarity. We evaluate DeARM on 3,816 functions from real-world and malware datasets against Ghidra, Nova, and LLM4Decompile, achieving 89.9% CFG similarity and 79.3% DFG similarity overall. We further show that DeARM’s repair pipeline serves as a universal, model-agnostic refinement layer, improving CFG similarity by 10.17% on average when applied to four external decompilers.

Audience: Graduate  Faculty 

remind we with google calendar

 

April 2026

SU MO TU WE TH FR SA
29 30 31 1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 1 2
Submit an Event