Event
Ph.D. Dissertation Defense: Kapil Anand
Tuesday, July 23, 2013
1:00 p.m.
2168, AV Williams Building
Maria Hoo
301 405 3681
mch@umd.edu
ANNOUNCEMENT: Ph.D. Dissertation Defense
Name: Kapil Anand
Committee:
Professor Rajeev Barua, Chair/Advisor
Professor Shuvra Bhattacharya
Professor Manoj Franklin
Professor Angelos Keromytis
Professor Alan Sussman, Dean's representative
Date/Time: Tuesday, July 23, 2013 at 1 pm
Location: 2168, AV Williams Building
Title: A COMPILER LEVEL INTERMEDIATE REPRESENTATION BASED BINARY ANALYSIS SYSTEM AND ITS APPLICATIONS
Abstract:
"There has been a tremendous amount of activity in executable-level research targeting varied applications such as security vulnerability analysis, untrusted code analysis, malware analysis, program testing, and binary optimizations. The vision of this dissertation to advance the field of static analysis of executables and to bridge the gap between source-level analysis and executable analysis.
In spite of a significant overlap in the overall goals of several source-code methods and executable-level techniques, several analyses that are well-understood and implemented in source-level infrastructures have yet to become available in executable frameworks. One of the prime reasons behind the limitations of existing executable frameworks is that current executable frameworks define their own intermediate representations (IR) which are significantly more constrained than an IR used in a compiler. This severely limits the application of well-understood compiler transformations to executables and necessitates new research to make them applicable.
In the first part of this dissertation, we present techniques to convert the binaries to the same high-level intermediate representation that compilers use. We propose methods to segment the flat address space in an executable containing undifferentiated blocks of memory. We demonstrate the inadequacy of existing variable identification methods for their promotion to symbols and present our methods for symbol promotion. We have integrated our techniques with a prototype x86 binary framework called SecondWrite that uses LLVM as IR. The robustness of the framework is demonstrated by handling executables totaling more than a million lines of source-code, including several real world programs.
In the next part of this work, we demonstrate that several well-known source-level analysis frameworks such as symbolic analysis have limited effectiveness in the executable domain since executables typically lack higher-level semantics. Our first work of recovering a compiler-level representation addresses this limitation by recovering several higher-level semantics information from executables. In the next part of this work, we propose methods to handle the scenarios when such semantics cannot be recovered.
First, we propose a hybrid static-dynamic mechanism for recovering a precise and correct memory model in executables. Next, the enhanced memory model is employed to define a novel symbolic analysis framework for executables that can perform the same types of program analysis as source-level tools. Frameworks hitherto fail to simultaneously maintain the properties of correct representation and precise memory model and ignore memory-allocated variables while defining symbolic analysis mechanisms. We exemplify that our framework is robust, efficient and it significantly improves the performance of various traditional analyses like global value numbering and alias analysis for executables.
Finally, the underlying representation and analysis framework is employed for two separate applications. First, the framework is extended to define a static analysis framework. DemandFlow, for identifying information flow security violations in executables. DemandFlow employs a demand-driven mechanism to identify and precisely analyze only those program locations and memory accesses which are relevant to a vulnerability, thus enhancing scalability. DemandFlow uncovers six previously undiscovered format string and directory traversal vulnerabilities in popular ftp and internet relay chat clients.
Next, the framework is extended to implement a platform-specific optimization for embedded processors. Several embedded systems provide the facility of locking one or more lines in the cache - this feature is called cache locking. We employ instruction cache locking as a mechanism for improving the average-case run-time of general embedded applications. We demonstrate that the optimal solution for instruction cache locking can be obtained in polynomial time. Since our scheme is implemented inside a binary framework, it successfully addresses the portability concern by enabling the implementation of cache locking at the time of deployment when all the details of the memory hierarchy are availaible."
