Presented at the Spring 2016 SERENE-RISC Workshop.
Assembly code analysis is one of the critical processes for mitigating the exponentially increasing threats from malicious software. It is also a common practice for detecting and justifying software plagiarism and software patent infringements when the source code is unavailable. However, it is a manually intensive and time-consuming process even for experienced reverse engineers. An effective and efficient assembly code clone search engine can greatly reduce the effort of this process, since it can identify the cloned parts that have been previously analyzed. By closely collaborating with reverse engineers and Defence Research and Development Canada (DRDC), we have implemented an award winning assembly clone search engine called Kam1n0. It is the first clone search engine that can efficiently identify the given query assembly function’s subgraph clones from a large assembly code repository. Kam1n0 is built upon the Apache Spark computation framework and Cassandra-like key-value distributed storage. Extensive experimental results suggest that Kam1n0 is accurate, efficient, and scalable for handling large volume of assembly code. We will give a live demonstration of Kam1n0 in the SERENE-RISC workshop.