Sophia d'Antoine — asm2vec: Binary Learning for Vulnerability Discovery

This talk will present a novel application of a machine learning model and a corresponding tool, asm2vec, for vulnerability discovery. Treating both program disassembly as a natural language, we construct embeddings of identifiers at scale using a concept similar to word2vec, in which the output is a vector of related identifiers and their proximity.

Identifiers in assembly vary but for this talk include: function contexts, variables, data flow, memory cells, and operations of interest (reads and writes). Unique tokens or features are extracted from these identifiers and mapped into a co-occurrence matrix. This matrix is then used to train our model and produce embeddings. The trained model will then be used to maps identifiers, and their vector associations, to bug patterns but even more simply, to discover code anomalies which may be of interest.

This work builds on top of Facebook’s StarSpace project as well as Tensorflow’s Swivel to calculate the co-occurrence matrix. (Source: Jailbreak Brewing Company)

