Executable Code Golf: Making Tiny Binaries for Constrained Systems

C2 | Thu 24 Jan | 4:45 p.m.–5:30 p.m.


Presented by

  • Nathan Egge

    Dr. Nathan E. Egge is a Senior Research Engineer at Mozilla and a member of the non-profit Xiph.Org foundation. Nathan works on video compression research with the goal of producing best-in-class, royalty-free open standards for media on the Internet. He is a co-author of the AV1 video format from the Alliance for Open Media and contributed to the Daala project before that. Nathan received dual BS and MS degrees from Virginia Tech in 1999 and 2001 respectively, in both Computer and Mathematics, and received his Ph.D. in Computer Science from George Mason University in 2014.

Abstract

Ever had to rewrite an algorithm or data structure to keep your binary within a certain size? Then you've played executable code golf! For whole programs, a common practice is to use an executable packer like UPX. However as your binary size gets smaller, the overhead of the decoding stub becomes significant and the compression efficiency of the LZ-based algorithms go down. Better compression can be had using a compressing linker like Crinkler, but this closed source project only supports 32-bit Windows targets. For Linux targets or 64-bit / non-x86 / obsolete hardware there is no good alternative. The XLINK project is an open-source compressing linker implementing a PAQ-based compression algorithm similar to that in Crinkler. It currently targets older DOS embedded platforms where both disk space and memory are limited and the notoriously long decode times of Crinkler are unacceptable. An experimental 32-bit ELF target is in progress which will bring the same executable compression tools to Linux for use in embedded or IOT applications. This talk will describe the PAQ compression algorithm in detail and give a short overview of the assembly code for decompression. It will provide an overview of how XLINK works and is able to trade-off start-up time for compression by varying the decoder algorithm, including the use of multiple entropy segments and an alternate lower complexity hashing function. You will learn why applying compression while linking outperforms even the best post-link-stage executable compressors and what extra size-optimization tricks are available at link-time. Finally, we will look at the open research problems in executable compression and other future work like individual function compression for patching existing binaries.