Optimizing Google’s Compression Algorithm Zopfli (Stage 1)

The final project for my software portability and optimization class consists of two stages. The first stage involves  finding an open source software package that includes hashing functionaluity written in a language that compiles to machine code (C, C++, Assembler).

Still learning the ropes of the open source community I figured this was a great opportunity to use my experiences from past projects to my advantage. Boy do I wish it was that easy.  The difference this time around was that we weren’t necessarily looking to tackle an existing bug or a particular project. It was up to use to dig through the source code looking for some sort of hashing functionality.

I figured the best place to start searching was GitHubs Trending page for popular C/C++ projects. Although there are hundreds of  great C/C++ projects written in C/C++ I almost forgot how overwhelming new source code can be. It’s like entering a foreign land where you speak the language but a different dialect.

Projects can contain hundreds of thousands of line of code. Understanding it’s entire functionality is next to impossible. Searching for file names on GitHub is relatively simple but this will also return third party libraries and depandencies. I had started to download various packages to perform a project wide search for any reference to hashing algorithms.

I spent hours digging through endless repos and directories trying to find a great function I could optimize for this assignment. Then I stumbled upon this project from Google and instantly knew it would be a great fit.

The project is named Zopfli, a compression algorithm library written in C designed with performance over speed. Zopfli will compress to deflate, gzip and zlib output format, but it won’t decompress. Existing gzip or deflate libraries can decompress the data

What’s great about this project is it’s not overly huge and theres a variety of different hashing functionalities.

For this project I will be focussing on ZopfliResetHash function which compares the current hash state and determines if it should be reset.


void ZopfliResetHash(size_t window_size, ZopfliHash* h) {
size_t i;
h->val = 0;
for (i = 0; i < 65536; i++) { h->head[i] = -1; /* -1 indicates no head so far. */
for (i = 0; i < window_size; i++) { h->prev[i] = i; /* If prev[j] == j, then prev[j] is uninitialized. */
h->hashval[i] = -1;
for (i = 0; i < window_size; i++) { h->same[i] = 0;
h->val2 = 0;
for (i = 0; i < 65536; i++) { h->head2[i] = -1;
for (i = 0; i < window_size; i++) { h->prev2[i] = i;
h->hashval2[i] = -1;


In term of performance benchmarking for Zopfli I plan on isolating a few key features. I’ll be focusing re-structuring the ZopfliResetHash  function for better optimization during build time and execution.

As of right now to executing  time make command yields the following results:

  • real: Elapsed real (wall clock) time used by the process, in seconds.
  • user: Total number of CPU-seconds that the process used directly (in user mode), in seconds.
  • sys: Total number of CPU-seconds used by the system on behalf of the process (in kernel mode), in seconds.

real 0m2.613s
user 0m2.322s
sys 0m0.286s

To isolate the function form the rest of the program, Ive wrapped it in a simple time function that times the execution of the function.

clock_t start, end;
start = clock();
end = clock();
printf( “Number of seconds: %f\n”, (end-start)/(double)CLOCKS_PER_SEC );]

Wrapping the function reset hash executes a consistent result of:

Number of seconds: 0.00231


After taking sometime to analyze the functionality of the hash.c file I’ve started to assemble a strategy to optimize the hash functionality. I will work on implementing altered build options in the makefile and improve loop algorithms to trigger compiler optimizations such as auto-vectoriation.

Stay tuned for updates as I progress through this project.

Update 1/8/2018

Optimizing Googl’es Compression Algorithm Zopfli (Stage 2)


Related Posts