Test Building
Post Notes: All tests and optimizations can be downloaded from my GitHub repo.
- github.com/marcobeltempo/zopfli/tree/optimization_tester
- To build the source code run the
make
command from the main directory - Execute/compress file
./zopfli your_file_name
In continuation of Stage 1 – Optimizing Google’s Compression Algorithm, it was time to begin optimizing our compression application Zopfli.
Re-cap
Zopfli, a compression algorithm library written in C designed with performance over speed.
I went back to the drawing board on my original “time benchmark” and felt I needed to implement something a little bit more flexible. To start off I created the following files to store my test declarations and implementations:
zopfli/src/zopfli/functionTimer.h
zopfli/src/zopfli/functionTimer.c
Which contains the following functions:
void setTotalTime(double tt); //sums the total execution time after each funciton call
void getFunctionTimerStats(); //displays the results of the tests
void startTimer(); //starts the timer
void stopTimer(); //stops the timer
In order to keep the testing as consistent and simple as possible, I had executed my tests on two identical copies of Zopfli.
One named zopfli
(master branch) and the other zopfli_hacked
(optimization_testing branch).
I used a 10MB test file from engineerhammad.com. This site is great for downloading files of various sizes to use during your testing.
Each test was run by compressing to a new file as opposed to appending an existing .gz
. This ensures that the same routes are executing during testing.
To test my newly created tests (test inception) I wrapped a call to the ZopfliResetHash
function with startTimer()
and stopTimer()
.
Source:marcobeltempo/zopfli/blob/optimization_tester/src/zopfli/squeeze.c
if (instart == inend) return 0;
startTimer();
ZopfliResetHash(ZOPFLI_WINDOW_SIZE, h);
stopTimer();
In order to get the test results, execute:
time ./zopfli filename
Which will display every instance of the function and a summary of results upon succesfull compression of the file.
Below is the benchmark of the unmodified Zopfli program.
[mabeltempo@aarchie zopfli]$ time ./zopfli ../test1Mb.db
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 1 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 2 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 3 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 4 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 5 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 6 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 7 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 8 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 9 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 10 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 11 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 12 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 13 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 14 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 15 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 16 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 17 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 18 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 19 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 20 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 21 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 22 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 23 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 24 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 25 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 26 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 27 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 28 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 29 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 30 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 31 -[EXEC TIME#]: 0.000231 sec.
-------------RESULTS--------------
Average Execution Time: 0.000231 sec.
----------------------------------
Total Time: 0.007176 sec.
real 0m0.947s
user 0m0.926s
sys 0m0.020s
If we look at the results we can see that each function call would execute between 0.000231 - 0.000232 sec.
Total time spent executing in the program was 0.007176 sec.
I ran this multiple times and yielding fairly consistent results.
Optimizations
Now for the purpose of these test’s we’ll be focusing on the following function:
Source:marcobeltempo/zopfli/blob/master/src/zopfli/hash.c#L45-L89
/* Resets all fields of ZopfliHash. */
void ZopfliResetHash(size_t window_size, ZopfliHash* h) {
size_t i;
h->val = 0;
for (i = 0; i < 65536; i++) { h->head[i] = -1; /* -1 indicates no head so far. */
}
for (i = 0; i < window_size; i++) { h->prev[i] = i; /* If prev[j] == j, then prev[j] is uninitialized. */
h->hashval[i] = -1;
}
#ifdef ZOPFLI_HASH_SAME
for (i = 0; i < window_size; i++) { h->same[i] = 0;
}
#endif
#ifdef ZOPFLI_HASH_SAME_HASH
h->val2 = 0;
for (i = 0; i < 65536; i++) { h->head2[i] = -1;
}
for (i = 0; i < window_size; i++) { h->prev2[i] = i;
h->hashval2[i] = -1;
}
#endif
}
The purpose of this function is to ensure that the hash is in a safe state before continuing compression.
At first glance, we can pinpoint a few potential optimizations.
This loop simply loops through a static number of 65536 and resets all the elements in head
to -1
.
for (i = 0; i < 65536; i++) { h->head[i] = -1; /* -1 indicates no head so far. */
}
If we work our way down we notice that this same method is repeatedly used.
Since this value is statically declared, why not create a constant #define
?
#define HASH_SHIFT 5
#define HASH_MASK 32767
#define HASH_LENGTH 65536
Now since we’ve already confirmed that this head array is of a fixed size, is it really necessary to loop through each element?
I replaced the loop with a memset function that sets the newly defined constant size to -1
“`
memset(h->head, -1, sizeof(int) * HASH_LENGTH);
Now let's rebuild with the program using the `make` command and run our tests to see if we've gained any performance improvements.
—————-RESULTS—————–
Original Average Execution Time: 0.000231 sec.
Hacked Average Execution Time: 0.000184 sec.
Original Total Time: 0.007187 sec.
Hacked Total Time: 0.005693 sec.
Difference of : 0.001494 sec.
Optimized by: 20.79%
WOW! Just from those two minor changes, we were able to optimize execution time by 21%!
At this point, I was curious to see what would happen if applied this same change to the other loops within the ZopfliResetHash function.
To my surprise, it actually reduced performance around 3%. This could be due to duplicate calls since `memset` is called at the beginning of the function.
Another optimization candidate that I discovered was in the <a href="https://github.com/google/zopfli/blob/master/Makefile#L4-L5" rel="noopener">Makefile</a>.
By default, Zopfli was being built using the `-O2` optimization flag.
CFLAGS = -W -Wall -Wextra -ansi -pedantic -lm -O2 -Wno-unused-function
CXXFLAGS = -W -Wall -Wextra -ansi -pedantic -O2
-O3
Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload, -ftree-loop-vectorize, -ftree-loop-distribute-patterns, -fsplit-paths -ftree-slp-vectorize, -fvect-cost-model, -ftree-partial-pre and -fipa-cp-clone options. Source
Let's see what happens if we bump it up to `-O3` along with our recent changes.
CFLAGS = -W -Wall -Wextra -ansi -pedantic -lm -O3 -Wno-unused-function
CXXFLAGS = -W -Wall -Wextra -ansi -pedantic -O3
I honestly couldn't believe the results at this point...
—————-RESULTS—————–
Original Average Execution Time: 0.000231 sec.
Hacked Average Execution Time: 0.000079 sec.
Original Total Time: 0.007187 sec.
Hacked Total Time: 0.002439 sec.
Difference of : 0.004748 sec.
Optimized by: 66.06%
“`
Simply changing the compile flag along with our previous optimizations, we experience a 66% increase!
Results
Being able to analyze a project such as Zopfli and pinpoint areas for potential optimization as definitely a valuable learning experience. This gave me the opportunity to explore unfamiliar code and combine various optimization methods to enhance overall performance. I also learned that not all “enhancements” have to be huge changes. The smallest tweaks can make all the difference. Successfully optimizing Zopfli without hindering its functionality came with a sense of pride and reward. Considering there are plenty of other opportunities for optimization, I plan to continue my research on Zopfli and experiement with new ideas.