A brief history of speed
During the development process of PyAMF, three things were (and still are!) highly valued - compatibility (primarily) with the Flash Player, stability and features. We now have a pretty substantial test suite that ensures that we don't break anything (and if we do, the buildbots will let us know) from commit to commit.
One thing that was not given a whole lot of thought during all this was performance. Everyone knows the danger of premature optimisation so with the release of PyAMF 0.6 (still currently in beta) I wanted to show the performance improvements that have been made since the release of 0.4.
PyAMF provides pure python AMF codecs as well as an optional C extension. The pure python version allows PyAMF to run on App Engine, Jython and PyPy but runs 100% within the interpreter. This means all byte packing (AMF is a binary format), reference checking (AMF supports object references), type checking is done relatively inefficiently (when compared to C). If you are using CPython then you be able to take advantage of CPyAMF which is a Cython package developed specifically to relieve the interpreter of these responsibilities.
CPyAMF was initially introduced in 0.4 and has been expanded in each new major release.
- Provided a C version of pyamf.util.BufferedByteStream which handles all the byte un/packing. It used the C equivalent of cStringIO.
- Reworked cpyamf.util.BufferedByteStream away from cStringIO - it was not fast enough for our needs. Reference checking is now also done in C.
- The final part - all type checking is now done in C. The Python interpreter is almost never (directly) invoked.
AMF allows a wide variety of object graphs to be serialised. Different types of object graphs will stress different areas of the library, for example an object graph that contains a lot of references will stress the reference checking code.
To that end a benchmarking tool called AMFBench has been developed that uses 4 different types of object graphs to stress all parts of the library (the details of which is probably best kept for another blog post - this one is long enough already!). The only important 'builder' to highlight is called 'complex' which attempts to mimic a real world scenario - that is a mix of scalar and complex types with a couple of references thrown in for good measure.
Using the 'complex' builder with a graph size of 10000, all using the same machine (Macbook Pro 2.8GHz Intel Core 2 Duo, 4GB, Mac OS X 10.6.4) running the same version of Python:
Python 2.5.4 (r254:67916, Feb 11 2010, 00:50:55) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information.
All timings are in seconds, limited to 6 decimal places. pure means using PyAMF in pure python mode (no C extensions enabled), ext means PyAMF is using its bundled C extensions.
0.6 is the best performing release thus far (even if you're not using the C extensions)! 0.4.2 performance was atrocious - if you're still using it this should be a compelling reason for you to upgrade!
Interestingly, the pure Python and C versions of 0.4.2 timings are almost the same, making cpyamf virtually useless for this version :D Since they both are using cStringIO under the hood, this is not suprising.
Whilst it makes sense that decoding would take longer than encoding, there seems to be a large discrepancy between the two and warrants further investigation
Some things to take away:
- You'll get better performance if you use AMF3. 0.6 now uses this version by default.
- Make sure you are using the C extensions! There is now around an order of magnitude difference between the two
I would like to take this opportunity to make a huge "thank you" to the developers of the Cython project, without which CPyAMF might never have existed. Being able to stand on the shoulders of giants has allowed me to concentrate on the issues at hand, not battling with the Python C-API, or handling reference counting correctly. It has been a lot of fun to work with.