Mina and Twisted Matrix benchmarks
2007-08-08 17:03:02 by Fabio FornoPlease read also the follow up in order to see the updated results with a better setup of Mina.
Recently Alessandro, one of my business partners, suggested me an interesting Java project for massive scalability in respect to the number of network connections: mina.
As you may imagine this is one of the crucial aspects about real time communication and presence based services: XMPP, and any similar protocol, requires a large number of simultaneously long living connections, with very little traffic in average. Most of network applications, instead, have been built on the thread pool concept, that's to say, a limited number of threads serving short living connections (the HTTP model).
With IM and, more recently, with Comet, we had to radically change the approach, and most of the application containers based on the thread pool couldn't deal with it. Thus the importance of frameworks like Mina, able of exploiting java.nio for handling thousands of connections within a single thread.
So we decided to make some benchmarks of it compared to Twisted Matrix , the framework we are currently using. The test is very simple, and we have written:
- a server accepting any number of connections and logging on the console any message received and the latency between two consecutive messages
- a client opening n connections, which picks up a random connection each 5ms and sending short message with the identification of the client
With this setup we're pretty sure that any additional CPU time depends on the number of connections and not on the computational load of the applications (we just send a message each 5ms)
The test machine has an Intel Centrino at 1.8 GHz, 1.2GB of RAM and Ubuntu Feisty installed, with the kernel 2.6.20-16-386 installed
Java version: Java HotSpot(TM) Client VM (build 1.5.0_11-b03, mixed mode, sharing)
Twisted setup: Python 2.5 and Twisted 2.5 using the epoll reactor.
And now the results for Mina:
#connections | Server Load | Client Load | Latency |
| 100 | <1% | <1% | 6-8ms |
| 1000 | <1% | <1% | 6-8ms |
| 2500 | 32% | 7% | 6-10ms |
| 5000 | 60% | 25% | 5-15ms |
| 10000 | 55% | 35% | 5-20ms |
| 15000 | 90% | 5% | 8-15ms |
and for Twisted:
#connections | Server Load | Client Load | Latency |
| 100 | <1% | <1% | 0.1-3ms |
| 1000 | <1% | <1% | 0.1-3ms |
| 10000 | <1% | <1% | 0.1-3ms |
| 15000 | <1% | <1% | 0.1-4ms |
The figures of Mina are not bad, but the figures of Twisted are really impressive: absolutely no impact in respect to the number of connections! I think that the main reason is the adoption of epoll in Twisted. I don't know which implementation of select java.nio uses on Linux, but I think it's not epoll, in fact the 90% of the time spent by the Mina client and server was system time.
Update: I've added the code used in the tests in order to let you try and make suggestion for improving them:
Just discovered that Java now support epoll, so could Mina...
epoll shouldn't matter with 100 connections, yet in that case Twisted is still much faster as far as latency goes (6-8ms vs. 0.1-3ms)...
Sure, Mina pays some of the inefficiencies of the JVM, which is much heavier than Python. However the business logic implemented in Java could be much more efficient than Python. For example for XMPP (that's the reason we are doing these tests) we need also an XML parser and I can bet that any Java parser outperforms the parser in Twisted. Finally there's the problem of the global interpreter lock that blocks Python from taking adavantage from multicore architectures...
Java certainly is faster than Python; I'm surprised Mina code is slower. I'd expect java.nio code to be just as fast as Twisted reactor using same polling method (poll(), probably.) As far as XML parsers, most Python XML parsers are implemented in C; I forget if the one in Twisted's Jabber support is, but if not, it ought to (and coding that may be easier than switching to Java.) I have suggested to the maintainer that the code be switched to use lxml, which is built on the very fast libxml2 library (and would thus also have a fast XPath implementation, in addition to a fast parser.) As far as scaling to multiple processors, Twisted approach is typically to do so by supporting distributing to multiple processes. This allows scaling across multiple machines. However, you can still use threads in Python and release the global interpreter lock when using C code. In a Jabber server the expensive C code would be XML parsing, so that could be passed to threadpool and take advantage of multicore machines.
In fact the idea I have of an highly scalable XMPP server is based on small cooperating processes ;) The only centralized part is a lookup service telling the other processes who is handling a particular session. This task is very light so it can easily scale, and with some caching and some heuristics (e.g. migrate the sessions of the users sharing the same roster into the same session manager) such a server could handle any number of clients.
The XML parser of Twisted is based on expat, which is very fast, but the partial DOM of stanzas is built with Python which slows down everything. I don't remember the exact figures, but once I did some testing and you can parse just few hundreds of messages per second per processor, which is at least one order of magnitued slower than Java or C based parsers. However there is one feature of the present parser that I would like to keep: the simplified DOM interface being used is much more effective than DOM. Accessing child elements using the dot notation or attributes as dictionary items is so handy that I wouldn't step back.
Re-run with jdk 6, it uses epoll by default if you have a 2.6 kernel, otherwise it uses poll: http://java.sun.com/javase/6/docs/technotes/guides/io/enhancements.html
Thanks, just posted a follow up
You should be using time.clock() to perform Python benchmarks, for more accuracy.