Mina and Twisted Matrix benchmarks
2007-08-08 17:03:02 by Fabio FornoPlease read also the follow up in order to see the updated results with a better setup of Mina.
Recently Alessandro, one of my business partners, suggested me an interesting Java project for massive scalability in respect to the number of network connections: mina.
As you may imagine this is one of the crucial aspects about real time communication and presence based services: XMPP, and any similar protocol, requires a large number of simultaneously long living connections, with very little traffic in average. Most of network applications, instead, have been built on the thread pool concept, that's to say, a limited number of threads serving short living connections (the HTTP model).
With IM and, more recently, with Comet, we had to radically change the approach, and most of the application containers based on the thread pool couldn't deal with it. Thus the importance of frameworks like Mina, able of exploiting java.nio for handling thousands of connections within a single thread.
So we decided to make some benchmarks of it compared to Twisted Matrix , the framework we are currently using. The test is very simple, and we have written:
- a server accepting any number of connections and logging on the console any message received and the latency between two consecutive messages
- a client opening n connections, which picks up a random connection each 5ms and sending short message with the identification of the client
With this setup we're pretty sure that any additional CPU time depends on the number of connections and not on the computational load of the applications (we just send a message each 5ms)
The test machine has an Intel Centrino at 1.8 GHz, 1.2GB of RAM and Ubuntu Feisty installed, with the kernel 2.6.20-16-386 installed
Java version: Java HotSpot(TM) Client VM (build 1.5.0_11-b03, mixed mode, sharing)
Twisted setup: Python 2.5 and Twisted 2.5 using the epoll reactor.
And now the results for Mina:
#connections | Server Load | Client Load | Latency |
| 100 | <1% | <1% | 6-8ms |
| 1000 | <1% | <1% | 6-8ms |
| 2500 | 32% | 7% | 6-10ms |
| 5000 | 60% | 25% | 5-15ms |
| 10000 | 55% | 35% | 5-20ms |
| 15000 | 90% | 5% | 8-15ms |
and for Twisted:
#connections | Server Load | Client Load | Latency |
| 100 | <1% | <1% | 0.1-3ms |
| 1000 | <1% | <1% | 0.1-3ms |
| 10000 | <1% | <1% | 0.1-3ms |
| 15000 | <1% | <1% | 0.1-4ms |
The figures of Mina are not bad, but the figures of Twisted are really impressive: absolutely no impact in respect to the number of connections! I think that the main reason is the adoption of epoll in Twisted. I don't know which implementation of select java.nio uses on Linux, but I think it's not epoll, in fact the 90% of the time spent by the Mina client and server was system time.
Update: I've added the code used in the tests in order to let you try and make suggestion for improving them:
Just discovered that Java now support epoll, so could Mina...
epoll shouldn't matter with 100 connections, yet in that case Twisted is still much faster as far as latency goes (6-8ms vs. 0.1-3ms)...
Sure, Mina pays some of the inefficiencies of the JVM, which is much heavier than Python. However the business logic implemented in Java could be much more efficient than Python. For example for XMPP (that's the reason we are doing these tests) we need also an XML parser and I can bet that any Java parser outperforms the parser in Twisted. Finally there's the problem of the global interpreter lock that blocks Python from taking adavantage from multicore architectures...