If you haven't heard by now, TorqueBox 2.x is powered by JBoss AS7 which claims to be blazingly fast and lightweight. So, naturally, we want to put those claims to the test and see how TorqueBox 2.x stacks up against the competition.
- TorqueBox on JRuby 1.6.4
- Trinidad 1.2.3 on JRuby 1.6.4
- Passenger 3.0.9 standalone on REE and Ruby 1.9.2
- Unicorn 4.1.1 on REE and Ruby 1.9.2
Even if you're not a fan of JRuby, stick around to see how Ruby 1.9.2 compares to REE. From round 2 we know REE outperforms Ruby 1.8.7 but how does it compare to 1.9.2?
Spree is a well-known Rails 3 application that can run under Ruby 1.8, 1.9, and JRuby. Based on feedback from our Redmine benchmarks, we wanted to make sure the next application could run under Ruby 1.9 for an accurate comparison of JRuby vs C Ruby performance.
Spree is nice enough to ship with a set of sample data that we used for benchmarking. The benchmark script simulates users browsing around a few pages of the site, starting with a small number of concurrent users and gradually increasing until it finishes after 80 minutes.
More details about the benchmark and links to the raw results are at the bottom of the post.
Ignoring the latency graph for a minute, it's obvious that the runtime (JRuby vs Ruby 1.9.2) is the differentiator in throughput, CPU usage, and free memory. TorqueBox and Trinidad have no appreciable difference in these categories but both clearly outperform Passenger and Unicorn. If you're concerned with maximizing throughput, minimizing CPU usage, or minimizing memory usage under load then you can't go wrong choosing either JRuby server.
However, what about the latency graph? This is a graph of the average time taken for each request - in other words the average time a user would have to wait for a page on the site to load. This is where the difference in web servers, not runtimes, is readily apparent.
At peak load, TorqueBox has a lower latency than the nearest competitor, Passenger, by a factor of 8 and beats out Trinidad by a factor of 32. Note that the latency graph's y-axis has a logarithmic scale. To help illustrate this point, here's the same latency graph with a linear y-axis and Unicorn removed because its latency is so bad at the end of the test.
So, in a common real-world scenario, let's assume our application has a requirement that it must have an average response time of 1 second. How many requests per second can each server handle while staying under this 1 second mark? Looking at the latency and throughput graphs, we see that Trinidad can handle 45 requests per second, Passenger 90 requests per second, Unicorn 100 requests per second, and TorqueBox 130 requests per second. At peak load of 130 requests per seconds the average response time from TorqueBox is only 256ms, well under our 1 second requirement.
If you were still skeptical about the performance benefits of switching to JRuby, the above graphs should be convincing enough to give it a shot.
TorqueBox 2.x vs TorqueBox 1.1.1
We've seen how TorqueBox 2.x stacks up against the competition, but how does it compare to the latest 1.x stable release, TorqueBox 1.1.1? Thanks in a large part to AS7, TorqueBox 2.x has lower latency, higher peak throughput, less CPU usage, and less memory usage than TorqueBox 1.1.1.
REE vs Ruby 1.9.2
Ruby 1.9.2 gives Passenger and Unicorn lower latency, higher throughput, lower CPU usage, and more free memory than REE. From a performance standpoint there's no reason why you shouldn't be using 1.9.2 if you must use a C Ruby.
All benchmarks were run on Amazon EC2 using an m1.large Tsung client instance, a c1.xlarge server instance, and a db.m1.large MySQL database instance. All instances were started in the same availabilty zone and every benchmark started from a clean database loaded with Spree's sample data. Each benchmark run was performed twice on separate days and the best of the two runs used for the graphs.
TorqueBox and Trinidad were set to use a 2GB heap and a maximum of 100 HTTP threads to match the database connection pool size. Unicorn and Passenger were both started with 50 workers. From testing, 50 was the sweet spot to get maximum throughput and anything over just increased CPU usage and memory usage without any further increase in throughput.
Tsung Reports and Raw Results
If you'd prefer to take the raw data and analyze it yourself, the Tsung-generated reports and raw Tsung results are available for each server below.
Passenger w/ REE
Passenger w/ 1.9.2
Unicorn w/ REE
Unicorn w/ 1.9.2
Questions? Comments? Leave us a comment on this post or get in touch via mailing lists, IRC, or twitter.