Wednesday, October 27, 2010

Profiling is fun - or - getting munin-cgi-graph fast

I have worked with operations and infrastructure in and around the field of Linux for ages, and in such I job I've not gotten to program a lot. But some. Todays post may be trivial for those that program more than me, but I got a rush out of it.

I've been using some spare time to help out with the munin 2.0 alpha. So I have been profiling it to optimize performance. The original objective was to find out what munin-cgi-graph was using more than a second for when making a single graph with rrdgraph. Especially since it appeared to take significantly under a second to run the rrdgraph command.

I did the work on a site with about 150 hosts and some thousands of "services" (fewer actual plugins since there were quite a few multigraph plugins). This makes a lot of stuff heavier, the nested $config hash with all configuration in (250K lines) takes 10-20 seconds to read from a file on disk.

There are many things in Munin that can be optimized but for munin-cgi-graph the very worst single thing was unneeded use of regular expressions. Regular expressions are very very handy, but in many many cases not needed. String and sub-string matching is a lot faster than regular expressions - but more tedious to type. In the case of munin_find_field regular expressions were definitely not needed in any way. All calls to the procedure showed simple full string matches. I found a few other places with un-needed regular expressions and made some simple replacements like this:

-            next if $key =~ /^#%#/;
+ next if substr($key,0,3) eq '#%#';

This went a very long way.

But the main thing I realized was that munin_find_field was being used way too much, and after restructuring how the work given it was looped over it all went a lot faster. In the end graphing time went from over 1 second to under 0.1 second pr. single graph. That was smooth enough to look at loading in firefox that any remaining regrets I may have had about removing munin-graph from trunk some weeks ago evaporated. It's fast enough.

We're looking to put the same optimizations in Munin 1.4, tonite a guy here is going to profile a full run of munin-graph to see how it works out and what new sore thumbs might stick out. The full run he's doing normally takes 40 minutes(!) without profiling. -- Yes, I know, 40 minutes is too long, and this is why we're switching to cgi graphing. We just have to make sure the cgi graphing is fast!

For profiling I can recommend Devel::NYTprof. It was very easy to use and made it easy to see our hotspot(s).

1 comment:

Matthew said...

Having just setup munin 1.4 with munin-fastcgi-graph I'm looking forward to your improvements!