School’s Out is a RoR app, running on a cluster of Mongrels. Every so often, during high traffic days, one of the Mongrel processes would go goofy and start chewing through memory and becoming unresponsive. I use the top command and sort processes by memory usage. The bad process will be at the top using over 20% of system memory. This only happens maybe once or twice a month, and I could never duplicate it on my development environment.
Mongrel has a neat feature built in where you can turn on debug mode by sending a USR1 signal to that process. In this case the only information I got from debug mode was that there were requests that were hung up somewhere in my Rails code. Not super useful, but it was a start. Now I needed to figure out where and why this hangup was happening.
Normally I would restart all the Mongrels to get the site running again. This time I just took the offending process out of the loadbalancer (I am using Apache httpd + mod_proxy_balancer)
Now the question is, how do I try and debug a running process?
I had no clue, but luckily Jamis Buck did.
I attached gdb to the bad mongrel and followed Jamis’ instructions to figure out where the process was stuck.
[root@www iwarshak]# gdb /opt/local/bin/ruby 9489 ... Attaching to program: /opt/local/bin/ruby, process 9489 ... (gdb) set $ary = (int)backtrace(-1) (gdb) set $count = *($ary+8) (gdb) set $index = 0 (gdb) while $index < $count >x/1s *((int)rb_ary_entry($ary, $index)+12) >set $index = $index + 1 >end 0x9653a50: "/opt/ruby/lib/ruby/gems/1.8/gems/postgres-pr-0.4.0/lib/buffer.rb:64:in `read'" (gdb)
Ok, it looks like it had something to do with the postgres-pr driver. I ran this several times and always got the same result. Just to compare, I did the same thing with the working Mongrels. I got something like this for all of the other ones.
The good Mongrels looked like this
[root@www iwarshak]# gdb /opt/local/bin/ruby 9498 ... Attaching to program: /opt/local/bin/ruby, process 9498 ... (gdb) set $ary = (int)backtrace(-1) (gdb) set $count = *($ary+8) (gdb) set $index = 0 (gdb) while $index < $count >x/1s *((int)rb_ary_entry($ary, $index)+12) >set $index = $index + 1 >end 0x9bc2768: "/opt/ruby/lib/ruby/gems/1.8/gems/mongrel-0.3.14/lib/mongrel/configurator.rb:274:in `sleep'" (gdb)
I am no Mongrel expert, but it the responsive Mongrels looked like they are sleeping, waiting for a request to come in.
After digging around, the only potential solution I found was to use the native postgres gem, instead of the pure Ruby postgres-pr driver. So that’s what I did.
[root@www iwarshak]# gem uninstall postgres [root@www iwarshak]# gem uninstall postgres-pr
I am hoping that this solves the problem.