About This Blog: Memcached 28 comments

posted Friday, December 14, 2007 by topfunky

An interview at RubyInside recently suggested that Rails developers who want to be hired should maintain a Rails-related blog.

I’ll add to this and say that every beginning Rails developer should write their own blog software. It’s a great learning experience and you can try things that aren’t possible with just an app running on localhost. It’s also a great environment for learning without the pressure of a mission-critical app. When you’re working for a client and deploying an important application, you’ll have made all the beginner mistakes on your own time (hopefully).

This blog started on shared hosting with Typo software. I later switched to a VPS with 128 MB RAM and currently run on a RailsMachine VPS with 512 MB RAM.

In the process, I’ve learned a lot about

  • Running Rails on a shared host (don’t do it!)
  • Deployment and automation with Capistrano (do it!)
  • Unix server administration
  • Process monitoring and uptime
  • Rails development
  • Merb development
  • Application design and asynchronous processes
  • Optimizing for speed
  • Reporting and log parsing

If you need a blog for your business or for a client, I would definitely check out Mephisto, Radiant, or Simplelog. But if you’re thinking about starting a blog for yourself, you should write your own from scratch. It only requires 2 controllers (Articles and Comments) and apparently some guy even filmed a screencast showing you how to start one in only 15 minutes.

Every developer’s favorite topic: Performance

A Wordpress lecture mentioned the fact that a standard Wordpress install on a given server only achieves about 8 requests per second. The way they get it up to 300 req/sec is not by profiling loops and optimizing method calls, but by caching as much as possible.

The biggest speed benefit you can get is from page caching. However, that entails writing a bunch of code to expire the cached pages when articles are edited, comments are made, or templates are changed.

Expiration by Key Name with Memcached

Recently, I decided to try out some caching strategies I had read about1. The idea is to use memcached to store objects with keys that will automatically expire when the item changes.

For example, this page you are reading needs to expire if the article is edited or if new non-SPAM comments are posted.

This page has several sections that could cause the cache to become stale.
Figure A This page has several sections that could cause the cache to become stale.

So I’m caching the entire rendered page in memcached, with the article’s ID, timestamp, and number of comments in the key. In memory, that’s something like

"Articles:show:11097:1197661574:11"
The memcached key for this page includes several values corresponding to items which will invalidate the cache.
Figure B The memcached key for this page includes several values corresponding to items which will invalidate the cache.

This way, there’s no need to explicitly delete cached items. The application will ask for a new page when the data changes, and if memcached is full it will clear out the older, unused items.

Adding a comment causes the a new key to be calculated, which prompts a new page to be generated.
Figure C Adding a comment causes the a new key to be calculated, which prompts a new page to be generated.

Implementation

NOTE: You’ll need to have a memcached server running. I wrote a previous article about how to do that.

I’m using this technique both on my blog and at PeepCode.

First, I’ve put most of the key logic into the Article model.

class Article < ActiveRecord::Base
  def cache_key
    "#{self.class.name}:#{id}:#{updated_at.to_i}:#{ham_comments.length}" 
  end

For this blog (using Merb), the controller is pretty simple since render returns the HTML that was generated.

# Articles controller
def show
  @article = Article.find params[:id]
  Cache.get("Articles:show:#{@article.cache_key}", 1.hour) do
    render
  end
end

I’ve also wrapped it in some extra logic that doesn’t use the cached version if there is a flash message to display.

For PeepCode (using Rails), I wrote a render_cached method in the ApplicationController that handles more of this automatically. It takes a key, an expiration time, and a Hash of options that will be sent to the Rails render method.

Here’s how it’s called:

render_cached(@article.cache_key, 1.hour, :action => "show")

The render_cached method does several things:

  • It builds a key that includes the controller name, action name, and format being rendered. This ensures that HTML is cached separately from XML.
  • It uses the render_to_string method to generate the template as a string, or render :text to send back the cached version.
  • If checks a should_cache? method to see if caching should happen. This is different from Rails’ internal development/production caching. Instead, it’s a controller-specific method that can turn off caching in some circumstances. By default, I use no caching if someone is logged in or if there is a flash message to display.
def render_cached(key, expiration, render_options)
  return unless perform_caching
  if should_cache?
    combined_key = [
      'controller',
      controller_name,
      action_name,
      params[:format] || 'html',
      key
    ].join(':')
    output = Cache.get(combined_key, expiration) do
      render_to_string render_options
    end
    render :text => output
  else
    render render_options
  end
end

The should_cache? method looks like this:

def should_cache?
  (current_user.nil? && current_order.nil? && flash[:notice].blank?)
end

Summary

Performance chart of an application using this technique (values are in requests per second).
Figure D Performance chart of an application using this technique (values are in requests per second).

Overall, this has worked quite well. The most frequently accessed pages turn out to be pretty responsive. The cached versions perform much faster and there’s not a single line of cache-expiring code.

Raw performance numbers from pl_analyze (requests per second, controller and action names omitted):

And a word from our sponsor…

I’ve got several authors working on some great PeepCode PDF books. They will be published in the next few months.

Currently, many people have found Ryan Daigle’s Rails2 PDF to be useful for getting up to speed with Rails 2.0.1. It’s even available in Español and will soon be available in a few other languages, too.

I refilmed the Capistrano 2 screencast from scratch. It’s mostly the same content as the first Capistrano screencast, but is up to date for the new namespaces, callbacks, and other features of Capistrano 2.

The Rails from Scratch series has been updated with code and notes about Rails 2.0.1 compatibility. It’s a free update if you purchased it in the past.

Finally, I published a screencast on Git a while ago. I’m using Git wherever possible and really love the speed, easy branching, and flexibility in areas that were frustrating in Subversion.

Resources

1 Articles about auto-expiring memcached keys:

28 comments

Leave a response

  • Where have you been? We missed your articles!

    It is funny you mention all ruby devs should build their own blog software. All my talks and videos that I have planned in 2008 revolve around this blogging platform I’m writing.

  • What about releasing a version of your blog engine? That would be great for those who want to increase their merb_fu

  • topfunky

    @bryanl I’m back! It’s tough to work and blog at the same time! ;-)

    @Matt: Merb is still under heavy development. I hear that things should settle down a bit after the 0.5 release and I’ll be glad to make the source to this blog available.

  • topfunky

    Experimental Rails plugin with the render_cached method is here:

    http://topfunky.net/svn/plugins/autoexpire/

  • Just to make sure I understand what’s going on here, the show method of your action controller always hits the database every request because of the Article.find, right? I assume that’s not a big performance issue? I’ve always thought the goal of caching was to avoid hitting the DB altogether, but I guess your strategy here is that just the one lookup by primary key isn’t that costly, so it’s not worth the added complexity of writing cache sweepers to totally avoid hitting the DB at all.

  • topfunky

    @Paul: Exactly. There is a database hit every time, but it’s pretty cheap compared to the cost of rendering the entire HTML page.

  • ellis

    Paul, I think thats what this plugin is for http://blog.evanweaver.com/articles/2007/12/13/better-rails-caching/

  • This usually works quite well to get a base cache version for any model ..

    class ActiveRecord::Base

    def cache_version
      attributes.has_key?('updated_at') ? updated_at.to_i : attributes.values.join(":")
    end

    end

  • topfunky

    @ellis: Interesting. I’m looking at it now and it seems like it’s a different strategy than what I’m talking about here. I’ll try it out.

  • Great post as usual, Geoff. Having written several blog engines myself (and a habitual blog engine tourist), I agree it’s a great exercise; right along with writing a chat server… the standard pre-canned code katas sometimes yield some of the most interesting approaches.

  • @Paul: Take a look my blog post that describes an approach that uses action caching and caches the database check.

  • I just had time to get my mephisto fixed recently.
    Over time I have an unknown fear with capistrano (don’t know why, probably lacking of knowledge on rake and sever admin), but I will put hard work on it over holidays. Great article Geoffrey, thanks! and Merry Christmas to everyone!

  • That’s a very interesting idea to make the cache key based on the attributes of the data you’re caching. It does make expiration unnecessary, I never thought about it that way! So simple.

  • If a user is logged in but the “main” content is the same (i.e. the user’s name is just displayed in the sidebar), is there a significant difference in performance when you cache the “main” page?

    You can use render :text => output, :layout => ‘application’. The view still needs to be converted to html so there might not be any gain in performance. Let’s say there are about 5 DB queries (a few joins, not really complicated) in the action.

  • I was considering spending some time with memcached a while back. But then I checked out a presentation by Chris Wranst that Err guy where he said that memcached would actually slow down small sites instead of speeding them up. After reading your post, I’m assuming that Chris’ statement is only intended in the context of using memcached to reduce database load (not as a full page cacher)?

  • topfunky

    @meekish: Ok, this is a great point. For sessions, memcached may not be faster than DB-backed sessions if your database server is on the same machine as Mongrel. As I understand it, this is because your app has to make a separate connection to memcached, so why not just use the existing connection to the database and store the content there?

    But it can definitely help if it’s replacing more resource-intensive operations. For example, I have a reporting page that shows a table and graphic representing the same data. I store the query in memcached so it doesn’t need to make the same SQL call twice for that information.

    Benchmarking is key. For any change, I run httperf against the site with and without memcached. If the memcached version isn’t faster, then it’s not worth using it.

  • “Running Rails on a shared host (don’t do it!)” I have to seriously disagree here. I too had really bad experience with textdrive, but now that we have switched to Byte everything has been working perfectly for almost a full year. See here for more information: http://www.movesonrails.com/articles/2007/04/14/byte-for-all-your-rails-hosting

  • Blogs can be written very easily. I recently wrote a very minimal ruby based blog

  • Wow, what a great post. I’ve been really digging into cache stuff lately. I like the tips with auto expiring cache. Thanks so much.

  • Interestig idea. It s a great thread

  • This made me think of the way that git identifies it’s commits with md5 hashes (or the like). Maybe something like that would be possible to do?

    Either way, this is awesome and I will be using it soon.

  • topfunky

    @Nathan: The only problem with that would be the CPU power needed to calculate an md5 hash. Using a straight ID or updated_at timestamp is much faster.

  • When using the keys on several objects shown on the page, one has to watch out for the combined key length. After 256 characters Memcache gets a buffer overflow and must be restarted. I use the method above, but for safety I make an md5 sum if the key is too long.

  • Speaking of Typo, did you try the last 5.0 release ?

  • yep, i’ve tried the new 5.12 release and it’s killed my blog; i can’t add another post without it keeling over.

    plus it seems to blossom on bootup from 50mb used to 120+, not a good thing.

    i’m going to write my own, call it Matilda and donate any proceeds or thankful donations to Heath Ledger’s charity.

    think it a fitting tribute for one of my, now passed, idols.

    all the best, love the site Geoff ;-)

  • Hey, totally resurrecting the post… where’s the blog source code now that Merb reached 1.0? ;)

  • Geoffrey Grosenbach

    I upgraded to 1.0 but am not ready to release the code. I’ll probably be rewriting it back in Rails since Merb is being merged into Rails for the next major releease.

    There are several other Merb blogging engines you could check out such as Feather:

    http://github.com/mleung/feather/wikis/getting-started

  • Ok, no problem. Thank you! :)

Your Comment

Nuby on Rails

Geoffrey Grosenbach / Ruby / Code / Graphics / Design / Rails / Merb / Javascript / CSS

Manufactured with

Subscribe

Subscribe (RSS)