Tuesday, April 23, 2013

NDB Caching Queries Tips & Best Practice - Google App Engine

Update: Since keys only queries are now free, I would prefer to just cache the queries with only resulting to keys_only=True then retrieving the cached values of it with ndb.get_multi(keys).

If you are creating a heavy read app engine app, that has a lot of listing/query entities it's a good idea to cache those queries so you don't get charged for reads. But you want it to also be up to date and not have to worry about invalidations.

Here is some of the things I've done for caching queries. This can't be applied to all but should work on most and can be implemented on same manner with more complex queries.

The idea is to have an updated field on the fields you are filtering from so you can use that as your cache key.

Here is a sample code that that shows how to display user post with cached queries.


from google.appengine.ext import ndb

class User(ndb.Model):
    created = ndb.DateTimeProperty(auto_now_add=True, indexed=False)
    updated = ndb.DateTimeProperty(auto_now=True, indexed=False)

    email = ndb.StringProperty()
    # It's always good to keep a total of everything if you are displaying it
    total_comments = ndb.IntegerProperty(default=0, indexed=False)


class Comment(ndb.Model):
    created = ndb.DateTimeProperty(auto_now_add=True, indexed=False)
    updated = ndb.DateTimeProperty(auto_now=True, indexed=False)

    user = ndb.KeyProperty(required=True)
    message = ndb.TextProperty()

    @classmethod
    @ndb.transactional(xg=True)
    def post_comment(cls, user, message):        
        user.total_comments += 1
        comment = Comment(user=user.key, message=message)
        ndb.put_multi([user, comment])

    @classmethod
    def get_by_user(cls, user, cursor=None):
        ctx = ndb.get_context()
        # every new comment you add a total and updated field so the cache invalidates instantly
        cache_id = 'get_by_user_%s_%s_%s' % (user.key.urlsafe(), user.updated, cursor)
        cache = ctx.memcache_get(cache_id).get_result()

        if cache:
            result, cursor, more = cache
            # This is your decision if you want to cache keys only
            # it's helpful in cases that you have a single page with that value
            # it means that you cache less and more efficiently
            result = filter(None, ndb.get_multi([r for r in result]))
        else:
            qry = cls.query(cls.user == user.key)

            result, cursor, more = qry.fetch_page(20, start_cursor=ndb.Cursor(urlsafe=cursor) if cursor else None)
            # cache keys only again your decision, you can cache the whole thing if it's not important
            # expiration is not needed if it's this simple
            ctx.memcache_set(cache_id, ([r.key for r in result], cursor, more))

        return result, cursor, more