Andy Hulstkamp

about creating online experiences

12. April 2011

Using Google App Engine

I had a look at Google App engine and some of the new features in Spring 3 (MVC).

I decided to create a platform for city ratings and –rankings. I polished the whole thing a bit in the hope to attract some visitors. I wonder if I can get some peaks in traffic to see how this service will perform in GAE under load outside of a stress test.

To see what I’m talking aoput, a first version is up and running at CityClash. To read about my experience with Google App Engine read on after the screenshot.

CityClash

Building CityClash.org involved a couple of interesting aspects towards Google App Engine, the datastore and Spring MVC.

  • The usual CRUD ops
  • Transactions
  • Joins & aggregation
  • Long running processes
  • REST-style with JSON response
  • External API calls
  • Developing for scalability

I used the SpringSource Tool suite with the GAE-Plugin for development. Setting up Spring with GAE was straightforward and worked as expected. Add the required jars and configure the spring project as you would normally do. Note, I’m referring to the MVC part of Spring and can’t speak for the other parts right now.

Spring 3 MVC

The REST-features and Ajax-simplifications in Spring MVC worked in GAE just fine. This stuff is really well done in Spring and a joy to work with.

Task queues in GAE

Task queues in Google App Engine are a great way to speed up the user experience, by simply doing expensive work later and not during a user’s request. If some calculations need not to be done immediately task queues are worth considering.

When a city rating in CityClash comes in, I defer calculations of some statistical figures to a task queue. The user does not need to know about these figures immediately, it’s good enough when they get updated a couple of minutes later and the user gets a rather quick response.

Task queues are easy to set up. In a couple of lines you get a Queue from the Factory, create, configure the task and add it to the queue. The Queue will later invoke a HTTP-Request based on the Task. The actual work is then done in the handler that serves the request.

URL Fetch calling external apis

For Internationalization purposes I need to translate the city names to the local language. In CityClash.org I currently support German and English. Via Geocoding I get the English name but I need the German name as well. For Berne, Switzerland I need Bern, Schweiz too.

In Google Maps I could invoke another request asking for the German name, but I was a bit worried to get over the quota or get blocked for invoking too many requests at a rate too high. For this reason and for the fun of it, I decided to use the Google Translate API to get the localized name.

If CityClash gets a rating from a city that had not been rated before, the Google Translate API gets invoked and fetches the German name once. This is simple and straightforward. Use java.net.URL to open a connection and parse the stream. To not block the user’s request I put this step in a task queue as well.

Memcache

CC uses the cache as much as possible. I’m using the low level API. There’s not much to say about this. Put and get Serializable objects, use expiration and be sure to have a fallback if values in the cache get lost: it’s a cache not a permanent store.

Datastore

If you come from an SQL-background, then the datastore is probably the most difficult aspect to get used to. It took me a while to think out of the box and accept the limitations and embrace the blessings.

Here are a few key things to note from my newbie point of view and as of now:

  • Loose the SQL-mindset. Forget about complex joins, nested queries and aggregation.
  • What you get is sort of a distributed Hashtable, optimized for reads. The datastore is organized in a fashion, that lets it efficiently scan and read data. Stuff like filters (<,<=,= etc.), order by, ancestors or keys can be applied to the queries. But there are limitations of how you can combine the filters, orders etc. too.
  • As to expect, reads are fast, especially when using keys.
  • Don’t spoil this by trying to work around the limitations regarding joins and aggregation. Doings loads of calculations in memory to mimic such behavior won’t scale well and will cost you cycles.
  • Instead store the data in a fashion that embraces the usage pattern.
  • Try to do calculations once, when writing. Prepare the data for simple, efficient reads.
  • If writing takes too long try to delegate some stuff to task queues and do the hard work later.
  • Accept and embrace denormalization.
  • Use task queues and cron jobs where possible to do long running processes in the background. In CC lots of aggregation is done in the background.
  • Writes are expensive, since they may need reorganization. Be aware of contention.
  • If writes have the potential to cause contention, consider sharding or use smaller entity groups.
  • Transactions are there but only spanning over the same entity group . I tried to stay away of transactions alltogether – and denormalize.
  • I stayed away from JDO and JPA because I could. Where’s the point of having another abstraction layer over a HashMap? I used the low level API in conjunction with objectify, which itself makes for a very good documentation on the datastore.
  • There’s no full text search. The most you will get is a filter-order combination to mimick a sclaed-down LIKE on crutches.

Again, these are not necessarily best practices; they just reflect my take on the datastore of the appengine right now.

Mail

Sending mail is a no brainer and works straight out of the box. The only restriction is that the sender address of a message must be the email address of an administrator for the application.

Development Server

I have yet to discover any inconsistencies between the development server and the appengine in terms of behaviour. The test skeleton is pretty good too. Flex

I didn’t bother to use the Spring Blaze integration. For the couple of requests and the few data I simply used the HTTPService. The new REST-full features paired with the JSON response made it easy to integrate Flex and Spring. Same applies to the Ajax-Calls through JQuery.

JSTL

My experience was, that there are some minor issues with JSTL inside GAE. For example, I could not get hold of an object-reference via the var attribute inside the forEach-Tag. Possibly my fault and I didn’t bother too much since the index-based approach using the varStatus-Attribute works fine. Further the formatNumber-Tag didn’t work since its implementation apparently needs a reference to the session. I did not want to turn on sessions (backed by memcache and the datastore) just for this, so I formatted the numbers in advance to be consumed by the page building.

Conclusion

So far I’m pretty impressed by GAE. It has restrictions and it took me some time to get used to the datastore first. But once you get the hang of it, things work out pretty straightforward.

And you get the advantages

  • Scalability (can’t really judge on this right now),
  • Pay for what you use
  • Easy App versioning
  • No Server-Management and maintenance.

While I can’t really judge on the first as of now, the last point especially, for me, is a real blessing.

After all, it depends whether your app can live within the constraints of GAE (mainly the datastore). I’d say Google App Engine is not the right tool for complex business applications as of now, but is a great environment for simple to moderately complex web-apps. For independent developers or startups the PaaS-approach might very well be the right thing.

Others don’t agree and I might change my opinion in 3 month or so after having seen CC run inside GAE for a while, but for now my first experience with google app engine was pretty pleasant.

further reading

Letterpress Text Paintings

Painting Text onto a HTML5-Canvas. For cocomoshi.

Ruler Spark Component using dynamic skin parts and custom GraphicElement

Spark Component using a GraphicElement for a more lightweight approach