Pencarian

Rss Posts

 

 

 

Rails on PostgreSQL: PGCon 2010 talk on Rails and PostgreSQL

Aug 03, 2010

A while back I posted a link to a talk by Gleb Arshinov that he gave at the SF PUG. This talk was on “PostgreSQL for high performance Rails apps”, and was full of fine suggestions from their experiences with their Rails apps.

Gleb is back again, this time on May 21 2010 at PGCon where he and Alexander Dymo talked about PostgreSQL as a secret weapon for Rails apps. Some of the same ground is covered (use SQL DDL vs ActiveRecord create_table, etc), but there’s lots of new information too. Here are some notes:

  • 1:10 They’re using PostgreSQL 8.4, nginx, and mongrel
  • 4:00-6:00 Talks about dropping down into SQL via ActiveRecord
  • 6:30 Use include to eliminate N+1 queries.
  • 7:30 Watch for things like acts_as_tree that reintroduce lots of queries in exchange for the improvement in abstraction.
  • 9:00 One query, 12 joins – complicated, but query time goes from 8 seconds to 60 ms.
  • 14:00-17:00 A technique for recording SQL queries; this helps ensure you’re not running unexpected queries
  • 19:00 Suggests use straight SQL for DDL rather than the ActiveRecord DSL
  • 20:00 Use constraints, FKs, etc to preserve data integrity – “anything you don’t have a constraint on will get corrupted”
  • 23:00 Don’t use CASCADE since app won’t know about the deletions
  • 28:00 Keep a log of times for the most frequent user requests. Alex suggests using integration tests for this; code is at 29:10 and 29:30.
  • 32:30 A technique for loading data with ActiveRecord’s select option with PostgreSQL arrays to save on object creation. Questions from the audience about normalization vs efficiency.
  • 38:50 Role/user/privilege checking can be slow; shows a technique for using PostgreSQL’s bool_or and GROUP BY to get the data in one fell swoop. Query time went from 2+ seconds to 64 ms.
  • 42:00 Do analytics in the database. Saw speed improve from 90s to 5s and saved tons of RAM.
  • 44:40 Some excellent new PostgreSQL features that are either here now or are on the way (replication, windowing functions)
  • 46:30 Demonstrates a problem with PostgreSQL’s LIMIT and OFFSET when used with subselects. Some discussion of pagination with the audience. Here’s an excellent discussion of pagination alternatives written by Justin French.
  • 50:30 How to force PostgreSQL to use a subselect vs a join; the example goes from 605ms to 325 ms.
  • 52:20 Be careful with generate_series. Apparently these functions cannot generate hints for the planner.
  • 55:30 General props to PostgreSQL community.
  • 59:40 Need to test queries both in cold state and hot state; they saw 14x speed difference.
  • 1:01:40 Tune PostgreSQL – shared_buffers, work_mem, autovacuum, etc. Rely on community knowledge for initial configuration.

Lots of good stuff there, enjoy!

Rails on PostgreSQL: Pivotal Labs Talk – Scaling a Rails App with Postgres

Jul 24, 2010

I’m slowly catching up with my podcast backlog and came across a Pivotal Labs talk from May 2009. In this talk Josh Susser and Damon McCormick are presenting on Scaling a Rails App with Postgres . It’s a little dated now – this talk was given was when PostgreSQL 8.4 was in beta – but, still, lots of good stuff. Here are some notes:

  • They started with an existing Rails app with lots of data, so they had some constraints – not greenfield development.
  • Around the 5-6 minute mark there’s a good discussion of PostgreSQL’s query optimizer and how it analyzes a table’s data distribution. One takeaway (mentioned around 16:20) is to run vacuum more often on a particular table if there are a lot of writes.
  • 10:00 How to set STATISTICS for a particular table.
  • 11:00 Using partial indexes.
  • 14:00 Indexing on expressions.
  • 18:10-23:00 A nice discussion of the EXPLAIN output.
  • 23:45 Here they talk about wide columns. I’ve seen this in MySQL as well, where splitting text data out into a separate table yielded some good speedups.
  • 26:10 Some discussion of pg_bench.
  • 35:30 How long does it take to add an index to large tables? They saw times of up to an hour for tables with millions of rows.
  • 36:30 clustering your data in order to get PostgreSQL to write it more efficiently.
  • 37:30-48:00 A thorough discussion of partitioning tables via table inheritance. They used an ActiveRecord model (39:23) with a bunch of utility methods. They also had a cron to periodically create new partitions. At 45:15 they make a nice distinction between using partial indexes and partitions – one advantage is that a partition’s indexes can be different than its parents indexes. At 49:00 they mention maybe doing a plugin, not sure if that happened.
  • 52:00 Some discussion of full text search via tsearch.
  • 53:00 PostgreSQL’s lack of built in replication outside of WAL shipping, Slony, etc. Thank goodness 9.0 will address this!
  • 54:00 Some props to Engine Yard on their PostgreSQL support.

Good stuff all around, and thanks to Pivotal for posting these great talks!