Avoiding memory leaks with Backbone.Collection

I was recently auditing our site for memory leaks and discovered a way to leak Backbone.Collection instances that is less than obvious, so I thought it might be worth documenting here in case others run into it.

The leak.

Here's a simple way to reproduce the leak:

window.someModel = new Backbone.Model()

// Create a collection that contains a model and listens to an event.
collection = new Backbone.Collection.extend({
  initialize: function() {
    this.listenTo Backbone, 'some_event', this.eventHandler
  },
  eventHandler: function() {}
})([window.someModel]))

// When done with the collection, stop listening to events to remove
// any references to the collection so the collection may be garbage
// collected.
collection.stopListening()

See the leak?

Tracking it down.

Here's what I got when profiling this code:

Heap Snapshot of Memory Leak

Indeed, the heap snapshot shows our collection still lingering in memory.
Looking at the retainers, we see someModel as a culprit and see the collection registered as a listener to an all event on a Backbone.Model.

However, we added no such event binding in our code, so what's binding that event and why is stopListening not removing it?

The answer became clear when I remembered that Backbone.Collection provides the ability to listen to events on the models they contain.

For example:

this.listenTo collection, 'change:some_model_attribute', ...

Indeed, if you dive into the latest implementation of Backbone (1.1.2 as of writing) you'll see the following code for Backbone.Collection:

// Internal method to create a model's ties to a collection.
_addReference: function(model, options) {
  this._byId[model.cid] = model;
  if (model.id != null) this._byId[model.id] = model;
  if (!model.collection) model.collection = this;
  model.on('all', this._onModelEvent, this);
},

There you have it: model.on('all', this._onModelEvent, this).

Backbone is binding the collection to the model automatically behind-the-scenes in order to propagate model events up to listeners on the collection. Also, because the model is bound to using model.on instead of this.listenTo, this.stopListening won't unbind that event, so the collection continues to be referenced by the model as a listener. As long as the model is around and has that reference, the collection won't be garbage collected.

In addition, the line if (!model.collection) model.collection = this; doesn't help the cause, adding another reference to the collection from the model which would also need to be removed to make the collection eligible for garbage collection.

The fix.

Okay, so now that we see what's leaking, how can we plug it?

Since adding models to the collection is the cause, removing would seem like the natural answer. Indeed, removing the model from the collection causes the collection to remove its references from the model.

The code then becomes:

window.someModel = new Backbone.Model()

// Create a collection that contains a model and listens to an event.
collection = new Backbone.Collection.extend({
  initialize: function() {
    this.listenTo Backbone, 'some_event', this.eventHandler
  },
  eventHandler: function() {}
})([window.someModel]))

// When done with the collection, stop listening to events to
// remove any references to the collection so the collection
// may be garbage collected.
collection.stopListening()
// ...And remove all models from the collection so the models
// no longer contain references to the collection so the
// collection may be garbage collected.
collection.reset()

The takeaway.

When done with a collection, be sure to remove all models from it to avoid a potential memory leak or make sure that all models aren't referenced by anything other than the collection itself.

Lambda: Bees with Frickin' Laser Beams

Amazon Web Services' recently released Lambda service might be the crown jewel of their toolset.

Lambda runs your code so far in cloud that it almost takes the computers out of computing. Developers can run their Node.js code in an isolated environment that includes some basic and any number of custom provided modules. The instance runs through the script before disappearing silently into the night -- though it can also report results! You can find more Lambda overview here.

Love at first invocation

We were lucky enough to use this while it was still in "preview" mode, and we found it so useful that we were using it in production within the month. There are further plans to open source our work, so keep an eye out for Imagineer. Imagineer aims to sever the tie between image sizing and any developer effort, so that we can ask for whatever we need and either get it from S3 or have it created, served, and persisted on S3.

What makes Lambda most attractive to us is the explosive scalability. To demonstrate the power, I'll walk you through making a simple load testing project with the AWS Node SDK. This was heavily inspired by Chicago Tribune's super cool Bees with Machine Guns!. BwMG, written in Python, helps users spin up several EC2 instances to "attack" an endpoint and test it's ability to handle the load. I used BwMG in load testing Imagineer, and found it as sweet as honey. The bit that stung me was the need to both spin up and spin down servers. They even include this warning in their README:

Please remember to [spin down the servers]—we aren’t responsible for your EC2 bills.

Another pain point is that EC2 instances are limited per account, and I had trouble getting enough to make it useful.

Setting goals

An example flow with BwMG might look like:

bees up -s 4 -g public -k frakkingtoasters
bees attack -n 10000 -c 250 -u http://www.ournewwebbyhotness.com/
bees down

This builds 4 servers to send 10,000 requests, 250 at a time, to the destination.

Our goal is to imitate this flow with a single line:

node bees.js -n 2000 -c 100 -u https://log.roadtrippers.com/

Assembling the hive

You will need authentication for AWS. I chose to use a credentials file at ~/.aws/credentials, but there are a number of options outlined here.

Create a directory for the project. I have lovingly named my project beeswithfrickinlaserbeams. We'll have 3 dependencies in our main script:

aws-sdk - Official API for interacting with AWS services, including Lambda.

adm-zip - Compression tools, because Lambda requires code to be provided in zip format.

bluebird - Promises will help us better organize the code.

Here's how my package.json looks:

{
  "name": "beeswithfrickinlaserbeams",
  "version": "1.0.0",
  "description": "Bees with frickin' laser beams",
  "dependencies": {
    "adm-zip": "^0.4.7",
    "aws-sdk": "^2.1.26",
    "bluebird": "^2.9.25"
  }
}

You can create that file in your project directory and install the dependencies with npm install.

Gathering the swarm

Create a file called bees.js. This will what we interact with through the command line.

At the top of the file, we'll include our depencies. Lambda requires a region to be specified, so I'm using us-east-1.

var AWS = require('aws-sdk');
var lambda = new AWS.Lambda({region: 'us-east-1'});
var Promise = require('bluebird');
var AdmZip = require('adm-zip');

Next, we'll parse out the command line arguments. This isn't very important to the demonstration, so I didn't spend much time making it robust.

var CONCURRENT_JOB_LIMIT = 50;
var config = process.argv.reduce(function(memo, arg, index) {
  switch (arg) {
    case '-n':
      memo.totalRequests = process.argv[index + 1];
      break;
    case '-c':
      memo.concurrentRequests = process.argv[index + 1];
      memo.beamsPerBee = Math.ceil(
        1.0 * memo.concurrentRequests / CONCURRENT_JOB_LIMIT);
      break;
    case '-u':
      memo.url = process.argv[index + 1];
      break;
  }
  return memo;
}, {});
if (config.concurrentRequests < CONCURRENT_JOB_LIMIT) {
  config.beeCount = config.concurrentRequests;
} else config.beeCount = CONCURRENT_JOB_LIMIT;
config.iterations = Math.ceil(
  1.0 * config.totalRequests / config.concurrentRequests);

Next, we'll create a configuration for a new Lambda function. You can create your function using their GUI, CLI tools, or directly from other code. You may notice we're working toward two other files, bee.js and laser.js.

// Create a zip file from the code for Lambda to consume
var zip = new AdmZip();
zip.addLocalFile('bee.js');
zip.addLocalFile('laser.js');

// Configure bee Lambda function
var createFunctionParams = {
  Code: {
    ZipFile: zip.toBuffer()
  },
  FunctionName: 'bee',
  Handler: 'bee.handler',
  Role: '<< your IAM ARN role >>',
  Runtime: 'nodejs',
  MemorySize: 1024,
  Timeout: 3
};

See the full explanation of options for createFunction here. Take note of the MemorySize and Timeout options. Processes that grab too much memory or run too long are swatted, but cost grows with mememory and runtime.

Now, create the function. We just want to confirm it's been created and calculate some parameters for our jobs to use. They need to know the target (url) and how many connections to make (beamsPerBee). Lastly, we kick off our recursive bee invoking method. We'll hop to that next

var createFunction = Promise.promisify(lambda.createFunction, lambda);
createFunction(createFunctionParams).
  then(function(e) {
  console.log('Releasing',
    config.beeCount,
    'bee(s) with',
    config.beamsPerBee,
    'frickin\' laser beam(s) each for',
    config.iterations,
    'attack(s).');

  var invokeParams = {
    FunctionName: 'bee',
    Payload: JSON.stringify({
      url: config.url,
      beamsPerBee: config.beamsPerBee
    })
  };

  // Start sending waves of requests
  sendInTheBees(config.iterations - 1, invokeParams);
});

The last part of this file is the sendInTheBees method. It will start a wave of jobs and perform calculations on the results. To match the createFunction earlier, we call deleteFunction at the end of our code. An ideal workflow would probably offer commands to create and delete the job.

It's not pretty, but let's get it over with so you can see the job itself!

var invoke = Promise.promisify(lambda.invoke, lambda);
function sendInTheBees(iterations, invokeParams, totals) {
  // Kick off all the jobs
  var invokedBees = [];
  for (var i = 0; i < config.beeCount; i++) {
    invokedBees.push(invoke(invokeParams));
  }
  var payload;

  Promise.all(invokedBees).then(function(results){
    // Run calculations on our results
    totals = results.reduce(function(memo, result) {
      payload = JSON.parse(result.Payload);
      memo.codes = payload.codes.reduce(function(m, v) {
        if (memo.codes.indexOf(v) === -1) m.push(v);
        return m;
      });
      memo.time += payload.time * 1.0 / config.beeCount / config.iterations;
      memo.hits += config.beamsPerBee;
      return memo;
    }, totals || { time: 0, hits: 0, codes: [] });

    if (iterations > 0) {
      console.log('Sending in', iterations, 'more swarms.');
      sendInTheBees(iterations - 1, invokeParams, totals);
    } else {
      console.log('Sent', totals.hits, 'hits');
      console.log('Recieved codes:', totals.codes);
      console.log('Mean request time:', parseInt(totals.time), 'ms');
      lambda.deleteFunction({FunctionName: 'bee'}).send();
    }
  }).catch(function(e) {
    console.log(e);
    lambda.deleteFunction({FunctionName: 'bee'}).send();
  })
}

And that's it! Only 107 lines so far.

Anatomy of a bee

Make a file named bee.js. This will hold the code that runs the Lambda job in the necessary handler format. The handler is passed an event payload and a context variable. We don't do much here except spawn child processes for each request and keep tabs on some runtime stats.

Ideally we would handle errors by calling context.fail, but we'll leave that for another day.

var cp = require('child_process');

exports.handler = function handler(event, context) {
  var count = event.beamsPerBee;
  var codes = [];
  var time = 0;

  for(var i = 0; i < event.beamsPerBee; i++) {
    var child = cp.fork('laser.js').
      on('message', function(m) {
        if (codes.indexOf(m.code) === -1) codes.push(m.code)
        time += m.time * 1.0 / event.beamsPerBee;
        count--;
        if(count === 0) {
          context.succeed({
            time: time,
            codes: codes
          });
        }
      });
    child.send({url: event.url});
  }
};

One file to go! Create laser.js if you haven't already. This code lives only to make a request to the target.

process.on('message', function(m) {
  var explicitProtocol = m.url.match(/^https|^http/);
  var protocol = require(explicitProtocol ? explicitProtocol[0] : 'http');
  var start = new Date();

  protocol.get(m.url, function(res) {
    process.send({ code: res.statusCode, time: new Date() - start });
    process.exit();
  });
});

Send in the bees

Let's put it to the test! I haven't quite nailed the power of Bees with Machine Guns, but we can still run as we set out to:

$ node bees.js -n 2000 -c 100 -u http://log.roadtrippers.com
Releasing 50 bee(s) with 2 frickin' laser beam(s) each for 20 attack(s).
Sending in 19 more swarms.
Sending in 18 more swarms.
Sending in 17 more swarms.
Sending in 16 more swarms.
Sending in 15 more swarms.
Sending in 14 more swarms.
Sending in 13 more swarms.
Sending in 12 more swarms.
Sending in 11 more swarms.
Sending in 10 more swarms.
Sending in 9 more swarms.
Sending in 8 more swarms.
Sending in 7 more swarms.
Sending in 6 more swarms.
Sending in 5 more swarms.
Sending in 4 more swarms.
Sending in 3 more swarms.
Sending in 2 more swarms.
Sending in 1 more swarms.
Sent 2000 hits
Recieved codes: 200
Mean request time: 76 ms

Translation:

As you can see, we were able to send several waves of requests to the target service. If you choose to follow along or use this project, please remember to use this only on servers you are responsible for. As mentioned in the BwMG README:

If you decide to use the Bees, please keep in mind the following important caveat: they are, more-or-less a distributed denial-of-service attack in a fancy package

Conclusion

We looked at what Lambda has to offer developers, looked at how Roadtrippers currently uses it, and walked through a simple load testing project to get acquainted with the API.

If you are interested in the source code, it is available on our repo. If you have questions about how we use Lambda for Imagineer, or about anything else we do at Roadtrippers, don't hesitate to say hello. Thank you for reading!

Setting up Ruby for a project on OS X using brew and rbenv.

For those of us who aren't regularly setting up ruby projects and find their memory foggy when the time comes, this is a quick cheatsheet of the process on Mac OS X for those that use brew and rbenv.

This assumes you have brew, ruby-build, and rbenv already installed.

  1. Change to your project's directory: cd /path/to/my/project
  2. Update brew: brew update
  3. Pull in latest ruby version availability: brew upgrade ruby-build
  4. List all available ruby versions: rbenv install --list
  5. Install a given ruby version: rbenv install <ruby-version>
  6. Set the ruby version for the current directory: rbenv local <ruby-version>
  7. Install bundler: gem install bundler
  8. Install project dependencies: bundle install

The Road To Perfect Coffee

Brain Juice

Coffee is critical to the engineering team here at Roadtrippers. It keeps all of us warm and happily coding from morning to night. So, why not show what starts our engines in the morning and kick off our engineering blog by showcasing how we make our coffee?!

Setup

The Roadtrippers' coffee setup is made up of three parts: the beans, the grinder, the brewer, and a simple kitchen scale.

First and foremost, Deeper Roots Coffee, a fantastic local coffee roaster, provides us with a freshly roasted supply every couple weeks.

The other pieces have changed over time as our team grew, and being great fans of scalable architecture, our current setup is just that.

Our first grinder, the Baratza Encore worked well when we were a small team of about a dozen people (total, not all highly caffeinated engineers). Our brewer at the time consisted of a simple home brewing machine (not worth mentioning) and the occasional Chemex. We have since upgraded to a Baratza Vario that allows us to grind coffee for 3L brews.

The brewer, highly recommended by our favorite local coffee shops, is a Fetco 2131XTS-3L. It brews up to 3L at a time and adding more capacity (within reason of course) is as simple as buying an extra carafe or two. The portability is great for taking into a meeting or moving between floors.

There's nothing special about the scale we use. Just remember, making coffee by volume or "just eyeballing it" is strictly forbidden. You must weigh the beans!

The Method

Our brewing process is pretty clear take on SCAA brewing guidelines. We use 165 grams to a 3L brew. For off peak hours, or if someone wants a carafe to themselves, we have a 1L brew setting which calls for 65g of coffee. Increasing the ratio of coffee to water is something we found important since there is slightly less extraction time. Less water build up leads to less time steeping.

I'm told every Baratza Vario is different, and the measurements are so fine that you might need to adjust to taste. That said, we've settled on a P7 setting for all drip brews.

Winning Over The Non Techies

Engineers have attention to detail and therefore understand the importance of weighing the beans. After all, we make sense of "" != nil. We love precision and admire the process. Relaying this to others can be a challenge, so one of our designers whipped up this info graphic to explain the process in amazing detail but with simplicity.

Roadtrippers Presents: How To Brew Coffee The Right Way

Conclusion

I hope this helps everyone in their quest for better coffee and highly caffeinated engineering performance!

Now that we're all on the same highly caffeinated page, stay tuned for more posts about Roadtrippers' technology.