Platform as a service provider Heroku posted a lengthy update on Friday to its blog addressing the routing performance issues affecting Ruby on Rails applications running on its Bamboo and Cedar stacks.
This post comes after one customer alleged last week that Heroku had “swindled” Ruby on Rails customers by quietly changing the way it distributes tasks across Amazon EC2, a change that resulted in longer queue times.
Heroku acknowledged the issues quickly, and promised to follow-up with an in-depth technical review. The timeline provides a good example of effectively communicating with customers around an issue by providing a quick response turnaround and specific commitments to solving the problem.
According to Heroku, as its traffic grew, it added new nodes to the routing cluster, which resulted in the per-node request queues becoming less and less efficient. Customers were then presented with confusing metrics from Heroku and New Relic.
“Our router logs captured the service time and the depth of the per app request queue and present that to customers, who in turn were relying on these metrics to determine scaling needs. However, as the cluster grew, the time-and-depth metric for an individual router was no longer a relevant way to determine latency in your app,” Jesper Jorgensen, senior director, product management at Heroku said. “As a result, customers experienced what were effectively random load balancing applied to their Bamboo applications. This was not caused by an explicit change to the Bamboo routing code. Nor was it related to the new routing logic on Cedar. It was a pure side-effect of the expansion of the routing cluster.”
Heroku explained that that issue with applications running on the Bamboo stack is the nature of routing on the stack coupled with gradual, horizontal expansion of the routing cluster.
On the Cedar stack, the root cause is the fact that Cedar is optimized for concurrent request routing, while some frameworks, like Rails, are not concurrent in their default configurations.
In the post, Heroku promised to improve its documentation so it accurately reflects how its service works across Bamboo and Cedar stacks, remove “incorrect and confusing metrics”, add metrics around queuing impact on application response times and provide tools for developers to augment latency and queuing metrics. It also promised to better support concurrent-request Rails apps on Cedar.
“The Rap Genius family appreciates Heroku’s extensive apology and technical explanation of the queuing problems its Rails customers are facing. This is a solid first step towards rectifying this situation – but it’s only a first step,” Rap Genius said in a blog post. “In particular, Rap Genius thinks that Heroku’s customers would appreciate if Heroku went into more depth about what they knew about the problem over the last few years. In addition, Heroku should offer its customers at least some of their money back. We love Heroku, they allowed us to focus on building an amazing site, they hooked us up, but they promised a level of service they did not provide.”
Talk back: Do you think Heroku’s explanation of the performance issues is adequate? Do you think that Rap Genius is right in seeking financial compensation for the performance issues? Let us know in a comment.