On Why I Don't Like Auto-Scaling in the Cloud - O'Reilly Broadcast
Dynamic scaling is better than auto scaling in the cloud - George Resse
Good rant about why auto-scaling is not dynamic scaling: two of the most widely cited advantages of the cloud. Not entirely convinced by the argument, though. I suspect governors can be used to manage scaling in a sensible way.
Sometimes traffic is truly unexpected. But not as often as you think. If you know you are getting coverage in some publication, marketing should have done an ROI projection on the campaign and be able to provide you with expected response rates. But you don't want it to auto-scale. Auto-scaling cannot differentiate between valid traffic and non-sense. You can. If your environment is experiencing a sudden, unexpected spike in activity, the appropriate approach is to have minimal auto-scaling with governors in place, receive a notification from your cloud infrastructure management tools, then determinate what the best way to respond is going forward. Here, the auto-scaling is simply a band-aid to enable a human to use dynamic scaling to define an appropriate, temporary capacity to support the unexpected change in demand. If you know you have a batch window from midnight to 3am, set your cloud infrastructure management tools to add capacity at 11:30 and throttle back at 3:30.
Makes good sense. I'd probably disagree on a few points if the cloud is owned/controlled by one's own organization.sccache - Google Code
The SHOP.COM Cache System is an object cache system that...RailsLab .:. Scaling Rails - Scaling Rails Screencasts
Learn everything you need to know about Scaling your Rails app through 13 informative Screencasts produced by Gregg Pollack with the support of New Relic.
Nice Screencasts series of performance and scalability of Rails Apps by Gregg Pollack
Scaling Rails screencasts produced by Gregg Pollack and supported by New RelicScaling Digg and Other Web Applications | High Scalability
Joe Stump, Lead Architect at Digg, gave this presentation at the Web 2.0 Expo. I couldn't find the actual presentation, but fortunately Kris Jordan took some great notes. That's how key moments in history are accidentally captured forever. Joe was also kind enough to respond to my email questions with a phone call.
Scaling StrategiesRails Lab .:. Expert advice on tuning and optimizing your Rails app
Rails Performance Resources - Expert advice on tuning and optimizing your Rails app.Building and Scaling a Startup on Rails: 12 Things We Learned the Hard Way - Axon Flux - A Ruby on Rails Blog
primer on rubyacts_as_ferric : Caching with Ruby on Rails
Interesting article about MySQL scalability problems.Dare Obasanjo aka Carnage4Life - Building Scalable Databases: Pros and Cons of Various Database Sharding Schemes
"Database sharding is the process of splitting up a database across multiple machines to improve the scalability of an application. The justification for database sharding is that after a certain scale point it is cheaper and more feasible to scale a site horizontally by adding more machines than to grow it vertically by adding beefier servers."
SELECT Name, Address FROM Customers WHERE CustomerID= ?", conn);nkallen's cache-money at master — GitHub
Active record memory cache.
A Write-Through Cacheing Library for ActiveRecord
class Message < ActiveRecord::BaseFacebook's photo storage rewrite
"Facebook will complete its roll-out of a new photo storage system designed to reduce the social network's reliance on expensive proprietary solutions from NetApp and Akamai."Agile Testing: Experiences deploying a large-scale infrastructure in Amazon EC2
Experiences deploying a large-scale infrastructure in Amazon EC2 At OpenX we recently completed a large-scale deployment of one of our server farms to Amazon EC2. Here are some lessons learned from that experience. Expect failures; what's more, embrace them Fully automate your infrastructure deployments Design your infrastructure so that it scales horizontally Establish clear measurable goals Be prepared to quickly identify and eliminate bottlenecks As you carefully watch various indicators of your systems' health, be prepared to.... Play wack-a-mole for a while, until things get stableInfoQ: Facebook: Science and the Social Graph
facebook structureRabbitMQ - A Fast, Reliable Queuing Option for Rubyists
When it comes to developing large systems with many interdependent parts, it’s common nowadays to use “queues.”
Runs as a daemon to link separate apps via a queue. Article includes suggested links.Some Notes on Distributed Key Stores « random($foo)
Distributed Key Stores
(Anti RDBMS) Key-value storesEngineering @ Facebook's Notes | Facebook
article about data architecture for facebook's photo system. Seems interesting
The Photos application is one of Facebook’s most popular features. Up to date, users have uploaded over 15 billion photos which makes Facebook the biggest photo sharing website. For each uploaded photo, Facebook generates and stores four images of different sizes, which translates to a total of 60 billion images and 1.5PB of storage. The current growth rate is 220 million new photos per week, which translates to 25TB of additional storage consumed weekly. At the peak there are 550,000 images served per second. These numbers pose a significant challenge for the Facebook photo storage infrastructure. NFS photo infrastructure The old photo infrastructure consisted of several tiers: * Upload tier receives users’ photo uploads, scales the original images and saves them on the NFS storage tier. * Photo serving tier receives HTTP requests for photo images and serves them from the NFS storage tier. * NFS storage tier built on top of commercial storage appliances. Since each ima
Since each image is stored in its own file, there is an enormous amount of metadata generated on the storage tier due to the namespace directories and file inodes. The amount of metadata far exceeds the caching abilities of the NFS storage tier, resulting in multiple I/O operations per photo upload or read request. The whole photo serving infrastructure is bottlenecked on the high metadata overhead of the NFS storage tier, which is one of the reasons why Facebook relies heavily on CDNs to serve photos. Two additional optimizations were deployed in order to mitigate this problem to some degree:distributed systems primer :: snax
I've been reading a bunch of papers about distributed systems recently, in order to help systematize for myself the thing that we built over the last year. Many of them were originally passed to me by Toby DiPasquale. Here is an annotated list so everyone can benefit. It helps if you have some algorithms literacy, or have built a system at scale, but don't let that stop you.Ruby Proxies for Scale and Monitoring - igvita.com
Including transparent sending of production traffic to staging.
Maybe for Omniture testing?
Lift the curtain behind any modern web application and you will find at least a few proxy servers orchestrating the show. Caching proxies such as Varnish and Squid help us take the load of our application servers; reverse proxies such as Haproxy and Nginx help us partition and distribute the workload to multiple workers, all without revealing the underlying architecture to the user. In the Ruby world, Rack middleware and Rails Metal are sister concepts: both allow the programmer to inject functionality in the pre or post-processing step of the HTTP request.
Three clusters Production (Huge!!) Staging (one) Benchmarking (same as staging)Drop ACID and Think About Data | High Scalability
nice summary of different data stores...Facebook | Engineering @ Facebook's Notes
Needle in a haystack: efficient storage of billions of photosScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
One MILLION connections !Amazon Web Services Blog: New Features for Amazon EC2: Elastic Load Balancing, Auto Scaling, and Amazon CloudWatch
And this is what we wanted a year ago. At least.
defineSQL Databases Are An Overapplied Solution (And What To Use Instead)
SQL Databases Are An Overapplied Solution (And What To Use Instead)Official Google Research Blog: Large-scale graph computing at Google
I want one of these! "We have created scalable infrastructure, named Pregel, to mine a wide range of graphs. In Pregel, programs are expressed as a sequence of iterations. In each iteration, a vertex can, independently of other vertices, receive messages sent to it in the previous iteration, send messages to other vertices, modify its own and its outgoing edges' states, and mutate the graph's topology (experts in parallel processing will recognize that the Bulk Synchronous Parallel Model inspired Pregel). Currently, Pregel scales to billions of vertices and edges, but this limit will keep expanding. Pregel's applicability is harder to quantify, but so far we haven't come across a type of graph or a practical graph computing problem which is not solvable with Pregel. It computes over large graphs much faster than alternatives, and the application programming interface is easy to use. Implementing PageRank, for example, takes only about 15 lines of code. "
So many things to learn and apply in business deals.
http://spinn3r.com/rankAdding Simplicity - An Engineering Mantra: Shard Lessons
No, not SHARED lessons, I mean SHARD lessons. I have to admit that until about a year ago I didn't really know the term shards in relation to databases. Now don't confuse that with not understanding how databases can be horizontally scaled. I was introduced to that concept and helped to define the various ways it can be done but we just called it splits. Regardless of what you call it, there are some interesting challenges that are introduced. The well known challenges of consistency are discussed ad nauseam, even by me, so I'm not going there with this article. But besides that, there are some other lessons to learn when applying the pattern to your data.
Worth reading just for the section on intelligently designing shard counts. Great discussion on picking counts that smooth your cost step functionStack Overflow Architecture | High Scalability
Stack Overflow Architecture | High Scalability
Stack Overflow is a much loved programmer question and answer site written by two guys nobody has ever heard of before. Well, not exactly. The site was created by top programmer and blog stars Jeff Atwood and Joel Spolsky. In that sense Stack Overflow is like a celebrity owned restaurant, only it should be around for a while. Joel estimates 1/3 of all the programmers in the world have used the site so they must be serving up something good.Django in the Real World
A talk given at OSCON 2009 on July 21st, 2009.
This tutorial examines how best to cope when the Real World intrudes on your carefully designed website.
There’s plenty of material (documentation, blogs, books) out there that’ll help you write a site using Django… but then what? You’ve still got to test, deploy, monitor, and tune the site; failure at deployment time means all your beautiful code is for naught. This tutorial examines how best to cope when the Real World intrudes on your carefully designed website.NoSQL: If Only It Was That Easy « Marked As Pertinent
Intéressant, une étude des différentes db alternatives sous l'angle de la scalabilité
data store scaling technologiesThe Rails Way: Do it Later With Delayed Job.
Ryan's meta-programming approach for declaring background tasks doubles-up as documentation, and puts the focus on the individual methods rather than requiring you to create separate classes for your jobs.
Do it Later With Delayed Job.jQuery maxImage plugin: Demo
This plugin will resize and scale targeted images to their max width according to the image ratio, the browser size and some simple options.
This plugin will resize and scale targeted images to their max width according to the image ratio, the browser size and some simple options. Change the size of your browser to see it's effect.
smart photo scalingDigg the Blog » Blog Archive » Looking to the future with Cassandra
answer is 3TB database???
"The fundamental problem is endemic to the relational database mindset, which places the burden of computation on reads rather than writes."
Wow, cassandra uses a lot of disk space. Trade offs!ongoing · Ravelry
Casey: We’ve got 430,000 registered users, in a month we’ll see 200,000 of those, about 135,000 in a week and about 70,000 in a day. We peak at 3.6 million pageviews per day. That’s registered users only (doesn’t include the very few pages that are Google accessible) and does not include the usual API calls, RSS feeds, AJAX. Actual requests that hit Rails per day is 10 million. 900 new users sign up per day. The forums are very active with about 50,000 new posts being written each day. Some various numbers — 2.3 million knitting/crochet projects, 19 million forum posts, 13 million private messages, 8 million photos (the majority are hosted by Flickr).
Some various numbers — 2.3 million knitting/crochet projects, 19 million forum posts, 13 million private messages, 8 million photos (the majority are hosted by Flickr).Dare Obasanjo aka Carnage4Life - Building Scalable Databases: Denormalization, the NoSQL Movement and Digg
As a Web developer it's always a good idea to know what the current practices are in the industry even if they seem a bit too crazy to adopt…yet.
bit on why non-SQL dbs are used in social networking sitesHow Ravelry Scales to 10 Million Requests Using Rails | High Scalability
How Ravelry Scales to 10 Million Requests Using Rails
Interessantissimo articolo su un sito web fatto in rails (www.ravelry.com) che ha raggiunto volumi di traffico davvero ragguardevoli. L'articolo illustra come è nata l'idea per il sito (si tratta di un sito per gli appassionati di cucito e lavoro a maglia), come si è arrivati a quei numeri (10 million requests a day hit Rails, 3.6 million pageviews per day, 430,000 registered users. 70,000 active each day. 900 new sign ups per day), dell'architettura adottata, e delle lezioni imparate.
Tim Bray has a wonderful interview with Casey Forbes, creator of Ravelry, a Ruby on Rails site supporting a 400,000+ strong community of dedicated knitters and crocheters.HowToLearnMoreScalability - memcached - Learn more about scalablity - Project Hosting on Google Code
scalablitySQL Databases Don't Scale
a hot discussion about Drupal
Are you choosing a Content Management System for your next site? Allow me to throw in my two cents against Drupal. In theory, Drupal is a CMS that lets you control your site out of the box. In practice, it’s a nightmare to configure and maintain.
Artigo de Mariya Lysenkova sobre os pontos fracos do Drupal como gerenciador de conteúdos.How Google Taught Me to Cache and Cash-In | High Scalability
A user named Apathy in this thread on how Reddit scales some of their features, shares some advice he learned while working at Google and other major companies. To be fair, I [Apathy] was working at Google at the time, and every job I held between 1995 and 2005 involved at least one of the largest websites on the planet. I didn't come up with any of these ideas, just watched other smart people I worked with who knew what they were doing and found (or wrote) tools that did the same things. But the theme is always the same: # Cache everything you can and store the rest in some sort of database (not necessarily relational and not necessarily centralized). How do you go about applying this strategy?
ing caches is a clasisc strategy for milking your servers as much as possilbe. First look for an exact match. If that's not founHigh Scalability - High Scalability - How Ravelry Scales to 10 Million Requests Using Rails
CogentHigh Scalability - High Scalability - Why are Facebook, Digg, and Twitter so hard to scale?
Facebook, Digg, Twitter が、なぜスケールするのに困難なのか？すべてはリアルタイムに更新されるデータだからという見解であっているかな？Facebook | Engineering @ Facebook's Notes
A software-based distributed caching system such as memcached is an important piece of today's largest Internet sites that support millions of concurrent users and deliver user-friendly response times. The distributed nature of memcached design transforms 1000s of servers into one large caching pool with gigabytes of memory per node. This blog entry explores single-instance memcached scalability for a few usage patterns.
"A software-based distributed caching system such as memcached is an important piece of today's largest Internet sites that support millions of concurrent users and deliver user-friendly response times. The distributed nature of memcached design transforms 1000s of servers into one large caching pool with gigabytes of memory per node. This blog entry explores single-instance memcached scalability for a few usage patterns."Dare Obasanjo aka Carnage4Life - Facebook Seattle Engineering Road Show: Mike Shroepfer on Engineering at Scale at Facebook
Article summarizing presentation by Facebook on some of their scaling challenges and solutions.Strategy: Flickr - Do the Essential Work Up-front and Queue the Rest | High Scalability
This strategy is stated perfectly by Flickr's Myles Grant: The Flickr engineering team is obsessed with making pages load as quickly as possible. To that end, we’re refactoring large amounts of our code to do only the essential work up front, and rely on our queuing system to do the rest. Flickr uses a queuing system to process 11 million tasks a day. Leslie Michael Orchard also does a great job explaining the queuing meme in his excellent post Queue everything and delight everyone. Asynchronous work queues are how you scalably solve problems that are too big to handle in real-time.Rackspace Cloud Computing & Hosting | NoSQL Ecosystem
Good introduction to the "NoSQL" space (initially not a fan of the term, but I guess it is going to stick...), highlighting the different designs used by the options in the space, and the benefits/drawbacks of those designs.
Unprecedented data volumes are driving businesses to look at alternatives to the traditional relational database technology that has served us well for over thirty years. Collectively, these alternatives have become known as “NoSQL databases.”SQL Databases Don't Scale
"Sharding kills most of the value of a relational database."
sql database dbPresentation Summary “High Performance at Massive Scale: Lessons Learned at Facebook” « Idle Process
Summary of the Facebook architecture and the bottlenecks they have had to work around
After considering a variety of data clustering algorithms, found that there was very little win for the additional complexity of clustering. So at Facebook, user data is randomly partitioned across indiviual databases and machines across the cluster. Hence, each user access requires retrieving data corresponding to user state spread across hundreds of machines. Intra-cluster network performance is hence critical to site performance. Facebook employs memcache to store the vast majority of user data in memory spread across thousands of machines in the cluster. In essence, nodes maintain a distributed hash table to determine the machine responsible for a particular users data. Hot data from MySQL is stored in the cache. The cache supports get/set/incr/decr andEvaluating Django Caching Options | codysoyland.com
Good overview of Django Caching Techniques
denormalizationPaul Stadig: Clojure + Terracotta = Yeah, Baby!
"These two seem like an interesting combination. Imagine the possibilities...kill your database, simple POJO applications, free distributed transactions, clustered JVMs with limitless memory...it would make your hair would grow back, you'd get women, and become filthy rich...well...maybe not, but at least you'd have more fun writing software." -- Paul StadigMr. Moore gets to punt on sharding - (37signals)
I guess the conclusion is that there’s no use in preempting the technological progress of tomorrow. Machines will get faster and cheaper all the time, but you’ll still only have the same limited programming resources that you had yesterday. If you can spend them on adding stuff that users care about instead of prematurely optimizing for the future, you stand a better chance of being in business when that tomorrow finally rolls around.
VerkkoStadi Technologies is looking for a Hardcore PHP Developer. See more on the Job Board.Gamma error in picture scaling
"Photographs that have been scaled with these software have been degradated. The degradation is often faint but probably most pictures contain at least an array where the degradation is clearly visible. I suppose this happens since the first versions of these software, maybe 20 years ago."
found via reddit. Should consider this the next time I edit a bunch of photosCassandra @ Twitter: An Interview with Ryan King « MyNoSQL
RT @kvz: Why Twitter is dropping MySQL in favor of Cassandra: http://bit.ly/dyeiXF
RT @DZone "Cassandra @ Twitter: An Interview with Ryan King « MyNoSQL" http://dzone.com/WbTY
MyNoSQL: Please include anything I’ve missed.Dennis Forbes on Software and Technology - Getting Real about NoSQL and the SQL-Isn't-Scalable Lie
SQL is Scalable and NoSQL Isn’t For Everyone The point is one that I think all rational people already realize: The ACID RDBMS isn’t appropriate for every need, nor is the NoSQL solution.
"[Though as Michael Stonebraker points out, SQL the query language actually has remarkably little to actually to do with the debate. It would be more clearly called NoACID]"High Scalability - High Scalability - Digg: 4000% Performance Increase by Sorting in PHP Rather than MySQL
# # Scaling practices turn a relational database into a non-relational database. To scale at Digg they followed a set of practices very similar to those used at eBay. No joins, no foreign key constraints (to scale writes), primary key look-ups only, limited range queries, and joins were done in memory. When implementing the comment feature a 4,000 percent increase in performance was created by sorting in PHP instead of MySQL. All this effort required to make a relational database scale basically meant you were using a non-relational database anyway. So why not just use a non-relational database from the start?
As Digg started out with a MySQL oriented architecture and has recently been moving full speed to Cassandra, his observations on some of their lessons learned and the motivation for the move are especially valuable. Here are some of the key takeaways you find useful:
RT @Sebdz: RT: @programmateur: Digg: 4000 % performance increase by sorting in PHP rather than MySQL (via @mrboo) - http://bit.ly/ckma10
♻ @n1k0: "Scaling practices turn a relational database into a non-relational database" http://n1k.li/4v (via @nsilberman)
Typically for relatively static data sets, relatively low query volumes, and relatively high latency requirements.InfoQ: Joe Armstrong About Erlang
In this interview filmed during QCon London 2008, Joe Armstrong, designer of Erlang, speaks on various aspects of the Erlang language, presenting its roots, how it compares with other languages and why it has become popular these days due to its native ability to scale on multi core systems.
Joe Armstrong About Erlang
Joe ArmstrongSteve Huffman on Lessons Learned at Reddit | Carsonified
Steve Huffman on Lessons Learned at Reddit By Keir WhitakerHigh Scalability - High Scalability - 7 Lessons Learned While Building Reddit to 270 Million Page Views a Month
7 Lessons Learned While Building Reddit to 270 Million Page Views a MonthAlecco Locco: A Gazillion-user Comet Server With libevent, Part 0
A test with 200,000 sockets (note it's 100,000 pairs) showed a process size of 2MB, so far so goodSecond Life Architecture - The Grid | High Scalability
536Share Exploring the software behind Facebook, the world’s largest site Posted in Main on June 18th, 2010 by Pingdom FacebookAt the scale that Facebook operates, a lot of traditional approaches to serving web content break down or simply aren’t practical. The challenge for Facebook’s engineers has been to keep the site up and running smoothly in spite of handling close to half a billion active users. This article takes a look at some of the software and techniques they use to accomplish that.
Software Behind FacebookA List Apart: Articles: Supersize that Background, Please!
Create properly scaling backgrounds.
Good use of background cover property and @media queries
Instead of using one fixed background size, a better solution would be to scale the image to make it fit within different window sizes. Unfortunately, CSS 2.1 has no means of scaling background images. There are a couple of workarounds, however these all rely on the HTML img element (instead of CSS backgrounds). They also use absolute positioning for layering and tables or scripting to enable resizing. Additionally, not all of these techniques preserve the image’s ratio, which results in unrealistically stretched backgrounds.