Andrew Kirch | 1 May 20:18 2011
Picon

Re: Amazon diagnosis

On 5/1/2011 2:07 PM, Mike wrote:
> I am still waiting for proof that single points of failure can
> realistically be completely eliminated from any moderately complicated
> network environment / application. So far, I think murphy is still
> winning on this one.

Sure they can, but as a thought exercise fully 2n redundancy is
difficult on a small scale for anything web facing.  I've seen a very
simple implementation for a website requiring 5 9's that consumed over
$50k in equipment, and this wasn't even geographically diverse.  I have
to believe that scaling up the concept of "doing it right" results in
exponential cost increases.  To illustrate the problem, I would give you
the first step in the thought exercise:  first find two datacenters with
diverse carriers, that aren't on the same regional power grid (As we've
learned in the (iirc) 2003 power outage, New York and DC won't work, nor
will Ohio, so you need redundant teams to cover a very remote site).

Jeff Wheeler | 1 May 21:29 2011

Re: Amazon diagnosis

On Sun, May 1, 2011 at 2:18 PM, Andrew Kirch <trelane <at> trelane.net> wrote:
> Sure they can, but as a thought exercise fully 2n redundancy is
> difficult on a small scale for anything web facing.  I've seen a very
> simple implementation for a website requiring 5 9's that consumed over
> $50k in equipment, and this wasn't even geographically diverse.  I have

What it really boils down to is this: if application developers are
doing their jobs, a given service can be easy and inexpensive to
distribute to unrelated systems/networks without a huge infrastructure
expense.  If the developers are not, you end up spending a lot of
money on infrastructure to make up for code, databases, and APIs which
were not designed with this in mind.

These same developers who do not design and implement services with
diversity and redundancy in mind will fare little better with AWS than
any other platform.  Look at Reddit, for example.  This is an
application/service which is utterly trivial to implement in a cheap,
distributed manner, yet they have failed to do so for years, and
suffer repeated, long-duration outages as a result.  They probably buy
a lot more AWS services than would otherwise be needed, and truly have
a more complex infrastructure than such a simple service should.

IT managers would do well to understand that a few smart programmers,
who understand how all their tools (web servers, databases,
filesystems, load-balancers, etc.) actually work, can often do more to
keep infrastructure cost under control, and improve the reliability of
services, than any other investment in IT resources.

--

-- 
Jeff S Wheeler <jsw <at> inconcepts.biz>
(Continue reading)


Gmane