Eugen Leitl | 13 Apr 11:53 2012

Ask For Forgiveness Programming - Or How We'll Program 1000 Cores


http://highscalability.com/blog/2012/3/6/ask-for-forgiveness-programming-or-how-well-program-1000-cor.html

Ask For Forgiveness Programming - Or How We'll Program 1000 Cores

Tuesday, March 6, 2012 at 9:15AM

The argument for a massively multicore future is now familiar: while clock
speeds have leveled off, device density is increasing, so the future is cheap
chips with hundreds and thousands of cores. That’s the inexorable logic
behind our multicore future.

The unsolved question that lurks deep in the dark part of a programmer’s mind
is: how on earth are we to program these things? For problems that aren’t
embarrassingly parallel, we really have no idea. IBM Research’s David Ungar
has an idea. And it’s radical in the extreme...

Grace Hopper once advised “It's easier to ask for forgiveness than it is to
get permission.” I wonder if she had any idea that her strategy for dealing
with human bureaucracy would the same strategy David Ungar thinks will help
us tame  the technological bureaucracy of 1000+ core systems?

You may recognize David as the co-creator of the Self programming language,
inspiration for the HotSpot technology in the JVM and the prototype model
used by Javascript. He’s also the innovator behind using cartoon animation
techniques to build user interfaces. Now he’s applying that same creative
zeal to solving the multicore problem.

During a talk on his research, Everything You Know (about Parallel
Programming) Is Wrong! A Wild Screed about the Future, he called his approach
(Continue reading)

Brown, John Mickey | 13 Apr 16:18 2012
Picon

Re: Ask For Forgiveness Programming - Or How We'll Program 1000 Cores

Nice write up. It seems to convey the same topics I thought of during the conference. I told David Ungar that I
agreed mostly with his premise because at a macro scale (distributed computing), we've been dealing with
this issue of scaling and data staleness for decades. I've often asked the business the tolerance of
receiving correct timely data during requirements. Pure synchronized solutions in the enterprise are
extremely costly.

Although the application may be different and have different constraints at the micro level, there's
opportunity for the two camps (micro and enterprise) to learn practices, patterns, and innovation from
each other that can impact both fields.

I'm particularly interested in how to apply the Actor language concepts into enterprise business
applications to allow it to be more mainstream (perhaps in the ESB and Fabric space)

Glad to find someone else that recognized the same parallels (pardon the pun).

John Mickey Brown - Application Architect

"In times of drastic change, it is the learners who inherit the future. The learned usually find themselves
equipped to live in a world that no longer exists" - Eric Hoffner

-----Original Message-----
From: fonc-bounces <at> vpri.org [mailto:fonc-bounces <at> vpri.org] On Behalf Of Eugen Leitl
Sent: Friday, April 13, 2012 5:53 AM
To: tt <at> postbiota.org; info <at> postbiota.org; forkit!
Cc: Fundamentals of New Computing
Subject: [fonc] Ask For Forgiveness Programming - Or How We'll Program 1000 Cores


http://highscalability.com/blog/2012/3/6/ask-for-forgiveness-programming-or-how-well-program-1000-cor.html


(Continue reading)

Miles Fidelman | 13 Apr 16:35 2012
Picon

Re: Ask For Forgiveness Programming - Or How We'll Program 1000 Cores

Brown, John Mickey wrote:
> Nice write up. It seems to convey the same topics I thought of during the conference. I told David Ungar that
I agreed mostly with his premise because at a macro scale (distributed computing), we've been dealing
with this issue of scaling and data staleness for decades. I've often asked the business the tolerance of
receiving correct timely data during requirements. Pure synchronized solutions in the enterprise are
extremely costly.
>
> Although the application may be different and have different constraints at the micro level, there's
opportunity for the two camps (micro and enterprise) to learn practices, patterns, and innovation from
each other that can impact both fields.
>
> I'm particularly interested in how to apply the Actor language concepts into enterprise business
applications to allow it to be more mainstream (perhaps in the ESB and Fabric space)
>
> Glad to find someone else that recognized the same parallels (pardon the pun).
>
FYI: Carl Hewitt, the guy who coined the actor concept, has been doing 
some interesting writing along similar lines, c.f.,
http://knol.google.com/k/carl-hewitt/indeterminacy-in-computation/pcxtp4rx7g1t/32#

There seems to be a growing school of thought that large, distributed 
systems are always inconsistent, and we have to find new metaphors and 
design patterns that take this into account.  A lot of the work in the 
NoSQL space, around MVCC (multi-version concurrency control) and 
"eventual consistency" as embodied in, for example, CouchDB, or for that 
matter distributed CVS systems like Git, are steps in this direction.  
Erlang's implementation of the Actor model is a particularly interesting 
platform in this regard.

Miles Fidelman
(Continue reading)

David Goehrig | 13 Apr 17:34 2012

Re: Ask For Forgiveness Programming - Or How We'll Program 1000 Cores

There's a very simple concept that most of the world embraces in everything from supply chain management, to personnel allocations, to personal relationships. 

We call it *slack*

What Dan is talking about amounts to introducing slack into distributed models.  Particularly this version of the definition of slack:

": lacking in completeness, finish, or perfection <a very slack piece of work>"

Which is a more realistic version of computation in a universe with propagation delay (finite speed of light). But it also introduces a concept similar to anyone familiar with ropes. You can't tie a knot without some slack. (computation being an exercise in binary sequence knot making). Finishing a computation is analogous to pulling the rope taunt. 

Dave




On Apr 13, 2012, at 5:53 AM, Eugen Leitl <eugen-WaCMhLcPx2TYtjvyW6yDsg@public.gmane.org> wrote:


http://highscalability.com/blog/2012/3/6/ask-for-forgiveness-programming-or-how-well-program-1000-cor.html

Ask For Forgiveness Programming - Or How We'll Program 1000 Cores

Tuesday, March 6, 2012 at 9:15AM

The argument for a massively multicore future is now familiar: while clock
speeds have leveled off, device density is increasing, so the future is cheap
chips with hundreds and thousands of cores. That’s the inexorable logic
behind our multicore future.

The unsolved question that lurks deep in the dark part of a programmer’s mind
is: how on earth are we to program these things? For problems that aren’t
embarrassingly parallel, we really have no idea. IBM Research’s David Ungar
has an idea. And it’s radical in the extreme...

Grace Hopper once advised “It's easier to ask for forgiveness than it is to
get permission.” I wonder if she had any idea that her strategy for dealing
with human bureaucracy would the same strategy David Ungar thinks will help
us tame  the technological bureaucracy of 1000+ core systems?

You may recognize David as the co-creator of the Self programming language,
inspiration for the HotSpot technology in the JVM and the prototype model
used by Javascript. He’s also the innovator behind using cartoon animation
techniques to build user interfaces. Now he’s applying that same creative
zeal to solving the multicore problem.

During a talk on his research, Everything You Know (about Parallel
Programming) Is Wrong! A Wild Screed about the Future, he called his approach
“anti-lock or “race and repair” because the core idea is that the only way
we’re going to be able to program the new multicore chips of the future is to
sidestep Amdhal’s Law and program without serialization, without locks,
embracing non-determinism. Without locks calculations will obviously be
wrong, but correct answers can be approached over time using techniques like
fresheners:

   A thread that, instead of responding to user requests, repeatedly selects
a cached value according to some strategy, and recomputes that value from its
inputs, in case the value had been inconsistent. Experimentation with a
prototype showed that on a 16-core system with a 50/50 split between workers
and fresheners, fewer than 2% of the queries would return an answer that had
been stale for at least eight mean query times. These results suggest that
tolerance of inconsistency can be an effective strategy in circumventing
Amdahl’s law.

During his talk David mentioned that he’s trying  to find a better name than
“anti-lock or “race and repair” for this line of thinking. Throwing my hat
into the name game, I want to call it Ask For Forgiveness Programming (AFFP),
based on the idea that using locks is “asking for permission” programming, so
not using locks along with fresheners is really “asking for forgiveness.” I
think it works, but it’s just a thought.

No Shared Lock Goes Unpunished

Amdahl’s Law is used to understand why simply having more cores won’t save us
for a large class of problems. The idea is that any program is made up of a
serial fraction and a parallel fraction. More cores only helps you with the
parallel portion. If an operation takes 10 seconds, for example, and one
second of it is serial, then having infinitely many cores will only help you
make the parallelizable part faster, the serial code will always take  one
second. Amdahl says you can never go faster than that 10%. As long as your
code has a serial portion it’s impossible to go faster.

Jakob Engblom recounts a similar line of thought in his blog:

   They also had the opportunity to test their solution [for parallel
Erlang] on a Tilera 64-core machines. This mercilessly exposed any
scalability limitations in their system, and proved the conventional wisdom
that going beyond 10+ cores is quite different from scaling from 1 to 8… The
two key lessons they learned was that no shared lock goes unpunished, and
data has to be distributed as well as code.

   It seems that for all big system parallelization efforts turn into a hunt
for locks and the splitting up of code and data into units that can run in
parallel without having to synchronize. The “upper limit” of this process is
clearly a system with no synchronization points at all.

Sherlock Holmes says that when you have eliminated the impossible, whatever
remains, however improbable, must be the truth, so the truth is: removing
serialization is the only way to use all these cores. Since synchronization
is the core serial component to applications, we must get rid of
synchronization.

A Transaction View Isn’t as Strong as it Once Was

Getting rid of locks/synchronization may not be the radical notion it once
was. Developments over the last few years have conditioned us to deal with
more ambiguity, at least for distributed systems.

Through many discussions about CAP, we’ve come to accept a non-transactional
view of databases as a way to preserve availability in the face of
partitions. Even if a read doesn’t return the last write, we know it will
eventually and that some merge logic will make them consistent once again.

The idea of compensating transactions as a way around the performance
problems due to distributed coordination has also become more familiar.

Another related idea comes from the realm of optimistic concurrency control
that lets multiple transactions proceed in parallel without locking and then
checks at commit time if a conflict requires a rollback. The application then
gets to try again.

And for some time now memcache has supported a compare-and-set operation that
allows multiple clients to avoid writing over each other by comparing time
stamps.

As relaxed as these methods are, they all still require that old enemy:
synchronization.

Your Answers are Already Wrong

The core difficulty with abandoning synchronization is coming to terms with
the notion that results may not be correct at all times. It’s a certainty we
love certainty, so even considering we could get wrong answers for any window
of time is heresy.

David says we need to change our way of thinking. The idea is to get results
that are “good enough, soon enough.” Get wrong answers quickly, but that are
still right enough to be useful. Perfect is the enemy of the good and perfect
answers simply take too long at scale.

David emphasises repeatedly that there’s a fundamental trade-off between
correctness and performance. Without locks operations happen out of order,
which gives the wrong answer. Race conditions happen. To get correct answers
we effectively add in delays, making one thing wait for another, which kills
performance, and we don’t want to kill performance, we want to use all those
cores.

But consider an aspect of working on distributed systems that people don’t
like to think about: your answers are already always wrong.

Unless you are querying a read-only corpus or use a global lock, in
distributed systems any answer to any query is potentially wrong, always.
This is the hardest idea to get into your brain when working in distributed
systems with many cores that experience simultaneous updates. The world never
stops. It’s always in flux. You can never assume for a single moment there’s
a stable frame of reference. The answer from a query you just made could be
wrong in the next instant. A query to see if a host is up, for example, can’t
ever be assumed right. That host maybe up or down in the next instant and
your code won’t know about it until it finds a conflict.

So are we really that far away from accepting that all answers to queries
could be wrong?

Lessons from Nature

One strategy for dealing with many cores is to move towards biological models
instead of mathematical models, where complicated behaviours emerge without
global determinism. Bird flocks, for example, emerge from three simple rules:
avoid crowding, steer towards average heading of neighbors,  steer towards
average position of neighbors. No pi-calculus required, it works without
synchronization or coordination. Each bird is essentially its own thread, it
simply looks around and makes local decisions. This is a more cellular
automaton view of the world.

Race and Repair - Mitigating Wrong Results

The idea is that errors created by data races won’t be prevented, they will
be repaired. A calculation made without locks will be wrong under concurrent
updates. So why not use some of our many cores to calculate the right answers
in the background, in parallel, and update the values? This approach:

   Uses no synchronization.

   Tolerates some wrong answers.

   Probabilistically fixes the answers over time.

Some obvious questions: how many background threads do you need? What order
should values be recalculated? And how wrong will your answers be?

To figure this out an experiment was run and described in Inconsistency
Robustness for Scalability in Interactive Concurrent‑Update In-Memory MOLAP
Cubes, which test updates on a complicated spreadsheet.

With locking the results were correct, but scalability was limited. Without
locking, results were usually wrong. Both results are as might be expected.
And when they added freshener threads they found:

   Combining the frequency with the duration data suggests that in a 16-core
system with a 50/50 split between workers and fresheners, fewer than 2% of
the queries would return an answer that had been stale for at least eight
mean query times.

I found this result quite surprising. I would have expected the answers to be
wrong more of the time. Could this actually work?

Breadcrumbs

The simple idea of Race and Repair is open to a lot of clever innovations.
Breadcrumbs are one such innovation that attempts to be smarter about which
values need recalculating. Meta-data is attached on a value indicating that a
entity is recalculating the value or has changed a dependent value such that
this value is now out of date. Any entity that might want to use this data
can wait until a “valid” value is calculated and/or not to insert a
calculated value if it is out of data. This narrows the window of time in
which errors are introduced. It’s a mitigation.

There are endless variations of this. I can imagine remembering calculations
that used a value and then publishing updated values so those calculations
can be rerun. The result is a roiling  event driven sea that is constantly
bubbling with updates that are trying to bring values towards correctness,
but probably never quite getting there.

Probabilistic Data Structures

Another area David has researched are hashtables that can be inserted into
without synchronization. This would allow entries to be added to a hashtable
from any number of cores without slowing down the entire system with a lock.
Naive insertion into a hashtable in parallel will mess up pointers, which can
either result in the loss of values or the insertion of values, but it’s
possible to work around these issues and create a lock free hashtable.

His presentation goes into a lot of detail on how this might work. He rejects
light-weight locking schmes like CAS because these are still a choke point
and there’s a penalty for atomic instructions under contention. They won’t
scale.

He thinks there’s a big research opportunity in probabilistic data structures
that work without synchronization and that work with mitigation.

Is this the Future?

This is just a light weight introduction. For more details please read all
the papers and watch all the videos. But I think it’s important to talk about
and think about how we might make use of all these cores for traditional
programming tasks, though the result may be anything but traditional.

A bad experience on one project makes me somewhat skeptical that human nature
will ever be comfortable in accepting wrong answers as the norm. My
experience report was on an event driven system with a large number of nodes
that could generate events so fast that events had to be dropped. Imagine a
city undergoing an earthquake and a huge sensor net spewing out change events
to tell the world what’s going on in real-time.

The original requirement was events could never be dropped, ever, which made
a certain amount of sense when the number of sensors was small. As the number
of sensors expands it’s simply not possible. My proposal was to drop events
and have background processes query the actual sensors in the background so
that the database would be synced over time. Very much like the proposals
here.

It was a no go. A vehement no go. Titanic arguments ensued.  Management and
everyone on down simply could not accept the idea that their models would be
out of sync with the sensors (which of course they were anyway). My eventual
solution took a year to implement and radically changed everything, but that
was simpler than trying to convince people to deal with uncertainty.

So there might be some convincing to do.

Related Articles

   Everything You Know (About Parallel Programming) Is Wrong!: A Wild Screed
About the Future (slides, video)

   Renaissance Project

   On Hacker News

   David Ungar: It is Good to be Wrong

   We Really Don't Know How To Compute!

   Harnessing emergence for manycore programming: early experience
integrating ensembles, adverbs, and object-based inheritance

   Inconsistency Robustness for Scalability in Interactive Concurrent‑Update
In-Memory MOLAP Cubes


_______________________________________________
fonc mailing list
fonc-uVco7kAcSAQ@public.gmane.org
http://vpri.org/mailman/listinfo/fonc
<div>
<div>There's a very simple concept that most of the world embraces in everything from supply chain management, to personnel allocations, to personal relationships.&nbsp;</div>
<div><br></div>
<div>We call it *slack*≤/div>
<div><br></div>
<div>What Dan is talking about amounts to introducing slack into distributed models. &nbsp;Particularly this version of the definition of slack:</div>
<div><br></div>
<div>":&nbsp;lacking in completeness, finish, or perfection&nbsp;<span class="vi">&lt;a very slack piece of work&gt;"</span>
</div>
<div><br></div>
<div>Which is a more realistic version of computation in a universe with propagation delay (finite speed of light). But it also introduces a concept similar to anyone familiar with ropes. You can't tie a knot without some slack. (computation being an exercise in binary sequence knot making). Finishing a computation is analogous to pulling the rope taunt.&nbsp;</div>
<div><br></div>
<div>Dave</div>
<div><br></div>
<div><br></div>
<div><br></div>
<div>
<br><br>-=-=- <a href="mailto:dave@...">dave@...</a> -=-=-</div>
<div>
<br>On Apr 13, 2012, at 5:53 AM, Eugen Leitl &lt;<a href="mailto:eugen@...">eugen@...</a>&gt; wrote:<br><br>
</div>
<div></div>
<blockquote type="cite"><div>
<span></span><br><span><a href="http://highscalability.com/blog/2012/3/6/ask-for-forgiveness-programming-or-how-well-program-1000-cor.html">http://highscalability.com/blog/2012/3/6/ask-for-forgiveness-programming-or-how-well-program-1000-cor.html</a></span><br><span></span><br><span>Ask For Forgiveness Programming - Or How We'll Program 1000 Cores</span><br><span></span><br><span>Tuesday, March 6, 2012 at 9:15AM</span><br><span></span><br><span>The argument for a massively multicore future is now familiar: while clock</span><br><span>speeds have leveled off, device density is increasing, so the future is cheap</span><br><span>chips with hundreds and thousands of cores. That&rsquo;s the inexorable logic</span><br><span>behind our multicore future.</span><br><span></span><br><span>The unsolved question that lurks deep in the dark part of a programmer&rsquo;s mind</span><br><span>is: how on earth are we to program these things? For problems that aren&rsquo;t</span><br><span>embarrassingly parallel, we really have no idea. IBM Research&rsquo;s David Ungar</span><br><span>has an idea. And it&rsquo;s radical in the extreme...</span><br><span></span><br><span>Grace Hopper once advised &ldquo;It's easier to ask for forgiveness than it is to</span><br><span>get permission.&rdquo; I wonder if she had any idea that her strategy for dealing</span><br><span>with human bureaucracy would the same strategy David Ungar thinks will help</span><br><span>us tame &nbsp;the technological bureaucracy of 1000+ core systems?</span><br><span></span><br><span>You may recognize David as the co-creator of the Self programming language,</span><br><span>inspiration for the HotSpot technology in the JVM and the prototype model</span><br><span>used by Javascript. He&rsquo;s also the innovator behind using cartoon animation</span><br><span>techniques to build user interfaces. Now he&rsquo;s applying that same creative</span><br><span>zeal to solving the multicore problem.</span><br><span></span><br><span>During a talk on his research, Everything You Know (about Parallel</span><br><span>Programming) Is Wrong! A Wild Screed about the Future, he called his approach</span><br><span>&ldquo;anti-lock or &ldquo;race and repair&rdquo; because the core idea is that the only way</span><br><span>we&rsquo;re going to be able to program the new multicore chips of the future is to</span><br><span>sidestep Amdhal&rsquo;s Law and program without serialization, without locks,</span><br><span>embracing non-determinism. Without locks calculations will obviously be</span><br><span>wrong, but correct answers can be approached over time using techniques like</span><br><span>fresheners:</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;A thread that, instead of responding to user requests, repeatedly selects</span><br><span>a cached value according to some strategy, and recomputes that value from its</span><br><span>inputs, in case the value had been inconsistent. Experimentation with a</span><br><span>prototype showed that on a 16-core system with a 50/50 split between workers</span><br><span>and fresheners, fewer than 2% of the queries would return an answer that had</span><br><span>been stale for at least eight mean query times. These results suggest that</span><br><span>tolerance of inconsistency can be an effective strategy in circumventing</span><br><span>Amdahl&rsquo;s law.</span><br><span></span><br><span>During his talk David mentioned that he&rsquo;s trying &nbsp;to find a better name than</span><br><span>&ldquo;anti-lock or &ldquo;race and repair&rdquo; for this line of thinking. Throwing my hat</span><br><span>into the name game, I want to call it Ask For Forgiveness Programming (AFFP),</span><br><span>based on the idea that using locks is &ldquo;asking for permission&rdquo; programming, so</span><br><span>not using locks along with fresheners is really &ldquo;asking for forgiveness.&rdquo; I</span><br><span>think it works, but it&rsquo;s just a thought.</span><br><span></span><br><span>No Shared Lock Goes Unpunished</span><br><span></span><br><span>Amdahl&rsquo;s Law is used to understand why simply having more cores won&rsquo;t save us</span><br><span>for a large class of problems. The idea is that any program is made up of a</span><br><span>serial fraction and a parallel fraction. More cores only helps you with the</span><br><span>parallel portion. If an operation takes 10 seconds, for example, and one</span><br><span>second of it is serial, then having infinitely many cores will only help you</span><br><span>make the parallelizable part faster, the serial code will always take &nbsp;one</span><br><span>second. Amdahl says you can never go faster than that 10%. As long as your</span><br><span>code has a serial portion it&rsquo;s impossible to go faster.</span><br><span></span><br><span>Jakob Engblom recounts a similar line of thought in his blog:</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;They also had the opportunity to test their solution [for parallel</span><br><span>Erlang] on a Tilera 64-core machines. This mercilessly exposed any</span><br><span>scalability limitations in their system, and proved the conventional wisdom</span><br><span>that going beyond 10+ cores is quite different from scaling from 1 to 8&hellip; The</span><br><span>two key lessons they learned was that no shared lock goes unpunished, and</span><br><span>data has to be distributed as well as code.</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;It seems that for all big system parallelization efforts turn into a hunt</span><br><span>for locks and the splitting up of code and data into units that can run in</span><br><span>parallel without having to synchronize. The &ldquo;upper limit&rdquo; of this process is</span><br><span>clearly a system with no synchronization points at all.</span><br><span></span><br><span>Sherlock Holmes says that when you have eliminated the impossible, whatever</span><br><span>remains, however improbable, must be the truth, so the truth is: removing</span><br><span>serialization is the only way to use all these cores. Since synchronization</span><br><span>is the core serial component to applications, we must get rid of</span><br><span>synchronization.</span><br><span></span><br><span>A Transaction View Isn&rsquo;t as Strong as it Once Was</span><br><span></span><br><span>Getting rid of locks/synchronization may not be the radical notion it once</span><br><span>was. Developments over the last few years have conditioned us to deal with</span><br><span>more ambiguity, at least for distributed systems.</span><br><span></span><br><span>Through many discussions about CAP, we&rsquo;ve come to accept a non-transactional</span><br><span>view of databases as a way to preserve availability in the face of</span><br><span>partitions. Even if a read doesn&rsquo;t return the last write, we know it will</span><br><span>eventually and that some merge logic will make them consistent once again.</span><br><span></span><br><span>The idea of compensating transactions as a way around the performance</span><br><span>problems due to distributed coordination has also become more familiar.</span><br><span></span><br><span>Another related idea comes from the realm of optimistic concurrency control</span><br><span>that lets multiple transactions proceed in parallel without locking and then</span><br><span>checks at commit time if a conflict requires a rollback. The application then</span><br><span>gets to try again.</span><br><span></span><br><span>And for some time now memcache has supported a compare-and-set operation that</span><br><span>allows multiple clients to avoid writing over each other by comparing time</span><br><span>stamps.</span><br><span></span><br><span>As relaxed as these methods are, they all still require that old enemy:</span><br><span>synchronization.</span><br><span></span><br><span>Your Answers are Already Wrong</span><br><span></span><br><span>The core difficulty with abandoning synchronization is coming to terms with</span><br><span>the notion that results may not be correct at all times. It&rsquo;s a certainty we</span><br><span>love certainty, so even considering we could get wrong answers for any window</span><br><span>of time is heresy.</span><br><span></span><br><span>David says we need to change our way of thinking. The idea is to get results</span><br><span>that are &ldquo;good enough, soon enough.&rdquo; Get wrong answers quickly, but that are</span><br><span>still right enough to be useful. Perfect is the enemy of the good and perfect</span><br><span>answers simply take too long at scale.</span><br><span></span><br><span>David emphasises repeatedly that there&rsquo;s a fundamental trade-off between</span><br><span>correctness and performance. Without locks operations happen out of order,</span><br><span>which gives the wrong answer. Race conditions happen. To get correct answers</span><br><span>we effectively add in delays, making one thing wait for another, which kills</span><br><span>performance, and we don&rsquo;t want to kill performance, we want to use all those</span><br><span>cores.</span><br><span></span><br><span>But consider an aspect of working on distributed systems that people don&rsquo;t</span><br><span>like to think about: your answers are already always wrong.</span><br><span></span><br><span>Unless you are querying a read-only corpus or use a global lock, in</span><br><span>distributed systems any answer to any query is potentially wrong, always.</span><br><span>This is the hardest idea to get into your brain when working in distributed</span><br><span>systems with many cores that experience simultaneous updates. The world never</span><br><span>stops. It&rsquo;s always in flux. You can never assume for a single moment there&rsquo;s</span><br><span>a stable frame of reference. The answer from a query you just made could be</span><br><span>wrong in the next instant. A query to see if a host is up, for example, can&rsquo;t</span><br><span>ever be assumed right. That host maybe up or down in the next instant and</span><br><span>your code won&rsquo;t know about it until it finds a conflict.</span><br><span></span><br><span>So are we really that far away from accepting that all answers to queries</span><br><span>could be wrong?</span><br><span></span><br><span>Lessons from Nature</span><br><span></span><br><span>One strategy for dealing with many cores is to move towards biological models</span><br><span>instead of mathematical models, where complicated behaviours emerge without</span><br><span>global determinism. Bird flocks, for example, emerge from three simple rules:</span><br><span>avoid crowding, steer towards average heading of neighbors, &nbsp;steer towards</span><br><span>average position of neighbors. No pi-calculus required, it works without</span><br><span>synchronization or coordination. Each bird is essentially its own thread, it</span><br><span>simply looks around and makes local decisions. This is a more cellular</span><br><span>automaton view of the world.</span><br><span></span><br><span>Race and Repair - Mitigating Wrong Results</span><br><span></span><br><span>The idea is that errors created by data races won&rsquo;t be prevented, they will</span><br><span>be repaired. A calculation made without locks will be wrong under concurrent</span><br><span>updates. So why not use some of our many cores to calculate the right answers</span><br><span>in the background, in parallel, and update the values? This approach:</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;Uses no synchronization.</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;Tolerates some wrong answers.</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;Probabilistically fixes the answers over time.</span><br><span></span><br><span>Some obvious questions: how many background threads do you need? What order</span><br><span>should values be recalculated? And how wrong will your answers be?</span><br><span></span><br><span>To figure this out an experiment was run and described in Inconsistency</span><br><span>Robustness for Scalability in Interactive Concurrent&#8209;Update In-Memory MOLAP</span><br><span>Cubes, which test updates on a complicated spreadsheet.</span><br><span></span><br><span>With locking the results were correct, but scalability was limited. Without</span><br><span>locking, results were usually wrong. Both results are as might be expected.</span><br><span>And when they added freshener threads they found:</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;Combining the frequency with the duration data suggests that in a 16-core</span><br><span>system with a 50/50 split between workers and fresheners, fewer than 2% of</span><br><span>the queries would return an answer that had been stale for at least eight</span><br><span>mean query times.</span><br><span></span><br><span>I found this result quite surprising. I would have expected the answers to be</span><br><span>wrong more of the time. Could this actually work?</span><br><span></span><br><span>Breadcrumbs</span><br><span></span><br><span>The simple idea of Race and Repair is open to a lot of clever innovations.</span><br><span>Breadcrumbs are one such innovation that attempts to be smarter about which</span><br><span>values need recalculating. Meta-data is attached on a value indicating that a</span><br><span>entity is recalculating the value or has changed a dependent value such that</span><br><span>this value is now out of date. Any entity that might want to use this data</span><br><span>can wait until a &ldquo;valid&rdquo; value is calculated and/or not to insert a</span><br><span>calculated value if it is out of data. This narrows the window of time in</span><br><span>which errors are introduced. It&rsquo;s a mitigation.</span><br><span></span><br><span>There are endless variations of this. I can imagine remembering calculations</span><br><span>that used a value and then publishing updated values so those calculations</span><br><span>can be rerun. The result is a roiling &nbsp;event driven sea that is constantly</span><br><span>bubbling with updates that are trying to bring values towards correctness,</span><br><span>but probably never quite getting there.</span><br><span></span><br><span>Probabilistic Data Structures</span><br><span></span><br><span>Another area David has researched are hashtables that can be inserted into</span><br><span>without synchronization. This would allow entries to be added to a hashtable</span><br><span>from any number of cores without slowing down the entire system with a lock.</span><br><span>Naive insertion into a hashtable in parallel will mess up pointers, which can</span><br><span>either result in the loss of values or the insertion of values, but it&rsquo;s</span><br><span>possible to work around these issues and create a lock free hashtable.</span><br><span></span><br><span>His presentation goes into a lot of detail on how this might work. He rejects</span><br><span>light-weight locking schmes like CAS because these are still a choke point</span><br><span>and there&rsquo;s a penalty for atomic instructions under contention. They won&rsquo;t</span><br><span>scale.</span><br><span></span><br><span>He thinks there&rsquo;s a big research opportunity in probabilistic data structures</span><br><span>that work without synchronization and that work with mitigation.</span><br><span></span><br><span>Is this the Future?</span><br><span></span><br><span>This is just a light weight introduction. For more details please read all</span><br><span>the papers and watch all the videos. But I think it&rsquo;s important to talk about</span><br><span>and think about how we might make use of all these cores for traditional</span><br><span>programming tasks, though the result may be anything but traditional.</span><br><span></span><br><span>A bad experience on one project makes me somewhat skeptical that human nature</span><br><span>will ever be comfortable in accepting wrong answers as the norm. My</span><br><span>experience report was on an event driven system with a large number of nodes</span><br><span>that could generate events so fast that events had to be dropped. Imagine a</span><br><span>city undergoing an earthquake and a huge sensor net spewing out change events</span><br><span>to tell the world what&rsquo;s going on in real-time.</span><br><span></span><br><span>The original requirement was events could never be dropped, ever, which made</span><br><span>a certain amount of sense when the number of sensors was small. As the number</span><br><span>of sensors expands it&rsquo;s simply not possible. My proposal was to drop events</span><br><span>and have background processes query the actual sensors in the background so</span><br><span>that the database would be synced over time. Very much like the proposals</span><br><span>here.</span><br><span></span><br><span>It was a no go. A vehement no go. Titanic arguments ensued. &nbsp;Management and</span><br><span>everyone on down simply could not accept the idea that their models would be</span><br><span>out of sync with the sensors (which of course they were anyway). My eventual</span><br><span>solution took a year to implement and radically changed everything, but that</span><br><span>was simpler than trying to convince people to deal with uncertainty.</span><br><span></span><br><span>So there might be some convincing to do.</span><br><span></span><br><span>Related Articles</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;Everything You Know (About Parallel Programming) Is Wrong!: A Wild Screed</span><br><span>About the Future (slides, video)</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;Renaissance Project</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;On Hacker News</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;David Ungar: It is Good to be Wrong</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;We Really Don't Know How To Compute!</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;Harnessing emergence for manycore programming: early experience</span><br><span>integrating ensembles, adverbs, and object-based inheritance</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;Inconsistency Robustness for Scalability in Interactive Concurrent&#8209;Update</span><br><span>In-Memory MOLAP Cubes</span><br><span></span><br><span></span><br><span>_______________________________________________</span><br><span>fonc mailing list</span><br><span><a href="mailto:fonc@...">fonc@...</a></span><br><span><a href="http://vpri.org/mailman/listinfo/fonc">http://vpri.org/mailman/listinfo/fonc</a></span><br>
</div></blockquote>
</div>
Miles Fidelman | 13 Apr 17:58 2012
Picon

Re: Ask For Forgiveness Programming - Or How We'll Program 1000 Cores

And then the financial types take advantage of it by practicing 
arbitrage - using programmed trading to identify inconsistencies just a 
few microseconds before the next guy.

Hmmm... maybe these are implementation of "race-and-repair" at a macro 
level.

David Goehrig wrote:
> There's a very simple concept that most of the world embraces in 
> everything from supply chain management, to personnel allocations, to 
> personal relationships.
>
> We call it *slack*
>
<snip>
> Which is a more realistic version of computation in a universe with 
> propagation delay (finite speed of light). But it also introduces a 
> concept similar to anyone familiar with ropes. You can't tie a knot 
> without some slack. (computation being an exercise in binary sequence 
> knot making). Finishing a computation is analogous to pulling the rope 
> taunt.
>
>
> On Apr 13, 2012, at 5:53 AM, Eugen Leitl <eugen@... 
> <mailto:eugen@...>> wrote:
>
>>
>> http://highscalability.com/blog/2012/3/6/ask-for-forgiveness-programming-or-how-well-program-1000-cor.html
>>
>> Ask For Forgiveness Programming - Or How We'll Program 1000 Cores
>>
<snip>
>>
>> Race and Repair - Mitigating Wrong Results
>>

--

-- 
In theory, there is no difference between theory and practice.
In practice, there is.   .... Yogi Berra

David Barbour | 14 Apr 08:03 2012
Picon

Re: Ask For Forgiveness Programming - Or How We'll Program 1000 Cores

Another option would be to introduce slack in the propagation delay itself. E.g. if I send you a message indicating the meeting has been moved to 1:30pm, it would be a good idea to send it a good bit in advance - perhaps at 10:30am - so you can schedule and prepare. 


With computers, the popular solution - sending a message *when* we want something to happen - seems analogous to sending the message at 1:29:58pm, leaving the computer always on the edge with little time to prepare even for highly predictable events, and making the system much more vulnerable to variations in communication latency.

On Fri, Apr 13, 2012 at 8:34 AM, David Goehrig <dave-VzFlwLIqKc6rG/iDocfnWg@public.gmane.org> wrote:
There's a very simple concept that most of the world embraces in everything from supply chain management, to personnel allocations, to personal relationships. 

We call it *slack*

What Dan is talking about amounts to introducing slack into distributed models.  Particularly this version of the definition of slack:

": lacking in completeness, finish, or perfection <a very slack piece of work>"

Which is a more realistic version of computation in a universe with propagation delay (finite speed of light). But it also introduces a concept similar to anyone familiar with ropes. You can't tie a knot without some slack. (computation being an exercise in binary sequence knot making). Finishing a computation is analogous to pulling the rope taunt. 

Dave




On Apr 13, 2012, at 5:53 AM, Eugen Leitl <eugen-WaCMhLcPx2TYtjvyW6yDsg@public.gmane.org> wrote:


http://highscalability.com/blog/2012/3/6/ask-for-forgiveness-programming-or-how-well-program-1000-cor.html

Ask For Forgiveness Programming - Or How We'll Program 1000 Cores

Tuesday, March 6, 2012 at 9:15AM

The argument for a massively multicore future is now familiar: while clock
speeds have leveled off, device density is increasing, so the future is cheap
chips with hundreds and thousands of cores. That’s the inexorable logic
behind our multicore future.

The unsolved question that lurks deep in the dark part of a programmer’s mind
is: how on earth are we to program these things? For problems that aren’t
embarrassingly parallel, we really have no idea. IBM Research’s David Ungar
has an idea. And it’s radical in the extreme...

Grace Hopper once advised “It's easier to ask for forgiveness than it is to
get permission.” I wonder if she had any idea that her strategy for dealing
with human bureaucracy would the same strategy David Ungar thinks will help
us tame  the technological bureaucracy of 1000+ core systems?

You may recognize David as the co-creator of the Self programming language,
inspiration for the HotSpot technology in the JVM and the prototype model
used by Javascript. He’s also the innovator behind using cartoon animation
techniques to build user interfaces. Now he’s applying that same creative
zeal to solving the multicore problem.

During a talk on his research, Everything You Know (about Parallel
Programming) Is Wrong! A Wild Screed about the Future, he called his approach
“anti-lock or “race and repair” because the core idea is that the only way
we’re going to be able to program the new multicore chips of the future is to
sidestep Amdhal’s Law and program without serialization, without locks,
embracing non-determinism. Without locks calculations will obviously be
wrong, but correct answers can be approached over time using techniques like
fresheners:

   A thread that, instead of responding to user requests, repeatedly selects
a cached value according to some strategy, and recomputes that value from its
inputs, in case the value had been inconsistent. Experimentation with a
prototype showed that on a 16-core system with a 50/50 split between workers
and fresheners, fewer than 2% of the queries would return an answer that had
been stale for at least eight mean query times. These results suggest that
tolerance of inconsistency can be an effective strategy in circumventing
Amdahl’s law.

During his talk David mentioned that he’s trying  to find a better name than
“anti-lock or “race and repair” for this line of thinking. Throwing my hat
into the name game, I want to call it Ask For Forgiveness Programming (AFFP),
based on the idea that using locks is “asking for permission” programming, so
not using locks along with fresheners is really “asking for forgiveness.” I
think it works, but it’s just a thought.

No Shared Lock Goes Unpunished

Amdahl’s Law is used to understand why simply having more cores won’t save us
for a large class of problems. The idea is that any program is made up of a
serial fraction and a parallel fraction. More cores only helps you with the
parallel portion. If an operation takes 10 seconds, for example, and one
second of it is serial, then having infinitely many cores will only help you
make the parallelizable part faster, the serial code will always take  one
second. Amdahl says you can never go faster than that 10%. As long as your
code has a serial portion it’s impossible to go faster.

Jakob Engblom recounts a similar line of thought in his blog:

   They also had the opportunity to test their solution [for parallel
Erlang] on a Tilera 64-core machines. This mercilessly exposed any
scalability limitations in their system, and proved the conventional wisdom
that going beyond 10+ cores is quite different from scaling from 1 to 8… The
two key lessons they learned was that no shared lock goes unpunished, and
data has to be distributed as well as code.

   It seems that for all big system parallelization efforts turn into a hunt
for locks and the splitting up of code and data into units that can run in
parallel without having to synchronize. The “upper limit” of this process is
clearly a system with no synchronization points at all.

Sherlock Holmes says that when you have eliminated the impossible, whatever
remains, however improbable, must be the truth, so the truth is: removing
serialization is the only way to use all these cores. Since synchronization
is the core serial component to applications, we must get rid of
synchronization.

A Transaction View Isn’t as Strong as it Once Was

Getting rid of locks/synchronization may not be the radical notion it once
was. Developments over the last few years have conditioned us to deal with
more ambiguity, at least for distributed systems.

Through many discussions about CAP, we’ve come to accept a non-transactional
view of databases as a way to preserve availability in the face of
partitions. Even if a read doesn’t return the last write, we know it will
eventually and that some merge logic will make them consistent once again.

The idea of compensating transactions as a way around the performance
problems due to distributed coordination has also become more familiar.

Another related idea comes from the realm of optimistic concurrency control
that lets multiple transactions proceed in parallel without locking and then
checks at commit time if a conflict requires a rollback. The application then
gets to try again.

And for some time now memcache has supported a compare-and-set operation that
allows multiple clients to avoid writing over each other by comparing time
stamps.

As relaxed as these methods are, they all still require that old enemy:
synchronization.

Your Answers are Already Wrong

The core difficulty with abandoning synchronization is coming to terms with
the notion that results may not be correct at all times. It’s a certainty we
love certainty, so even considering we could get wrong answers for any window
of time is heresy.

David says we need to change our way of thinking. The idea is to get results
that are “good enough, soon enough.” Get wrong answers quickly, but that are
still right enough to be useful. Perfect is the enemy of the good and perfect
answers simply take too long at scale.

David emphasises repeatedly that there’s a fundamental trade-off between
correctness and performance. Without locks operations happen out of order,
which gives the wrong answer. Race conditions happen. To get correct answers
we effectively add in delays, making one thing wait for another, which kills
performance, and we don’t want to kill performance, we want to use all those
cores.

But consider an aspect of working on distributed systems that people don’t
like to think about: your answers are already always wrong.

Unless you are querying a read-only corpus or use a global lock, in
distributed systems any answer to any query is potentially wrong, always.
This is the hardest idea to get into your brain when working in distributed
systems with many cores that experience simultaneous updates. The world never
stops. It’s always in flux. You can never assume for a single moment there’s
a stable frame of reference. The answer from a query you just made could be
wrong in the next instant. A query to see if a host is up, for example, can’t
ever be assumed right. That host maybe up or down in the next instant and
your code won’t know about it until it finds a conflict.

So are we really that far away from accepting that all answers to queries
could be wrong?

Lessons from Nature

One strategy for dealing with many cores is to move towards biological models
instead of mathematical models, where complicated behaviours emerge without
global determinism. Bird flocks, for example, emerge from three simple rules:
avoid crowding, steer towards average heading of neighbors,  steer towards
average position of neighbors. No pi-calculus required, it works without
synchronization or coordination. Each bird is essentially its own thread, it
simply looks around and makes local decisions. This is a more cellular
automaton view of the world.

Race and Repair - Mitigating Wrong Results

The idea is that errors created by data races won’t be prevented, they will
be repaired. A calculation made without locks will be wrong under concurrent
updates. So why not use some of our many cores to calculate the right answers
in the background, in parallel, and update the values? This approach:

   Uses no synchronization.

   Tolerates some wrong answers.

   Probabilistically fixes the answers over time.

Some obvious questions: how many background threads do you need? What order
should values be recalculated? And how wrong will your answers be?

To figure this out an experiment was run and described in Inconsistency
Robustness for Scalability in Interactive Concurrent‑Update In-Memory MOLAP
Cubes, which test updates on a complicated spreadsheet.

With locking the results were correct, but scalability was limited. Without
locking, results were usually wrong. Both results are as might be expected.
And when they added freshener threads they found:

   Combining the frequency with the duration data suggests that in a 16-core
system with a 50/50 split between workers and fresheners, fewer than 2% of
the queries would return an answer that had been stale for at least eight
mean query times.

I found this result quite surprising. I would have expected the answers to be
wrong more of the time. Could this actually work?

Breadcrumbs

The simple idea of Race and Repair is open to a lot of clever innovations.
Breadcrumbs are one such innovation that attempts to be smarter about which
values need recalculating. Meta-data is attached on a value indicating that a
entity is recalculating the value or has changed a dependent value such that
this value is now out of date. Any entity that might want to use this data
can wait until a “valid” value is calculated and/or not to insert a
calculated value if it is out of data. This narrows the window of time in
which errors are introduced. It’s a mitigation.

There are endless variations of this. I can imagine remembering calculations
that used a value and then publishing updated values so those calculations
can be rerun. The result is a roiling  event driven sea that is constantly
bubbling with updates that are trying to bring values towards correctness,
but probably never quite getting there.

Probabilistic Data Structures

Another area David has researched are hashtables that can be inserted into
without synchronization. This would allow entries to be added to a hashtable
from any number of cores without slowing down the entire system with a lock.
Naive insertion into a hashtable in parallel will mess up pointers, which can
either result in the loss of values or the insertion of values, but it’s
possible to work around these issues and create a lock free hashtable.

His presentation goes into a lot of detail on how this might work. He rejects
light-weight locking schmes like CAS because these are still a choke point
and there’s a penalty for atomic instructions under contention. They won’t
scale.

He thinks there’s a big research opportunity in probabilistic data structures
that work without synchronization and that work with mitigation.

Is this the Future?

This is just a light weight introduction. For more details please read all
the papers and watch all the videos. But I think it’s important to talk about
and think about how we might make use of all these cores for traditional
programming tasks, though the result may be anything but traditional.

A bad experience on one project makes me somewhat skeptical that human nature
will ever be comfortable in accepting wrong answers as the norm. My
experience report was on an event driven system with a large number of nodes
that could generate events so fast that events had to be dropped. Imagine a
city undergoing an earthquake and a huge sensor net spewing out change events
to tell the world what’s going on in real-time.

The original requirement was events could never be dropped, ever, which made
a certain amount of sense when the number of sensors was small. As the number
of sensors expands it’s simply not possible. My proposal was to drop events
and have background processes query the actual sensors in the background so
that the database would be synced over time. Very much like the proposals
here.

It was a no go. A vehement no go. Titanic arguments ensued.  Management and
everyone on down simply could not accept the idea that their models would be
out of sync with the sensors (which of course they were anyway). My eventual
solution took a year to implement and radically changed everything, but that
was simpler than trying to convince people to deal with uncertainty.

So there might be some convincing to do.

Related Articles

   Everything You Know (About Parallel Programming) Is Wrong!: A Wild Screed
About the Future (slides, video)

   Renaissance Project

   On Hacker News

   David Ungar: It is Good to be Wrong

   We Really Don't Know How To Compute!

   Harnessing emergence for manycore programming: early experience
integrating ensembles, adverbs, and object-based inheritance

   Inconsistency Robustness for Scalability in Interactive Concurrent‑Update
In-Memory MOLAP Cubes


_______________________________________________
fonc mailing list
fonc-uVco7kAcSAQ@public.gmane.org
http://vpri.org/mailman/listinfo/fonc

_______________________________________________
fonc mailing list
fonc-uVco7kAcSAQ@public.gmane.org
http://vpri.org/mailman/listinfo/fonc




--
bringing s-words to a pen fight
<div>
<p>Another option would be to introduce slack in the propagation delay itself. E.g. if I send you a message indicating the meeting has been moved to 1:30pm, it would be a good idea to send it a good bit in advance - perhaps at 10:30am - so you can schedule and prepare.&nbsp;</p>
<div>
<br>
</div>
<div>With computers, the popular solution - sending a message *when* we want something to happen - seems analogous to sending the message at 1:29:58pm, leaving the computer always on the edge with little time to prepare even for highly predictable events, and making the system much more vulnerable to variations in communication latency.</div>
<div><br></div>
<div><div><div><div>
<div class="gmail_quote">On Fri, Apr 13, 2012 at 8:34 AM, David Goehrig <span dir="ltr">&lt;<a href="mailto:dave@...">dave@...</a>&gt;</span> wrote:<br><blockquote class="gmail_quote">
<div bgcolor="#FFFFFF">
<div>There's a very simple concept that most of the world embraces in everything from supply chain management, to personnel allocations, to personal relationships.&nbsp;</div>
<div><br></div>
<div>We call it *slack*≤/div>
<div><br></div>
<div>What Dan is talking about amounts to introducing slack into distributed models. &nbsp;Particularly this version of the definition of slack:</div>
<div><br></div>
<div>":&nbsp;lacking in completeness, finish, or perfection&nbsp;<span>&lt;a very slack piece of work&gt;"</span>
</div>
<div><br></div>
<div>Which is a more realistic version of computation in a universe with propagation delay (finite speed of light). But it also introduces a concept similar to anyone familiar with ropes. You can't tie a knot without some slack. (computation being an exercise in binary sequence knot making). Finishing a computation is analogous to pulling the rope taunt.&nbsp;</div>
<div><br></div>
<div>Dave</div>
<div><br></div>
<div><br></div>
<div><br></div>
<div>
<br><br>-=-=- <a href="mailto:dave@..." target="_blank">dave@...</a> -=-=-</div>
<div><div class="h5">
<div>
<br>On Apr 13, 2012, at 5:53 AM, Eugen Leitl &lt;<a href="mailto:eugen <at> leitl.org" target="_blank">eugen@...</a>&gt; wrote:<br><br>
</div>
<div></div>
<blockquote type="cite"><div>
<span></span><br><span><a href="http://highscalability.com/blog/2012/3/6/ask-for-forgiveness-programming-or-how-well-program-1000-cor.html" target="_blank">http://highscalability.com/blog/2012/3/6/ask-for-forgiveness-programming-or-how-well-program-1000-cor.html</a></span><br><span></span><br><span>Ask For Forgiveness Programming - Or How We'll Program 1000 Cores</span><br><span></span><br><span>Tuesday, March 6, 2012 at 9:15AM</span><br><span></span><br><span>The argument for a massively multicore future is now familiar: while clock</span><br><span>speeds have leveled off, device density is increasing, so the future is cheap</span><br><span>chips with hundreds and thousands of cores. That&rsquo;s the inexorable logic</span><br><span>behind our multicore future.</span><br><span></span><br><span>The unsolved question that lurks deep in the dark part of a programmer&rsquo;s mind</span><br><span>is: how on earth are we to program these things? For problems that aren&rsquo;t</span><br><span>embarrassingly parallel, we really have no idea. IBM Research&rsquo;s David Ungar</span><br><span>has an idea. And it&rsquo;s radical in the extreme...</span><br><span></span><br><span>Grace Hopper once advised &ldquo;It's easier to ask for forgiveness than it is to</span><br><span>get permission.&rdquo; I wonder if she had any idea that her strategy for dealing</span><br><span>with human bureaucracy would the same strategy David Ungar thinks will help</span><br><span>us tame &nbsp;the technological bureaucracy of 1000+ core systems?</span><br><span></span><br><span>You may recognize David as the co-creator of the Self programming language,</span><br><span>inspiration for the HotSpot technology in the JVM and the prototype model</span><br><span>used by Javascript. He&rsquo;s also the innovator behind using cartoon animation</span><br><span>techniques to build user interfaces. Now he&rsquo;s applying that same creative</span><br><span>zeal to solving the multicore problem.</span><br><span></span><br><span>During a talk on his research, Everything You Know (about Parallel</span><br><span>Programming) Is Wrong! A Wild Screed about the Future, he called his approach</span><br><span>&ldquo;anti-lock or &ldquo;race and repair&rdquo; because the core idea is that the only way</span><br><span>we&rsquo;re going to be able to program the new multicore chips of the future is to</span><br><span>sidestep Amdhal&rsquo;s Law and program without serialization, without locks,</span><br><span>embracing non-determinism. Without locks calculations will obviously be</span><br><span>wrong, but correct answers can be approached over time using techniques like</span><br><span>fresheners:</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;A thread that, instead of responding to user requests, repeatedly selects</span><br><span>a cached value according to some strategy, and recomputes that value from its</span><br><span>inputs, in case the value had been inconsistent. Experimentation with a</span><br><span>prototype showed that on a 16-core system with a 50/50 split between workers</span><br><span>and fresheners, fewer than 2% of the queries would return an answer that had</span><br><span>been stale for at least eight mean query times. These results suggest that</span><br><span>tolerance of inconsistency can be an effective strategy in circumventing</span><br><span>Amdahl&rsquo;s law.</span><br><span></span><br><span>During his talk David mentioned that he&rsquo;s trying &nbsp;to find a better name than</span><br><span>&ldquo;anti-lock or &ldquo;race and repair&rdquo; for this line of thinking. Throwing my hat</span><br><span>into the name game, I want to call it Ask For Forgiveness Programming (AFFP),</span><br><span>based on the idea that using locks is &ldquo;asking for permission&rdquo; programming, so</span><br><span>not using locks along with fresheners is really &ldquo;asking for forgiveness.&rdquo; I</span><br><span>think it works, but it&rsquo;s just a thought.</span><br><span></span><br><span>No Shared Lock Goes Unpunished</span><br><span></span><br><span>Amdahl&rsquo;s Law is used to understand why simply having more cores won&rsquo;t save us</span><br><span>for a large class of problems. The idea is that any program is made up of a</span><br><span>serial fraction and a parallel fraction. More cores only helps you with the</span><br><span>parallel portion. If an operation takes 10 seconds, for example, and one</span><br><span>second of it is serial, then having infinitely many cores will only help you</span><br><span>make the parallelizable part faster, the serial code will always take &nbsp;one</span><br><span>second. Amdahl says you can never go faster than that 10%. As long as your</span><br><span>code has a serial portion it&rsquo;s impossible to go faster.</span><br><span></span><br><span>Jakob Engblom recounts a similar line of thought in his blog:</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;They also had the opportunity to test their solution [for parallel</span><br><span>Erlang] on a Tilera 64-core machines. This mercilessly exposed any</span><br><span>scalability limitations in their system, and proved the conventional wisdom</span><br><span>that going beyond 10+ cores is quite different from scaling from 1 to 8&hellip; The</span><br><span>two key lessons they learned was that no shared lock goes unpunished, and</span><br><span>data has to be distributed as well as code.</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;It seems that for all big system parallelization efforts turn into a hunt</span><br><span>for locks and the splitting up of code and data into units that can run in</span><br><span>parallel without having to synchronize. The &ldquo;upper limit&rdquo; of this process is</span><br><span>clearly a system with no synchronization points at all.</span><br><span></span><br><span>Sherlock Holmes says that when you have eliminated the impossible, whatever</span><br><span>remains, however improbable, must be the truth, so the truth is: removing</span><br><span>serialization is the only way to use all these cores. Since synchronization</span><br><span>is the core serial component to applications, we must get rid of</span><br><span>synchronization.</span><br><span></span><br><span>A Transaction View Isn&rsquo;t as Strong as it Once Was</span><br><span></span><br><span>Getting rid of locks/synchronization may not be the radical notion it once</span><br><span>was. Developments over the last few years have conditioned us to deal with</span><br><span>more ambiguity, at least for distributed systems.</span><br><span></span><br><span>Through many discussions about CAP, we&rsquo;ve come to accept a non-transactional</span><br><span>view of databases as a way to preserve availability in the face of</span><br><span>partitions. Even if a read doesn&rsquo;t return the last write, we know it will</span><br><span>eventually and that some merge logic will make them consistent once again.</span><br><span></span><br><span>The idea of compensating transactions as a way around the performance</span><br><span>problems due to distributed coordination has also become more familiar.</span><br><span></span><br><span>Another related idea comes from the realm of optimistic concurrency control</span><br><span>that lets multiple transactions proceed in parallel without locking and then</span><br><span>checks at commit time if a conflict requires a rollback. The application then</span><br><span>gets to try again.</span><br><span></span><br><span>And for some time now memcache has supported a compare-and-set operation that</span><br><span>allows multiple clients to avoid writing over each other by comparing time</span><br><span>stamps.</span><br><span></span><br><span>As relaxed as these methods are, they all still require that old enemy:</span><br><span>synchronization.</span><br><span></span><br><span>Your Answers are Already Wrong</span><br><span></span><br><span>The core difficulty with abandoning synchronization is coming to terms with</span><br><span>the notion that results may not be correct at all times. It&rsquo;s a certainty we</span><br><span>love certainty, so even considering we could get wrong answers for any window</span><br><span>of time is heresy.</span><br><span></span><br><span>David says we need to change our way of thinking. The idea is to get results</span><br><span>that are &ldquo;good enough, soon enough.&rdquo; Get wrong answers quickly, but that are</span><br><span>still right enough to be useful. Perfect is the enemy of the good and perfect</span><br><span>answers simply take too long at scale.</span><br><span></span><br><span>David emphasises repeatedly that there&rsquo;s a fundamental trade-off between</span><br><span>correctness and performance. Without locks operations happen out of order,</span><br><span>which gives the wrong answer. Race conditions happen. To get correct answers</span><br><span>we effectively add in delays, making one thing wait for another, which kills</span><br><span>performance, and we don&rsquo;t want to kill performance, we want to use all those</span><br><span>cores.</span><br><span></span><br><span>But consider an aspect of working on distributed systems that people don&rsquo;t</span><br><span>like to think about: your answers are already always wrong.</span><br><span></span><br><span>Unless you are querying a read-only corpus or use a global lock, in</span><br><span>distributed systems any answer to any query is potentially wrong, always.</span><br><span>This is the hardest idea to get into your brain when working in distributed</span><br><span>systems with many cores that experience simultaneous updates. The world never</span><br><span>stops. It&rsquo;s always in flux. You can never assume for a single moment there&rsquo;s</span><br><span>a stable frame of reference. The answer from a query you just made could be</span><br><span>wrong in the next instant. A query to see if a host is up, for example, can&rsquo;t</span><br><span>ever be assumed right. That host maybe up or down in the next instant and</span><br><span>your code won&rsquo;t know about it until it finds a conflict.</span><br><span></span><br><span>So are we really that far away from accepting that all answers to queries</span><br><span>could be wrong?</span><br><span></span><br><span>Lessons from Nature</span><br><span></span><br><span>One strategy for dealing with many cores is to move towards biological models</span><br><span>instead of mathematical models, where complicated behaviours emerge without</span><br><span>global determinism. Bird flocks, for example, emerge from three simple rules:</span><br><span>avoid crowding, steer towards average heading of neighbors, &nbsp;steer towards</span><br><span>average position of neighbors. No pi-calculus required, it works without</span><br><span>synchronization or coordination. Each bird is essentially its own thread, it</span><br><span>simply looks around and makes local decisions. This is a more cellular</span><br><span>automaton view of the world.</span><br><span></span><br><span>Race and Repair - Mitigating Wrong Results</span><br><span></span><br><span>The idea is that errors created by data races won&rsquo;t be prevented, they will</span><br><span>be repaired. A calculation made without locks will be wrong under concurrent</span><br><span>updates. So why not use some of our many cores to calculate the right answers</span><br><span>in the background, in parallel, and update the values? This approach:</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;Uses no synchronization.</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;Tolerates some wrong answers.</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;Probabilistically fixes the answers over time.</span><br><span></span><br><span>Some obvious questions: how many background threads do you need? What order</span><br><span>should values be recalculated? And how wrong will your answers be?</span><br><span></span><br><span>To figure this out an experiment was run and described in Inconsistency</span><br><span>Robustness for Scalability in Interactive Concurrent&#8209;Update In-Memory MOLAP</span><br><span>Cubes, which test updates on a complicated spreadsheet.</span><br><span></span><br><span>With locking the results were correct, but scalability was limited. Without</span><br><span>locking, results were usually wrong. Both results are as might be expected.</span><br><span>And when they added freshener threads they found:</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;Combining the frequency with the duration data suggests that in a 16-core</span><br><span>system with a 50/50 split between workers and fresheners, fewer than 2% of</span><br><span>the queries would return an answer that had been stale for at least eight</span><br><span>mean query times.</span><br><span></span><br><span>I found this result quite surprising. I would have expected the answers to be</span><br><span>wrong more of the time. Could this actually work?</span><br><span></span><br><span>Breadcrumbs</span><br><span></span><br><span>The simple idea of Race and Repair is open to a lot of clever innovations.</span><br><span>Breadcrumbs are one such innovation that attempts to be smarter about which</span><br><span>values need recalculating. Meta-data is attached on a value indicating that a</span><br><span>entity is recalculating the value or has changed a dependent value such that</span><br><span>this value is now out of date. Any entity that might want to use this data</span><br><span>can wait until a &ldquo;valid&rdquo; value is calculated and/or not to insert a</span><br><span>calculated value if it is out of data. This narrows the window of time in</span><br><span>which errors are introduced. It&rsquo;s a mitigation.</span><br><span></span><br><span>There are endless variations of this. I can imagine remembering calculations</span><br><span>that used a value and then publishing updated values so those calculations</span><br><span>can be rerun. The result is a roiling &nbsp;event driven sea that is constantly</span><br><span>bubbling with updates that are trying to bring values towards correctness,</span><br><span>but probably never quite getting there.</span><br><span></span><br><span>Probabilistic Data Structures</span><br><span></span><br><span>Another area David has researched are hashtables that can be inserted into</span><br><span>without synchronization. This would allow entries to be added to a hashtable</span><br><span>from any number of cores without slowing down the entire system with a lock.</span><br><span>Naive insertion into a hashtable in parallel will mess up pointers, which can</span><br><span>either result in the loss of values or the insertion of values, but it&rsquo;s</span><br><span>possible to work around these issues and create a lock free hashtable.</span><br><span></span><br><span>His presentation goes into a lot of detail on how this might work. He rejects</span><br><span>light-weight locking schmes like CAS because these are still a choke point</span><br><span>and there&rsquo;s a penalty for atomic instructions under contention. They won&rsquo;t</span><br><span>scale.</span><br><span></span><br><span>He thinks there&rsquo;s a big research opportunity in probabilistic data structures</span><br><span>that work without synchronization and that work with mitigation.</span><br><span></span><br><span>Is this the Future?</span><br><span></span><br><span>This is just a light weight introduction. For more details please read all</span><br><span>the papers and watch all the videos. But I think it&rsquo;s important to talk about</span><br><span>and think about how we might make use of all these cores for traditional</span><br><span>programming tasks, though the result may be anything but traditional.</span><br><span></span><br><span>A bad experience on one project makes me somewhat skeptical that human nature</span><br><span>will ever be comfortable in accepting wrong answers as the norm. My</span><br><span>experience report was on an event driven system with a large number of nodes</span><br><span>that could generate events so fast that events had to be dropped. Imagine a</span><br><span>city undergoing an earthquake and a huge sensor net spewing out change events</span><br><span>to tell the world what&rsquo;s going on in real-time.</span><br><span></span><br><span>The original requirement was events could never be dropped, ever, which made</span><br><span>a certain amount of sense when the number of sensors was small. As the number</span><br><span>of sensors expands it&rsquo;s simply not possible. My proposal was to drop events</span><br><span>and have background processes query the actual sensors in the background so</span><br><span>that the database would be synced over time. Very much like the proposals</span><br><span>here.</span><br><span></span><br><span>It was a no go. A vehement no go. Titanic arguments ensued. &nbsp;Management and</span><br><span>everyone on down simply could not accept the idea that their models would be</span><br><span>out of sync with the sensors (which of course they were anyway). My eventual</span><br><span>solution took a year to implement and radically changed everything, but that</span><br><span>was simpler than trying to convince people to deal with uncertainty.</span><br><span></span><br><span>So there might be some convincing to do.</span><br><span></span><br><span>Related Articles</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;Everything You Know (About Parallel Programming) Is Wrong!: A Wild Screed</span><br><span>About the Future (slides, video)</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;Renaissance Project</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;On Hacker News</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;David Ungar: It is Good to be Wrong</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;We Really Don't Know How To Compute!</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;Harnessing emergence for manycore programming: early experience</span><br><span>integrating ensembles, adverbs, and object-based inheritance</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;Inconsistency Robustness for Scalability in Interactive Concurrent&#8209;Update</span><br><span>In-Memory MOLAP Cubes</span><br><span></span><br><span></span><br><span>_______________________________________________</span><br><span>fonc mailing list</span><br><span><a href="mailto:fonc@..." target="_blank">fonc@...</a></span><br><span><a href="http://vpri.org/mailman/listinfo/fonc" target="_blank">http://vpri.org/mailman/listinfo/fonc</a></span><br>
</div></blockquote>
</div></div>
</div>
<br>_______________________________________________<br>
fonc mailing list<br><a href="mailto:fonc@...">fonc@...</a><br><a href="http://vpri.org/mailman/listinfo/fonc" target="_blank">http://vpri.org/mailman/listinfo/fonc</a><br><br>
</blockquote>
</div>
<br><br clear="all"><div><br></div>-- <br>bringing s-words to a pen fight<br>
</div></div></div></div>
</div>
Alan Kay | 14 Apr 13:22 2012
Picon

Re: Ask For Forgiveness Programming - Or How We'll Program 1000 Cores

This is a good idea (and, interestingly, was a common programming style in the first Simula) ...

Cheers,

Alan

From: David Barbour <dmbarbour <at> gmail.com>
To: Fundamentals of New Computing <fonc-uVco7kAcSAQ@public.gmane.org>
Sent: Friday, April 13, 2012 11:03 PM
Subject: Re: [fonc] Ask For Forgiveness Programming - Or How We'll Program 1000 Cores

Another option would be to introduce slack in the propagation delay itself. E.g. if I send you a message indicating the meeting has been moved to 1:30pm, it would be a good idea to send it a good bit in advance - perhaps at 10:30am - so you can schedule and prepare. 

With computers, the popular solution - sending a message *when* we want something to happen - seems analogous to sending the message at 1:29:58pm, leaving the computer always on the edge with little time to prepare even for highly predictable events, and making the system much more vulnerable to variations in communication latency.

On Fri, Apr 13, 2012 at 8:34 AM, David Goehrig <dave-VzFlwLIqKc6rG/iDocfnWg@public.gmane.org> wrote:
There's a very simple concept that most of the world embraces in everything from supply chain management, to personnel allocations, to personal relationships. 

We call it *slack*

What Dan is talking about amounts to introducing slack into distributed models.  Particularly this version of the definition of slack:

": lacking in completeness, finish, or perfection <a very slack piece of work>"

Which is a more realistic version of computation in a universe with propagation delay (finite speed of light). But it also introduces a concept similar to anyone familiar with ropes. You can't tie a knot without some slack. (computation being an exercise in binary sequence knot making). Finishing a computation is analogous to pulling the rope taunt. 

Dave




On Apr 13, 2012, at 5:53 AM, Eugen Leitl <eugen-WaCMhLcPx2TYtjvyW6yDsg@public.gmane.org> wrote:


http://highscalability.com/blog/2012/3/6/ask-for-forgiveness-programming-or-how-well-program-1000-cor.html

Ask For Forgiveness Programming - Or How We'll Program 1000 Cores

Tuesday, March 6, 2012 at 9:15AM

The argument for a massively multicore future is now familiar: while clock
speeds have leveled off, device density is increasing, so the future is cheap
chips with hundreds and thousands of cores. That’s the inexorable logic
behind our multicore future.

The unsolved question that lurks deep in the dark part of a programmer’s mind
is: how on earth are we to program these things? For problems that aren’t
embarrassingly parallel, we really have no idea. IBM Research’s David Ungar
has an idea. And it’s radical in the extreme...

Grace Hopper once advised “It's easier to ask for forgiveness than it is to
get permission.” I wonder if she had any idea that her strategy for dealing
with human bureaucracy would the same strategy David Ungar thinks will help
us tame  the technological bureaucracy of 1000+ core systems?

You may recognize David as the co-creator of the Self programming language,
inspiration for the HotSpot technology in the JVM and the prototype model
used by Javascript. He’s also the innovator behind using cartoon animation
techniques to build user interfaces. Now he’s applying that same creative
zeal to solving the multicore problem.

During a talk on his research, Everything You Know (about Parallel
Programming) Is Wrong! A Wild Screed about the Future, he called his approach
“anti-lock or “race and repair” because the core idea is that the only way
we’re going to be able to program the new multicore chips of the future is to
sidestep Amdhal’s Law and program without serialization, without locks,
embracing non-determinism. Without locks calculations will obviously be
wrong, but correct answers can be approached over time using techniques like
fresheners:

   A thread that, instead of responding to user requests, repeatedly selects
a cached value according to some strategy, and recomputes that value from its
inputs, in case the value had been inconsistent. Experimentation with a
prototype showed that on a 16-core system with a 50/50 split between workers
and fresheners, fewer than 2% of the queries would return an answer that had
been stale for at least eight mean query times. These results suggest that
tolerance of inconsistency can be an effective strategy in circumventing
Amdahl’s law.

During his talk David mentioned that he’s trying  to find a better name than
“anti-lock or “race and repair” for this line of thinking. Throwing my hat
into the name game, I want to call it Ask For Forgiveness Programming (AFFP),
based on the idea that using locks is “asking for permission” programming, so
not using locks along with fresheners is really “asking for forgiveness.” I
think it works, but it’s just a thought.

No Shared Lock Goes Unpunished

Amdahl’s Law is used to understand why simply having more cores won’t save us
for a large class of problems. The idea is that any program is made up of a
serial fraction and a parallel fraction. More cores only helps you with the
parallel portion. If an operation takes 10 seconds, for example, and one
second of it is serial, then having infinitely many cores will only help you
make the parallelizable part faster, the serial code will always take  one
second. Amdahl says you can never go faster than that 10%. As long as your
code has a serial portion it’s impossible to go faster.

Jakob Engblom recounts a similar line of thought in his blog:

   They also had the opportunity to test their solution [for parallel
Erlang] on a Tilera 64-core machines. This mercilessly exposed any
scalability limitations in their system, and proved the conventional wisdom
that going beyond 10+ cores is quite different from scaling from 1 to 8… The
two key lessons they learned was that no shared lock goes unpunished, and
data has to be distributed as well as code.

   It seems that for all big system parallelization efforts turn into a hunt
for locks and the splitting up of code and data into units that can run in
parallel without having to synchronize. The “upper limit” of this process is
clearly a system with no synchronization points at all.

Sherlock Holmes says that when you have eliminated the impossible, whatever
remains, however improbable, must be the truth, so the truth is: removing
serialization is the only way to use all these cores. Since synchronization
is the core serial component to applications, we must get rid of
synchronization.

A Transaction View Isn’t as Strong as it Once Was

Getting rid of locks/synchronization may not be the radical notion it once
was. Developments over the last few years have conditioned us to deal with
more ambiguity, at least for distributed systems.

Through many discussions about CAP, we’ve come to accept a non-transactional
view of databases as a way to preserve availability in the face of
partitions. Even if a read doesn’t return the last write, we know it will
eventually and that some merge logic will make them consistent once again.

The idea of compensating transactions as a way around the performance
problems due to distributed coordination has also become more familiar.

Another related idea comes from the realm of optimistic concurrency control
that lets multiple transactions proceed in parallel without locking and then
checks at commit time if a conflict requires a rollback. The application then
gets to try again.

And for some time now memcache has supported a compare-and-set operation that
allows multiple clients to avoid writing over each other by comparing time
stamps.

As relaxed as these methods are, they all still require that old enemy:
synchronization.

Your Answers are Already Wrong

The core difficulty with abandoning synchronization is coming to terms with
the notion that results may not be correct at all times. It’s a certainty we
love certainty, so even considering we could get wrong answers for any window
of time is heresy.

David says we need to change our way of thinking. The idea is to get results
that are “good enough, soon enough.” Get wrong answers quickly, but that are
still right enough to be useful. Perfect is the enemy of the good and perfect
answers simply take too long at scale.

David emphasises repeatedly that there’s a fundamental trade-off between
correctness and performance. Without locks operations happen out of order,
which gives the wrong answer. Race conditions happen. To get correct answers
we effectively add in delays, making one thing wait for another, which kills
performance, and we don’t want to kill performance, we want to use all those
cores.

But consider an aspect of working on distributed systems that people don’t
like to think about: your answers are already always wrong.

Unless you are querying a read-only corpus or use a global lock, in
distributed systems any answer to any query is potentially wrong, always.
This is the hardest idea to get into your brain when working in distributed
systems with many cores that experience simultaneous updates. The world never
stops. It’s always in flux. You can never assume for a single moment there’s
a stable frame of reference. The answer from a query you just made could be
wrong in the next instant. A query to see if a host is up, for example, can’t
ever be assumed right. That host maybe up or down in the next instant and
your code won’t know about it until it finds a conflict.

So are we really that far away from accepting that all answers to queries
could be wrong?

Lessons from Nature

One strategy for dealing with many cores is to move towards biological models
instead of mathematical models, where complicated behaviours emerge without
global determinism. Bird flocks, for example, emerge from three simple rules:
avoid crowding, steer towards average heading of neighbors,  steer towards
average position of neighbors. No pi-calculus required, it works without
synchronization or coordination. Each bird is essentially its own thread, it
simply looks around and makes local decisions. This is a more cellular
automaton view of the world.

Race and Repair - Mitigating Wrong Results

The idea is that errors created by data races won’t be prevented, they will
be repaired. A calculation made without locks will be wrong under concurrent
updates. So why not use some of our many cores to calculate the right answers
in the background, in parallel, and update the values? This approach:

   Uses no synchronization.

   Tolerates some wrong answers.

   Probabilistically fixes the answers over time.

Some obvious questions: how many background threads do you need? What order
should values be recalculated? And how wrong will your answers be?

To figure this out an experiment was run and described in Inconsistency
Robustness for Scalability in Interactive Concurrent‑Update In-Memory MOLAP
Cubes, which test updates on a complicated spreadsheet.

With locking the results were correct, but scalability was limited. Without
locking, results were usually wrong. Both results are as might be expected.
And when they added freshener threads they found:

   Combining the frequency with the duration data suggests that in a 16-core
system with a 50/50 split between workers and fresheners, fewer than 2% of
the queries would return an answer that had been stale for at least eight
mean query times.

I found this result quite surprising. I would have expected the answers to be
wrong more of the time. Could this actually work?

Breadcrumbs

The simple idea of Race and Repair is open to a lot of clever innovations.
Breadcrumbs are one such innovation that attempts to be smarter about which
values need recalculating. Meta-data is attached on a value indicating that a
entity is recalculating the value or has changed a dependent value such that
this value is now out of date. Any entity that might want to use this data
can wait until a “valid” value is calculated and/or not to insert a
calculated value if it is out of data. This narrows the window of time in
which errors are introduced. It’s a mitigation.

There are endless variations of this. I can imagine remembering calculations
that used a value and then publishing updated values so those calculations
can be rerun. The result is a roiling  event driven sea that is constantly
bubbling with updates that are trying to bring values towards correctness,
but probably never quite getting there.

Probabilistic Data Structures

Another area David has researched are hashtables that can be inserted into
without synchronization. This would allow entries to be added to a hashtable
from any number of cores without slowing down the entire system with a lock.
Naive insertion into a hashtable in parallel will mess up pointers, which can
either result in the loss of values or the insertion of values, but it’s
possible to work around these issues and create a lock free hashtable.

His presentation goes into a lot of detail on how this might work. He rejects
light-weight locking schmes like CAS because these are still a choke point
and there’s a penalty for atomic instructions under contention. They won’t
scale.

He thinks there’s a big research opportunity in probabilistic data structures
that work without synchronization and that work with mitigation.

Is this the Future?

This is just a light weight introduction. For more details please read all
the papers and watch all the videos. But I think it’s important to talk about
and think about how we might make use of all these cores for traditional
programming tasks, though the result may be anything but traditional.

A bad experience on one project makes me somewhat skeptical that human nature
will ever be comfortable in accepting wrong answers as the norm. My
experience report was on an event driven system with a large number of nodes
that could generate events so fast that events had to be dropped. Imagine a
city undergoing an earthquake and a huge sensor net spewing out change events
to tell the world what’s going on in real-time.

The original requirement was events could never be dropped, ever, which made
a certain amount of sense when the number of sensors was small. As the number
of sensors expands it’s simply not possible. My proposal was to drop events
and have background processes query the actual sensors in the background so
that the database would be synced over time. Very much like the proposals
here.

It was a no go. A vehement no go. Titanic arguments ensued.  Management and
everyone on down simply could not accept the idea that their models would be
out of sync with the sensors (which of course they were anyway). My eventual
solution took a year to implement and radically changed everything, but that
was simpler than trying to convince people to deal with uncertainty.

So there might be some convincing to do.

Related Articles

   Everything You Know (About Parallel Programming) Is Wrong!: A Wild Screed
About the Future (slides, video)

   Renaissance Project

   On Hacker News

   David Ungar: It is Good to be Wrong

   We Really Don't Know How To Compute!

   Harnessing emergence for manycore programming: early experience
integrating ensembles, adverbs, and object-based inheritance

   Inconsistency Robustness for Scalability in Interactive Concurrent‑Update
In-Memory MOLAP Cubes


_______________________________________________
fonc mailing list
fonc-uVco7kAcSAQ@public.gmane.org
http://vpri.org/mailman/listinfo/fonc

_______________________________________________
fonc mailing list
fonc-uVco7kAcSAQ@public.gmane.org
http://vpri.org/mailman/listinfo/fonc




--
bringing s-words to a pen fight

_______________________________________________
fonc mailing list
fonc-uVco7kAcSAQ@public.gmane.org
http://vpri.org/mailman/listinfo/fonc


<div><div>
<div><span>This is a good idea (and, interestingly, was a common programming style in the first Simula) ...</span></div>
<div>
<br><span></span>
</div>
<div><span>Cheers,</span></div>
<div>
<br><span></span>
</div>
<div><span>Alan<br></span></div>
<div>
<br><blockquote>  <div> <div> <div dir="ltr">  <span>From:</span> David Barbour &lt;dmbarbour <at> gmail.com&gt;<br><span>To:</span> Fundamentals of New Computing &lt;fonc@...&gt; <br><span>Sent:</span> Friday, April 13, 2012 11:03 PM<br><span>Subject:</span> Re: [fonc] Ask For Forgiveness Programming - Or How We'll Program 1000 Cores<br> </div> <br><div>Another option would be to introduce slack in the propagation delay itself. E.g. if I send you a message indicating the meeting has been moved to 1:30pm, it would be a good idea to send it a good bit in advance - perhaps at 10:30am - so you can schedule and prepare.&nbsp;<div>
<br>
</div>
<div>With computers, the popular solution - sending a message *when* we want something to happen - seems analogous to sending the message at 1:29:58pm, leaving the computer always on the edge with little time to prepare even for highly predictable events, and making the system much more vulnerable to variations in communication latency.</div>
<div><br></div>
<div><div><div><div>
<div class="yiv1561138434gmail_quote">On Fri, Apr 13, 2012 at 8:34 AM, David Goehrig <span dir="ltr">&lt;<a rel="nofollow" ymailto="mailto:dave@..." target="_blank" href="mailto:dave <at> nexttolast.com">dave@...</a>&gt;</span> wrote:<br><blockquote class="yiv1561138434gmail_quote">
<div>
<div>There's a very simple concept that most of the world embraces in everything from supply chain management, to personnel allocations, to personal relationships.&nbsp;</div>
<div><br></div>
<div>We call it *slack*≤/div>
<div><br></div>
<div>What Dan is talking about amounts to introducing slack into distributed models. &nbsp;Particularly this version of the definition of slack:</div>
<div><br></div>
<div>":&nbsp;lacking in completeness, finish, or perfection&nbsp;<span>&lt;a very slack piece of work&gt;"</span>
</div>
<div><br></div>
<div>Which is a more realistic version of computation in a universe with propagation delay (finite speed of light). But it also introduces a concept similar to anyone familiar with ropes. You can't tie a knot without some slack. (computation being an exercise in binary sequence knot making). Finishing a computation is analogous to pulling the rope taunt.&nbsp;</div>
<div><br></div>
<div>Dave</div>
<div><br></div>
<div><br></div>
<div><br></div>
<div>
<br><br>-=-=- <a rel="nofollow" ymailto="mailto:dave@..." target="_blank" href="mailto:dave <at> nexttolast.com">dave@...</a> -=-=-</div>
<div><div class="yiv1561138434h5">
<div>
<br>On Apr 13, 2012, at 5:53 AM, Eugen Leitl &lt;<a rel="nofollow" ymailto="mailto:eugen@..." target="_blank" href="mailto:eugen <at> leitl.org">eugen@...</a>&gt; wrote:<br><br>
</div>
<div></div>
<blockquote type="cite"><div>
<span></span><br><span>http://highscalability.com/blog/2012/3/6/ask-for-forgiveness-programming-or-how-well-program-1000-cor.html</span><br><span></span><br><span>Ask For Forgiveness Programming - Or How We'll Program 1000 Cores</span><br><span></span><br><span>Tuesday, March 6, 2012 at 9:15AM</span><br><span></span><br><span>The argument for a massively multicore future is now familiar: while clock</span><br><span>speeds have leveled off, device density is increasing, so the future is cheap</span><br><span>chips with hundreds and thousands of cores. That&rsquo;s the inexorable logic</span><br><span>behind our multicore future.</span><br><span></span><br><span>The unsolved question that lurks deep in the dark part of a programmer&rsquo;s mind</span><br><span>is: how on earth are we to program these things? For problems that aren&rsquo;t</span><br><span>embarrassingly parallel, we really have no idea. IBM Research&rsquo;s David Ungar</span><br><span>has an idea. And it&rsquo;s radical in the extreme...</span><br><span></span><br><span>Grace Hopper once advised &ldquo;It's easier to ask for forgiveness than it is to</span><br><span>get permission.&rdquo; I wonder if she had any idea that her strategy for dealing</span><br><span>with human bureaucracy would the same strategy David Ungar thinks will help</span><br><span>us tame &nbsp;the technological bureaucracy of 1000+ core systems?</span><br><span></span><br><span>You may recognize David as the co-creator of the Self programming language,</span><br><span>inspiration for the HotSpot technology in the JVM and the prototype model</span><br><span>used by Javascript. He&rsquo;s also the innovator behind using cartoon animation</span><br><span>techniques to build user interfaces. Now he&rsquo;s applying that same creative</span><br><span>zeal to solving the multicore problem.</span><br><span></span><br><span>During a talk on his research, Everything You Know (about Parallel</span><br><span>Programming) Is Wrong! A Wild Screed about the Future, he called his approach</span><br><span>&ldquo;anti-lock or &ldquo;race and repair&rdquo; because the core idea is that the only way</span><br><span>we&rsquo;re going to be able to program the new multicore chips of the future is to</span><br><span>sidestep Amdhal&rsquo;s Law and program without serialization, without locks,</span><br><span>embracing non-determinism. Without locks calculations will obviously be</span><br><span>wrong, but correct answers can be approached over time using techniques like</span><br><span>fresheners:</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;A thread that, instead of responding to user requests, repeatedly selects</span><br><span>a cached value according to some strategy, and recomputes that value from its</span><br><span>inputs, in case the value had been inconsistent. Experimentation with a</span><br><span>prototype showed that on a 16-core system with a 50/50 split between workers</span><br><span>and fresheners, fewer than 2% of the queries would return an answer that had</span><br><span>been stale for at least eight mean query times. These results suggest that</span><br><span>tolerance of inconsistency can be an effective strategy in circumventing</span><br><span>Amdahl&rsquo;s law.</span><br><span></span><br><span>During his talk David mentioned that he&rsquo;s trying &nbsp;to find a better name than</span><br><span>&ldquo;anti-lock or &ldquo;race and repair&rdquo; for this line of thinking. Throwing my hat</span><br><span>into the name game, I want to call it Ask For Forgiveness Programming (AFFP),</span><br><span>based on the idea that using locks is &ldquo;asking for permission&rdquo; programming, so</span><br><span>not using locks along with fresheners is really &ldquo;asking for forgiveness.&rdquo; I</span><br><span>think it works, but it&rsquo;s just a thought.</span><br><span></span><br><span>No Shared Lock Goes Unpunished</span><br><span></span><br><span>Amdahl&rsquo;s Law is used to understand why simply having more cores won&rsquo;t save us</span><br><span>for a large class of problems. The idea is that any program is made up of a</span><br><span>serial fraction and a parallel fraction. More cores only helps you with the</span><br><span>parallel portion. If an operation takes 10 seconds, for example, and one</span><br><span>second of it is serial, then having infinitely many cores will only help you</span><br><span>make the parallelizable part faster, the serial code will always take &nbsp;one</span><br><span>second. Amdahl says you can never go faster than that 10%. As long as your</span><br><span>code has a serial portion it&rsquo;s impossible to go faster.</span><br><span></span><br><span>Jakob Engblom recounts a similar line of thought in his blog:</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;They also had the opportunity to test their solution [for parallel</span><br><span>Erlang] on a Tilera 64-core machines. This mercilessly exposed any</span><br><span>scalability limitations in their system, and proved the conventional wisdom</span><br><span>that going beyond 10+ cores is quite different from scaling from 1 to 8&hellip; The</span><br><span>two key lessons they learned was that no shared lock goes unpunished, and</span><br><span>data has to be distributed as well as code.</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;It seems that for all big system parallelization efforts turn into a hunt</span><br><span>for locks and the splitting up of code and data into units that can run in</span><br><span>parallel without having to synchronize. The &ldquo;upper limit&rdquo; of this process is</span><br><span>clearly a system with no synchronization points at all.</span><br><span></span><br><span>Sherlock Holmes says that when you have eliminated the impossible, whatever</span><br><span>remains, however improbable, must be the truth, so the truth is: removing</span><br><span>serialization is the only way to use all these cores. Since synchronization</span><br><span>is the core serial component to applications, we must get rid of</span><br><span>synchronization.</span><br><span></span><br><span>A Transaction View Isn&rsquo;t as Strong as it Once Was</span><br><span></span><br><span>Getting rid of locks/synchronization may not be the radical notion it once</span><br><span>was. Developments over the last few years have conditioned us to deal with</span><br><span>more ambiguity, at least for distributed systems.</span><br><span></span><br><span>Through many discussions about CAP, we&rsquo;ve come to accept a non-transactional</span><br><span>view of databases as a way to preserve availability in the face of</span><br><span>partitions. Even if a read doesn&rsquo;t return the last write, we know it will</span><br><span>eventually and that some merge logic will make them consistent once again.</span><br><span></span><br><span>The idea of compensating transactions as a way around the performance</span><br><span>problems due to distributed coordination has also become more familiar.</span><br><span></span><br><span>Another related idea comes from the realm of optimistic concurrency control</span><br><span>that lets multiple transactions proceed in parallel without locking and then</span><br><span>checks at commit time if a conflict requires a rollback. The application then</span><br><span>gets to try again.</span><br><span></span><br><span>And for some time now memcache has supported a compare-and-set operation that</span><br><span>allows multiple clients to avoid writing over each other by comparing time</span><br><span>stamps.</span><br><span></span><br><span>As relaxed as these methods are, they all still require that old enemy:</span><br><span>synchronization.</span><br><span></span><br><span>Your Answers are Already Wrong</span><br><span></span><br><span>The core difficulty with abandoning synchronization is coming to terms with</span><br><span>the notion that results may not be correct at all times. It&rsquo;s a certainty we</span><br><span>love certainty, so even considering we could get wrong answers for any window</span><br><span>of time is heresy.</span><br><span></span><br><span>David says we need to change our way of thinking. The idea is to get results</span><br><span>that are &ldquo;good enough, soon enough.&rdquo; Get wrong answers quickly, but that are</span><br><span>still right enough to be useful. Perfect is the enemy of the good and perfect</span><br><span>answers simply take too long at scale.</span><br><span></span><br><span>David emphasises repeatedly that there&rsquo;s a fundamental trade-off between</span><br><span>correctness and performance. Without locks operations happen out of order,</span><br><span>which gives the wrong answer. Race conditions happen. To get correct answers</span><br><span>we effectively add in delays, making one thing wait for another, which kills</span><br><span>performance, and we don&rsquo;t want to kill performance, we want to use all those</span><br><span>cores.</span><br><span></span><br><span>But consider an aspect of working on distributed systems that people don&rsquo;t</span><br><span>like to think about: your answers are already always wrong.</span><br><span></span><br><span>Unless you are querying a read-only corpus or use a global lock, in</span><br><span>distributed systems any answer to any query is potentially wrong, always.</span><br><span>This is the hardest idea to get into your brain when working in distributed</span><br><span>systems with many cores that experience simultaneous updates. The world never</span><br><span>stops. It&rsquo;s always in flux. You can never assume for a single moment there&rsquo;s</span><br><span>a stable frame of reference. The answer from a query you just made could be</span><br><span>wrong in the next instant. A query to see if a host is up, for example, can&rsquo;t</span><br><span>ever be assumed right. That host maybe up or down in the next instant and</span><br><span>your code won&rsquo;t know about it until it finds a conflict.</span><br><span></span><br><span>So are we really that far away from accepting that all answers to queries</span><br><span>could be wrong?</span><br><span></span><br><span>Lessons from Nature</span><br><span></span><br><span>One strategy for dealing with many cores is to move towards biological models</span><br><span>instead of mathematical models, where complicated behaviours emerge without</span><br><span>global determinism. Bird flocks, for example, emerge from three simple rules:</span><br><span>avoid crowding, steer towards average heading of neighbors, &nbsp;steer towards</span><br><span>average position of neighbors. No pi-calculus required, it works without</span><br><span>synchronization or coordination. Each bird is essentially its own thread, it</span><br><span>simply looks around and makes local decisions. This is a more cellular</span><br><span>automaton view of the world.</span><br><span></span><br><span>Race and Repair - Mitigating Wrong Results</span><br><span></span><br><span>The idea is that errors created by data races won&rsquo;t be prevented, they will</span><br><span>be repaired. A calculation made without locks will be wrong under concurrent</span><br><span>updates. So why not use some of our many cores to calculate the right answers</span><br><span>in the background, in parallel, and update the values? This approach:</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;Uses no synchronization.</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;Tolerates some wrong answers.</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;Probabilistically fixes the answers over time.</span><br><span></span><br><span>Some obvious questions: how many background threads do you need? What order</span><br><span>should values be recalculated? And how wrong will your answers be?</span><br><span></span><br><span>To figure this out an experiment was run and described in Inconsistency</span><br><span>Robustness for Scalability in Interactive Concurrent&#8209;Update In-Memory MOLAP</span><br><span>Cubes, which test updates on a complicated spreadsheet.</span><br><span></span><br><span>With locking the results were correct, but scalability was limited. Without</span><br><span>locking, results were usually wrong. Both results are as might be expected.</span><br><span>And when they added freshener threads they found:</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;Combining the frequency with the duration data suggests that in a 16-core</span><br><span>system with a 50/50 split between workers and fresheners, fewer than 2% of</span><br><span>the queries would return an answer that had been stale for at least eight</span><br><span>mean query times.</span><br><span></span><br><span>I found this result quite surprising. I would have expected the answers to be</span><br><span>wrong more of the time. Could this actually work?</span><br><span></span><br><span>Breadcrumbs</span><br><span></span><br><span>The simple idea of Race and Repair is open to a lot of clever innovations.</span><br><span>Breadcrumbs are one such innovation that attempts to be smarter about which</span><br><span>values need recalculating. Meta-data is attached on a value indicating that a</span><br><span>entity is recalculating the value or has changed a dependent value such that</span><br><span>this value is now out of date. Any entity that might want to use this data</span><br><span>can wait until a &ldquo;valid&rdquo; value is calculated and/or not to insert a</span><br><span>calculated value if it is out of data. This narrows the window of time in</span><br><span>which errors are introduced. It&rsquo;s a mitigation.</span><br><span></span><br><span>There are endless variations of this. I can imagine remembering calculations</span><br><span>that used a value and then publishing updated values so those calculations</span><br><span>can be rerun. The result is a roiling &nbsp;event driven sea that is constantly</span><br><span>bubbling with updates that are trying to bring values towards correctness,</span><br><span>but probably never quite getting there.</span><br><span></span><br><span>Probabilistic Data Structures</span><br><span></span><br><span>Another area David has researched are hashtables that can be inserted into</span><br><span>without synchronization. This would allow entries to be added to a hashtable</span><br><span>from any number of cores without slowing down the entire system with a lock.</span><br><span>Naive insertion into a hashtable in parallel will mess up pointers, which can</span><br><span>either result in the loss of values or the insertion of values, but it&rsquo;s</span><br><span>possible to work around these issues and create a lock free hashtable.</span><br><span></span><br><span>His presentation goes into a lot of detail on how this might work. He rejects</span><br><span>light-weight locking schmes like CAS because these are still a choke point</span><br><span>and there&rsquo;s a penalty for atomic instructions under contention. They won&rsquo;t</span><br><span>scale.</span><br><span></span><br><span>He thinks there&rsquo;s a big research opportunity in probabilistic data structures</span><br><span>that work without synchronization and that work with mitigation.</span><br><span></span><br><span>Is this the Future?</span><br><span></span><br><span>This is just a light weight introduction. For more details please read all</span><br><span>the papers and watch all the videos. But I think it&rsquo;s important to talk about</span><br><span>and think about how we might make use of all these cores for traditional</span><br><span>programming tasks, though the result may be anything but traditional.</span><br><span></span><br><span>A bad experience on one project makes me somewhat skeptical that human nature</span><br><span>will ever be comfortable in accepting wrong answers as the norm. My</span><br><span>experience report was on an event driven system with a large number of nodes</span><br><span>that could generate events so fast that events had to be dropped. Imagine a</span><br><span>city undergoing an earthquake and a huge sensor net spewing out change events</span><br><span>to tell the world what&rsquo;s going on in real-time.</span><br><span></span><br><span>The original requirement was events could never be dropped, ever, which made</span><br><span>a certain amount of sense when the number of sensors was small. As the number</span><br><span>of sensors expands it&rsquo;s simply not possible. My proposal was to drop events</span><br><span>and have background processes query the actual sensors in the background so</span><br><span>that the database would be synced over time. Very much like the proposals</span><br><span>here.</span><br><span></span><br><span>It was a no go. A vehement no go. Titanic arguments ensued. &nbsp;Management and</span><br><span>everyone on down simply could not accept the idea that their models would be</span><br><span>out of sync with the sensors (which of course they were anyway). My eventual</span><br><span>solution took a year to implement and radically changed everything, but that</span><br><span>was simpler than trying to convince people to deal with uncertainty.</span><br><span></span><br><span>So there might be some convincing to do.</span><br><span></span><br><span>Related Articles</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;Everything You Know (About Parallel Programming) Is Wrong!: A Wild Screed</span><br><span>About the Future (slides, video)</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;Renaissance Project</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;On Hacker News</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;David Ungar: It is Good to be Wrong</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;We Really Don't Know How To Compute!</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;Harnessing emergence for manycore programming: early experience</span><br><span>integrating ensembles, adverbs, and object-based inheritance</span><br><span></span><br><span> &nbsp;&nbsp;&nbsp;Inconsistency Robustness for Scalability in Interactive Concurrent&#8209;Update</span><br><span>In-Memory MOLAP Cubes</span><br><span></span><br><span></span><br><span>_______________________________________________</span><br><span>fonc mailing list</span><br><span><a rel="nofollow" ymailto="mailto:fonc@..." target="_blank" href="mailto:fonc@...">fonc@...</a></span><br><span><a rel="nofollow" target="_blank" href="http://vpri.org/mailman/listinfo/fonc">http://vpri.org/mailman/listinfo/fonc</a></span><br>
</div></blockquote>
</div></div>
</div>
<br>_______________________________________________<br>
fonc mailing list<br><a rel="nofollow" ymailto="mailto:fonc@..." target="_blank" href="mailto:fonc@...">fonc@...</a><br><a rel="nofollow" target="_blank" href="http://vpri.org/mailman/listinfo/fonc">http://vpri.org/mailman/listinfo/fonc</a><br><br>
</blockquote>
</div>
<br><br clear="all"><div><br></div>-- <br>bringing s-words to a pen fight<br>
</div></div></div></div>
</div>
<br>_______________________________________________<br>fonc mailing list<br><a ymailto="mailto:fonc@...g" href="mailto:fonc@...">fonc@...</a><br><a href="http://vpri.org/mailman/listinfo/fonc" target="_blank">http://vpri.org/mailman/listinfo/fonc</a><br><br><br>
</div> </div> </blockquote>
</div>   </div></div>

Gmane