Magnus | 31 Jul 2006 15:57
Picon

isconf deprecates infrastructures.org?

I am reading up on isconf at http://trac.t7a.org/isconf/ and a lot of
what I'm reading would seem to contradict or deprecate the bootstrap
checklist for infrastructures.org.

Does the site need to be revamped to reflect this development?
Steve Traugott | 14 Aug 2006 08:08

Re: isconf deprecates infrastructures.org?

On Mon, Jul 31, 2006 at 09:57:07AM -0400, Magnus wrote:
> I am reading up on isconf at http://trac.t7a.org/isconf/ and a lot of
> what I'm reading would seem to contradict or deprecate the bootstrap
> checklist for infrastructures.org.

Good eyes.  ;-)

> Does the site need to be revamped to reflect this development?

The short answer is yes.  Right now I'm up to my eyeballs in building
out the new infrastructure for t7a.org; this time around it's
Kerberos, AFS, LDAP, Xen, isconf, and a few other things...  All of
our web sites, including infrastructures.org, are getting moved into
the new world after that.  In the case of infrastructures.org, it's
moving into a wiki market -- that's a Trac site with the tracmarket
plugin added (http://trac-hacks.org/wiki/MarketPlugin)...  If you
haven't read up on decision markets, now would be a good time to
start.  ;-)

As for the discrepancies between infrastructures.org and isconf 4, see
http://www.infrastructures.org/papers/turing/turing.html in case you
haven't stumbled across it yet; we got a lot of things right in the
earlier 'bootstrapping' paper (which infrastructures.org is based on),
but I think we were just plain lucky we didn't shoot ourselves in the
foot a few times.  If nothing else, read the Foreword in 'turing'.

Steve
--

-- 
Stephen G. Traugott  (KG6HDQ)
Managing Partner, TerraLuna LLC
(Continue reading)

Florian Heigl | 17 Aug 2006 12:16
Picon
Gravatar

Re: isconf deprecates infrastructures.org?

Hi Steve,

2006/8/14, Steve Traugott <stevegt <at> terraluna.org>:
> On Mon, Jul 31, 2006 at 09:57:07AM -0400, Magnus wrote:
> > I am reading up on isconf at http://trac.t7a.org/isconf/ and a lot of
> > what I'm reading would seem to contradict or deprecate the bootstrap
> > checklist for infrastructures.org.

> The short answer is yes.  Right now I'm up to my eyeballs in building
> out the new infrastructure for t7a.org; this time around it's
> Kerberos, AFS, LDAP, Xen, isconf, and a few other things...  All of
> our web sites, including infrastructures.org, are getting moved into
> the new world after that.  In the case of infrastructures.org, it's
> moving into a wiki market -- that's a Trac site with the tracmarket
> plugin added (http://trac-hacks.org/wiki/MarketPlugin)...  If you
> haven't read up on decision markets, now would be a good time to
> start.  ;-)

would You consider documenting a little of Your planning work?
I'm currently building a new xen host with a lot of scripts for
automated building of domU's and (slowly) trying out the different
tools around, and while I'm slowly making isconf work for it, it would
be great to see how someone else does the same thing...

Florian
Jordan Curzon | 17 Aug 2006 16:38
Picon
Gravatar

Re: Re: isconf deprecates infrastructures.org?

This might be interesting for you, although it is not isconf stuff.

These are three scripts that I use to build my xen guests with Ubuntu.
create-rootfs.sh makes a base image and prepare-xen.sh copies that to
an LVM disk and gives the machine it's own identity (SSH and
hostname). setup-dev.sh is an example of the setup script that gets
run to provision a server for a specific role. setup-xen.sh is a
script that will setup ubuntu as a xen dom0 host.

I use these and with a proxy for downloading the packages I can
reimage a xen guest and have it fully configured in about 3min. It is
wonderfull for testing configurations and making sure that what I have
is (almost) always reproducible.

On 8/17/06, Florian Heigl <florian.heigl <at> gmail.com> wrote:
> Hi Steve,
>
> 2006/8/14, Steve Traugott <stevegt <at> terraluna.org>:
> > On Mon, Jul 31, 2006 at 09:57:07AM -0400, Magnus wrote:
> > > I am reading up on isconf at http://trac.t7a.org/isconf/ and a lot of
> > > what I'm reading would seem to contradict or deprecate the bootstrap
> > > checklist for infrastructures.org.
>
> > The short answer is yes.  Right now I'm up to my eyeballs in building
> > out the new infrastructure for t7a.org; this time around it's
> > Kerberos, AFS, LDAP, Xen, isconf, and a few other things...  All of
> > our web sites, including infrastructures.org, are getting moved into
> > the new world after that.  In the case of infrastructures.org, it's
> > moving into a wiki market -- that's a Trac site with the tracmarket
> > plugin added (http://trac-hacks.org/wiki/MarketPlugin)...  If you
(Continue reading)

Matthew Palmer | 17 Aug 2006 23:42
Favicon
Gravatar

Re: Re: isconf deprecates infrastructures.org?

On Thu, Aug 17, 2006 at 08:38:17AM -0600, Jordan Curzon wrote:
> This might be interesting for you, although it is not isconf stuff.
> 
> These are three scripts that I use to build my xen guests with Ubuntu.
> create-rootfs.sh makes a base image and prepare-xen.sh copies that to
> an LVM disk and gives the machine it's own identity (SSH and
> hostname). setup-dev.sh is an example of the setup script that gets
> run to provision a server for a specific role. setup-xen.sh is a
> script that will setup ubuntu as a xen dom0 host.

There's Steve Kemp's xen-tools package, too, which does the same thing, in
what is probably a more generalised manner.  It's cross-distro, now, too.

- Matt
Steve Traugott | 19 Sep 2006 05:42

Re: Re: Re: isconf deprecates infrastructures.org?

On Fri, Aug 18, 2006 at 07:42:18AM +1000, Matthew Palmer wrote:
> On Thu, Aug 17, 2006 at 08:38:17AM -0600, Jordan Curzon wrote:
> > This might be interesting for you, although it is not isconf stuff.
> > 
> > These are three scripts that I use to build my xen guests with Ubuntu.
> > create-rootfs.sh makes a base image and prepare-xen.sh copies that to
> > an LVM disk and gives the machine it's own identity (SSH and
> > hostname). setup-dev.sh is an example of the setup script that gets
> > run to provision a server for a specific role. setup-xen.sh is a
> > script that will setup ubuntu as a xen dom0 host.
> 
> There's Steve Kemp's xen-tools package, too, which does the same thing, in
> what is probably a more generalised manner.  It's cross-distro, now, too.

Kris Buytaert also did a writeup on using systemimager for Xen guest
images:

    http://howto.x-tend.be/AutomatingVirtualMachineDeployment/

Steve
--

-- 
Stephen G. Traugott  (KG6HDQ)
Managing Partner, TerraLuna LLC
stevegt <at> TerraLuna.Org -- http://www.t7a.org
Mark Ferlatte | 13 Aug 2006 07:07
Gravatar

Re: isconf deprecates infrastructures.org?

Magnus said on Mon, Jul 31, 2006 at 09:57:07AM -0400:
> I am reading up on isconf at http://trac.t7a.org/isconf/ and a lot of
> what I'm reading would seem to contradict or deprecate the bootstrap
> checklist for infrastructures.org.
> 
> Does the site need to be revamped to reflect this development?

Perhaps this is no longer the correct list, but:

Something that I don't see isconf doing is recovering from user error
(at least, not very well).

Let say that I have an environment with a bunch of developers, and those
developers, for example, have root on a set of machines.  Being
developers, they want to be able to install things temporarily "as
needed", and I want to be able to restore the machines back to baseline
quickly and easily.

isconf seems to indicate that it will just totally fail in that
scenerio, requiring a fully machine rebuild in order to bring the
machine back under control.

In fact, it seems that isconf will blow up if anybody forgets to make
changes using isconf at all (vs. restoring the machine to the known good
state).

Am I missing something?

M
(Continue reading)

Steve Traugott | 19 Sep 2006 00:31

Re: isconf deprecates infrastructures.org?

(*Way* behind on my mail...)

Looks like everyone else covered this thread pretty well; just wanted
to inject a few more points:

On Sat, Aug 12, 2006 at 10:07:37PM -0700, Mark Ferlatte wrote:
> Let say that I have an environment with a bunch of developers, and those
> developers, for example, have root on a set of machines.  Being
> developers, they want to be able to install things temporarily "as
> needed", and I want to be able to restore the machines back to baseline
> quickly and easily.

People always want to be able to roll back to a previous baseline
while running within the context of the root filesystem that they're
modifying.  As a sysadmin, you quite frequently do rollbacks, for
instance by running a package uninstaller.  This works most of the
time, and this leads you to believe that a general rollback tool
is possible.  It's a myth.

The changes you've made are irreversible -- you can't run the code
backwards; you can only run other code that claims to be able to undo
those changes.  The "undo" outcome can't be predicted computationally,
so a general-purpose tool that can do this reliably can not be
written.  Any general rollback tool has to run outside the context of
the root filesystem being modified.

Use systemimager, running in a miniroot during reboot, if you want to
get outside the root filesystem and do clean rollbacks.  Then use
isconf to add the last few deltas on since the last image snapshot.

(Continue reading)

Daniel Hagerty | 13 Aug 2006 09:36

Re: isconf deprecates infrastructures.org?

 > In fact, it seems that isconf will blow up if anybody forgets to make
 > changes using isconf at all (vs. restoring the machine to the known good
 > state).
 >
 > Am I missing something?

    No, you aren't.

    It's a pretty standard problem for sysadmin tools in this space.
You'd have to detect what was done behind the tool's back and either
pretend the missing delta was performed by the tool, or undo what was
done outside the tool.  You're not going to get this kind of behavior
from the isconf model of how you do things.
Steve Traugott | 19 Sep 2006 01:03

state machines

On Sun, Aug 13, 2006 at 03:36:55AM -0400, Daniel Hagerty wrote:
>  > In fact, it seems that isconf will blow up if anybody forgets to make
>  > changes using isconf at all (vs. restoring the machine to the known good
>  > state).
>  >
>  > Am I missing something?
> 
>     No, you aren't.
> 
>     It's a pretty standard problem for sysadmin tools in this space.

Dan, there's got to be some general way of saying this; I think while
either lambda calculus or turing machines can *illustrate* it, they
still don't say *why*.

I'm starting to think that a closer explanation might be something
like:

    A machine can be described as a directed graph.  The disk states
    are nodes, the changes are edges.  You can't go backwards along an
    edge -- there can be no "undo".  You might theoretically be able
    to reach a prior node by some other path, but there is no general
    solution for generating the code that implements those reverse
    edges.  Each edge transition -- in any direction -- must be
    individually tested to verify that is has reached the desired
    node.  In the case of normal transistions, this is known as
    "testing before production rollout".  In the case of reverse
    transitions, the resulting disk state must inspected to ensure
    that it is indeed the prior state, and not some new node in the
    directed graph.  Creating the transition code for each reverse
(Continue reading)

Wesley Craig | 19 Sep 2006 15:44
Picon

Re: state machines

Stated so generally, I think I can come up with counter examples.   
For instance, if you make a full disk image before pushing out a  
change, you can in fact "undo" the change, by restoring the backup.

:wes

On 18 Sep 2006, at 19:03, Steve Traugott wrote:
>     A machine can be described as a directed graph.  The disk states
>     are nodes, the changes are edges.  You can't go backwards along an
>     edge -- there can be no "undo".  You might theoretically be able
>     to reach a prior node by some other path, but there is no general
>     solution for generating the code that implements those reverse
>     edges.  Each edge transition -- in any direction -- must be
>     individually tested to verify that is has reached the desired
>     node.  In the case of normal transistions, this is known as
>     "testing before production rollout".  In the case of reverse
>     transitions, the resulting disk state must inspected to ensure
>     that it is indeed the prior state, and not some new node in the
>     directed graph.  Creating the transition code for each reverse
>     edge, and performing the inspection to ensure that the code
>     re-creates the prior state, will always be more expensive than
>     just hitting the big "reset" button and rebuilding the machine
>     back to the starting state, then replaying the forward edges until
>     the desired node is reached.
Matt S Trout | 19 Sep 2006 23:36
Picon

Re: state machines

Wesley Craig wrote:
> Stated so generally, I think I can come up with counter examples.  For 
> instance, if you make a full disk image before pushing out a change, you 
> can in fact "undo" the change, by restoring the backup.

This is precisely what Steve proposes as the *one* way that you can do a 
reliable undo.

This does, of course, assume that only the contents of the disk ever varies 
between machines of the same type, and that everything else (BIOS etc.) is 
configured identically before deployment and never changed afterwards.

--

-- 
      Matt S Trout       Offering custom development, consultancy and support
   Technical Director    contracts for Catalyst, DBIx::Class and BAST. Contact
Shadowcat Systems Ltd.  mst (at) shadowcatsystems.co.uk for more information

+ Help us build a better perl ORM: http://dbix-class.shadowcatsystems.co.uk/ +
Wesley Craig | 19 Sep 2006 23:52
Picon

Re: state machines

On 19 Sep 2006, at 17:36, Matt S Trout wrote:
> This is precisely what Steve proposes as the *one* way that you can  
> do a reliable undo.

And if you believe that (speaking practically, I certainly do),  
there's a lot more you can do.  If you're able to efficiently  
snapshot systems, capture changes, etc, then all (or maybe only most)  
the rest is not necessary.  Take, for example, the idea of installing  
Cadence.  If you knew the state of the system before Cadence was  
installed, you could capture the changes that installing Cadence made.

> This does, of course, assume that only the contents of the disk  
> ever varies between machines of the same type, and that everything  
> else (BIOS etc.) is configured identically before deployment and  
> never changed afterwards.

Sure.  In fact, "machines of the same type" is often roulette.  Those  
sorts of problems make any solution less that 100% reliable.  I know  
I've certainly swapped out a machine that was "of the same type" as a  
failed machine, only to find the small underlying differences  
impacted the services the machine was meant to provide.

:wes
u+infra-terraluna-jmto | 19 Sep 2006 17:36
Picon
Picon

Re: state machines

Wesley,

On Tue, Sep 19, 2006 at 09:44:12AM -0400, Wesley Craig wrote:
> Stated so generally, I think I can come up with counter examples.   
> For instance, if you make a full disk image before pushing out a  
> change, you can in fact "undo" the change, by restoring the backup.

what you propose is reserving a lot of space for rollbacks, which
is going to be expensive, and yet a change which erases
the code for reading the saved image will be irreversible.

Of course testing and debugging the reversal for each change makes it
_possible_ but that does not seem to be the point, if we look at
the original statement, which you happened to cite:

> On 18 Sep 2006, at 19:03, Steve Traugott wrote:
> >    ... Creating the transition code for each reverse
> >    edge, and performing the inspection to ensure that the code
> >    re-creates the prior state, will always be more expensive than
> >    just hitting the big "reset" button and rebuilding the machine

Rune.
Brandon S. Allbery KF8NH | 19 Sep 2006 17:18
Picon
Favicon

Re: state machines


On Sep 19, 2006, at 9:44 AM, Wesley Craig wrote:

> Stated so generally, I think I can come up with counter examples.   
> For instance, if you make a full disk image before pushing out a  
> change, you can in fact "undo" the change, by restoring the backup.

...unless said change affects something other than disk --- consider  
PC BIOS, or more significantly the SPARC/PPC "eeprom" command.

--

-- 
brandon s. allbery     [linux,solaris,freebsd,perl]       
allbery <at> kf8nh.com
system administrator  [openafs,heimdal,too many hats]   
allbery <at> ece.cmu.edu
electrical and computer engineering, carnegie mellon university       
KF8NH
Wesley Craig | 19 Sep 2006 17:40
Picon

Re: state machines

On 19 Sep 2006, at 11:18, Brandon S. Allbery KF8NH wrote:
> On Sep 19, 2006, at 9:44 AM, Wesley Craig wrote:
>> Stated so generally, I think I can come up with counter examples.   
>> For instance, if you make a full disk image before pushing out a  
>> change, you can in fact "undo" the change, by restoring the backup.
>
> ...unless said change affects something other than disk ---  
> consider PC BIOS, or more significantly the SPARC/PPC "eeprom"  
> command.

Sure.  Think you can come up with a solution for that situation?   
State machines are just that.  If you are able to record the state,  
you can restart.  It's that simple.

:wes
Brandon S. Allbery KF8NH | 19 Sep 2006 17:44
Picon
Favicon

Re: state machines


On Sep 19, 2006, at 11:40 AM, Wesley Craig wrote:

> On 19 Sep 2006, at 11:18, Brandon S. Allbery KF8NH wrote:
>> On Sep 19, 2006, at 9:44 AM, Wesley Craig wrote:
>>> Stated so generally, I think I can come up with counter  
>>> examples.  For instance, if you make a full disk image before  
>>> pushing out a change, you can in fact "undo" the change, by  
>>> restoring the backup.
>>
>> ...unless said change affects something other than disk ---  
>> consider PC BIOS, or more significantly the SPARC/PPC "eeprom"  
>> command.
>
> Sure.  Think you can come up with a solution for that situation?   
> State machines are just that.  If you are able to record the state,  
> you can restart.  It's that simple.

Sure --- assuming you know all of the state that is ever affected by  
any change.  Which is in some sense the fundamental issue here; I do  
*not* reliably know everything that e.g. Cadence installs will  
affect, and once or twice we've been caught by surprise as a result.   
State machines are only useful when *all* possible states are known  
beforehand.

--

-- 
brandon s. allbery     [linux,solaris,freebsd,perl]       
allbery <at> kf8nh.com
system administrator  [openafs,heimdal,too many hats]   
allbery <at> ece.cmu.edu
(Continue reading)

Wesley Craig | 19 Sep 2006 17:59
Picon

Re: state machines

On 19 Sep 2006, at 11:44, Brandon S. Allbery KF8NH wrote:
> Sure --- assuming you know all of the state that is ever affected  
> by any change.  Which is in some sense the fundamental issue here;  
> I do *not* reliably know everything that e.g. Cadence installs will  
> affect, and once or twice we've been caught by surprise as a  
> result.  State machines are only useful when *all* possible states  
> are known beforehand.

Oh, you want to talk about practicalities?  I thought we were talking  
about theorems & proofs.

If you want to talk about practicalities,  Do you think Cadence is  
reprogramming the firmware?  Let's assume for the moment that it only  
installs file in the filesystem.  Do you agree that it's *possible*  
for you to know everything that Cadence install has affected?

:wes
Brandon S. Allbery KF8NH | 19 Sep 2006 18:14
Picon
Favicon

Re: state machines


On Sep 19, 2006, at 11:59 AM, Wesley Craig wrote:

> On 19 Sep 2006, at 11:44, Brandon S. Allbery KF8NH wrote:
>> Sure --- assuming you know all of the state that is ever affected  
>> by any change.  Which is in some sense the fundamental issue here;  
>> I do *not* reliably know everything that e.g. Cadence installs  
>> will affect, and once or twice we've been caught by surprise as a  
>> result.  State machines are only useful when *all* possible states  
>> are known beforehand.
>
> Oh, you want to talk about practicalities?  I thought we were  
> talking about theorems & proofs.

If practicalities disagree with the theory, then something's wrong  
with the theory.  In this case, the theory is that we can know every  
modification made to a system by any given action --- to which I must  
first ask "at what level?"  Clearly it's false at the quantum level,  
question being whether that is relevant.  Unfortunately, I can  
imagine cases where it *is* at least in part relevant, and in those  
cases you have a significant problem.

When it comes down to it, your thesis relies on the answers to:
(a) do you know all the levels at which any possible action can  
modify the system?
(b) can you reliably record *and later restore* the state at *all* of  
those levels?  (keeping in mind that this may require actions to be  
performed in a particular order, so simply thwacking the eeprom after  
doing your disk restore might not completely restore the state if the  
eeprom controls something that can affect the restore....)
(Continue reading)

Wesley Craig | 19 Sep 2006 22:50
Picon

Re: state machines

On 19 Sep 2006, at 12:14, Brandon S. Allbery KF8NH wrote:
> (b) can you reliably record *and later restore* the state at *all*  
> of those levels?

So, you're arguing that backup and restore don't work?  Is that  
because of the quantum effects you mention?

It's like the entire history of computing is wrong.  How do you  
practically deploy a few hundred machines, given that the theory more  
or less says that it's impossible?

:wes
Brandon S. Allbery KF8NH | 20 Sep 2006 00:47
Picon
Favicon

Re: state machines


On Sep 19, 2006, at 16:50 , Wesley Craig wrote:

> On 19 Sep 2006, at 12:14, Brandon S. Allbery KF8NH wrote:
>> (b) can you reliably record *and later restore* the state at *all*  
>> of those levels?
>
> So, you're arguing that backup and restore don't work?  Is that  
> because of the quantum effects you mention?

I'm saying in (hopefully) rare cases it could matter.  But even in  
the general case your machine's state can involve more than just the  
disk, and unless you're doing extra work the backups only save the  
state of the disk.

--

-- 
brandon s. allbery    [linux,solaris,freebsd,perl]     allbery <at> kf8nh.com
system administrator [openafs,heimdal,too many hats] allbery <at> ece.cmu.edu
electrical and computer engineering, carnegie mellon university    KF8NH
Tracy R Reed | 20 Sep 2006 00:31

Re: state machines


I have been following this thread (and this list) for a few months 
trying to glean wisdom from the gurus and have managed to keep quiet 
thus far but this message tickled me.

Wesley Craig wrote:
> On 19 Sep 2006, at 12:14, Brandon S. Allbery KF8NH wrote:
>> (b) can you reliably record *and later restore* the state at *all*  
>> of those levels?
> 
> So, you're arguing that backup and restore don't work?  Is that  
> because of the quantum effects you mention?

I'm not sure if you are joking here or not. If a photon hits your 
computer the quantum state of the computer is changed. I, for one, don't 
care about that as far as my software goes. Unless it gets to the point 
where my computer melts. But then I have problems other than software.

> It's like the entire history of computing is wrong.  How do you  
> practically deploy a few hundred machines, given that the theory more  
> or less says that it's impossible?

Theory may say it is impossible to do it *perfectly* but in practice all 
most of us need is "good enough" and that is the only thing that allows 
any of us to actually get any real work done.

--

-- 
Tracy R Reed                  http://ultraviolet.org
A: Because we read from top to bottom, left to right
Q: Why should I start my reply below the quoted text
(Continue reading)

Wesley Craig | 20 Sep 2006 04:42
Picon

Re: state machines

On 19 Sep 2006, at 18:31, Tracy R Reed wrote:
> Wesley Craig wrote:
>> On 19 Sep 2006, at 12:14, Brandon S. Allbery KF8NH wrote:
>>> (b) can you reliably record *and later restore* the state at  
>>> *all*  of those levels?
>> So, you're arguing that backup and restore don't work?  Is that   
>> because of the quantum effects you mention?
>
> I'm not sure if you are joking here or not. If a photon hits your  
> computer the quantum state of the computer is changed. I, for one,  
> don't care about that as far as my software goes. Unless it gets to  
> the point where my computer melts. But then I have problems other  
> than software.

I guess you can say I'm joking.  I'm pretty sure that backup &  
restore work in most cases.  Brandon seemed to be saying that backup  
& restore don't work.  I'm perfectly willing to discuss the edge  
cases where backup & restore are imperfect, just so long as we can  
acknowledge that we're doing backups *so that we can restore*.  And  
that we're going through the pain of backing up because restores *do  
work*, where "work" is defined as accomplishing something useful in  
the realm of managing systems.

> Theory may say it is impossible to do it *perfectly* but in  
> practice all most of us need is "good enough" and that is the only  
> thing that allows any of us to actually get any real work done.

Amen.

:wes
(Continue reading)

Brandon S. Allbery KF8NH | 20 Sep 2006 04:46
Picon
Favicon

Re: state machines


On Sep 19, 2006, at 22:42 , Wesley Craig wrote:

> On 19 Sep 2006, at 18:31, Tracy R Reed wrote:
>> Wesley Craig wrote:
>>> On 19 Sep 2006, at 12:14, Brandon S. Allbery KF8NH wrote:
>>>> (b) can you reliably record *and later restore* the state at  
>>>> *all*  of those levels?
>>> So, you're arguing that backup and restore don't work?  Is that   
>>> because of the quantum effects you mention?
>>
>> I'm not sure if you are joking here or not. If a photon hits your  
>> computer the quantum state of the computer is changed. I, for one,  
>> don't care about that as far as my software goes. Unless it gets  
>> to the point where my computer melts. But then I have problems  
>> other than software.
>
> I guess you can say I'm joking.  I'm pretty sure that backup &  
> restore work in most cases.  Brandon seemed to be saying that  
> backup & restore don't work.

You are misunderstanding; they work for what they do, but you'd best  
be aware of the parts of your infrastructure that aren't represented  
by local disk.  The recent mention of dependency on CNAMEs was a  
better example of what I was getting at.

--

-- 
brandon s. allbery    [linux,solaris,freebsd,perl]     allbery <at> kf8nh.com
system administrator [openafs,heimdal,too many hats] allbery <at> ece.cmu.edu
electrical and computer engineering, carnegie mellon university    KF8NH
(Continue reading)

Wesley Craig | 20 Sep 2006 04:51
Picon

Re: state machines

On 19 Sep 2006, at 22:46, Brandon S. Allbery KF8NH wrote:
> You are misunderstanding; they work for what they do, but you'd  
> best be aware of the parts of your infrastructure that aren't  
> represented by local disk.  The recent mention of dependency on  
> CNAMEs was a better example of what I was getting at.

Oh sure, I agree with that.  But let me ask you: What is more likely  
to have the CNAME dependency problem?  A recent backup restored, or  
an old system image with months or years worth of changes applied?   
Perhaps you see what I'm getting at.

:wes
Daniel Hagerty | 20 Sep 2006 05:52

Re: state machines

 > Oh sure, I agree with that.  But let me ask you: What is more likely
 > to have the CNAME dependency problem?  A recent backup restored, or
 > an old system image with months or years worth of changes applied?
 > Perhaps you see what I'm getting at.

    You're presumably suggesting that the backup (since it's just data
at this level of abstraction) is more robust than a series of hand
crafted imperative statemnts that you execute in proper order.

    The two both have their places.  A backup is a large relatively
opaque blob; code that supposedly reproduces the backup is
introspectable in a way the backup is not.  One being better than the
other is dependant on the context of use.
Brendan Strejcek | 20 Sep 2006 15:39
Favicon

Re: state machines

On 9/19/06, Daniel Hagerty <hag <at> linnaean.org> wrote:

>  > Oh sure, I agree with that.  But let me ask you: What is more likely
>  > to have the CNAME dependency problem?  A recent backup restored, or
>  > an old system image with months or years worth of changes applied?
>  > Perhaps you see what I'm getting at.
>
>     You're presumably suggesting that the backup (since it's just data
> at this level of abstraction) is more robust than a series of hand
> crafted imperative statemnts that you execute in proper order.
>
>     The two both have their places.  A backup is a large relatively
> opaque blob; code that supposedly reproduces the backup is
> introspectable in a way the backup is not.  One being better than the
> other is dependant on the context of use.

Actually, I think I can make this a little more concrete with another
example, which incorporates the backup scenario and brings the
discussion back to the original presentation of the state machine
model of management.

You have a deterministic backup (data and code that can reinstantiate
it) which behaves exactly as expected. However, between the time the
backup was taken and the time restored, some UID mappings were changed
on an external NIS or LDAP server, so files no longer have the correct
ownership. Note that this is not a problem with NIS or LDAP: if you
store the usernames instead, you could still have the same problem,
since the canonical data, by definition, will always be that given by
the directory server.

(Continue reading)

Wesley Craig | 20 Sep 2006 21:01
Picon

Re: state machines

On 20 Sep 2006, at 09:39, Brendan Strejcek wrote:
> You have a deterministic backup (data and code that can reinstantiate
> it) which behaves exactly as expected. However, between the time the
> backup was taken and the time restored, some UID mappings were changed
> on an external NIS or LDAP server, so files no longer have the correct
> ownership.

This is a great example problem.  Thank you for a positive  
contribution to the discussion.

If the backed up system had been running when the UID mapping  
changed, how would that have been handled?

> The essential problem is defining the boundaries of the system to be
> managed.

Indubitably.

> The state of the network is not an edge case.

I guess you're revealing where you place the boundary. :)

:wes
Daniel Hagerty | 20 Sep 2006 21:53

Re: state machines

 > > The state of the network is not an edge case.
 >
 > I guess you're revealing where you place the boundary. :)

    Let's turn the question back at you then.

    Suppose we have two apache web servers that are front ends to two
different systems with "identical" behavior.  These servers, by
nature, have different names; one is "foo", the other is "bar".  Both
are hiding behind a NAT system so that no address of either machine
will reveal that the correct names for generating redirects are "foo"
and "bar".

    Draw a boundary such that apache still generates correct redirects
for http 1.0 without having different configuration files that
expressely mention the correct names on "foo" and "bar".
Wesley Craig | 21 Sep 2006 02:37
Picon

Re: state machines

This is a configuration management / infrastructure question, or an  
apache configuration question?

:wes

On 20 Sep 2006, at 15:53, Daniel Hagerty wrote:
>     Suppose we have two apache web servers that are front ends to two
> different systems with "identical" behavior.  These servers, by
> nature, have different names; one is "foo", the other is "bar".  Both
> are hiding behind a NAT system so that no address of either machine
> will reveal that the correct names for generating redirects are "foo"
> and "bar".
>
>     Draw a boundary such that apache still generates correct redirects
> for http 1.0 without having different configuration files that
> expressely mention the correct names on "foo" and "bar".
Daniel Hagerty | 21 Sep 2006 07:20

Re: state machines

 > From: Wesley Craig <wes <at> umich.edu>
 > Date: Wed, 20 Sep 2006 20:37:27 -0400
 >
 > This is a configuration management / infrastructure question, or an
 > apache configuration question?

    The former of course.  The particular example isn't the best, but
I had hoped you'd see what I was driving at.  The pattern of a
boundary that you don't control imposing constraints on you that force
you to do "unreasonable" things in configuration is a general one.

    HTTP 1.1 has the client give the server enough information for
generating proper redirects without the adminsitrator configuring the
server with it, but it's hardly the first time a protocol, piece of
software, etc has demonstrated issues of this sort.
Daniel Hagerty | 20 Sep 2006 20:50

Re: state machines

 > Both the previous DNS alias example and the above UID example have a
 > similar nature: data is stored in an external directory which is
 > critical to expected functionality. The problem is a hard one because
 > it incorporates aspects of federation (you may not control the
 > directory servers; thus, the directory servers may be unreliable
 > and/or malicious).
 >
 > The essential problem is defining the boundaries of the system to be
 > managed. If you can't do that, you can't construct the state machine
 > digraph.

    Well, not strictly true, but harder.

    The above examples, and the network in general falls under the
following math problem:

eval(expression, context) yields a value.

If you change the context (a CNAME, the LDAP server, etc, etc), it's
quite possible that you're changing the value you produce.  If you can
capture the way that expression relies on its context, you can
possibly seperate its dependance on the context so that simple,
obvious changes will cause you to generate the same value given
different contexts.

An ex-employer does something effectively like this for managing the
development/testing/production installation of their in house product.
That's exactly the sort of situation where some amount of the context
(e.g. what's the name of the front end webserver?) is an immutable
given that changes between test and production, but you don't want to
(Continue reading)

Wesley Craig | 20 Sep 2006 06:11
Picon

Re: state machines

On 19 Sep 2006, at 23:52, Daniel Hagerty wrote:
>> Oh sure, I agree with that.  But let me ask you: What is more likely
>> to have the CNAME dependency problem?  A recent backup restored, or
>> an old system image with months or years worth of changes applied?
>> Perhaps you see what I'm getting at.
>
>     You're presumably suggesting that the backup (since it's just data
> at this level of abstraction) is more robust than a series of hand
> crafted imperative statemnts that you execute in proper order.

Without getting into "which is better," which is more likely to have  
the CNAME dependency problem?

>     The two both have their places.  A backup is a large relatively
> opaque blob; code that supposedly reproduces the backup is
> introspectable in a way the backup is not.  One being better than the
> other is dependant on the context of use.

On the one hand, it's hard to disagree that different solutions are  
useful in different situations.  Your opacity statement is just hand  
waving, tho.  Discussing which specific situations are more amenable  
to which specific solutions would be useful.

:wes
Daniel Hagerty | 20 Sep 2006 08:32

Re: state machines

 > Without getting into "which is better," which is more likely to have
 > the CNAME dependency problem?

    I believe we've covered this already.  Either can demonstrate the
problem in the end result.

    Any path involving execution has an additional peril of
demonstrating this flavor of problem during execution.  The math that
demonstrates the relatively complexity of the two is trivial, should
we need to see it.

    In truth, even the straight up restore image has an execution
phase subject to all the usual perils, but I think we can take it as a
given that it works in practice.

 > On the one hand, it's hard to disagree that different solutions are
 > useful in different situations.  Your opacity statement is just hand
 > waving, tho.  Discussing which specific situations are more amenable
 > to which specific solutions would be useful.

    That is not a handwave, I just write coming from a highly
abstracted thought process.

    An image backup is a representation of "what" where the "how" that
produced the what is lost.  This is both its strength, and its
weakness.  By contrast, an execution method produces that "what" from
the "how", leaving both to be inspected, debugged, etc.

    Do you disagree that the removal of potentially essential
information increases an image's opacity while also making it simpler?
(Continue reading)

Wesley Craig | 20 Sep 2006 21:13
Picon

Re: state machines

On 20 Sep 2006, at 02:32, Daniel Hagerty wrote:
>> Without getting into "which is better," which is more likely to have
>> the CNAME dependency problem?
>
>     I believe we've covered this already.  Either can demonstrate the
> problem in the end result.
>
>     Any path involving execution has an additional peril of
> demonstrating this flavor of problem during execution.  The math that
> demonstrates the relatively complexity of the two is trivial, should
> we need to see it.
>
>     In truth, even the straight up restore image has an execution
> phase subject to all the usual perils, but I think we can take it as a
> given that it works in practice.

I've read the three paragraphs about three times now.  I get "running  
the log is more likely to produce the CNAME problem."  Please do  
correct me if I've read that wrong.

>     That is not a handwave, I just write coming from a highly
> abstracted thought process.
>
>     An image backup is a representation of "what" where the "how" that
> produced the what is lost.  This is both its strength, and its
> weakness.  By contrast, an execution method produces that "what" from
> the "how", leaving both to be inspected, debugged, etc.

The execution method starts with "what-1", executes "how" to produce  
"what-2".  Without being able to more or less fully inspect "what-1",  
(Continue reading)

Daniel Hagerty | 20 Sep 2006 22:03

Re: state machines

 > The execution method starts with "what-1", executes "how" to produce
 > "what-2".  Without being able to more or less fully inspect "what-1",
 > you're not going to have too much idea about "what-2", despite the
 > relative clarity of "how".  What if "how" were a simple patch to
 > "what-1"?  I will grant you that having "how" as a clear delta
 > between "what-1" and "what-2" can be handy for deeper analysis,
 > particularly if it's reversible.

    You're assuming that I'm of the isconf school, and that being in
state what-1 is a strict precondition.  Really I prefer to avoid that,
or at least have the delta function perform enough introspection to
recognize when and when it can't do the right thing.  The problem is
obviously undeciable from far enough out, but in practice, you won't
arrive here often.

    I don't think we need to go down this relatively off topic route
further.  We're coming from very different backgrounds and
enlightening the differences between them is probably better reached
through another path.
Daniel Hagerty | 19 Sep 2006 21:00

Re: state machines

 > If practicalities disagree with the theory, then something's wrong
 > with the theory.  In this case, the theory is that we can know every

    Investigating the converse is also useful -- if you can't
practically do what the theory says you can, what are you doing that
creates the practical obstacles?  "doctor, it hurts when I do this..."

 > When it comes down to it, your thesis relies on the answers to:
 > (a) do you know all the levels at which any possible action can
 > modify the system?
 > (b) can you reliably record *and later restore* the state at *all* of
 > those levels?  (keeping in mind that this may require actions to be
 > performed in a particular order, so simply thwacking the eeprom after
 > doing your disk restore might not completely restore the state if the
 > eeprom controls something that can affect the restore....)

    There are more questions.  Note that one of the problems a user
had was deleting a CNAME and discovering that his journal produced
different results depending on the existence of the CNAME.

 > You can do this with full machine virtualization, and perhaps someday
 > that will be a best practice.  Otherwise, unless you've carefully
 > inspected and dissected *everything* that touches your system, it's
 > not clear to me that you can say yes to both of the above questions.

    In the particular cadence example, you don't even need full
machine virtualization to see what's being done -- the program's only
means of interacting with the outside world is through the syscall
interface, which can be instrumented.  Concerned that it's touching
the eeprom?  You can prove that it isn't directly doing so by showing
(Continue reading)

Daniel Hagerty | 19 Sep 2006 11:53

state machines

    I'm supposed to be avoiding thought at present.  Shame on you!

 > Dan, there's got to be some general way of saying this; I think while
 > either lambda calculus or turing machines can *illustrate* it, they
 > still don't say *why*.

    Turing machines aren't actually used for anything outside of "such
and such is provably turing equivelent, and this, that, and the other
theorem have been proven w.r.t. turing machines".

    Lambda calculus is provably turing equivelent (surprise), and more
expressive to the point that several programming languages
(e.g. scheme, ML, haskell) are thinly veiled lambda calculus.  It's
what's most commonly used for real work of this sort.

    As to the question at hand, you probably aren't going to get the
illustration you want.  The problem is one of practicality, rather
than actual mathematical intractability.  The pie in the sky "what we
want" won't be practical for some time, as opposed to being
impossible.  Counter examples exist, if you look for the right thing.

    Haskell is the most direct citation that leaps to mind.  It's a
purely functional language (no first order side effects), and yet:

* It can express non-terminating programs
* It's compilable in sub-infinite time
* Programs can behave in a non-functional fashion, even though they
  can't be written in anything other than a functional form.

    (none of these are surprising properties)
(Continue reading)

Wesley Craig | 13 Aug 2006 22:51
Picon

Re: isconf deprecates infrastructures.org?

On 13 Aug 2006, at 03:36, Daniel Hagerty wrote:
>     It's a pretty standard problem for sysadmin tools in this space.
> You'd have to detect what was done behind the tool's back and either
> pretend the missing delta was performed by the tool, or undo what was
> done outside the tool.  You're not going to get this kind of behavior
> from the isconf model of how you do things.

This precisely the basis of how radmind is used to manage systems:

	http://radmind.org

radmind detects changes, a la tripwire, captures them, and allows an  
admin to replicate the captured changes to other machines.  Or, if  
the changes are not desirable, roll the machine back to a known good  
state.

:wes
Mark Ferlatte | 13 Aug 2006 23:07
Gravatar

Re: isconf deprecates infrastructures.org?

Wesley Craig said on Sun, Aug 13, 2006 at 04:51:16PM -0400:
> On 13 Aug 2006, at 03:36, Daniel Hagerty wrote:
> >    It's a pretty standard problem for sysadmin tools in this space.
> >You'd have to detect what was done behind the tool's back and either
> >pretend the missing delta was performed by the tool, or undo what was
> >done outside the tool.  You're not going to get this kind of behavior
> >from the isconf model of how you do things.
> 
> This precisely the basis of how radmind is used to manage systems:
> 
> 	http://radmind.org
> 
> radmind detects changes, a la tripwire, captures them, and allows an  
> admin to replicate the captured changes to other machines.  Or, if  
> the changes are not desirable, roll the machine back to a known good  
> state.

Yep.  Knew about radmin, but it wasn't available when I starting
building the current infrastructure.  

Dang.  It's come a long way.  Perhaps I should look into that; radmin +
svn for configuration files may do the trick.

M
Wesley Craig | 13 Aug 2006 23:44
Picon

Re: isconf deprecates infrastructures.org?

On 13 Aug 2006, at 17:07, Mark Ferlatte wrote:
> Yep.  Knew about radmin, but it wasn't available when I starting
> building the current infrastructure.

Careful not to confuse radmin and radmind.

:wes
Wil Cooley | 13 Aug 2006 21:39
Favicon
Gravatar

Re: isconf deprecates infrastructures.org?

On Sun, 2006-08-13 at 03:36 -0400, Daniel Hagerty wrote:

>     It's a pretty standard problem for sysadmin tools in this space.
> You'd have to detect what was done behind the tool's back and either
> pretend the missing delta was performed by the tool, or undo what was
> done outside the tool.  You're not going to get this kind of behavior
> from the isconf model of how you do things.

This has long been a complaint of mine with the tools I've looked at.
I'd really like to be able to inform the tool, make local changes, then
check my changes back into the central repository, easily and without a
lot of fuss.  Because, invariably, it takes more than one try to get a
particular configuration right and the iteration of "change in repo,
manually run tool to update host, reload server, see if it worked" is
frustratingly long--even if it's only 30 seconds or so.

Wil
--

-- 
Wil Cooley <wcooley <at> nakedape.cc>
Naked Ape Consulting, Ltd. <http://nakedape.cc>
Steve Traugott | 19 Sep 2006 04:20

Re: isconf deprecates infrastructures.org?

On Sun, Aug 13, 2006 at 12:39:19PM -0700, Wil Cooley wrote:
> This has long been a complaint of mine with the tools I've looked at.
> I'd really like to be able to inform the tool, make local changes, then
> check my changes back into the central repository, easily and without a
> lot of fuss.  Because, invariably, it takes more than one try to get a
> particular configuration right and the iteration of "change in repo,
> manually run tool to update host, reload server, see if it worked" is
> frustratingly long--even if it's only 30 seconds or so.

The whole point of isconf 4 is to get rid of this cycle.  There is no
longer any gold server; no central repository.  You just do something
like this on one of the machines you want to change:

    # lock isconf on all hosts so nobody else can make changes 
    isconf -m "upgrade mutt" lock

    # take a snapshot of the new mutt package
    isconf snap /tmp/mutt_1.5.9-2_i386.deb

    # install it
    isconf exec dpkg -i /tmp/mutt_1.5.9-2_i386.deb

    # check it in, unlocking other hosts
    isconf ci

...then do this on other hosts to update them (I also put this in rc,
and sometimes in cron):

    # replay the above 'snap' and exec', as well as anything else queued up
    isconf up  
(Continue reading)

Daniel Hagerty | 13 Aug 2006 22:47

Re: isconf deprecates infrastructures.org?

 > This has long been a complaint of mine with the tools I've looked at.
 > I'd really like to be able to inform the tool, make local changes, then
 > check my changes back into the central repository, easily and without a
 > lot of fuss.  Because, invariably, it takes more than one try to get a

    Well, depending on what you mean here, this is kind of what isconf
4 (or whatever it is) is trying to give you, provided that you work
within its constraints.  But by the same token, undoing "oops" as part
of your edit/compile/debug cycle isn't really in scope.  You'd need a
completely different model.  It all depends on what you really mean
when you say the tools aren't that good -- it's true, they aren't, but
what would you like to see them do better, spelled out more precisely?
(and this is the wrong list for such a discussion)

 > particular configuration right and the iteration of "change in repo,
 > manually run tool to update host, reload server, see if it worked" is
 > frustratingly long--even if it's only 30 seconds or so.

    If your edit/compile/debug cycle is 30 seconds, you're doing
peachy.  99.7% of the world has to deal with much longer.
Mark Ferlatte | 13 Aug 2006 09:48
Gravatar

Re: isconf deprecates infrastructures.org?

Daniel Hagerty said on Sun, Aug 13, 2006 at 03:36:55AM -0400:
>     It's a pretty standard problem for sysadmin tools in this space.
> You'd have to detect what was done behind the tool's back and either
> pretend the missing delta was performed by the tool, or undo what was
> done outside the tool.  You're not going to get this kind of behavior
> from the isconf model of how you do things.

Dang.  That's too bad.  I'd kind of like to be able to use isconf
instead of the in-house system I'm using now (basically, systemimager's
updateclient + cvsup to overlay configurations), but we use the "reset
the system back to known baseline" functionality a lot.

From reading more, it also seems like isconf assumes that your
environment never changes?  At least, there doesn't seem to be any way
to "collapse" the journal into a new base image so that you don't have
to replay the whole thing every time you image a new host.  In my case,
our current images are 2+ years old (Debian sarge), and there have been
a lot of things done to them in that period; having to replay 2 years of
changes (security patching apache multiple times, etc) every time I want
to install another rack of hosts doesn't seem like a good idea,
especially if someone removes a CNAME that hasn't been used for two
years but an early step in the journal depends on.

M
Steve Traugott | 19 Sep 2006 04:00

Re: isconf deprecates infrastructures.org?

On Sun, Aug 13, 2006 at 12:48:20AM -0700, Mark Ferlatte wrote:
> Daniel Hagerty said on Sun, Aug 13, 2006 at 03:36:55AM -0400:
> >     It's a pretty standard problem for sysadmin tools in this space.
> > You'd have to detect what was done behind the tool's back and either
> > pretend the missing delta was performed by the tool, or undo what was
> > done outside the tool.  You're not going to get this kind of behavior
> > from the isconf model of how you do things.
> 
> Dang.  That's too bad.  I'd kind of like to be able to use isconf
> instead of the in-house system I'm using now (basically, systemimager's
> updateclient + cvsup to overlay configurations), but we use the "reset
> the system back to known baseline" functionality a lot.

Systemimager during reboot running from a miniroot is always going to
be a reliable rollback -- it's what I use with isconf.  But
systemimager's updateclient script is a different animal, since it
runs in the context of the machine it's modifying.  Updateclient is
not going to be reliable in those cases where it (or anything else)
modifies systemimager or any of its prereqs, such as rsync, perl,
libc, init scripts, the kernel, etc.  While this might be fine in
development environments, I don't use it in production.  Since I
always manage production and development machines the same way, this
means I don't use updateclient at all.

By the way, I just noticed a serious bug in updateclient; the rsync
command is missing the -H and -S flags.  I had the rsync guys fix this
in getimage and the miniroot years ago; looks like they never fixed it
in updateclient.  

So, when running updateclient, you are in fact guaranteed to *not* get
(Continue reading)

Daniel Hagerty | 13 Aug 2006 22:32

Re: isconf deprecates infrastructures.org?

 > From reading more, it also seems like isconf assumes that your
 > environment never changes?  At least, there doesn't seem to be any way
 > to "collapse" the journal into a new base image so that you don't have

    Steve's original papers had allusions to a snapshotting process he
used to basically play forward some amount of the log so that new
installs didn't have to start from the very first delta.  He never
really elaborated (that I saw) on how he went about this.

    In any event, you can probably arrive at a workable process for
what you do to prevent this particular problem given what you're
already using.  Play some amount of the delta that gets played to
everything and image it.  Install new machines from the image and play
new delta from the point of imaging forward and you should arrive at
the same place.
Mark Ferlatte | 13 Aug 2006 22:40
Gravatar

Re: isconf deprecates infrastructures.org?

Daniel Hagerty said on Sun, Aug 13, 2006 at 04:32:01PM -0400:
>     In any event, you can probably arrive at a workable process for
> what you do to prevent this particular problem given what you're
> already using.  Play some amount of the delta that gets played to
> everything and image it.  Install new machines from the image and play
> new delta from the point of imaging forward and you should arrive at
> the same place.

That's a good point.  There's no reason to maintain a giant journal
forever.

Here's the thing, though; I've more or less got my infrastructure under
control (server/admin ratio of ~ 350:1 currently), but I'm spending time
doing application management, not systems.  For example, mysql _sucks_
at scale; there are no tools to automate replication setup amongst a
spare pool of mysqlds, for example.

This sort of application specific management is what would be the most
useful to me, but most of it seems to require modifications to the code
itself; many software packages seem to require a _lot_ of effort to
shoehorn them into this kind of environment, and it sucks, although far
less than doing this by hand.

I'm continuing to keep an eye on isconf 4, though; it has a lot of
things going for it that I like a lot, and it plus some other glue may
prove to be better than what I've got right now.

M

Gmane