Niall Douglas | 24 Feb 2012 18:45
Favicon

New merges for BE

Hi,

Some new merges for BE on gitorious:

http://gitorious.org/be/be/merge_requests/8 (Much improved Windows 
support)

Changes:

Added BE_INPUT_ENCODING and BE_OUTPUT_ENCODING to tell BE to override 
how it reads and writes from stdin/stdout. This fixes the extant 
unicode handing fatal exception bug.

Hacked in support for running from inside a ZIP archive. This is to 
enable BE to be bbfreezed into a self-contained directory. This means 
I can finally ditch needing a Python install or indeed needing 
several easy_install package dependencies for BEurtle.

Niall

--

-- 
Technology & Consulting Services - ned Productions Limited.
http://www.nedproductions.biz/. VAT reg: IE 9708311Q. Company no: 
472909.
My Work Portfolio: http://careers.stackoverflow.com/nialldouglas/
W. Trevor King | 24 Feb 2012 20:53
Picon
Favicon

Re: New merges for BE

On Fri, Feb 24, 2012 at 05:45:11PM +0000, Niall Douglas wrote:
> Added BE_INPUT_ENCODING and BE_OUTPUT_ENCODING to tell BE to override 
> how it reads and writes from stdin/stdout. This fixes the extant 
> unicode handing fatal exception bug.

Which bug would that be?  I don't see one in your branch's BE
repository.  In general, Python should set up the input and output
encodings appropriately on its own.  We use those defaults for stdin
and stdout, but only they should only get used for stdin and stdout
themselves.  With the StringInputOutput class, the encoding is
hardcoded to UTF-8, but and you're on your own converting that to
something that gets printed to a terminal.

I just cleaned up the StringInputOutput.get_stdout() method in my
gitorious branch, perhaps that fixes your original problem?

> Hacked in support for running from inside a ZIP archive. This is to 
> enable BE to be bbfreezed into a self-contained directory. This means 
> I can finally ditch needing a Python install or indeed needing 
> several easy_install package dependencies for BEurtle.

I informally cherrypicked your version fix in setup.py, thanks!

I'd like to merge your zipfile changes to `libbe/util/plugin.py`, but
I'm not comfortable with all of the things you've bundled in that
commit yet.  Would it be possible for you to refactor your Git history
so that is a separate commit?  If you don't want to bother with
rebasing, you could just use `git format-patch` to generate the patch
for that commit.  Then strip out the parts about other files and
change the commit message so that it's appropriate for the slimmed
(Continue reading)

Niall Douglas | 25 Feb 2012 21:35
Favicon

Re: New merges for BE

On 24 Feb 2012 at 14:53, W. Trevor King wrote:

> On Fri, Feb 24, 2012 at 05:45:11PM +0000, Niall Douglas wrote:
> > Added BE_INPUT_ENCODING and BE_OUTPUT_ENCODING to tell BE to override 
> > how it reads and writes from stdin/stdout. This fixes the extant 
> > unicode handing fatal exception bug.
> 
> Which bug would that be?  I don't see one in your branch's BE
> repository.  In general, Python should set up the input and output
> encodings appropriately on its own.  We use those defaults for stdin
> and stdout, but only they should only get used for stdin and stdout
> themselves.  With the StringInputOutput class, the encoding is
> hardcoded to UTF-8, but and you're on your own converting that to
> something that gets printed to a terminal.

If you remember many moons ago you were adamant that BE ought to 
interpret stdin and stdout according to what the OS tells it. Well, 
on Windows the only official way to change the charset for 
stdin/stdout for a child process is to get it to change it for you. 
Rather than arsing around with Windows codepage APIs, it's easier to 
just ignore Windows and force your own interpretation. It hardly 
matters as Windows passes through 8-bit data unchanged anyway.

The bug fixed is that now BEurtle can load the full 
bugseverywhere.org repo, including all closed issues (some of which 
contain unicode which was tripping the XML parser).

> I just cleaned up the StringInputOutput.get_stdout() method in my
> gitorious branch, perhaps that fixes your original problem?

(Continue reading)

W. Trevor King | 26 Feb 2012 01:44
Picon
Favicon

Re: New merges for BE

On Sat, Feb 25, 2012 at 08:35:21PM +0000, Niall Douglas wrote:
> On 24 Feb 2012 at 14:53, W. Trevor King wrote:
> > On Fri, Feb 24, 2012 at 05:45:11PM +0000, Niall Douglas wrote:
> > > Added BE_INPUT_ENCODING and BE_OUTPUT_ENCODING to tell BE to override 
> > > how it reads and writes from stdin/stdout. This fixes the extant 
> > > unicode handing fatal exception bug.
> > 
> > Which bug would that be?  I don't see one in your branch's BE
> > repository.  In general, Python should set up the input and output
> > encodings appropriately on its own.  We use those defaults for stdin
> > and stdout, but only they should only get used for stdin and stdout
> > themselves.  With the StringInputOutput class, the encoding is
> > hardcoded to UTF-8, but and you're on your own converting that to
> > something that gets printed to a terminal.
> 
> If you remember many moons ago you were adamant that BE ought to 
> interpret stdin and stdout according to what the OS tells it.

Oh right ;).  I remember that now.

> Well, on Windows the only official way to change the charset for
> stdin/stdout for a child process is to get it to change it for you.
> Rather than arsing around with Windows codepage APIs, it's easier to
> just ignore Windows and force your own interpretation. It hardly
> matters as Windows passes through 8-bit data unchanged anyway.

I just spent a bit of time looking at `callBEcmd` in your BEurtle
code, and then read
  http://mail.python.org/pipermail/python-list/2010-August/1252631.html
after which I completely agree with you about the need for
(Continue reading)

Niall Douglas | 27 Feb 2012 17:28
Favicon

Re: New merges for BE

On 25 Feb 2012 at 19:44, W. Trevor King wrote:

> > Well, on Windows the only official way to change the charset for
> > stdin/stdout for a child process is to get it to change it for you.
> > Rather than arsing around with Windows codepage APIs, it's easier to
> > just ignore Windows and force your own interpretation. It hardly
> > matters as Windows passes through 8-bit data unchanged anyway.
> 
> I just spent a bit of time looking at `callBEcmd` in your BEurtle
> code, and then read
>   http://mail.python.org/pipermail/python-list/2010-August/1252631.html
> after which I completely agree with you about the need for
> `BE_INPUT_ENCODING` and `BE_OUTPUT_ENCODING` to avoid silly system
> charsets.  I just cherry picked that commit.

Cool. Thanks.

> > Ah, well I was actually kinda hoping you might take as an inspiration 
> > to rewrite running-inside-ZIP support so it's correct? ;) I only say 
> > this because what I've done is a temporary hack at best.
> 
> I can go through and clean things up, but I'd like to start with your
> version and keep the commit history correct for proper attribution.
> That means that I need a running-inside-ZIP patch to start with, which
> is distinct from the other changes.

Alright I'll see what I can do.

> I doubt that the dynamic module loading is the cause of the slowness.
> Profiling `be --no-pager list` (as specified in `doc/hacking.txt`),
(Continue reading)

W. Trevor King | 28 Feb 2012 13:10
Picon
Favicon

Re: New merges for BE

On Mon, Feb 27, 2012 at 04:28:39PM +0000, Niall Douglas wrote:
> > I think that it would be better to spend time getting BE running as a
> > service than to spend it trying to squeeze a second or two out of the
> > load time.
> 
> I think BE already has a HTTP service which allows writes though I 
> can't find much documentation on it. I know the Boost guys want a 
> RESTful API though.

The current `be serve` provides HTTP access at the storage level (as a
backend for `libbe.storage.http`).  This was easy to implement but
means that parsing requirements are unchanged.  If there was a
command-level BE server, it could keep the already-parsed `BugDir` in
memory/swap, and only flush changes to the disk, which should be much
faster.

> My current thinking is to write an XML <=> BE disc representation 
> converter with RESTful API. That eliminates needing BE at all. I can 
> see there might be a problem for say someone like Github to allow a 
> plugin exporting a project's issues in BE format if every invocaton 
> took two seconds and/or an additional custom daemon needs to be 
> operated. A simple CGI with RESTful api would surely be better for 
> integration into other systems.

It's reading from the disk and parsing the YAML that's the slow part,
although if you implement it in a compiled language, the speedup might
be enough to make it palatable.  If you don't want to cache the
`BugDir` in memory/swap, another approach would be to convert from
BE's format (which is designed for easy versioning) to a database
(which is designed for speedy access).  Then you could serve requests
(Continue reading)

Niall Douglas | 1 Mar 2012 11:31
Favicon

Re: New merges for BE

On 28 Feb 2012 at 7:10, W. Trevor King wrote:

> It's reading from the disk and parsing the YAML that's the slow part,
> although if you implement it in a compiled language, the speedup might
> be enough to make it palatable.  If you don't want to cache the
> `BugDir` in memory/swap, another approach would be to convert from
> BE's format (which is designed for easy versioning) to a database
> (which is designed for speedy access).  Then you could serve requests
> from the database, and queue changes to periodically sync with the
> underlying BE repository.  A post-update hooke could trigger your
> server to reload the underlying BE repository.  Good luck with
> whatever you decide to do!

One of the big non-obvious advantages of XML is its in-built database 
support, so with a suitable schema you can mark things like uuid and 
in-reply-to as key joins. Queries to anything keyed are obviously 
O(1) and you can very quickly sort comments into their proper reply 
order. And of course it's all implemented in C, so from Python it's 
faster than using a native solution.

I actually did a lot of work last year with a NoSQL database called 
BaseX. It's basically a giant XML database and while every query 
takes no less than 150ms, they don't get slower and you can do lots 
of them in parallel. An amazing bit of technology.

I've been busy these last few days on pure maths homework - I'll get 
onto that patch you wanted today.

Niall

(Continue reading)

W. Trevor King | 24 Aug 2012 16:30
Picon
Favicon

BE command server

On Tue, Feb 28, 2012 at 07:10:42AM -0500, W. Trevor King wrote:
> On Mon, Feb 27, 2012 at 04:28:39PM +0000, Niall Douglas wrote:
> > > I think that it would be better to spend time getting BE running as a
> > > service than to spend it trying to squeeze a second or two out of the
> > > load time.
> > 
> > I think BE already has a HTTP service which allows writes though I 
> > can't find much documentation on it. I know the Boost guys want a 
> > RESTful API though.
> 
> The current `be serve` provides HTTP access at the storage level (as a
> backend for `libbe.storage.http`).  This was easy to implement but
> means that parsing requirements are unchanged.  If there was a
> command-level BE server, it could keep the already-parsed `BugDir` in
> memory/swap, and only flush changes to the disk, which should be much
> faster.

I've stubbed out a new command (serve-commands) to do this.  It's in
my public repo's master branch now, but I haven't though through a lot
of the user-identification and mapping issues yet, so definately don't
bind it to a public address yet ;).  The client (be --server URL
COMMAND ...) parses the command-line arguments, serializes them with
YAML, and POSTS that to the server.  The server runs the command, and
returns stdout to the client.  Lots of things are unchecked or missing
(e.g. client's stdout, EDITORS, etc.), but basic stuff should work.

Hopefully this will break through the scaling bottleneck on the stock
BE.

Cheers,
(Continue reading)

Niall Douglas | 26 Aug 2012 20:17
Favicon

Re: BE command server

On 24 Aug 2012 at 10:30, W. Trevor King wrote:

> > The current `be serve` provides HTTP access at the storage level (as a
> > backend for `libbe.storage.http`).  This was easy to implement but
> > means that parsing requirements are unchanged.  If there was a
> > command-level BE server, it could keep the already-parsed `BugDir` in
> > memory/swap, and only flush changes to the disk, which should be much
> > faster.
> 
> I've stubbed out a new command (serve-commands) to do this.  It's in
> my public repo's master branch now, but I haven't though through a lot
> of the user-identification and mapping issues yet, so definately don't
> bind it to a public address yet ;).  The client (be --server URL
> COMMAND ...) parses the command-line arguments, serializes them with
> YAML, and POSTS that to the server.  The server runs the command, and
> returns stdout to the client.  Lots of things are unchecked or missing
> (e.g. client's stdout, EDITORS, etc.), but basic stuff should work.
> 
> Hopefully this will break through the scaling bottleneck on the stock
> BE.

Useful to know, thanks.

BEurtle/BEXML takes a lot of care to ensure it works correctly if the 
user is simultaneously using the BE command on the same repo, mainly 
by being ultra-paranoid and way overusing stat(), which is slow. 
Right now I watch id-cache on the assumption that if BE writes, it'll 
sometimes get updated - unfortunately this misses comment updates, so 
every time BEurtle/BEXML touches a BE repo it has to recursively scan 
the .be directory and hash the stat() output to detect changes.
(Continue reading)

W. Trevor King | 27 Aug 2012 01:51
Picon
Favicon

Re: BE command server

On Sun, Aug 26, 2012 at 07:17:55PM +0100, Niall Douglas wrote:
> BEurtle/BEXML takes a lot of care to ensure it works correctly if the
> user is simultaneously using the BE command on the same repo, mainly
> by being ultra-paranoid and way overusing stat(), which is slow.

The new command server is single threaded, so aquiring a connection
acts as an effective lock.  This is not ideal, but it's fine for
proof-of-concept.  In a hypothetical asynchronous server, changes to
the single in-memory Storage instance should be effectively atomic
without needing a lock.

> Right now I watch id-cache on the assumption that if BE writes, it'll
> sometimes get updated - unfortunately this misses comment updates, so
> every time BEurtle/BEXML touches a BE repo it has to recursively scan
> the .be directory and hash the stat() output to detect changes.

For Linux systems, pyinotify might be a better approach [1].  For
other systems, some sort of locking system might be the best you can
do.

> The lockfile could live in the .be directory, and be held whenever a
> reading or writing operation is being performed. An additional
> "writelog" file would also live in .be and contain an incremental
> list of paths written, perhaps of the form:
>
> --- cut ---
> MaxAge: 2592000
>
> <iso9660 datetime>:
>   <path>
(Continue reading)

Niall Douglas | 27 Aug 2012 14:23
Favicon

Re: BE command server

On 26 Aug 2012 at 19:51, W. Trevor King wrote:

> On Sun, Aug 26, 2012 at 07:17:55PM +0100, Niall Douglas wrote:
> > BEurtle/BEXML takes a lot of care to ensure it works correctly if the
> > user is simultaneously using the BE command on the same repo, mainly
> > by being ultra-paranoid and way overusing stat(), which is slow.
> 
> The new command server is single threaded, so aquiring a connection
> acts as an effective lock.  This is not ideal, but it's fine for
> proof-of-concept.  In a hypothetical asynchronous server, changes to
> the single in-memory Storage instance should be effectively atomic
> without needing a lock.

I think you're missing my intended point: right now if two BE 
instances are run simultaneously, there is a chance of data 
corruption. BE needs to use lock files to serialise access *on* 
*disk*.

> > Right now I watch id-cache on the assumption that if BE writes, it'll
> > sometimes get updated - unfortunately this misses comment updates, so
> > every time BEurtle/BEXML touches a BE repo it has to recursively scan
> > the .be directory and hash the stat() output to detect changes.
> 
> For Linux systems, pyinotify might be a better approach [1].  For
> other systems, some sort of locking system might be the best you can
> do.

No, for inotify you need a constantly running background running 
process. Should that process stop, inotify updates are lost, and you 
can't guarantee a process won't be exited. Agreeing on a fsynced, 
(Continue reading)

W. Trevor King | 27 Aug 2012 15:12
Picon
Favicon

Re: BE command server

On Mon, Aug 27, 2012 at 01:23:28PM +0100, Niall Douglas wrote:
> On 26 Aug 2012 at 19:51, W. Trevor King wrote:
> > On Sun, Aug 26, 2012 at 07:17:55PM +0100, Niall Douglas wrote:
> > > BEurtle/BEXML takes a lot of care to ensure it works correctly if the
> > > user is simultaneously using the BE command on the same repo, mainly
> > > by being ultra-paranoid and way overusing stat(), which is slow.
> > 
> > The new command server is single threaded, so aquiring a connection
> > acts as an effective lock.  This is not ideal, but it's fine for
> > proof-of-concept.  In a hypothetical asynchronous server, changes to
> > the single in-memory Storage instance should be effectively atomic
> > without needing a lock.
> 
> I think you're missing my intended point: right now if two BE 
> instances are run simultaneously, there is a chance of data 
> corruption. BE needs to use lock files to serialise access *on* 
> *disk*.

If you're running all your commands through a single command-serve
process, you won't have simultaneous BE instances accessing the disk
(even simultaneous client BE calls).  No need for locking here.

If users start simultaneous calls to BE on that disk database anyway:

  $ be serve-commands &
  $ be add "this might corrupt the database"

Then they're silly ;).  They should instead use

  $ be serve-commands &
(Continue reading)

Niall Douglas | 27 Aug 2012 16:20
Favicon

Re: BE command server

On 27 Aug 2012 at 9:12, W. Trevor King wrote:

> > I think you're missing my intended point: right now if two BE 
> > instances are run simultaneously, there is a chance of data 
> > corruption. BE needs to use lock files to serialise access *on* 
> > *disk*.
>
> We can't protect against everything users might dream up.  If
> protection is expensive, it's better to just warn people and then let
> them do what they want.

I would have said it's a question of interoperability, not caveat 
utilitor. The protection, despite what you appear to think, is not 
even remotely expensive - a lock file is two iops as compared to 
several hundred iops to parse a full BE repo. Less than 1% overhead 
is not expensive.

> You only get corruption if process A changes the on-disk filesystem,
> and process B reads some of the changing files while they are in the
> act of changing.  You can also get synchronization errors if process A
> changes the on-disk filesystem and process B thinks it's in-memory
> versions of those files are still current.  In both cases, you can
> avoid the problem if process B is using inotify and realizes that A
> made a change.  If process B is not running, A can do whatever it
> wants, and B will have a valid filesystem DB to load the next time it
> starts.

inotify isn't really suitable for the purpose you're suggesting. 
You'd have to inotify on every directory in a BE repo which is costly 
on file descriptors and you may run into a select() limit and have to 
(Continue reading)

W. Trevor King | 27 Aug 2012 17:41
Picon
Favicon

Re: BE command server

On Mon, Aug 27, 2012 at 03:20:04PM +0100, Niall Douglas wrote:
> On 27 Aug 2012 at 9:12, W. Trevor King wrote:
> > > I think you're missing my intended point: right now if two BE 
> > > instances are run simultaneously, there is a chance of data 
> > > corruption. BE needs to use lock files to serialise access *on* 
> > > *disk*.
> >
> > We can't protect against everything users might dream up.  If
> > protection is expensive, it's better to just warn people and then let
> > them do what they want.
> 
> I would have said it's a question of interoperability, not caveat 
> utilitor. The protection, despite what you appear to think, is not 
> even remotely expensive - a lock file is two iops as compared to 
> several hundred iops to parse a full BE repo. Less than 1% overhead 
> is not expensive.

True.  I'm mostly worried about complexity/maintenance costs here.
Also, VCS merges may update the BE database, so a locking solution
will want to hook into each VCS?

> inotify also isn't portable. Linux is the primary development system
> for only about 5% of developers.

If a portable solution works just as well or better, portable is
clearly the way to go.  I put inotify on the table, because you said
you were polling stat(), and that's something that could be replaced
with inotify.

> BE needs to write in atomic transactions with a transaction log so
(Continue reading)

Niall Douglas | 28 Aug 2012 15:31
Favicon

Re: BE command server

On 27 Aug 2012 at 11:41, W. Trevor King wrote:

> True.  I'm mostly worried about complexity/maintenance costs here.
> Also, VCS merges may update the BE database, so a locking solution
> will want to hook into each VCS?

Correct, though BE need only wait for the VCS to complete. I don't 
know about the other SCMs, but GIT has its deliberately "stupid" 
design to eliminate the need for any atomicity at all i.e. its design 
allows it to cope with multiple partial failure or indeed massively 
parallel usage. Its only sore point is its index, which as that holds 
the staging area needs to be locked and unlocked during use. If git 
crashes during use it often leaves .git/index.lock around, preventing 
any further git usage until it's deleted.

Other than .git/index.lock, I'm not aware of any other locking which 
GIT does.

> > inotify also isn't portable. Linux is the primary development system
> > for only about 5% of developers.
> 
> If a portable solution works just as well or better, portable is
> clearly the way to go.  I put inotify on the table, because you said
> you were polling stat(), and that's something that could be replaced
> with inotify.

Ah, I see. I'm actually not polling with stat() at all in BEurtle, 
but I translated what I'm doing into POSIX-speak for your benefit :). 
I love how wires get crossed ...

(Continue reading)

W. Trevor King | 30 Aug 2012 05:43
Picon
Favicon

Re: BE command server

On Tue, Aug 28, 2012 at 02:31:07PM +0100, Niall Douglas wrote:
> On 27 Aug 2012 at 11:41, W. Trevor King wrote:
> > True.  I'm mostly worried about complexity/maintenance costs here.
> > Also, VCS merges may update the BE database, so a locking solution
> > will want to hook into each VCS?
> 
> Correct, though BE need only wait for the VCS to complete.

But if you've got a persistent process that expects the on-disk DB to
be locked before changes, and someone runs

  $ git pull

That's going to change the DB without the appropriate locking unless
you've got something fancy in .git/hooks/.  If you wrap all your VCS
calls in BE (using your vcs command), then the locking would be
easier, but journalling would still be difficult.

> BEXML can also talk to other issue trackers such as Redmine and
> Github via a unified API. BEurtle will then use BEXML to let you
> import and export issues between your local BE tracker and external
> public facing trackers.

So BEXML is basically a collection of new libbe.storage backends.

Currently BE expects:

  Command -(acting on)-> multiple BugDirs -(reading from)-> Storage

I suppose:
(Continue reading)

W. Trevor King | 30 Aug 2012 06:02
Picon
Favicon

Multiple bugdirs

For anyone who may have been tuning out the command-server thread, I
thought I'd change the subject to highlight another recent feature of
my branch:

On Wed, Aug 29, 2012 at 11:43:17PM -0400, W. Trevor King wrote:
> Most of BE has been designed with multiple bugdirs in a single
> repository in mind.  I just cleaned up the commands to fully support
> this (including adding a <bugdir> entity to XML input/output).  Try
> something like:
> 
>   $ mkdir -p /tmp/joint-repo/.be
>   $ cd /tml/joint-repo/.be
>   $ ln -s ~/src/be/.be/version
>   $ ln -s ~/src/be/.be/bea86499-824e-4e77-b085-2d581fa9ccab/
>   $ ln -s ~/src/BEurtle/.be/7017b289-f207-4e39-9746-f58323404eba/
>   $ be list
>   bea/e22:um: Think about authentication.
>   bea/12c:uw: Bug aggregation.  Multi-repo meta-BE?
>   701/26b:os: Putting a URL into the path box throws an exception for Path.NormalizePath in BEurtlePlugin.loadIssues
>   701/3c4:os: Shouldn't be possible to add an issue with no summary
>   …
> 
> You'll need to use a version of my repo that contains
> 
>   commit 4db1a045a0606bead191a563abc54dfa8352efe0
>   Author: W. Trevor King <wking <at> tremily.us>
>   Date:   Wed Aug 29 23:26:17 2012 -0400
> 
>     Rewrite commands to use bugdirs instead of a single bugdir.
> 
(Continue reading)

Niall Douglas | 1 Sep 2012 21:37
Favicon

Re: BE command server

On 29 Aug 2012 at 23:43, W. Trevor King wrote:

> > Correct, though BE need only wait for the VCS to complete.
> 
> But if you've got a persistent process that expects the on-disk DB to
> be locked before changes, and someone runs
> 
>   $ git pull
> 
> That's going to change the DB without the appropriate locking unless
> you've got something fancy in .git/hooks/.  If you wrap all your VCS
> calls in BE (using your vcs command), then the locking would be
> easier, but journalling would still be difficult.

?

BE needs to lock the VCS (read/write) lock first, then lock its own 
lock. And release in reverse order.

git only locks the index during 'git checkout' which occurs after 
'git fetch' in a 'git pull', or 'git commit'. Push/pulling otherwise 
is lock free.

> > BEXML can also talk to other issue trackers such as Redmine and
> > Github via a unified API. BEurtle will then use BEXML to let you
> > import and export issues between your local BE tracker and external
> > public facing trackers.
> 
> So BEXML is basically a collection of new libbe.storage backends.

(Continue reading)

W. Trevor King | 2 Sep 2012 13:22
Picon
Favicon

Re: BE command server

On Sat, Sep 01, 2012 at 08:37:26PM +0100, Niall Douglas wrote:
> On 29 Aug 2012 at 23:43, W. Trevor King wrote:
> 
> > > Correct, though BE need only wait for the VCS to complete.
> >
> > But if you've got a persistent process that expects the on-disk DB to
> > be locked before changes, and someone runs
> >
> >   $ git pull
> >
> > That's going to change the DB without the appropriate locking unless
> > you've got something fancy in .git/hooks/.  If you wrap all your VCS
> > calls in BE (using your vcs command), then the locking would be
> > easier, but journalling would still be difficult.
> 
> ?
> 
> BE needs to lock the VCS (read/write) lock first, then lock its own
> lock. And release in reverse order.
> 
> git only locks the index during 'git checkout' which occurs after
> 'git fetch' in a 'git pull', or 'git commit'. Push/pulling otherwise
> is lock free.

Say you're serving from a locked BE repo, which makes any changes look
atomic to your server.  While your server is running, someone runs
`git pull` in the repository, and Git goes through merrily checking
out new version of the repo files without doing any locking that the
BE server notices.  If you need the BE locks to keep the server sane,
you could be in trouble.
(Continue reading)

Niall Douglas | 14 Sep 2012 15:34
Favicon

Re: BE command server

On 2 Sep 2012 at 7:22, W. Trevor King wrote:

> > BE needs to lock the VCS (read/write) lock first, then lock its own
> > lock. And release in reverse order.
> > 
> > git only locks the index during 'git checkout' which occurs after
> > 'git fetch' in a 'git pull', or 'git commit'. Push/pulling otherwise
> > is lock free.
> 
> Say you're serving from a locked BE repo, which makes any changes look
> atomic to your server.  While your server is running, someone runs
> `git pull` in the repository, and Git goes through merrily checking
> out new version of the repo files without doing any locking that the
> BE server notices.  If you need the BE locks to keep the server sane,
> you could be in trouble.

Sorry about the delay in replying. Was in Canada, just about to leave 
for the UK and Sweden then back to Canada. Such is emigration.

I think where we're at is that if the SCM has enough locking, you 
lock via the SCM lock per batch of read operations and you lock via 
the BE lock with transaction journal for write operations. Since we 
last spoke, it has occurred to me that a single journal file with 
lock is inefficient - instead you probably ought to use 
per-transaction datetime named files containing those relative paths 
which have changed in that write transaction. That plays nicer with 
SCMs than constantly resolving conflicts in merges of a single 
journal file.

> I try to be lazy in requesting data from the storage backends already,
(Continue reading)

W. Trevor King | 14 Sep 2012 17:13
Picon
Favicon

Re: BE command server

On Fri, Sep 14, 2012 at 02:34:37PM +0100, Niall Douglas wrote:
> On 2 Sep 2012 at 7:22, W. Trevor King wrote:
> > I try to be lazy in requesting data from the storage backends already,
> > but I don't do any filtering in the storage backends.  I'm not sure
> > what you mean by muxing; is BEXML asynchronous within a single BE
> > command?
> 
> BEXML doesn't even open a file for reading until it has to, so issues 
> get loaded one by one without comments for example. Yes, it's 
> completely asynchronous, so you can request issue A, B and C in 
> parallel and it won't block until you try reading something not yet 
> retrieved.

Similarly in libbe, the comments and attributes are not loaded until
you try to access them (thanks to the magic of
`libbe.storage.util.properties`).  This sounds like “lazy”, but I'm
not sure it qualifies as “asynchronous”.  I'd define asynchronous as
“when needed, comments for bugs A, B, and C are simultaneosly fetched
from storage”, so you could start parsing A's comments while you were
getting I/O blocking on B's.

> The muxing refers to that you can mix a live Redmine repo with a live 
> Gitgub repo and it appears like a single, unified BE repo. When you 
> modify an issue, it fires out the appropriate backend live.

This is the case with my current BE, except that the separate backends
each have their own repo.  I'm not sure what you gain by splitting a
*single repo* among several storage backends.  Can you elaborate?

> The FS backend is no faster than BE, but agreed the XML backend is 
(Continue reading)

W. Trevor King | 14 Sep 2012 17:43
Picon
Favicon

YAML storage is slow

On Fri, Sep 14, 2012 at 11:13:28AM -0400, W. Trevor King wrote:
> Having storage that is faster to parse would also help with the
> prefactor (but not the scaling), as `be list` on #bea864# currently
> spends 63% of its time in `yaml.load()` (cumulative, and that's
> *after* disk I/O).  I went with YAML as a storage file format because
> the home-grown file format was pretty close to YAML already.  Perhaps
> it's worth rethinking this choice (or using `git notes` to store JSON
> or pickled versions of each YAML file.  Hmmm…)

As a quick proof-of-concept, I translated all the `settings` and
`values` files under `.be/` from YAML to JSON and swapped in
`json.dumps` and `json.loads` for `generate` and `parse` in
`libbe.storage.util.mapfile`.  The resulting BE took less than half
the time to run `be list` on #bea864# (2.4s for YAML, 1.1s for BE).

However, the original sorted, line-spaced format (preserved in our
YAML) was selected to allow painless merging.  We can also add
whitespace at will to JSON, so perhaps it is time for Bugs Everywhere
Directory v1.5 ;).

Trevor

--

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy
_______________________________________________
Be-devel mailing list
Be-devel <at> bugseverywhere.org
(Continue reading)

W. Trevor King | 17 Sep 2012 15:02
Picon
Favicon

Re: YAML storage is slow

On Fri, Sep 14, 2012 at 11:43:34AM -0400, W. Trevor King wrote:
> As a quick proof-of-concept, I translated all the `settings` and
> `values` files under `.be/` from YAML to JSON and swapped in
> `json.dumps` and `json.loads` for `generate` and `parse` in
> `libbe.storage.util.mapfile`.  The resulting BE took less than half
> the time to run `be list` on #bea864# (2.4s for YAML, 1.1s for BE).
> 
> However, the original sorted, line-spaced format (preserved in our
> YAML) was selected to allow painless merging.  We can also add
> whitespace at will to JSON, so perhaps it is time for Bugs Everywhere
> Directory v1.5 ;).

Done in my branch.

--

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy
_______________________________________________
Be-devel mailing list
Be-devel <at> bugseverywhere.org
http://void.printf.net/cgi-bin/mailman/listinfo/be-devel
Niall Douglas | 7 Oct 2012 00:26
Favicon

Re: BE command server

On 14 Sep 2012 at 11:13, W. Trevor King wrote:

> Similarly in libbe, the comments and attributes are not loaded until
> you try to access them (thanks to the magic of
> `libbe.storage.util.properties`).  This sounds like "lazy", but I'm
> not sure it qualifies as "asynchronous".  I'd define asynchronous as
> "when needed, comments for bugs A, B, and C are simultaneosly fetched
> from storage", so you could start parsing A's comments while you were
> getting I/O blocking on B's.

Sure. BEXML does a certain amount of prefetching to try and hide 
network latency, so it fires off three or four issues ahead of where 
the "current" issue is. Those get fetched in the background, and i/o 
waiting is somewhat reduced.

> > The muxing refers to that you can mix a live Redmine repo with a live 
> > Gitgub repo and it appears like a single, unified BE repo. When you 
> > modify an issue, it fires out the appropriate backend live.
> 
> This is the case with my current BE, except that the separate backends
> each have their own repo.  I'm not sure what you gain by splitting a
> *single repo* among several storage backends.  Can you elaborate?

Let's say you have a Redmine issue tracker somewhere on the web. What 
you do is mix the issues from Redmine into the local BE repo, so it 
looks as if your local BE repo is much larger than it really is.

In my new employer, each subdivision runs its own issue tracking 
system and indeed their own SCM. Generally each group has its own 
preference for a SCM and issue tracker. This has introduced 
(Continue reading)

W. Trevor King | 29 Oct 2012 02:50
Picon
Favicon

BE daemon servers

On Fri, Aug 24, 2012 at 10:30:37AM -0400, W. Trevor King wrote:
> On Tue, Feb 28, 2012 at 07:10:42AM -0500, W. Trevor King wrote:
> > The current `be serve` provides HTTP access at the storage level (as a
> > backend for `libbe.storage.http`).  This was easy to implement but
> > means that parsing requirements are unchanged.  If there was a
> > command-level BE server, it could keep the already-parsed `BugDir` in
> > memory/swap, and only flush changes to the disk, which should be much
> > faster.
> 
> I've stubbed out a new command (serve-commands) to do this.

Because I think the command server scales a lot better than the
storage server, I'm working on getting the following working for the
1.1.0 release:

  $ be --server http://cs.bugseverywhere.org/ new "broken frobnitz"
  Created bug with ID bea/abc
  $ be --server http://cs.bugseverywhere.org/ comment bea/abc
  <Describe bug>
  $ be --server http://cs.bugseverywhere.org/ commit "bea/abc: broken frobnitz"

A recent advance (included in https://gitorious.org/~wking/be/wtk, but
not in the trunk until I get things working a bit more smoothly) is
the addition of --daemon, --pidfile, and --logfile options to all
WSGI-based commands.  This makes it easier to manage long-running BE
server instances.  However, the command server still has some issues
that need to be worked out:

  1. Some commands read from stdin, either as raw bytes or as Unicode
     (decoded by the local encoding).  However, JSON doesn't
(Continue reading)


Gmane