John Caruso | 18 Aug 21:31

Data "corruption" with fastpath caching

Consider the following pseudocode snippet:

     <<...generate file $myfile in some way...>>
     ns_returnfile 200 text/plain $myfile
     ns_unlink $myfile

If this snippet is executed in a tight loop on a Linux system, the chances 
of returning the wrong results are very high due to AOLserver's fastpath 
caching, which requires the following four attributes to be identical to 
consider a new file to be a cache hit (as per the FastReturn function in 
fastpath.c):

1) Same device number
2) Same inode number
3) Same modification time (within one second)
4) Same size

Assuming $myfile is always on the same filesystem, number 1 is taken care 
of, and Linux reuses inode numbers, so the creation and deletion of 
$myfile will typically result in a file with the same inode.  So in this 
example, files created within a given second that contains the same amount 
of data as a preceding file created within that same second will be 
considered identical, and will be erroneously served from cache.

This isn't just a hypothetical, BTW; a client of mine ran into this issue 
and spent many weeks trying to figure out what was happening before 
tracing it back to AOLserver's fastpath caching.  And the issue had 
existed for many years without being detected.

I'm mainly bringing this up to shine a light on the issue and see what 
(Continue reading)

Tom Jackson | 18 Aug 22:33

Re: Data "corruption" with fastpath caching

There is probably someone here that can directly address a better way to
do what you want, with ns_cache or some other scheme, but it looks like
your basic problem is saving rapidly changing data to disk or serving it
from cache. Why do this? If you data is changing faster than once per
second, don't keep a copy of it. It's not be a data corruption issue
because you are choosing to overwrite the old data with new data using
the exact same file name. If the data is important, don't overwrite it,
thus no corruption. 

But in general it is not a good idea to do things the way you are, which
is reading and writing the same file at the same time, which has nothing
to do with fastpath. You should use a cond/mutex to serialize access.

tom jackson

On Mon, 2008-08-18 at 12:33 -0700, John Caruso wrote:
> Consider the following pseudocode snippet:
> 
>      <<...generate file $myfile in some way...>>
>      ns_returnfile 200 text/plain $myfile
>      ns_unlink $myfile
> 
> If this snippet is executed in a tight loop on a Linux system, the chances 
> of returning the wrong results are very high due to AOLserver's fastpath 
> caching, which requires the following four attributes to be identical to 
> consider a new file to be a cache hit (as per the FastReturn function in 
> fastpath.c):
> 
> 1) Same device number
> 2) Same inode number
(Continue reading)

John Caruso | 18 Aug 23:13

Re: Data "corruption" with fastpath caching

On Monday 01:33 PM 8/18/2008, Tom Jackson wrote:
>It's not be a data corruption issue
>because you are choosing to overwrite the old data with new data using
>the exact same file name. If the data is important, don't overwrite it,
>thus no corruption.

No, you've misunderstood the scenario.  The file name needn't be the same 
to trigger this issue, and the "corruption" doesn't come from serving data 
out of a file that's changing, but rather because fastpath caching 
mistakenly identifies a new file as being identical to a previously-cached 
file (for the reasons I outlined) and erroneously serves the 
previously-cached data to the user.

This is a design limitation and arguably a bug in the fastpath caching 
implementation, which is potentially quite serious since it silently 
serves the wrong data to the user.  If you want a more straightforward 
(albeit contrived) demonstration of the problem, here you go:

    set file [open "/var/tmp/myfile" "w"]
    puts $file "ABC123"
    close $file
    ns_returnfile 200 text/plain "/var/tmp/myfile"
    ns_unlink -nocomplain "/var/tmp/myfile"

    set file [open "/var/tmp/myotherfile" "w"]
    puts $file "XYZ987"
    close $file
    ns_returnfile 200 text/plain "/var/tmp/myotherfile"
    ns_unlink -nocomplain "/var/tmp/myotherfile"

(Continue reading)

Tom Jackson | 19 Aug 00:17

Re: Data "corruption" with fastpath caching

John,

Just to be clear: fastpath is for serving static content. This is not
what you are doing here, you are creating a temporary file to store
dynamic content. For your bug to work you must delete the old file and
create a new one within the same second, etc. 

Also, your code sequence below will leave temporary files around in the
case of a crash. If you want to safely serve the content from this
temporary storage, you should unlink after you finish creating it (no
other thread or process will be able to access the content, or you can
unlink before you write the content and even local users will not be
able to see the file. 

Then just send out the contents directly using the fd not the file
name.  

(maybe something like:

ns_return 200 [ns_guesstype $myfile] [read $fd]

Then you can close the fd, although AOLserver does that automatically at
the end of each request.

Now: why are you writing the content to disk? Can't you use a temp
variable.

tom jackson

On Mon, 2008-08-18 at 14:13 -0700, John Caruso wrote:
(Continue reading)

Jeff Rogers | 19 Aug 00:38

Re: Data "corruption" with fastpath caching

While I'd agree this is a bug in fastpath, the real problem is that 
fastpath is being used at all in this case.  The intent of fastpath is 
to avoid reading a seldom-changed file from disk.  It happens to be used 
in ns_returnfile since that is the normal use case.  On unix the 
fastpath cache is keyed off the dev/inode probably to keep the hash key 
shorter.  Windows doesn't have device and inode numbers so it uses the 
filename as the hashkey, so it wouldn't run into this problem.

 From the server side, this could be fixed by:
- adding in the filename to the hash key or checking that it is the same
- making ns_unlink flush the entry from the fastpath cache
- restricting what fastpath will cache - e.g., don't cache anything in 
/var/tmp or /tmp or a configuration-specified directory.
- adding a "-nocache" flag to ns_returnfile

All of these have pros and cons.

I don't think your suggestion of waiting for cache entries to age a 
second or two would work well, it just moves the race condition around 
and adds a whole lot of disk activity when a busy server is warming up - 
static files might be read a few dozen times instead of once.

Fixing it from the application side is much easier.  Just use 
ns_returnfp instead of ns_returnfile, on the open handle if you 
generated the file from tcl code and it's convenient to get the handle, 
otherwise by opening the file right there:

     <<...generate file $myfile in some way...>>
     set fp [open $myfile]
     ns_returnfp 200 text/plain $fp
(Continue reading)

John Caruso | 19 Aug 01:20

Re: Data "corruption" with fastpath caching

On Monday 03:38 PM 8/18/2008, Jeff Rogers wrote:
>While I'd agree this is a bug in fastpath, the real problem is that 
>fastpath is being used at all in this case.  The intent of fastpath is to 
>avoid reading a seldom-changed file from disk.

I'd agree that that's the intent, but the caching is hidden within 
ns_returnfile and it's not clear at all from the user's perspective that 
this alligator is lurking in the swamp.  Using ns_returnfile in this way 
may not be the best approach in any particular situation, but it's 
nonetheless a completely valid usage and isn't contraindicated in any 
AOLserver docs I've seen.

It's not difficult to come up with examples where it might happen, 
BTW...say, a web service that returns the result of an operating system 
command to a user.

I think Jade makes a good point that this is not only a bug but 
potentially a security issue.

>It happens to be used in ns_returnfile since that is the normal use 
>case.  On unix the fastpath cache is keyed off the dev/inode probably to 
>keep the hash key shorter.  Windows doesn't have device and inode numbers 
>so it uses the filename as the hashkey, so it wouldn't run into this 
>problem.

No, it can still easily run into this problem--it's just that the file 
name needs to be the same in both cases (which actually did apply in my 
client's case, and caused confusion in the early debugging of the problem, 
since the assumption was that using the same file name and/or path name 
was the source of the problem).
(Continue reading)

Tom Jackson | 19 Aug 01:51

Re: Data "corruption" with fastpath caching

On Mon, 2008-08-18 at 16:20 -0700, John Caruso wrote:
> It's not difficult to come up with examples where it might happen, 
> BTW...say, a web service that returns the result of an operating system 
> command to a user.

The command is named ns_returnfile.

The expectation is that you are returning a "file", not a web service
resource. 

The expectation is that the file will be around for longer than one
second before being deleted and replaced.

The fact that the documentation doesn't say this is unimportant. Inodes
are reused, this is part of how the filesystem works. You could run into
the same problem with an archive program. A file of the same inode,
name, size and age is created replacing the old file. Most archive
programs would not understand that the file contents had changed. Is it
a bug? No. It is called a practical limitation.

Anyway: no bug, just how it works. The only bug is how ns_returnfile is
being used in the example. 

tom jackson

Jade Rubick | 19 Aug 01:56

Re: Data "corruption" with fastpath caching

Consider this use case:
  • You use git or another version control system to store for a bunch of static html files you serve with Aolserver.
  • You check out all of your static html files. Because they're all checked out at the same time, many of them have identical timestamps.
Could the user get the wrong version of an html file they're being served?

What about this scenario:
  • You have a web application that allows administrators on various sites hosted on your application to download a list of user names and passwords (this is a slightly contrived example). They can download it to CSV.
  • Admin #1 generates this file. You create a unique filename for their site_id, because you want a unique filename to return back to the user: site-1234-passwords.csv. You return this file to the admin.
  • Admin #2 generates their file. You create a unique filename for their site_id, because you want a unique filename to return back to the user: site-5000-passwords.csv. You attempt to return this file to the admin. Because their request was in the same second, however, they get site-1234-passwords.csv?
Do I understand the problem correctly? I think both of these scenarios are pretty common examples of the way people use Aolserver currently, but I'm not sure if I'm understanding correctly the bug.

Jade

Jade Rubick
Director of Development
Truist
120 Wall Street, 4th Floor
New York, NY USA
jade <at> volunteersolutions.org
+1 503 285 4963
+1 707 671 1333 fax



The information contained in this email/document is confidential and may be legally privileged. Access to this mail/document by anyone other than the intended recipient(s) is unauthorized. If you are not an intended recipient, any disclosure, copying, distribution, or any action taken or omitted to be taken in reliance to it, is prohibited.


On Mon, Aug 18, 2008 at 4:20 PM, John Caruso <jcaruso <at> arenasolutions.com> wrote:
On Monday 03:38 PM 8/18/2008, Jeff Rogers wrote:
While I'd agree this is a bug in fastpath, the real problem is that fastpath is being used at all in this case.  The intent of fastpath is to avoid reading a seldom-changed file from disk.

I'd agree that that's the intent, but the caching is hidden within ns_returnfile and it's not clear at all from the user's perspective that this alligator is lurking in the swamp.  Using ns_returnfile in this way may not be the best approach in any particular situation, but it's nonetheless a completely valid usage and isn't contraindicated in any AOLserver docs I've seen.

It's not difficult to come up with examples where it might happen, BTW...say, a web service that returns the result of an operating system command to a user.

I think Jade makes a good point that this is not only a bug but potentially a security issue.


It happens to be used in ns_returnfile since that is the normal use case.  On unix the fastpath cache is keyed off the dev/inode probably to keep the hash key shorter.  Windows doesn't have device and inode numbers so it uses the filename as the hashkey, so it wouldn't run into this problem.

No, it can still easily run into this problem--it's just that the file name needs to be the same in both cases (which actually did apply in my client's case, and caused confusion in the early debugging of the problem, since the assumption was that using the same file name and/or path name was the source of the problem).


From the server side, this could be fixed by:
- adding in the filename to the hash key or checking that it is the same

No go, as observed above.


- making ns_unlink flush the entry from the fastpath cache

Nope, since the file can be removed via (e.g.) exec rm.


- restricting what fastpath will cache - e.g., don't cache anything in /var/tmp or /tmp or a configuration-specified directory.
- adding a "-nocache" flag to ns_returnfile

This last is the one I'd considered as well, but the problem is that it puts the onus on the user to know that they should use the flag, and that's unlikely to be clear to them.


I don't think your suggestion of waiting for cache entries to age a second or two would work well, it just moves the race condition around and adds a whole lot of disk activity when a busy server is warming up - static files might be read a few dozen times instead of once.

Nope, not at all.  The only files that would get read more than once would be those that were served within one second of being generated--which wouldn't apply to any content that fits the definition of "static".

So this is actually a fairly non-intrusive fix.  The main limitation is that it relies on the file timestamps and the server timestamps being synchronized, which may not always be true.  But I can't think of a better solution.  Simply put, fastpath caching is inherently broken because it's not possible to guarantee that the file in question really should be served from cache (again, short of a cache-defeating checksum).


Fixing it from the application side is much easier.  Just use ns_returnfp instead of ns_returnfile, on the open handle if you generated the file from tcl code and it's convenient to get the handle, otherwise by opening the file right there:

Yep, and that's more or less the workaround I'd suggested to my client.  But my point here wasn't to ask about potential workarounds but to highlight the issue itself, since I haven't seen it mentioned before.


- John


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <listserv <at> listserv.aol.com> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.

Tom Jackson | 19 Aug 02:33

Re: Data "corruption" with fastpath caching

On Mon, 2008-08-18 at 16:56 -0700, Jade Rubick wrote:
> Consider this use case:
>       * You use git or another version control system to store for a
>         bunch of static html files you serve with Aolserver. 
>       * You check out all of your static html files. Because they're
>         all checked out at the same time, many of them have identical
>         timestamps.
> Could the user get the wrong version of an html file they're being
> served?
> 

No, because each file has a different inode. The "bug" requires that you
create and destroy one file and create another one within one second (so
they have the same timestamp) also required that the same inode is used
and that the file is the same exact size. 

But beyond that, hopefully your git checkout will maintain the original
timestamp with the file.

> What about this scenario:
>       * You have a web application that allows administrators on
>         various sites hosted on your application to download a list of
>         user names and passwords (this is a slightly contrived
>         example). They can download it to CSV.
>       * Admin #1 generates this file. You create a unique filename for
>         their site_id, because you want a unique filename to return
>         back to the user: site-1234-passwords.csv. You return this
>         file to the admin.
>       * Admin #2 generates their file. You create a unique filename
>         for their site_id, because you want a unique filename to
>         return back to the user: site-5000-passwords.csv. You attempt
>         to return this file to the admin. Because their request was in
>         the same second, however, they get site-1234-passwords.csv?
> Do I understand the problem correctly? I think both of these scenarios
> are pretty common examples of the way people use Aolserver currently,
> but I'm not sure if I'm understanding correctly the bug.
> 

The filename doesn't matter, neither does the source of the information.
Two different requests could create files. The requirement is that the
first is created and destroyed and the second is created within the same
second as the first, reuses the inode, has the exact same size. 

This is why you should not use linked files (with path names) as
temporary storage. Instead, open the file then unlink it (delete it from
the filesystem), then use it via the fd.

In short: there is no bug.

tom jackson

Jeff Rogers | 19 Aug 03:06

Re: Data "corruption" with fastpath caching

Tom Jackson wrote:
> No, because each file has a different inode. The "bug" requires that you
> create and destroy one file and create another one within one second (so
> they have the same timestamp) also required that the same inode is used
> and that the file is the same exact size. 
> 
> But beyond that, hopefully your git checkout will maintain the original
> timestamp with the file.

The "bug" conditions are actually slightly looser than this, because 
fastpath checks mtime and not ctime.  So a malicious user (or your 
version control system, if it makes the local files have the same 
timestamps as those in the repo) could overwrite a file at any point in 
the future, utime() it back to the same time and fastpath would still 
consider it the same.  So would any number of unix utilities, like 
rsync, tar, zip, etc.

Going back to my previous solutions, the only one on the server side 
that I still think is reasonable (names break hardlinks, cache flushing 
on unlink wasn't good in the first place, -nocache - why bother?) is to 
add a configuration option to exclude particular paths from fastpath. 
Actually not even a configuration option; that would involve a bit too 
much overhead for a marginal case; maybe a patch to fix this problem for 
users for whom it is a problem.

Using an unlinked file as a temporary is the right thing to do most of 
the time, but I imagine ti could be difficult to do when you need to 
pass the filename around to uncooperative external programs.

-J

Jeff Rogers | 19 Aug 02:13

Re: Data "corruption" with fastpath caching

John Caruso wrote:

> I'd agree that that's the intent, but the caching is hidden within 
> ns_returnfile and it's not clear at all from the user's perspective that 
> this alligator is lurking in the swamp.  Using ns_returnfile in this way 
> may not be the best approach in any particular situation, but it's 
> nonetheless a completely valid usage and isn't contraindicated in any 
> AOLserver docs I've seen.

This then is the real fix: mention it in the docs.  I put a blurb on the 
appropriate wiki pages; feel free to suggest something better :)
The docs in the distribution should be updated too.

>> It happens to be used in ns_returnfile since that is the normal use 
>> case.  On unix the fastpath cache is keyed off the dev/inode probably 
>> to keep the hash key shorter.  Windows doesn't have device and inode 
>> numbers so it uses the filename as the hashkey, so it wouldn't run 
>> into this problem.
> 
> No, it can still easily run into this problem--it's just that the file 
> name needs to be the same in both cases (which actually did apply in my 
> client's case, and caused confusion in the early debugging of the 
> problem, since the assumption was that using the same file name and/or 
> path name was the source of the problem).

The system needs to be free to do some things to improve performance 
with the understanding that the user needs to be aware of those things 
or risk bad behaviour.  I wouldn't call it an unreasonable assumption 
that a file with the same name (and same modtime etc) is the same file.
You can run into a very similar problem with NFS (i.e., attribute 
caching causing a modified file to appear not so) and people have 
learned to deal with that.

>> - making ns_unlink flush the entry from the fastpath cache
> 
> Nope, since the file can be removed via (e.g.) exec rm.

True, but I'd still put this in the "system needs to be able to ..." 
category above.  The system does some things and the developer should be 
aware of those things.

>> I don't think your suggestion of waiting for cache entries to age a 
>> second or two would work well, it just moves the race condition around 
>> and adds a whole lot of disk activity when a busy server is warming up 
>> - static files might be read a few dozen times instead of once.
> 
> Nope, not at all.  The only files that would get read more than once 
> would be those that were served within one second of being 
> generated--which wouldn't apply to any content that fits the definition 
> of "static".

It would work in your exact case, where the file is always removed 
immediately after being served and generated.  But if not, it would 
still come up with the wrong answer.

13:50:21 - create file
13:50:21 - serve file (gets cached)
13:50:21 - delete file
13:50:21 - create file again (reuses inode)
... time passes ...
13:55:11 - serve file

In this case the file modtime is more than a few seconds old, but the 
cached mtime, inode, etc. are still matching the file on disk, so the 
stale cache entry would get delivered.

There is also at least one clever optimization where "static" content 
does get served within a second of being created, where the 404 page is 
used to generate something like an image from something like a database 
and writes it to a file where it is subsequently served by fastpath.

> So this is actually a fairly non-intrusive fix.  The main limitation is 
> that it relies on the file timestamps and the server timestamps being 
> synchronized, which may not always be true.  But I can't think of a 
> better solution.  Simply put, fastpath caching is inherently broken 
> because it's not possible to guarantee that the file in question really 
> should be served from cache (again, short of a cache-defeating checksum).

The same can be said about nearly any caching system: it is unable to 
handle changes in the data that happen outside of the cache's control or 
knowledge.  This is just the bargain you make when you use a cache.

> But my point here wasn't to ask about potential workarounds but to 
> highlight the issue itself, since I haven't seen it mentioned before.

I think you highlighting it is most of the fix.  From there, get the 
caveat inserted into the documentation and the knowledge into the 
community so that the next person who runs into this problem will have 
an easier, or at least less frustrating time solving it.

-J

russell muetzelfeldt | 19 Aug 02:39

Re: Data "corruption" with fastpath caching

On 19/08/2008, at 10:13 AM, Jeff Rogers wrote:
> John Caruso wrote:
>
> The system needs to be free to do some things to improve  
> performance with the understanding that the user needs to be aware  
> of those things or risk bad behaviour.  I wouldn't call it an  
> unreasonable assumption that a file with the same name (and same  
> modtime etc) is the same file.
> You can run into a very similar problem with NFS (i.e., attribute  
> caching causing a modified file to appear not so) and people have  
> learned to deal with that.

the problem is that this can occur even if the filename is changed,  
and I'd argue that pretty convincingly violates the principle of  
least surprise.

yes, of course the system needs to make some assumptions about what  
it can optimise, but if the contents of /tmp/userinfo-71562 might get  
served back when I've asked for /tmp/userinfo-61453 then there's  
something wrong.

Russell

Tom Jackson | 19 Aug 02:56

Re: Data "corruption" with fastpath caching

On Tue, 2008-08-19 at 10:39 +1000, russell muetzelfeldt wrote:
> On 19/08/2008, at 10:13 AM, Jeff Rogers wrote:
> > John Caruso wrote:
> >
> > The system needs to be free to do some things to improve  
> > performance with the understanding that the user needs to be aware  
> > of those things or risk bad behaviour.  I wouldn't call it an  
> > unreasonable assumption that a file with the same name (and same  
> > modtime etc) is the same file.
> > You can run into a very similar problem with NFS (i.e., attribute  
> > caching causing a modified file to appear not so) and people have  
> > learned to deal with that.
> 
> the problem is that this can occur even if the filename is changed,  
> and I'd argue that pretty convincingly violates the principle of  
> least surprise.
> 
> yes, of course the system needs to make some assumptions about what  
> it can optimise, but if the contents of /tmp/userinfo-71562 might get  
> served back when I've asked for /tmp/userinfo-61453 then there's  
> something wrong.

If it were not for the fact that the same system is entirely responsible
for the situation, then I would agree. 

What you are really hoping for here is an idiot proof system. The big
hole in the reasoning here is that the important thing is the file name
with path, and that somehow this name is immutably linked to some
content. This is delusion. You want a transactional database but you are
using a filesystem. Grow up. 

BTW, fastpath has configuration parameters. Maybe bone up on those
first.

tom jackson

russell muetzelfeldt | 19 Aug 03:37

Re: Data "corruption" with fastpath caching

On 19/08/2008, at 10:56 AM, Tom Jackson wrote:

> You want a transactional database but you are using a filesystem.  
> Grow up.

and

> If your application wasn't the responsible party which violated the  
> expectation you state, I would agree (maybe).

please go and re-read this thread, and get your parties straight.

Tom Jackson | 19 Aug 03:59

Re: Data "corruption" with fastpath caching

On Tue, 2008-08-19 at 11:37 +1000, russell muetzelfeldt wrote:
> On 19/08/2008, at 10:56 AM, Tom Jackson wrote:
> 
> > You want a transactional database but you are using a filesystem.  
> > Grow up.
> 
> and
> 
> > If your application wasn't the responsible party which violated the  
> > expectation you state, I would agree (maybe).
> 
> 
> 
> please go and re-read this thread, and get your parties straight.

Sorry, I don't follow. 

Until someone explains to me why we need to be able to create and delete
a file (then return it via fastpath), then create another file in the
same second, I'll maintain that there is no bug in fastpath. 

The whole thing is a waste of time and space. We don't need to fix
ns_returnfile so that it is easier to waste time or space. 

tom jackson

russell muetzelfeldt | 19 Aug 04:24

Re: Data "corruption" with fastpath caching

On 19/08/2008, at 11:59 AM, Tom Jackson wrote:
> On Tue, 2008-08-19 at 11:37 +1000, russell muetzelfeldt wrote:
>> On 19/08/2008, at 10:56 AM, Tom Jackson wrote:
>>
>>> You want a transactional database but you are using a filesystem.
>>> Grow up.
>>
>> and
>>
>>> If your application wasn't the responsible party which violated the
>>> expectation you state, I would agree (maybe).
>>
>> please go and re-read this thread, and get your parties straight.
>
> Sorry, I don't follow.

ok, I'll spell it out.

it's not my application that's violated the expectation I state. you  
haven't been paying attention to the From: headers, and seem to have  
mistaken me for the original poster of this thread.

all I've been saying is that "ns_returnfile <filename>" returning the  
content of something other than <filename>, contrary to the  
documentation and common sense, is a bug. given that fastpath exists  
for a (good) reason, and that the behaviour which triggers the bug is  
marginal anyway, the correct response is "the bug will not be fixed,  
here's why, and here's how to work around it".

cheers

Russell

Tom Jackson | 19 Aug 07:18

Re: Data "corruption" with fastpath caching

On Tue, 2008-08-19 at 12:24 +1000, russell muetzelfeldt wrote:
> On 19/08/2008, at 11:59 AM, Tom Jackson wrote:
> > On Tue, 2008-08-19 at 11:37 +1000, russell muetzelfeldt wrote:
> >> On 19/08/2008, at 10:56 AM, Tom Jackson wrote:
> >>
> >>> You want a transactional database but you are using a filesystem.
> >>> Grow up.
> >>
> >> and
> >>
> >>> If your application wasn't the responsible party which violated the
> >>> expectation you state, I would agree (maybe).
> >>
> >> please go and re-read this thread, and get your parties straight.
> >
> > Sorry, I don't follow.
> 
> ok, I'll spell it out.
> 
> it's not my application that's violated the expectation I state. you  
> haven't been paying attention to the From: headers, and seem to have  
> mistaken me for the original poster of this thread.

Ah, okay. I didn't mean to point to any particular application, by "your" I didn't mean any particular you or
your. 

> all I've been saying is that "ns_returnfile <filename>" returning the  
> content of something other than <filename>, contrary to the  
> documentation and common sense, is a bug. given that fastpath exists  
> for a (good) reason, and that the behaviour which triggers the bug is  
> marginal anyway, the correct response is "the bug will not be fixed,  
> here's why, and here's how to work around it".

It is an interesting point. But it isn't a bug. The purpose of the API
is to return a static file, not one which changes in under a second. It
is not a bug to not support code which is guaranteed to be slower than
common alternatives. 

Fastpath is designed to support return of smallish static content. It
isn't some ancient way of speeding up stuff that was slow, it was for
speeding up stuff that was already fast but was easy to make even
faster. 

If you want to avoid use of fastpath, just set the configuration lower
than your dynamic content:

#
# Fastpath
#
ns_section "ns/server/${server}/fastpath"
ns_param cache                [set cache 10] ;# max entries ??
ns_param cachemaxsize         [set cachemaxsize [expr 5 * 1024 * 1024]]
ns_param cachemaxentry        [expr round(floor($cachemaxsize/$cache))]

Or, if the dynamic content is very small, or customized, don't write it
to a file in the first place. In general you are probably doing
something wrong if you write small content to a file and immediately
delete it. You are also likely doing something wrong if you are caching
large files.

tom jackson

Titi Alailima | 19 Aug 15:18

Re: Data "corruption" with fastpath caching

This would be a wonderful addition to the documentation.  As a matter of fact, I just added it:
http://panoptic.com/wiki/aolserver/Fastpath

For what it's worth, it seems to me that if it has a measurable benefit, it's worth leaving on by default, as
long as developers are properly educated about design issues (flaws, bugs, tradeoffs, whatever) that
they need to deal with.  If it's off by default it may as well be removed entirely.  I say on by default, but
well-documented so that developers are forced to have at least a cursory understanding of it when doing
anything that might relate to it.

Titi Ala'ilima
Lead Architect
MedTouch LLC
1100 Massachusetts Avenue
Cambridge, MA 02138
617.621.8670 x309

> -----Original Message-----
> From: AOLserver Discussion [mailto:AOLSERVER <at> LISTSERV.AOL.COM] On
> Behalf Of Tom Jackson
> Sent: Tuesday, August 19, 2008 1:18 AM
> To: AOLSERVER <at> LISTSERV.AOL.COM
> Subject: Re: [AOLSERVER] Data "corruption" with fastpath caching
>
> On Tue, 2008-08-19 at 12:24 +1000, russell muetzelfeldt wrote:
> > On 19/08/2008, at 11:59 AM, Tom Jackson wrote:
> > > On Tue, 2008-08-19 at 11:37 +1000, russell muetzelfeldt wrote:
> > >> On 19/08/2008, at 10:56 AM, Tom Jackson wrote:
> > >>
> > >>> You want a transactional database but you are using a filesystem.
> > >>> Grow up.
> > >>
> > >> and
> > >>
> > >>> If your application wasn't the responsible party which violated
> the
> > >>> expectation you state, I would agree (maybe).
> > >>
> > >> please go and re-read this thread, and get your parties straight.
> > >
> > > Sorry, I don't follow.
> >
> > ok, I'll spell it out.
> >
> > it's not my application that's violated the expectation I state. you
> > haven't been paying attention to the From: headers, and seem to have
> > mistaken me for the original poster of this thread.
>
> Ah, okay. I didn't mean to point to any particular application, by
> "your" I didn't mean any particular you or your.
>
> > all I've been saying is that "ns_returnfile <filename>" returning the
> > content of something other than <filename>, contrary to the
> > documentation and common sense, is a bug. given that fastpath exists
> > for a (good) reason, and that the behaviour which triggers the bug is
> > marginal anyway, the correct response is "the bug will not be fixed,
> > here's why, and here's how to work around it".
>
> It is an interesting point. But it isn't a bug. The purpose of the API
> is to return a static file, not one which changes in under a second. It
> is not a bug to not support code which is guaranteed to be slower than
> common alternatives.
>
> Fastpath is designed to support return of smallish static content. It
> isn't some ancient way of speeding up stuff that was slow, it was for
> speeding up stuff that was already fast but was easy to make even
> faster.
>
> If you want to avoid use of fastpath, just set the configuration lower
> than your dynamic content:
>
> #
> # Fastpath
> #
> ns_section "ns/server/${server}/fastpath"
> ns_param cache                [set cache 10] ;# max entries ??
> ns_param cachemaxsize         [set cachemaxsize [expr 5 * 1024 * 1024]]
> ns_param cachemaxentry        [expr round(floor($cachemaxsize/$cache))]
>
>
> Or, if the dynamic content is very small, or customized, don't write it
> to a file in the first place. In general you are probably doing
> something wrong if you write small content to a file and immediately
> delete it. You are also likely doing something wrong if you are caching
> large files.
>
> tom jackson
>
>
> --
> AOLserver - http://www.aolserver.com/
>
> To Remove yourself from this list, simply send an email to
> <listserv <at> listserv.aol.com> with the
> body of "SIGNOFF AOLSERVER" in the email message. You can leave the
> Subject: field of your email blank.

Juan José del Río | 19 Aug 15:58

Re: Data "corruption" with fastpath caching

I agree with Titi. The vast majority of times, having Fastpath on does
not harm at all.

Having it disabled by default would be like not using computers because
they fail sometimes. That would be too extreme, isn't it? ;-)

As long as it's well documented, and there are alternatives to avoid the
problems, i think it's ok to leave Fastpath activated by default.

Regards,

  Juan José

--
Juan José del Río
Chief of Commerce
Simple Option S.L.
Avda. Editor Angel Caffarena 11, B11, 1B
Málaga, 29010, Spain

+34 616 512 340 cell
+34 951 930 122 tel/fax

On Tue, 2008-08-19 at 06:18 -0700, Titi Alailima wrote:
> This would be a wonderful addition to the documentation.  As a matter of fact, I just added it:
> http://panoptic.com/wiki/aolserver/Fastpath
> 
> For what it's worth, it seems to me that if it has a measurable benefit, it's worth leaving on by default, as
long as developers are properly educated about design issues (flaws, bugs, tradeoffs, whatever) that
they need to deal with.  If it's off by default it may as well be removed entirely.  I say on by default, but
well-documented so that developers are forced to have at least a cursory understanding of it when doing
anything that might relate to it.
> 
> Titi Ala'ilima
> Lead Architect
> MedTouch LLC
> 1100 Massachusetts Avenue
> Cambridge, MA 02138
> 617.621.8670 x309
> 
> 
> > -----Original Message-----
> > From: AOLserver Discussion [mailto:AOLSERVER <at> LISTSERV.AOL.COM] On
> > Behalf Of Tom Jackson
> > Sent: Tuesday, August 19, 2008 1:18 AM
> > To: AOLSERVER <at> LISTSERV.AOL.COM
> > Subject: Re: [AOLSERVER] Data "corruption" with fastpath caching
> >
> > On Tue, 2008-08-19 at 12:24 +1000, russell muetzelfeldt wrote:
> > > On 19/08/2008, at 11:59 AM, Tom Jackson wrote:
> > > > On Tue, 2008-08-19 at 11:37 +1000, russell muetzelfeldt wrote:
> > > >> On 19/08/2008, at 10:56 AM, Tom Jackson wrote:
> > > >>
> > > >>> You want a transactional database but you are using a filesystem.
> > > >>> Grow up.
> > > >>
> > > >> and
> > > >>
> > > >>> If your application wasn't the responsible party which violated
> > the
> > > >>> expectation you state, I would agree (maybe).
> > > >>
> > > >> please go and re-read this thread, and get your parties straight.
> > > >
> > > > Sorry, I don't follow.
> > >
> > > ok, I'll spell it out.
> > >
> > > it's not my application that's violated the expectation I state. you
> > > haven't been paying attention to the From: headers, and seem to have
> > > mistaken me for the original poster of this thread.
> >
> > Ah, okay. I didn't mean to point to any particular application, by
> > "your" I didn't mean any particular you or your.
> >
> > > all I've been saying is that "ns_returnfile <filename>" returning the
> > > content of something other than <filename>, contrary to the
> > > documentation and common sense, is a bug. given that fastpath exists
> > > for a (good) reason, and that the behaviour which triggers the bug is
> > > marginal anyway, the correct response is "the bug will not be fixed,
> > > here's why, and here's how to work around it".
> >
> > It is an interesting point. But it isn't a bug. The purpose of the API
> > is to return a static file, not one which changes in under a second. It
> > is not a bug to not support code which is guaranteed to be slower than
> > common alternatives.
> >
> > Fastpath is designed to support return of smallish static content. It
> > isn't some ancient way of speeding up stuff that was slow, it was for
> > speeding up stuff that was already fast but was easy to make even
> > faster.
> >
> > If you want to avoid use of fastpath, just set the configuration lower
> > than your dynamic content:
> >
> > #
> > # Fastpath
> > #
> > ns_section "ns/server/${server}/fastpath"
> > ns_param cache                [set cache 10] ;# max entries ??
> > ns_param cachemaxsize         [set cachemaxsize [expr 5 * 1024 * 1024]]
> > ns_param cachemaxentry        [expr round(floor($cachemaxsize/$cache))]
> >
> >
> > Or, if the dynamic content is very small, or customized, don't write it
> > to a file in the first place. In general you are probably doing
> > something wrong if you write small content to a file and immediately
> > delete it. You are also likely doing something wrong if you are caching
> > large files.
> >
> > tom jackson
> >
> >
> > --
> > AOLserver - http://www.aolserver.com/
> >
> > To Remove yourself from this list, simply send an email to
> > <listserv <at> listserv.aol.com> with the
> > body of "SIGNOFF AOLSERVER" in the email message. You can leave the
> > Subject: field of your email blank.
> 
> 
> --
> AOLserver - http://www.aolserver.com/
> 
> To Remove yourself from this list, simply send an email to <listserv <at> listserv.aol.com> with the
> body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
> 
> 

Jeff Rogers | 19 Aug 19:30

Re: Data "corruption" with fastpath caching

Tom Jackson wrote:
> If you want to avoid use of fastpath, just set the configuration lower
> than your dynamic content:
> 
> #
> # Fastpath
> #
> ns_section "ns/server/${server}/fastpath"
> ns_param cache                [set cache 10] ;# max entries ??
> ns_param cachemaxsize         [set cachemaxsize [expr 5 * 1024 * 1024]]
> ns_param cachemaxentry        [expr round(floor($cachemaxsize/$cache))]

The description of the parameters here is a little confusing.  Browsing 
the source, it appears that "cache" is a flag to enable or disable 
fastpath, "cachemaxsize" is the maximum size of the cache, and 
"cachemaxentry" is the largest size of a file that will get cached. 
There is no setting for the max number of entries, the use of $cache in 
the settings above (reflecting the server defaults) is really a minimum 
number of cache entries (i.e., the default cache will hold at least 10 
entries of the max 512k size, but it could also hold 1000 5k files).

I didn't dig deep enough to see how the cache flushing works, but on 
casual perusal it looks like the cache is pruned by removing the oldest 
entries (not largest, least hit, or least recently hit).

-J

John Caruso | 19 Aug 03:06

Re: Data "corruption" with fastpath caching

On Monday 05:13 PM 8/18/2008, Jeff Rogers wrote:
>>Simply put, fastpath caching is inherently broken because it's not 
>>possible to guarantee that the file in question really should be served 
>>from cache (again, short of a cache-defeating checksum).
>
>The same can be said about nearly any caching system: it is unable to 
>handle changes in the data that happen outside of the cache's control or 
>knowledge.  This is just the bargain you make when you use a cache.

I'd say "nearly any" is going too far, and in fact I'd say that for most 
caching systems to fail to return the correct data is a serious bug.  The 
NFS example you bring up isn't really analogous since it's only about 
attributes, which are frequently not a concern; were NFS to return 
incorrect *data* for a file, that would be a serious bug.  And in this 
case we're talking about a web server that may silently return data that's 
completely incorrect, which I'd say is very serious misbehavior.

The core problem here is that AOLserver is attempting to use the tuple of 
(dev, inode, mtime, size) as a unique determiner of a file's identity, and 
that's an inherently broken assumption--particularly so since the 
granularity of mtime is one second and inodes are reused on many 
filesystems (e.g. very common ones like ext3 and ufs).

>I think you highlighting it is most of the fix.  From there, get the 
>caveat inserted into the documentation and the knowledge into the 
>community so that the next person who runs into this problem will have an 
>easier, or at least less frustrating time solving it.

That'd be an improvement over the current situation, but it's still the 
case that AOLserver as currently shipped has a file cache mechanism built 
into it which 1) may return incorrect data and 2) is enabled by 
default.  Given the risk, I'd say fastpath caching should be disabled by 
default rather than enabled.

- John

russell muetzelfeldt | 19 Aug 03:21

Re: Data "corruption" with fastpath caching

On 19/08/2008, at 11:06 AM, John Caruso wrote:

> That'd be an improvement over the current situation, but it's still  
> the case that AOLserver as currently shipped has a file cache  
> mechanism built into it which 1) may return incorrect data and 2)  
> is enabled by default.  Given the risk, I'd say fastpath caching  
> should be disabled by default rather than enabled.

if someone's application is at risk of triggering this behaviour,  
that'd just delays any problem until their load is high enough that  
they need to turn on fastpath - and surely that's an even worse  
scenario.

cheers

Russell

John Caruso | 19 Aug 04:20

Re: Data "corruption" with fastpath caching

On Monday 06:21 PM 8/18/2008, russell muetzelfeldt wrote:
>On 19/08/2008, at 11:06 AM, John Caruso wrote:
>>That'd be an improvement over the current situation, but it's still
>>the case that AOLserver as currently shipped has a file cache
>>mechanism built into it which 1) may return incorrect data and 2)
>>is enabled by default.  Given the risk, I'd say fastpath caching
>>should be disabled by default rather than enabled.
>
>if someone's application is at risk of triggering this behaviour,
>that'd just delays any problem until their load is high enough that
>they need to turn on fastpath - and surely that's an even worse
>scenario.

I'd say it's still better, because it requires explicit action on the 
user's part to enable the flawed caching mechanism in that case.  And 
actually I don't think fastpath in its default configuration would be of 
much help in performance terms these days, given that the cache is only 
5MB large and file data is typically cached by the OS anyway (and servers 
generally have far more RAM than they did even five years ago).

I do think this should have been considered (and steps taken to address 
it) when the fastpath caching mechanism was initially developed, since 
it's a glaring flaw.  I've designed things that rely on shaky underlying 
assumptions in the past, but only in controlled circumstances where those 
assumptions were guaranteed to obtain.  I can think of situations in which 
a caching mechanism with this type of design limitation wouldn't be an 
issue, but in my opinion it has no place being a default-enabled mechanism 
in an enterprise-grade web server.

- John

Tom Jackson | 19 Aug 07:27

Re: Data "corruption" with fastpath caching

On Mon, 2008-08-18 at 19:20 -0700, John Caruso wrote:

> I'd say it's still better, because it requires explicit action on the 
> user's part to enable the flawed caching mechanism in that case.  And 
> actually I don't think fastpath in its default configuration would be of 
> much help in performance terms these days, given that the cache is only 
> 5MB large and file data is typically cached by the OS anyway (and servers 
> generally have far more RAM than they did even five years ago).
> 

fastpath is for small static content. You don't need to cache large
files, and that is why the cachemaxsize parameter gives you a cutoff on
the largest size to cache. 

AOLserver has great performance on small files, fastpath speeds it up
further, plus the overall scheme handles directory files, internal
redirects, etc. 

> I do think this should have been considered (and steps taken to address 
> it) when the fastpath caching mechanism was initially developed, since 
> it's a glaring flaw.  I've designed things that rely on shaky underlying 
> assumptions in the past, but only in controlled circumstances where those 
> assumptions were guaranteed to obtain.  I can think of situations in which 
> a caching mechanism with this type of design limitation wouldn't be an 
> issue, but in my opinion it has no place being a default-enabled mechanism 
> in an enterprise-grade web server.

Why not just write another API which strips out all the things you don't
like. I think you misjudge fastpath in every way, but whatever.

tom jackson

Andrew Piskorski | 19 Aug 09:00

Re: Data "corruption" with fastpath caching

On Mon, Aug 18, 2008 at 06:06:23PM -0700, John Caruso wrote:

> That'd be an improvement over the current situation, but it's still the 
> case that AOLserver as currently shipped has a file cache mechanism built 
> into it which 1) may return incorrect data and 2) is enabled by 
> default.  Given the risk, I'd say fastpath caching should be disabled by 
> default rather than enabled.

Sounds right to me.  Either robustify Fastpath somehow against this
corner case, or don't have Fastpath turned on by default.

--

-- 
Andrew Piskorski <atp <at> piskorski.com>
http://www.piskorski.com/

Tom Jackson | 19 Aug 18:20

Re: Data "corruption" with fastpath caching

Andrew,

This is not a corner case. The exact same thing could happen without
fastpath. 

What is that thing? That the contents of a file changes after a request
is made and before the file is returned. In fact, there is no guarantee
that it won't change mid-return. This is a fact of life with files on
any filesystem. 

In fact, with the HTTP caching mechanisms, you could fail to get
up-to-date contents of a file, since the If-Modified-Since mechanism
will also fail. 

The problem here is that the application is using this static file
handling API to serve dynamic content. Wondering why it doesn't work is
pointless.

Just to summarize again, this case requires that a file is created then
destroyed and another file created within the same second that has the
same size. Also, the original file must get into the cache, and the only
way that can happen is for the application to treat it as a long lived
static file. 

We have other means to cache dynamic data, and large chunks of dynamic
content saved as a file can avoid the fastpath cache by setting the
cachemaxsize parameter. Writing smaller content to disk doesn't make any
sense if your goal is speed...or security. 

It is probably even more important to tamp down these misconceptions
about how AOLserver works. Static and dynamic content are handled by
different API. The reason is that it has long been recognized by the
developers of AOLserver that different techniques are required to
maintain high performance based upon how the content is generated, its
expected lifespan, its size, and its potential for reuse.  

tom jackson

On Tue, 2008-08-19 at 03:00 -0400, Andrew Piskorski wrote:
> On Mon, Aug 18, 2008 at 06:06:23PM -0700, John Caruso wrote:
> 
> > That'd be an improvement over the current situation, but it's still the 
> > case that AOLserver as currently shipped has a file cache mechanism built 
> > into it which 1) may return incorrect data and 2) is enabled by 
> > default.  Given the risk, I'd say fastpath caching should be disabled by 
> > default rather than enabled.
> 
> Sounds right to me.  Either robustify Fastpath somehow against this
> corner case, or don't have Fastpath turned on by default.
> 

Juan José del Río | 19 Aug 19:03

Re: Data "corruption" with fastpath caching

What about using epoll (or equivalent) in Linux, and kqueue in FreeBSD
to tell the kernel to notify AOLServer in change a file has changed?

That'd be a pretty easy and efficient way to discard fastpath items in
case they have been deleted and/or modified.

Just my two cents ;-) 

-  
Juan José del Río    |  
(+34) 616 512 340    |  juanjose <at> simpleoption.com

Simple Option S.L.
  Tel: (+34) 951 930 122
  Fax: (+34) 951 930 122
  http://www.simpleoption.com

On Tue, 2008-08-19 at 09:20 -0700, Tom Jackson wrote:
> Andrew,
> 
> This is not a corner case. The exact same thing could happen without
> fastpath. 
> 
> What is that thing? That the contents of a file changes after a request
> is made and before the file is returned. In fact, there is no guarantee
> that it won't change mid-return. This is a fact of life with files on
> any filesystem. 
> 
> In fact, with the HTTP caching mechanisms, you could fail to get
> up-to-date contents of a file, since the If-Modified-Since mechanism
> will also fail. 
> 
> The problem here is that the application is using this static file
> handling API to serve dynamic content. Wondering why it doesn't work is
> pointless.
> 
> Just to summarize again, this case requires that a file is created then
> destroyed and another file created within the same second that has the
> same size. Also, the original file must get into the cache, and the only
> way that can happen is for the application to treat it as a long lived
> static file. 
> 
> We have other means to cache dynamic data, and large chunks of dynamic
> content saved as a file can avoid the fastpath cache by setting the
> cachemaxsize parameter. Writing smaller content to disk doesn't make any
> sense if your goal is speed...or security. 
> 
> It is probably even more important to tamp down these misconceptions
> about how AOLserver works. Static and dynamic content are handled by
> different API. The reason is that it has long been recognized by the
> developers of AOLserver that different techniques are required to
> maintain high performance based upon how the content is generated, its
> expected lifespan, its size, and its potential for reuse.  
> 
> tom jackson
> 
> On Tue, 2008-08-19 at 03:00 -0400, Andrew Piskorski wrote:
> > On Mon, Aug 18, 2008 at 06:06:23PM -0700, John Caruso wrote:
> > 
> > > That'd be an improvement over the current situation, but it's still the 
> > > case that AOLserver as currently shipped has a file cache mechanism built 
> > > into it which 1) may return incorrect data and 2) is enabled by 
> > > default.  Given the risk, I'd say fastpath caching should be disabled by 
> > > default rather than enabled.
> > 
> > Sounds right to me.  Either robustify Fastpath somehow against this
> > corner case, or don't have Fastpath turned on by default.
> > 

Jim Davidson | 20 Aug 02:31

Re: Data "corruption" with fastpath caching

Hi,

I haven't looked at a "directory change notification" type scheme in a  
long time but that could be very clever.  Aside from addressing issues  
discussed here, the key benefit would be to avoid the repeated "stat"  
syscalls.  Those stat calls always bothered me conceptually but the  
performance of the underlying systems always improved faster than my  
irritation would grow to do something about it.  However, we were  
always careful to run websites against local filesystems - I would be  
more concerned with the overhead if we were using NFS or some other  
shared filesystem thing.

Somewhat related, the "dci module" (a series of AOL extensions we open  
sourced awhile back) includes some content fetch/caching features  
called "sob".  That had the model you described -- things stayed in  
the cache until either space was needed or the server received an  
explicit flush message on a publish event.  That approach worked well  
and scaled well but it wasn't entirely general nor naive, i.e., it was  
key that we understood how it worked under the covers and to make sure  
the flush message links were reliable to avoid stale content problems.

Anyway, I've been pondering this whole discussion some more and agree  
with Tom -- the fastpath isn't broken.  It just does a certain thing  
-- serves static files with a reasonable balance of performance and  
stability -- and shouldn't be modified except to add notes about how  
it works in the docs.  I'm having trouble thinking through how it  
could be modified to plug all possible race conditions.  I'd suggest  
the code snippets using fastpath for dynamic content should be  
modified, perhaps some new Tcl commands could be added to make it  
convenient, but otherwise it seems a mismatch between capabilities and  
requirements.

-Jim

On Aug 19, 2008, at 1:03 PM, Juan José del Río wrote:

> What about using epoll (or equivalent) in Linux, and kqueue in FreeBSD
> to tell the kernel to notify AOLServer in change a file has changed?
>
> That'd be a pretty easy and efficient way to discard fastpath items in
> case they have been deleted and/or modified.
>
> Just my two cents ;-)
>
> -
> Juan José del Río    |
> (+34) 616 512 340    |  juanjose <at> simpleoption.com
>
>
> Simple Option S.L.
>  Tel: (+34) 951 930 122
>  Fax: (+34) 951 930 122
>  http://www.simpleoption.com
>
>
> On Tue, 2008-08-19 at 09:20 -0700, Tom Jackson wrote:
>> Andrew,
>>
>> This is not a corner case. The exact same thing could happen without
>> fastpath.
>>
>> What is that thing? That the contents of a file changes after a  
>> request
>> is made and before the file is returned. In fact, there is no  
>> guarantee
>> that it won't change mid-return. This is a fact of life with files on
>> any filesystem.
>>
>> In fact, with the HTTP caching mechanisms, you could fail to get
>> up-to-date contents of a file, since the If-Modified-Since mechanism
>> will also fail.
>>
>> The problem here is that the application is using this static file
>> handling API to serve dynamic content. Wondering why it doesn't  
>> work is
>> pointless.
>>
>> Just to summarize again, this case requires that a file is created  
>> then
>> destroyed and another file created within the same second that has  
>> the
>> same size. Also, the original file must get into the cache, and the  
>> only
>> way that can happen is for the application to treat it as a long  
>> lived
>> static file.
>>
>> We have other means to cache dynamic data, and large chunks of  
>> dynamic
>> content saved as a file can avoid the fastpath cache by setting the
>> cachemaxsize parameter. Writing smaller content to disk doesn't  
>> make any
>> sense if your goal is speed...or security.
>>
>> It is probably even more important to tamp down these misconceptions
>> about how AOLserver works. Static and dynamic content are handled by
>> different API. The reason is that it has long been recognized by the
>> developers of AOLserver that different techniques are required to
>> maintain high performance based upon how the content is generated,  
>> its
>> expected lifespan, its size, and its potential for reuse.
>>
>> tom jackson
>>
>> On Tue, 2008-08-19 at 03:00 -0400, Andrew Piskorski wrote:
>>> On Mon, Aug 18, 2008 at 06:06:23PM -0700, John Caruso wrote:
>>>
>>>> That'd be an improvement over the current situation, but it's  
>>>> still the
>>>> case that AOLserver as currently shipped has a file cache  
>>>> mechanism built
>>>> into it which 1) may return incorrect data and 2) is enabled by
>>>> default.  Given the risk, I'd say fastpath caching should be  
>>>> disabled by
>>>> default rather than enabled.
>>>
>>> Sounds right to me.  Either robustify Fastpath somehow against this
>>> corner case, or don't have Fastpath turned on by default.
>>>
>
>
> --
> AOLserver - http://www.aolserver.com/
>
> To Remove yourself from this list, simply send an email to <listserv <at> listserv.aol.com 
> > with the
> body of "SIGNOFF AOLSERVER" in the email message. You can leave the  
> Subject: field of your email blank.

Tom Jackson | 19 Aug 01:37

Re: Data "corruption" with fastpath caching

On Mon, 2008-08-18 at 15:38 -0700, Jeff Rogers wrote:
> While I'd agree this is a bug in fastpath, the real problem is that 
> fastpath is being used at all in this case.  

I don't think it is a bug in fastpath. 

Think about the case where multiple logical files are actually the same
physical file. Using the name would result in caching the same object
under different names. This is a much more likely situation than this so
called bug.

tom jackson

Jeff Rogers | 19 Aug 02:18

Re: Data "corruption" with fastpath caching

Tom Jackson wrote:
> Think about the case where multiple logical files are actually the same
> physical file. Using the name would result in caching the same object
> under different names. This is a much more likely situation than this so
> called bug.

Huh, hard links - I sometimes forget about those.  It's a much more 
believable reason (than my previous suggestion of shortening the key) 
for why the inode was used instead of the filename for the hash key.

-J

russell muetzelfeldt | 19 Aug 02:01

Re: Data "corruption" with fastpath caching

On 19/08/2008, at 9:37 AM, Tom Jackson wrote:
> On Mon, 2008-08-18 at 15:38 -0700, Jeff Rogers wrote:
>> While I'd agree this is a bug in fastpath, the real problem is that
>> fastpath is being used at all in this case.
>
> I don't think it is a bug in fastpath.

fastpath is making assumptions about what means something is the  
"same file", and those assumptions are not consistent with unix  
filesystem semantics - how is this not a bug?

sure, the original use case that triggered this seems non-optimal,  
and could be done in other ways that don't trigger the bug, but that  
doesn't mean fastpath is behaving correctly...

Russell

Tom Jackson | 19 Aug 02:44

Re: Data "corruption" with fastpath caching

On Tue, 2008-08-19 at 10:01 +1000, russell muetzelfeldt wrote:
> On 19/08/2008, at 9:37 AM, Tom Jackson wrote:
> > On Mon, 2008-08-18 at 15:38 -0700, Jeff Rogers wrote:
> >> While I'd agree this is a bug in fastpath, the real problem is that
> >> fastpath is being used at all in this case.
> >
> > I don't think it is a bug in fastpath.
> 
> fastpath is making assumptions about what means something is the  
> "same file", and those assumptions are not consistent with unix  
> filesystem semantics - how is this not a bug?
> 

No, fastpath is making the exact same assumptions that any archive
program would make, which is to record certain attributes at the time
something is cached and then compare them with the same attributes at a
later time. Unless you do a checksum or some other comparison, the cache
system doesn't work very well for the intended purpose. 

> sure, the original use case that triggered this seems non-optimal,  
> and could be done in other ways that don't trigger the bug, but that  
> doesn't mean fastpath is behaving correctly...

The "use case" is a bug. You can't violate the essential granularity of
the support system and call it a bug. The granularity is: inode, size,
timestamp. Now, if we could just slow down AOLserver so that this never
happens, that would be a great fix. 

This is like claiming that a checksum collision is a bug. No, it is
expected. We don't use things like checksums, or inode,size,time as a
key as a guarantee of anything. They are a compromise, in other words,
engineering. 

tom jackson

Bas Scheffers | 19 Aug 03:01

Re: Data "corruption" with fastpath caching

On 19/08/2008, at 10:14 AM, Tom Jackson wrote:
> No, fastpath is making the exact same assumptions that any archive
> program would make, which is to record certain attributes at the time
> something is cached and then compare them with the same attributes  
> at a
Could the file name (just the name, not even the full path) not be  
added to the mix? Then using a random string as filename would make  
the problem go away, would it not?

Also, would it be possible to tell ns_returnfile to not use fastpath,  
if it is for one time use?

The alternative in this scenario would of course be to simply read the  
file and just ns_return it.

Bas.

russell muetzelfeldt | 19 Aug 03:04

Re: Data "corruption" with fastpath caching

On 19/08/2008, at 10:44 AM, Tom Jackson wrote:
> On Tue, 2008-08-19 at 10:01 +1000, russell muetzelfeldt wrote:
>>
>> sure, the original use case that triggered this seems non-optimal,
>> and could be done in other ways that don't trigger the bug, but that
>> doesn't mean fastpath is behaving correctly...
>
> The "use case" is a bug. You can't violate the essential  
> granularity of
> the support system and call it a bug. The granularity is: inode, size,
> timestamp. Now, if we could just slow down AOLserver so that this  
> never
> happens, that would be a great fix.

yes, that's exactly what I said - fastpath should be removed.

snark aside, if I say "ns_returnfile /tmp/foo-abcd" but nsd sends the  
contents of the now-deleted /tmp/bar-wxyz to the client then it's not  
doing what I've explicitly asked, and it's a bug.

just because the correct (imo) response is "tag WONTFIX, document as  
a gotcha, document workaround" doesn't mean that the behaviour is  
correct.

cheers

Russell

Tom Jackson | 19 Aug 03:29

Re: Data "corruption" with fastpath caching

On Tue, 2008-08-19 at 11:04 +1000, russell muetzelfeldt wrote:
> snark aside, if I say "ns_returnfile /tmp/foo-abcd" but nsd sends the  
> contents of the now-deleted /tmp/bar-wxyz to the client then it's not  
> doing what I've explicitly asked, and it's a bug.
> 
> just because the correct (imo) response is "tag WONTFIX, document as  
> a gotcha, document workaround" doesn't mean that the behaviour is  
> correct.

If your application wasn't the responsible party which violated the
expectation you state, I would agree (maybe).

The problem is that you think that the contents of a file remains
unchanged as long as the filename itself remains unchanged. 

Actually the problem is that someone is using a file to store volatile
data and then feeding this file through a cache. 

You really need to think about this insanity. Because it is insanity.

1. You waste time writing data to a file. 

2. You use ns_returnfile to send this data (reading from disk).

3. Fastpath puts this information into memory (taking space).

4. ns_returnfile uses the memory copy on later requests (but none
valid).

5. meanwhile the file is deleted, cache still exists taking up memory.

The above are "ideal" conditions. 

The bug is not in ns_returnfile.

tom jackson

Jeff Rogers | 19 Aug 02:53

Re: Data "corruption" with fastpath caching

russell muetzelfeldt wrote:
> On 19/08/2008, at 9:37 AM, Tom Jackson wrote:
>> On Mon, 2008-08-18 at 15:38 -0700, Jeff Rogers wrote:
>>> While I'd agree this is a bug in fastpath, the real problem is that
>>> fastpath is being used at all in this case.
>>
>> I don't think it is a bug in fastpath.
> 
> fastpath is making assumptions about what means something is the "same 
> file", and those assumptions are not consistent with unix filesystem 
> semantics - how is this not a bug?

It's not a bug because no one ever said that it *was* strictly following 
unix filesystem semantics, which isn't even a single thing (ufs is 
slightly different than nfs, is slightly different than ext2 -noatime, 
is slightly different than afs, etc.)  It is following a particular 
definition: if the file still exists and has the same 
dev/inode/mtime/size as it did when you last checked, then it is the 
same file.   This of it as a if-modified-since or if-none-match 
conditional GET.

It is a bug in that it's not what you expect.  However in that case, the 
location of the bug is subject to debate.

-J

> sure, the original use case that triggered this seems non-optimal, and 
> could be done in other ways that don't trigger the bug, but that doesn't 
> mean fastpath is behaving correctly...
> 
> 
> Russell
> 
> 
> -- 
> AOLserver - http://www.aolserver.com/
> 
> To Remove yourself from this list, simply send an email to 
> <listserv <at> listserv.aol.com> with the
> body of "SIGNOFF AOLSERVER" in the email message. You can leave the 
> Subject: field of your email blank.

John Caruso | 19 Aug 22:00

Re: Data "corruption" with fastpath caching

On Monday 05:53 PM 8/18/2008, Jeff Rogers wrote:
>russell muetzelfeldt wrote:
>>fastpath is making assumptions about what means something is the "same 
>>file", and those assumptions are not consistent with unix filesystem 
>>semantics - how is this not a bug?
>
>It's not a bug because no one ever said that it *was* strictly following 
>unix filesystem semantics, which isn't even a single thing (ufs is 
>slightly different than nfs, is slightly different than ext2 -noatime, is 
>slightly different than afs, etc.)  It is following a particular 
>definition: if the file still exists and has the same 
>dev/inode/mtime/size as it did when you last checked, then it is the same 
>file.   This of it as a if-modified-since or if-none-match conditional 
>GET.

Actually that's not analogous, for the same reason that the analogies to 
caching of attributes in NFS, rsync or tar not noticing content changes if 
attributes stay the same, etc, don't apply: because this bug can happen 
*even with two files that have completely different names or 
paths*.  Again, in this example...:

    set file [open "/var/tmp/myfile" "w"]
    puts $file "ABC123"
    close $file
    ns_returnfile 200 text/plain "/var/tmp/myfile"
    ns_unlink -nocomplain "/var/tmp/myfile"

    set file [open "/var/tmp/myotherfile" "w"]
    puts $file "XYZ987"
    close $file
    ns_returnfile 200 text/plain "/var/tmp/myotherfile"
    ns_unlink -nocomplain "/var/tmp/myotherfile"

...AOLserver will almost always return the contents of /var/tmp/myfile 
rather than /var/tmp/myotherfile in response to the second ns_returnfile.

I think the analogies to other systems aren't really germane 
anyway--AOLserver's behavior has to be judged on its own merits.  But 
adopting that standard, I can't think of any other program that would 
confuse /var/tmp/myfile with /var/tmp/myotherfile.

- John

Jeff Rogers | 19 Aug 23:10

Re: Data "corruption" with fastpath caching

John Caruso wrote:

>>  Think of it as a if-modified-since or if-none-match 
>> conditional GET.
> 
> Actually that's not analogous, <...>

I didn't mean to say it was exactly the same, just similar in that given 
a particular system that makes particular assumptions it is possible to 
construct a situation where the results are unexpected or incorrect in a 
particular way.

I think by now everyone reading this understands the problem.  What's 
not clear is what you are expecting to happen now.

Documentation has been updated to reflect awareness of this problem and 
caution against using ns_returnfile in this situation and suggesting 
alternate solutions in the client code.

Some code fixes have been proposed, which for various reasons are 
undesirable or simply won't fix the problem.

A default configuration change was suggested which seems generally 
viewed as undesirable.

What more are you looking for?

-J

John Caruso | 20 Aug 00:33

Re: Data "corruption" with fastpath caching

On Tuesday 02:10 PM 8/19/2008, Jeff Rogers wrote:
>A default configuration change was suggested which seems generally viewed 
>as undesirable.

My impression was that support was split about evenly, actually.  I take 
it that means you're against changing the default?  I'm a bit surprised, 
since you started out agreeing that it's a bug.  Personally I can't 
imagine any persuasive argument that a caching mechanism that can easily 
confuse /usr/local/private/var/rootpass and 
/var/tmp/verisign/certs/webcert.txt should be enabled by default in a web 
server.

For anyone thinking, well, you're the only one who's ever seen this bug, 
I'd say no, we're just the first ones to discover this bug.  It's quite 
possible that other people have run into it without knowing it, since 
AOLserver will just silently serve the wrong data.

As for what I want, as I said, I was mainly bringing this up to shine a 
light on the issue and see what other people's thoughts were.  That's been 
helpful in particular because I hadn't considered the security 
implications, which are quite serious; I may raise this issue on security 
forums as well so that people using ns_returnfile are aware of the danger 
of silent data corruption and/or information leaks and can review their 
code accordingly.

- John

Tom Jackson | 20 Aug 01:11

Re: Data "corruption" with fastpath caching

John,

This isn't a democracy. You have to demonstrate some understanding of
how things work. 

The only real security issue is your misuse/abuse of ns_returnfile to
serve dynamic data. 

Nobody is going to guarantee that you can't shoot yourself in the foot
due to your lack of understanding of writing robust code, or how to
configure and maintain a secure internet application, or take advice on
how to do so. 

But please, go tell the security police about our insecure file
commands. 

tom jackson

On Tue, 2008-08-19 at 15:33 -0700, John Caruso wrote:
> On Tuesday 02:10 PM 8/19/2008, Jeff Rogers wrote:
> >A default configuration change was suggested which seems generally viewed 
> >as undesirable.
> 
> My impression was that support was split about evenly, actually.  I take 
> it that means you're against changing the default?  I'm a bit surprised, 
> since you started out agreeing that it's a bug.  Personally I can't 
> imagine any persuasive argument that a caching mechanism that can easily 
> confuse /usr/local/private/var/rootpass and 
> /var/tmp/verisign/certs/webcert.txt should be enabled by default in a web 
> server.
> 
> For anyone thinking, well, you're the only one who's ever seen this bug, 
> I'd say no, we're just the first ones to discover this bug.  It's quite 
> possible that other people have run into it without knowing it, since 
> AOLserver will just silently serve the wrong data.
> 
> As for what I want, as I said, I was mainly bringing this up to shine a 
> light on the issue and see what other people's thoughts were.  That's been 
> helpful in particular because I hadn't considered the security 
> implications, which are quite serious; I may raise this issue on security 
> forums as well so that people using ns_returnfile are aware of the danger 
> of silent data corruption and/or information leaks and can review their 
> code accordingly.
> 
> - John
> 
> 
> --
> AOLserver - http://www.aolserver.com/
> 
> To Remove yourself from this list, simply send an email to <listserv <at> listserv.aol.com> with the
> body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
>