Michael Haardt | 9 Jan 2007 15:21
Picon

Old topic again: Option to avoid fsync()?

Hello,

I asked here once ago, but got no replies.

If you have a previously overloaded system that built up a large queue,
disabling fsync() is like an overboost switch: Not for regular operation,
but it solves the problem and brings you back to regular operation,
allowing to care about the original problem.

Apart from that, I run a spam honeypot where losing mail is no problem.
Avoiding fsync() for regular operation allows to run it on a way cheaper
system.

A command line option does not help, because it would not be passed
to queue runners and their children.  Either you compile Exim without
fsync() or introduce a new configuration file option.  Having an extra
executable finally annoys me enough to bring this topic up again.

Philip does not like an option like that, because dumb admins may use
it without being aware of the risks.  That's a valid point.

Me, I think the flexibility of Exim allows to screw up so many things
already, that dumb admins probably screw up already and have nothing
to lose.  I wouldn't mind if the daemon logged a message about unsafe
operation to mainlog when starting up.  I am trying to reduce the amount
of private Exim patches and getting this in the main distribution helps
me a lot, plus it may help others that know what they are doing.

Remember ghost busters: There will be the day when you have to cross
the streams. ;-)
(Continue reading)

Matt Bernstein | 9 Jan 2007 15:44
Picon
Favicon

Re: Old topic again: Option to avoid fsync()?

At 15:21 +0100 Michael Haardt wrote:

> If you have a previously overloaded system that built up a large queue,
> disabling fsync() is like an overboost switch: Not for regular operation,
> but it solves the problem and brings you back to regular operation,
> allowing to care about the original problem.

I try to avoid r/w contention by using full data journalling with external 
journals, which are on physically different (and hopefully faster) HDDs. I 
believe this is good practice for synchronous I/O like mail and NFS.

For real psychopathic I-don't-care-about-my-data cases you can always use 
tmpfs..

--

-- 
Florian Weimer | 10 Jan 2007 16:26
Picon

Re: Old topic again: Option to avoid fsync()?

* Matt Bernstein:

> I try to avoid r/w contention by using full data journalling with external 
> journals, which are on physically different (and hopefully faster) HDDs. I 
> believe this is good practice for synchronous I/O like mail and NFS.

If you want to throw money at the problem, a RAID controller with a
battery-backed cache is a good option as well.

On the other hand, with a lot drives in their default configuration,
fsync() can't reliably do what it claims to anyway. 8-/

--

-- 
Michael Haardt | 10 Jan 2007 17:38
Picon

Re: Old topic again: Option to avoid fsync()?

On Wed, Jan 10, 2007 at 04:26:29PM +0100, Florian Weimer wrote:
> If you want to throw money at the problem, a RAID controller with a
> battery-backed cache is a good option as well.

You completely miss the point, so let me rephrase it:  I am _not_
talking about regular operation.  I am talking about cleaning up a mess,
e.g. after an attack or double/triple fault that managed to kill all
redundancy.  Additionally, exotic applications benefit from disabling
fsync().

It's not economical to run systems at 10% of their maximum performance
just to have enough if shit happens, unless of course you just run a
small site, where the economic disadvantage of doing so can be tolerated.

> On the other hand, with a lot drives in their default configuration,
> fsync() can't reliably do what it claims to anyway. 8-/

Actually, if you use maildir, there is no fsync() to synchronise the
directory, just one for the tmp file, but a code audit must be harder
than implicating to leave fsync in place under all conditions, because
it the most you can do and still sometimes not enough.

The only valid point so far was:

>> Do you vote for or against having an option to disable fsync()?

> Against; I don't want Exim authors blamed for irresponsible behaviour.

What's irresponsible most of the time, may exceptionally be sane.
LD_PRELOAD is an idea that probably works fine, although I don't like
(Continue reading)

B. Johannessen | 10 Jan 2007 23:53
Gravatar

Re: Old topic again: Option to avoid fsync()?

Michael Haardt wrote:
> Any other opinions than "enforce fsync, because it works for me"?

If this can be done without impacting those that doesn't want to use the 
feature, I don't think there's much of an argument against it. As has 
been pointed out before, Exim already gives you an almost infinite 
number of ways to shoot yourself (or others) in the foot. Given that 
there are legitimate use-cases for this functionality, I'd vote to 
include it.

Personally I may find it useful for my spamtrap MX, which is handling 
close to 200k messages/week.

	Bob

--

-- 
Philip Hazel | 11 Jan 2007 10:49
Picon
Picon

Re: Old topic again: Option to avoid fsync()?

On Wed, 10 Jan 2007, B. Johannessen wrote:

> Michael Haardt wrote:
> > Any other opinions than "enforce fsync, because it works for me"?
> 
> If this can be done without impacting those that doesn't want to use the 
> feature, I don't think there's much of an argument against it. 

It is clear that this is a controversial issue. Perhaps the resolution 
is to add the option, but require a compile time configuration to 
include the feature. Then it would certainly have zero impact on anybody 
who chose not to include it in the binary. If you have it in the binary 
but do not turn it on, the impact is a flag test every time Exim might 
do an fsync(). I suspect this is a very small cost compared with 
everything else that's going on.

-- 
Philip Hazel            University of Cambridge Computing Service
Get the Exim 4 book:    http://www.uit.co.uk/exim-book

--

-- 
Philip Hazel | 22 Jan 2007 17:34
Picon
Picon

Re: Old topic again: Option to avoid fsync()?

On Thu, 11 Jan 2007, Philip Hazel wrote:

> It is clear that this is a controversial issue. Perhaps the resolution 
> is to add the option, but require a compile time configuration to 
> include the feature. Then it would certainly have zero impact on anybody 
> who chose not to include it in the binary. If you have it in the binary 
> but do not turn it on, the impact is a flag test every time Exim might 
> do an fsync(). I suspect this is a very small cost compared with 
> everything else that's going on.

OK, I've tried to make everyone happy. I have added a compile-time
option called ENABLE_DISABLE_FSYNC, and put a lot of warnings about it
in EDITME. In particular, I've said it should never be used when
compiling binaries for distribution.

If ENABLE_DISABLE_FSYNC is set, a runtime option called disable_fsync is
compiled. If the compile time option is not set, an attempt to use the
runtime option gets "unknown option".

This code is committed to CVS and so will be in tonight's snapshot.

Philip

-- 
Philip Hazel, University of Cambridge Computing Service.

--

-- 
Michael Haardt | 24 Jan 2007 10:41
Picon

Re: Old topic again: Option to avoid fsync()?

> OK, I've tried to make everyone happy. I have added a compile-time
> option called ENABLE_DISABLE_FSYNC, and put a lot of warnings about it
> in EDITME. In particular, I've said it should never be used when
> compiling binaries for distribution.
>
> If ENABLE_DISABLE_FSYNC is set, a runtime option called disable_fsync is
> compiled. If the compile time option is not set, an attempt to use the
> runtime option gets "unknown option".

Perfectly! I just tried it and it works great, reducing I/O from 400
down to about 20 operations per second.  It's back at 400 now, because
that's how things are meant to work, but it is good to know there is an
emergency exit in reach.

Thanks a lot for your work,

Michael

--

-- 
Michael Haardt | 11 Jan 2007 10:59
Picon

Re: Old topic again: Option to avoid fsync()?

> It is clear that this is a controversial issue. Perhaps the resolution 
> is to add the option, but require a compile time configuration to 
> include the feature. Then it would certainly have zero impact on anybody 
> who chose not to include it in the binary. If you have it in the binary 
> but do not turn it on, the impact is a flag test every time Exim might 
> do an fsync(). I suspect this is a very small cost compared with 
> everything else that's going on.

Thanks, that sounds like a perfect solution to me. :)

Michael

--

-- 
Daniel Tiefnig | 11 Jan 2007 14:50
Picon

Re: Old topic again: Option to avoid fsync()?

Michael Haardt wrote:
>> Perhaps the resolution is to add the option, but require a compile
>>  time configuration to include the feature.

Maybe the no-fsync stuff should be limited to non-daemon mode operation?
I think "exim -qff" would do the trick for Michael, (and for me)
wouldn't it? Michael?
That would at least prevent people from running "exim -bd" or "-q5m"
ordinarily. We could just ignore the no-fsync option or abort during
startup.

> Thanks, that sounds like a perfect solution to me. :)

If the Debian people will activate this switch at least in their -heavy
package, I'd second that. Andreas, do you think you can bear the risk?
At least with the above modification? I'm not sure whether I would.

lg,
daniel

--

-- 
Andreas Metzler | 11 Jan 2007 18:38

Re: Old topic again: Option to avoid fsync()?

On 2007-01-11 Daniel Tiefnig <exim <at> inode.at> wrote:
[...]
> If the Debian people will activate this switch at least in their -heavy
> package, I'd second that. Andreas, do you think you can bear the risk?
> At least with the above modification? I'm not sure whether I would.

I don't think we would enable this (unless it is enabled by default
upstream), since it is a controversial feature with Phil's opinion in
a rather definite direction. (I am giving Phil's opinion big weight
for the simple reason that he knows a lot more about the issue.)

However, the ultimate decision would be Marc's not mine, since he is
doing almost the whole work for exim packagin nowadays.
cu andreas
-- 
The 'Galactic Cleaning' policy undertaken by Emperor Zhark is a personal
vision of the emperor's, and its inclusion in this work does not constitute
tacit approval by the author or the publisher for any such projects,
howsoever undertaken.                                (c) Jasper Ffforde

--

-- 
Michael Haardt | 11 Jan 2007 17:28
Picon

Re: Old topic again: Option to avoid fsync()?

> Maybe the no-fsync stuff should be limited to non-daemon mode operation?

I don't think the delivery process knows much about running under a
queue runner spawned by a daemon or by a manually started queue runner
or as part of direct manual delivery.

> I think "exim -qff" would do the trick for Michael, (and for me)
> wouldn't it? Michael?

I don't use Exim queue runners for larger systems, because they do not
scale with a growing queue.

> That would at least prevent people from running "exim -bd" or "-q5m"
> ordinarily. We could just ignore the no-fsync option or abort during
> startup.

Unfortunately, the frequent fsync() calls still impose a large penalty
for queue runners, even if those omit them.  Try running one queue runner
with fsync and the rest without, and you won't see much improvement.

> > Thanks, that sounds like a perfect solution to me. :)
>
> If the Debian people will activate this switch at least in their -heavy
> package, I'd second that. Andreas, do you think you can bear the risk?
> At least with the above modification? I'm not sure whether I would.

Ah, the joy of "distributions".  There ought to be a large banner on that
compile-time switch, saying: You SHOULD (capital letters and reference
to RFC 2119) not enable this option:

(Continue reading)

Philip Hazel | 12 Jan 2007 10:29
Picon
Picon

Re: Old topic again: Option to avoid fsync()?

On Thu, 11 Jan 2007, Michael Haardt wrote:

> Whoever wonders what Exim 5 could contain to justify a new major version:
> A queue storage API like INN has for articles would be my ultimate
> favourite and definitively THE feature to start closing the performance
> gap to some commercial MTAs.  Well, one can dream of having the best of
> both worlds.

There are several things that Exim 5 could useful contain, but I might 
as well make it clear that it won't be my responsibility as I will be 
retired. :-) At some stage making a list might be useful. 

-- 
Philip Hazel            University of Cambridge Computing Service
Get the Exim 4 book:    http://www.uit.co.uk/exim-book

--

-- 
Daniel Tiefnig | 11 Jan 2007 19:47
Picon

Re: Old topic again: Option to avoid fsync()?

Michael Haardt wrote:
>> Maybe the no-fsync stuff should be limited to non-daemon mode 
>> operation?
> 
> I don't think the delivery process knows much about running under a 
> queue runner spawned by a daemon or by a manually started queue 
> runner or as part of direct manual delivery.

Ah, no. I just meant to include a check into exim's options parsing that
will abort on "exim -bd --no-fsync". (however --no-fsync will be called)

>> I think "exim -qff" would do the trick for Michael, (and for me) 
>> wouldn't it? Michael?
> 
> I don't use Exim queue runners for larger systems, because they do 
> not scale with a growing queue.

Hmm, so what are we talking about then? :o)

> Unfortunately, the frequent fsync() calls still impose a large 
> penalty for queue runners, even if those omit them.  Try running one 
> queue runner with fsync and the rest without, and you won't see much 
> improvement.

Well, you can of course disable regular queueruns while messing around.
The listening daemon may make some problems, but you can (re)start it
with "-odq" at least.

> Ah, the joy of "distributions".

(Continue reading)

Michael Haardt | 11 Jan 2007 22:17
Picon

Re: Old topic again: Option to avoid fsync()?

> > I don't use Exim queue runners for larger systems, because they do 
> > not scale with a growing queue.
>
> Hmm, so what are we talking about then? :o)

Exim queue runners don't deliver mails on their own, but spawn children
doing that.  Your suggestion is to use a new flag that queue runners
had to pass to those children, and of course exim had to check if that
flag had been passed by a non-admin user.  That works, but it is more
work to be sure you get it all right.

That's why I suggested a configuration file option.  Only admins can
change it and any exim process has the same, consistent view of the
configuration.

I don't use n queue runners that scan the queue in an uncoordinated
manner, thus frequently colliding with each other, but one script that
enumerates the queue once and keeps n parallel deliveries running.
In fact, n plus a few more (one delivery may trigger further deliveries).
The actual delivery process wouldn't know the difference.  You can
do nice things that way.

> Well, you can of course disable regular queueruns while messing around.
> The listening daemon may make some problems, but you can (re)start it
> with "-odq" at least.

If there is any way to still accept new messages, I do that, because
otherwise I hurt whoever wants to send them.

> > Ah, the joy of "distributions".
(Continue reading)

Daniel Tiefnig | 12 Jan 2007 10:06
Picon

Re: Old topic again: Option to avoid fsync()?

Michael Haardt wrote:
> Your suggestion is to use a new flag that queue runners had to pass
> to those children,

Not neccessarily, but that's what I thought would be most usefull.

> and of course exim had to check if that flag had been passed by a
> non-admin user.  That works, but it is more work to be sure you get
> it all right.

I now get your point.

> I don't use n queue runners that scan the queue in an uncoordinated 
> manner, thus frequently colliding with each other, but one script
> that enumerates the queue once and keeps n parallel deliveries
> running.

Sounds reasonable, maybe I should try that too on our queue server. I
didn't mind so far, as it is running fine as long as the queue stays
below, say, 100k messages.

lg,
daniel

--

-- 
Jonathan Knight | 10 Jan 2007 23:20
Picon

Re: Old topic again: Option to avoid fsync()?

Michael Haardt wrote:
> On Wed, Jan 10, 2007 at 04:26:29PM +0100, Florian Weimer wrote:
>   
>> If you want to throw money at the problem, a RAID controller with a
>> battery-backed cache is a good option as well.
>>     
>
> You completely miss the point, so let me rephrase it:  I am _not_
> talking about regular operation.  I am talking about cleaning up a mess,
> e.g. after an attack or double/triple fault that managed to kill all
> redundancy.  Additionally, exotic applications benefit from disabling
> fsync().
>
> It's not economical to run systems at 10% of their maximum performance
> just to have enough if shit happens, unless of course you just run a
> small site, where the economic disadvantage of doing so can be tolerated.
>   

Errrrrr.  I am somewhat concerned about your last statement.  I run the 
mail system for the University here, which isn't really a big site, but 
we see over a million attempts to deliver mail a day which translates 
into about 46,000 real mail messages after greylisting.

We have internal mail servers which accept email from local users and 
handle all internal communications and we have a pair of external mail 
servers which talk to the outside world.  Our mail servers are running 
at a fraction of their capacity just because bad things happen too often.

All it takes is some annoying spammer out on the internet to use one of 
our users as a fake "From" address and we will see hundreds of thousands 
(Continue reading)

Michael Haardt | 11 Jan 2007 09:27
Picon

Re: Old topic again: Option to avoid fsync()?

> Errrrrr.  I am somewhat concerned about your last statement.  I run the 
> mail system for the University here, which isn't really a big site, but 
> we see over a million attempts to deliver mail a day which translates 
> into about 46,000 real mail messages after greylisting.

Are that the two servers multiplexing your traffic between inside and
out? 46,000 messages/day are around 30 messages/minute total average and
probably 60-90 messages/minute peak, and that's two machines in total? I
see.  I run as low as 400 messages/minute, peak being 1500/minute - on a
single node.  I know I can reach 2000-2200, if needed.  The systems for
internal delivery run at 100-200 messages/minute when operating regularly.

> I have tried to run a mail system in the way that you are trying to and 
> I'm very happy that we have the resources here to run ours with lots of 
> spare capacity because it makes my life simpler.

I sure wouldn't mind a few hundred systems more to make my life simple. ;-)
But as I said: Only small sites can afford that.

Michael

--

-- 
Michael Haardt | 9 Jan 2007 17:16
Picon

Re: Old topic again: Option to avoid fsync()?

> I try to avoid r/w contention by using full data journalling with external 
> journals, which are on physically different (and hopefully faster) HDDs. I 
> believe this is good practice for synchronous I/O like mail and NFS.

There is a bunch things you can do for _regular operation_.  But disks can
fail by getting ridiculous slow, networks can lose or destroy packets,
name servers can fail, then of course there are attacks and whatever
else... and you end up with a queue.

Disable fsync() for a couple minutes and people have their mail.  Enable
it again and everything is fine.

> For real psychopathic I-don't-care-about-my-data cases you can always use 
> tmpfs..

Right, but that may either cause swapping or I have a very limited queue.
Running the system on real disks, but without fsync(), means data will
either be written lazy, or not at all.  It's nicely in between tmpfs
and fsync().

Do you vote for or against having an option to disable fsync()?

Michael

--

-- 
Matt Bernstein | 10 Jan 2007 11:28
Picon
Favicon

Re: Old topic again: Option to avoid fsync()?

On Jan 9 Michael Haardt wrote:

> Do you vote for or against having an option to disable fsync()?

Against; I don't want Exim authors blamed for irresponsible behaviour.

Another option available to you is to LD_PRELOAD a no-op for fsync(), eg 
<http://ftp.die.net/pub/qmail-tools/libnosync.c>.

But please try the external journal trick first, and set a commit interval 
as large as you like--I use a minute or two. Your I/O will scale since 
then main volume is largely only reading and the journal volume will only 
write if you have enough RAM.

--

-- 

Gmane