Strahinja Kustudić | 11 Aug 2012 00:15
Favicon

Index Bloat Problem

We have PostgreSQL 9.1 running on Centos 5 on two SSDs, one for indices and one for data. The database is extremely active with reads and writes. We have autovacuum enabled, but we didn't tweak it's aggressiveness. The problem is that after some time the database grows even more than 100% on the file system and most of the growth is because the indices are a few times bigger than they should be, and when this happens, the performance of the DB drops.


For example, yesterday when I checked the database size on the production server it was 30GB, and the restored dump of that database was only 17GB. The most interesting thing is that the data wasn't bloated that much, but the indices were. Some of them were a few times bigger than they should be. For example an index on the production db is 440MB, while that same index after dump/restore is 17MB, and there are many indices with that high difference. We could fix the problem if we reindex the DB, but that makes our DB go offline and it's not possible to do in the production enviroment.

Is there a way to make the autovacuum daemon more aggressive, since I'm not exactly sure how to do that in this case? Would that even help? Is there another way to remove this index bloat?

Thanks in advance,
Strahinja
hubert depesz lubaczewski | 11 Aug 2012 11:30
Gravatar

Re: Index Bloat Problem

On Sat, Aug 11, 2012 at 12:15:11AM +0200, Strahinja Kustudić wrote:
> Is there a way to make the autovacuum daemon more aggressive, since I'm not
> exactly sure how to do that in this case? Would that even help? Is there
> another way to remove this index bloat?

http://www.depesz.com/index.php/2011/07/06/bloat-happens/

Best regards,

depesz

--

-- 
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/

-
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Mark Kirkwood | 14 Aug 2012 00:52
Picon
Favicon

Re: Index Bloat Problem

On 11/08/12 10:15, Strahinja Kustudić wrote:
> We have PostgreSQL 9.1 running on Centos 5 on two SSDs, one for indices and
> one for data. The database is extremely active with reads and writes. We
> have autovacuum enabled, but we didn't tweak it's aggressiveness. The
> problem is that after some time the database grows even more than 100% on
> the file system and most of the growth is because the indices are a few
> times bigger than they should be, and when this happens, the performance of
> the DB drops.
>
> For example, yesterday when I checked the database size on the production
> server it was 30GB, and the restored dump of that database was only 17GB.
> The most interesting thing is that the data wasn't bloated that much, but
> the indices were. Some of them were a few times bigger than they should be.
> For example an index on the production db is 440MB, while that same index
> after dump/restore is 17MB, and there are many indices with that high
> difference. We could fix the problem if we reindex the DB, but that makes
> our DB go offline and it's not possible to do in the production enviroment.
>
> Is there a way to make the autovacuum daemon more aggressive, since I'm not
> exactly sure how to do that in this case? Would that even help? Is there
> another way to remove this index bloat?
>
>

Some workloads can be difficult to tame. However I would try something 
like this in postgresql.conf:

autovacuum_naptime= 10s
autovacuum_vacuum_scale_factor = 0.1

and maybe set log_autovacuum_min_duration so you see what autovacuum is 
doing.

If the above settings don't help, then you could maybe monitor growth 
and schedule regular REINDEXes on the tables concerned (at some suitably 
quiet time).

Regards

Mark

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Jeff Janes | 14 Aug 2012 06:14
Picon

Re: Index Bloat Problem

On Fri, Aug 10, 2012 at 3:15 PM, Strahinja Kustudić
<strahinjak <at> nordeus.com> wrote:
>
> For example, yesterday when I checked the database size on the production
> server it was 30GB, and the restored dump of that database was only 17GB.
> The most interesting thing is that the data wasn't bloated that much, but
> the indices were. Some of them were a few times bigger than they should be.
> For example an index on the production db is 440MB, while that same index
> after dump/restore is 17MB, and there are many indices with that high
> difference.

Could your pattern of deletions be leaving sparsely populated, but not
completely empty, index pages; which your insertions will then never
reuse because they never again insert values in that key range?

Cheers,

Jeff

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Strahinja Kustudić | 16 Aug 2012 21:57
Favicon

Re: Index Bloat Problem

Thanks for the help everyone and sorry for not replying sooner, I was on a business trip.


<at> Hubert pg_reorg looks really interesting and from the first read it looks to be a very good solution for maintenance, but for now I would rather try to slow down, or remove this bloat, so I have to do as less maintenance as possible.

<at> Mark So basically I should decrease the autovacuum nap time from 60s to 10s, reduce the scale factor from 0.2 to 0.1. log_autovacuum_min_duration is already set to 0, which means everything is logged.

<at> Jeff I'm not sure if I understand what you mean? I know that we never reuse key ranges. Could you be more clear, or give an example please.

Thanks in advance,
Strahinja



On Tue, Aug 14, 2012 at 6:14 AM, Jeff Janes <jeff.janes <at> gmail.com> wrote:
On Fri, Aug 10, 2012 at 3:15 PM, Strahinja Kustudić
<strahinjak <at> nordeus.com> wrote:
>
> For example, yesterday when I checked the database size on the production
> server it was 30GB, and the restored dump of that database was only 17GB.
> The most interesting thing is that the data wasn't bloated that much, but
> the indices were. Some of them were a few times bigger than they should be.
> For example an index on the production db is 440MB, while that same index
> after dump/restore is 17MB, and there are many indices with that high
> difference.

Could your pattern of deletions be leaving sparsely populated, but not
completely empty, index pages; which your insertions will then never
reuse because they never again insert values in that key range?

Cheers,

Jeff

Jeff Janes | 18 Aug 2012 04:33
Picon

Re: Index Bloat Problem

On Thu, Aug 16, 2012 at 12:57 PM, Strahinja Kustudić
<strahinjak <at> nordeus.com> wrote:
>
>  <at> Jeff I'm not sure if I understand what you mean? I know that we never reuse
> key ranges. Could you be more clear, or give an example please.

If an index leaf page is completely empty because every entry on it
were deleted, it will get recycled to be used in some other part of
the index.  (Eventually--it can take a while, especially if you have
long-running transactions).

But if the leaf page is only mostly empty, because only most of
entries on it were deleted, than it can never be reused, except for
entries that naturally fall into its existing key range (which will
never happen, if you never reuse key ranges)

So if you have a million records with keys 1..1000000, and do a
"delete from foo where key between 1 and 990000", then 99% of those
old index pages will become completely empty and eligible for reuse.
But if you do "delete from foo where key%100>0", then all of the pages
will become 99% empty, and none will be eligible for reuse (except the
very last one, which can still accept 1000001 and so on)

There has  been talk of allowing logically adjacent, mostly empty
pages to be merged so that one of them becomes empty, but the way
concurrent access to btree indexes was designed this is extremely hard
to do safely.

Cheers,

Jeff

--

-- 
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Greg Williamson | 18 Aug 2012 10:01
Picon
Favicon

Re: Index Bloat Problem

Thanks for this description--we have index bloat problems on a massively active (but small) database.This may help shed light on our problems.

Sorry for top-posting--challenged email reader.

Greg W.

From: Jeff Janes <jeff.janes <at> gmail.com>
To: Strahinja Kustudić <strahinjak <at> nordeus.com>
Cc: pgsql-performance <at> postgresql.org
Sent: Friday, August 17, 2012 7:33 PM
Subject: Re: [PERFORM] Index Bloat Problem

On Thu, Aug 16, 2012 at 12:57 PM, Strahinja Kustudić
<strahinjak <at> nordeus.com> wrote:
>
> <at> Jeff I'm not sure if I understand what you mean? I know that we never reuse
> key ranges. Could you be more clear, or give an example please.

If an index leaf page is completely empty because every entry on it
were deleted, it will get recycled to be used in some other part of
the index.  (Eventually--it can take a while, especially if you have
long-running transactions).

But if the leaf page is only mostly empty, because only most of
entries on it were deleted, than it can never be reused, except for
entries that naturally fall into its existing key range (which will
never happen, if you never reuse key ranges)

So if you have a million records with keys 1..1000000, and do a
"delete from foo where key between 1 and 990000", then 99% of those
old index pages will become completely empty and eligible for reuse.
But if you do "delete from foo where key%100>0", then all of the pages
will become 99% empty, and none will be eligible for reuse (except the
very last one, which can still accept 1000001 and so on)

There has  been talk of allowing logically adjacent, mostly empty
pages to be merged so that one of them becomes empty, but the way
concurrent access to btree indexes was designed this is extremely hard
to do safely.

Cheers,

Jeff


--
Sent via pgsql-performance mailing list (pgsql-performance <at> postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



Gmane