Mel Gorman | 9 Aug 2012 15:49
Picon

[RFC PATCH 0/5] Improve hugepage allocation success rates under load V3

Changelog since V2
o Capture !MIGRATE_MOVABLE pages where possible
o Document the treatment of MIGRATE_MOVABLE pages while capturing
o Expand changelogs

Changelog since V1
o Dropped kswapd related patch, basically a no-op and regresses if fixed (minchan)
o Expanded changelogs a little

Allocation success rates have been far lower since 3.4 due to commit
[fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. This
commit was introduced for good reasons and it was known in advance that
the success rates would suffer but it was justified on the grounds that
the high allocation success rates were achieved by aggressive reclaim.
Success rates are expected to suffer even more in 3.6 due to commit
[7db8889a: mm: have order > 0 compaction start off where it left] which
testing has shown to severely reduce allocation success rates under load -
to 0% in one case.  There is a proposed change to that patch in this series
and it would be ideal if Jim Schutt could retest the workload that led to
commit [7db8889a: mm: have order > 0 compaction start off where it left].

This series aims to improve the allocation success rates without regressing
the benefits of commit fe2c2a10. The series is based on 3.5 and includes
the commit 7db8889a to illustrate what impact it has to success rates.

Patch 1 updates a stale comment seeing as I was in the general area.

Patch 2 updates reclaim/compaction to reclaim pages scaled on the number
	of recent failures.

(Continue reading)

Mel Gorman | 9 Aug 2012 15:49
Picon

[PATCH 1/5] mm: compaction: Update comment in try_to_compact_pages

The comment about order applied when the check was
order > PAGE_ALLOC_COSTLY_ORDER which has not been the case since
[c5a73c3d: thp: use compaction for all allocation orders]. Fixing
the comment while I'm in the general area.

Signed-off-by: Mel Gorman <mgorman <at> suse.de>
Reviewed-by: Rik van Riel <riel <at> redhat.com>
Reviewed-by: Minchan Kim <minchan <at> kernel.org>
---
 mm/compaction.c |    6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index b39ede1..95ca967 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
 <at>  <at>  -759,11 +759,7  <at>  <at>  unsigned long try_to_compact_pages(struct zonelist *zonelist,
 	struct zone *zone;
 	int rc = COMPACT_SKIPPED;

-	/*
-	 * Check whether it is worth even starting compaction. The order check is
-	 * made because an assumption is made that the page allocator can satisfy
-	 * the "cheaper" orders without taking special steps
-	 */
+	/* Check if the GFP flags allow compaction */
 	if (!order || !may_enter_fs || !may_perform_io)
 		return rc;

--

-- 
(Continue reading)

Mel Gorman | 9 Aug 2012 15:49
Picon

[PATCH 3/5] mm: compaction: Capture a suitable high-order page immediately when it is made available

While compaction is migrating pages to free up large contiguous blocks for
allocation it races with other allocation requests that may steal these
blocks or break them up. This patch alters direct compaction to capture a
suitable free page as soon as it becomes available to reduce this race. It
uses similar logic to split_free_page() to ensure that watermarks are
still obeyed.

Signed-off-by: Mel Gorman <mgorman <at> suse.de>
Reviewed-by: Rik van Riel <riel <at> redhat.com>
---
 include/linux/compaction.h |    4 +-
 include/linux/mm.h         |    1 +
 mm/compaction.c            |   88 ++++++++++++++++++++++++++++++++++++++------
 mm/internal.h              |    1 +
 mm/page_alloc.c            |   63 +++++++++++++++++++++++--------
 5 files changed, 128 insertions(+), 29 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 51a90b7..5673459 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
 <at>  <at>  -22,7 +22,7  <at>  <at>  extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
 extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
 			int order, gfp_t gfp_mask, nodemask_t *mask,
-			bool sync);
+			bool sync, struct page **page);
 extern int compact_pgdat(pg_data_t *pgdat, int order);
 extern unsigned long compaction_suitable(struct zone *zone, int order);

(Continue reading)

Minchan Kim | 10 Aug 2012 01:35

Re: [PATCH 3/5] mm: compaction: Capture a suitable high-order page immediately when it is made available

On Thu, Aug 09, 2012 at 02:49:23PM +0100, Mel Gorman wrote:
> While compaction is migrating pages to free up large contiguous blocks for
> allocation it races with other allocation requests that may steal these
> blocks or break them up. This patch alters direct compaction to capture a
> suitable free page as soon as it becomes available to reduce this race. It
> uses similar logic to split_free_page() to ensure that watermarks are
> still obeyed.
> 
> Signed-off-by: Mel Gorman <mgorman <at> suse.de>
> Reviewed-by: Rik van Riel <riel <at> redhat.com>
Reviewed-by: Minchan Kim <minchan <at> kernel.org>

--

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont <at> kvack.org"> email <at> kvack.org </a>

Mel Gorman | 9 Aug 2012 15:49
Picon

[PATCH 2/5] mm: vmscan: Scale number of pages reclaimed by reclaim/compaction based on failures

If allocation fails after compaction then compaction may be deferred for
a number of allocation attempts. If there are subsequent failures,
compact_defer_shift is increased to defer for longer periods. This patch
uses that information to scale the number of pages reclaimed with
compact_defer_shift until allocations succeed again. The rationale is
that reclaiming the normal number of pages still allowed compaction to
fail and its success depends on the number of pages. If it's failing,
reclaim more pages until it succeeds again.

Note that this is not implying that VM reclaim is not reclaiming enough
pages or that its logic is broken. try_to_free_pages() always asks for
SWAP_CLUSTER_MAX pages to be reclaimed regardless of order and that is
what it does. Direct reclaim stops normally with this check.

	if (sc->nr_reclaimed >= sc->nr_to_reclaim)
		goto out;

should_continue_reclaim delays when that check is made until a minimum number
of pages for reclaim/compaction are reclaimed. It is possible that this patch
could instead set nr_to_reclaim in try_to_free_pages() and drive it from
there but that's behaves differently and not necessarily for the better. If
driven from do_try_to_free_pages(), it is also possible that priorities
will rise. When they reach DEF_PRIORITY-2, it will also start stalling
and setting pages for immediate reclaim which is more disruptive than not
desirable in this case. That is a more wide-reaching change that could
cause another regression related to THP requests causing interactive jitter.

Signed-off-by: Mel Gorman <mgorman <at> suse.de>
Acked-by: Rik van Riel <riel <at> redhat.com>
---
(Continue reading)

Minchan Kim | 10 Aug 2012 10:49

Re: [PATCH 2/5] mm: vmscan: Scale number of pages reclaimed by reclaim/compaction based on failures

On Thu, Aug 09, 2012 at 02:49:22PM +0100, Mel Gorman wrote:
> If allocation fails after compaction then compaction may be deferred for
> a number of allocation attempts. If there are subsequent failures,
> compact_defer_shift is increased to defer for longer periods. This patch
> uses that information to scale the number of pages reclaimed with
> compact_defer_shift until allocations succeed again. The rationale is
> that reclaiming the normal number of pages still allowed compaction to
> fail and its success depends on the number of pages. If it's failing,
> reclaim more pages until it succeeds again.
> 
> Note that this is not implying that VM reclaim is not reclaiming enough
> pages or that its logic is broken. try_to_free_pages() always asks for
> SWAP_CLUSTER_MAX pages to be reclaimed regardless of order and that is
> what it does. Direct reclaim stops normally with this check.
> 
> 	if (sc->nr_reclaimed >= sc->nr_to_reclaim)
> 		goto out;
> 
> should_continue_reclaim delays when that check is made until a minimum number
> of pages for reclaim/compaction are reclaimed. It is possible that this patch
> could instead set nr_to_reclaim in try_to_free_pages() and drive it from
> there but that's behaves differently and not necessarily for the better. If
> driven from do_try_to_free_pages(), it is also possible that priorities
> will rise. When they reach DEF_PRIORITY-2, it will also start stalling
> and setting pages for immediate reclaim which is more disruptive than not
> desirable in this case. That is a more wide-reaching change that could
> cause another regression related to THP requests causing interactive jitter.
> 
> Signed-off-by: Mel Gorman <mgorman <at> suse.de>
> Acked-by: Rik van Riel <riel <at> redhat.com>
(Continue reading)

Mel Gorman | 9 Aug 2012 15:49
Picon

[PATCH 4/5] mm: have order > 0 compaction start off where it left

From: Rik van Riel <riel <at> redhat.com>

This commit is already upstream as [7db8889a: mm: have order > 0 compaction
start off where it left]. It's included in this series to provide context
to the next patch as the series is based on 3.5.

Order > 0 compaction stops when enough free pages of the correct page
order have been coalesced.  When doing subsequent higher order
allocations, it is possible for compaction to be invoked many times.

However, the compaction code always starts out looking for things to
compact at the start of the zone, and for free pages to compact things to
at the end of the zone.

This can cause quadratic behaviour, with isolate_freepages starting at the
end of the zone each time, even though previous invocations of the
compaction code already filled up all free memory on that end of the zone.

This can cause isolate_freepages to take enormous amounts of CPU with
certain workloads on larger memory systems.

The obvious solution is to have isolate_freepages remember where it left
off last time, and continue at that point the next time it gets invoked
for an order > 0 compaction.  This could cause compaction to fail if
cc->free_pfn and cc->migrate_pfn are close together initially, in that
case we restart from the end of the zone and try once more.

Forced full (order == -1) compactions are left alone.

[akpm <at> linux-foundation.org: checkpatch fixes]
(Continue reading)

Mel Gorman | 9 Aug 2012 15:49
Picon

[PATCH 5/5] mm: have order > 0 compaction start near a pageblock with free pages

commit [7db8889a: mm: have order > 0 compaction start off where it left]
introduced a caching mechanism to reduce the amount work the free page
scanner does in compaction. However, it has a problem. Consider two process
simultaneously scanning free pages

				    			C
Process A		M     S     			F
		|---------------------------------------|
Process B		M 	FS

C is zone->compact_cached_free_pfn
S is cc->start_pfree_pfn
M is cc->migrate_pfn
F is cc->free_pfn

In this diagram, Process A has just reached its migrate scanner, wrapped
around and updated compact_cached_free_pfn accordingly.

Simultaneously, Process B finishes isolating in a block and updates
compact_cached_free_pfn again to the location of its free scanner.

Process A moves to "end_of_zone - one_pageblock" and runs this check

                if (cc->order > 0 && (!cc->wrapped ||
                                      zone->compact_cached_free_pfn >
                                      cc->start_free_pfn))
                        pfn = min(pfn, zone->compact_cached_free_pfn);

compact_cached_free_pfn is above where it started so the free scanner skips
almost the entire space it should have scanned. When there are multiple
(Continue reading)

Jim Schutt | 9 Aug 2012 16:36
Picon

Re: [RFC PATCH 0/5] Improve hugepage allocation success rates under load V3

Hi Mel,

On 08/09/2012 07:49 AM, Mel Gorman wrote:
> Changelog since V2
> o Capture !MIGRATE_MOVABLE pages where possible
> o Document the treatment of MIGRATE_MOVABLE pages while capturing
> o Expand changelogs
>
> Changelog since V1
> o Dropped kswapd related patch, basically a no-op and regresses if fixed (minchan)
> o Expanded changelogs a little
>
> Allocation success rates have been far lower since 3.4 due to commit
> [fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. This
> commit was introduced for good reasons and it was known in advance that
> the success rates would suffer but it was justified on the grounds that
> the high allocation success rates were achieved by aggressive reclaim.
> Success rates are expected to suffer even more in 3.6 due to commit
> [7db8889a: mm: have order>  0 compaction start off where it left] which
> testing has shown to severely reduce allocation success rates under load -
> to 0% in one case.  There is a proposed change to that patch in this series
> and it would be ideal if Jim Schutt could retest the workload that led to
> commit [7db8889a: mm: have order>  0 compaction start off where it left].

I was successful at resolving my Ceph issue on 3.6-rc1, but ran
into some other issue that isn't immediately obvious, and prevents
me from testing your patch with 3.6-rc1.  Today I will apply your
patch series to 3.5 and test that way.

Sorry for the delay.
(Continue reading)

Mel Gorman | 9 Aug 2012 16:51
Picon

Re: [RFC PATCH 0/5] Improve hugepage allocation success rates under load V3

On Thu, Aug 09, 2012 at 08:36:12AM -0600, Jim Schutt wrote:
> Hi Mel,
> 
> On 08/09/2012 07:49 AM, Mel Gorman wrote:
> >Changelog since V2
> >o Capture !MIGRATE_MOVABLE pages where possible
> >o Document the treatment of MIGRATE_MOVABLE pages while capturing
> >o Expand changelogs
> >
> >Changelog since V1
> >o Dropped kswapd related patch, basically a no-op and regresses if fixed (minchan)
> >o Expanded changelogs a little
> >
> >Allocation success rates have been far lower since 3.4 due to commit
> >[fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. This
> >commit was introduced for good reasons and it was known in advance that
> >the success rates would suffer but it was justified on the grounds that
> >the high allocation success rates were achieved by aggressive reclaim.
> >Success rates are expected to suffer even more in 3.6 due to commit
> >[7db8889a: mm: have order>  0 compaction start off where it left] which
> >testing has shown to severely reduce allocation success rates under load -
> >to 0% in one case.  There is a proposed change to that patch in this series
> >and it would be ideal if Jim Schutt could retest the workload that led to
> >commit [7db8889a: mm: have order>  0 compaction start off where it left].
> 
> I was successful at resolving my Ceph issue on 3.6-rc1, but ran
> into some other issue that isn't immediately obvious, and prevents
> me from testing your patch with 3.6-rc1.  Today I will apply your
> patch series to 3.5 and test that way.
> 
(Continue reading)

Jim Schutt | 9 Aug 2012 20:16
Picon

Re: [RFC PATCH 0/5] Improve hugepage allocation success rates under load V3

On 08/09/2012 07:49 AM, Mel Gorman wrote:
> Changelog since V2
> o Capture !MIGRATE_MOVABLE pages where possible
> o Document the treatment of MIGRATE_MOVABLE pages while capturing
> o Expand changelogs
>
> Changelog since V1
> o Dropped kswapd related patch, basically a no-op and regresses if fixed (minchan)
> o Expanded changelogs a little
>
> Allocation success rates have been far lower since 3.4 due to commit
> [fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. This
> commit was introduced for good reasons and it was known in advance that
> the success rates would suffer but it was justified on the grounds that
> the high allocation success rates were achieved by aggressive reclaim.
> Success rates are expected to suffer even more in 3.6 due to commit
> [7db8889a: mm: have order>  0 compaction start off where it left] which
> testing has shown to severely reduce allocation success rates under load -
> to 0% in one case.  There is a proposed change to that patch in this series
> and it would be ideal if Jim Schutt could retest the workload that led to
> commit [7db8889a: mm: have order>  0 compaction start off where it left].

On my first test of this patch series on top of 3.5, I ran into an
instance of what I think is the sort of thing that patch 4/5 was
fixing.  Here's what vmstat had to say during that period:

----------

2012-08-09 11:58:04.107-06:00
vmstat -w 4 16
(Continue reading)

Mel Gorman | 9 Aug 2012 22:46
Picon

Re: [RFC PATCH 0/5] Improve hugepage allocation success rates under load V3

On Thu, Aug 09, 2012 at 12:16:35PM -0600, Jim Schutt wrote:
> On 08/09/2012 07:49 AM, Mel Gorman wrote:
> >Changelog since V2
> >o Capture !MIGRATE_MOVABLE pages where possible
> >o Document the treatment of MIGRATE_MOVABLE pages while capturing
> >o Expand changelogs
> >
> >Changelog since V1
> >o Dropped kswapd related patch, basically a no-op and regresses if fixed (minchan)
> >o Expanded changelogs a little
> >
> >Allocation success rates have been far lower since 3.4 due to commit
> >[fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. This
> >commit was introduced for good reasons and it was known in advance that
> >the success rates would suffer but it was justified on the grounds that
> >the high allocation success rates were achieved by aggressive reclaim.
> >Success rates are expected to suffer even more in 3.6 due to commit
> >[7db8889a: mm: have order>  0 compaction start off where it left] which
> >testing has shown to severely reduce allocation success rates under load -
> >to 0% in one case.  There is a proposed change to that patch in this series
> >and it would be ideal if Jim Schutt could retest the workload that led to
> >commit [7db8889a: mm: have order>  0 compaction start off where it left].
> 
> On my first test of this patch series on top of 3.5, I ran into an
> instance of what I think is the sort of thing that patch 4/5 was
> fixing.  Here's what vmstat had to say during that period:
> 
> <SNIP>

My conclusion looking at the vmstat data is that everything is looking ok
(Continue reading)

Jim Schutt | 10 Aug 2012 00:38
Picon

Re: [RFC PATCH 0/5] Improve hugepage allocation success rates under load V3

On 08/09/2012 02:46 PM, Mel Gorman wrote:
> On Thu, Aug 09, 2012 at 12:16:35PM -0600, Jim Schutt wrote:
>> On 08/09/2012 07:49 AM, Mel Gorman wrote:
>>> Changelog since V2
>>> o Capture !MIGRATE_MOVABLE pages where possible
>>> o Document the treatment of MIGRATE_MOVABLE pages while capturing
>>> o Expand changelogs
>>>
>>> Changelog since V1
>>> o Dropped kswapd related patch, basically a no-op and regresses if fixed (minchan)
>>> o Expanded changelogs a little
>>>
>>> Allocation success rates have been far lower since 3.4 due to commit
>>> [fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. This
>>> commit was introduced for good reasons and it was known in advance that
>>> the success rates would suffer but it was justified on the grounds that
>>> the high allocation success rates were achieved by aggressive reclaim.
>>> Success rates are expected to suffer even more in 3.6 due to commit
>>> [7db8889a: mm: have order>   0 compaction start off where it left] which
>>> testing has shown to severely reduce allocation success rates under load -
>>> to 0% in one case.  There is a proposed change to that patch in this series
>>> and it would be ideal if Jim Schutt could retest the workload that led to
>>> commit [7db8889a: mm: have order>   0 compaction start off where it left].
>>
>> On my first test of this patch series on top of 3.5, I ran into an
>> instance of what I think is the sort of thing that patch 4/5 was
>> fixing.  Here's what vmstat had to say during that period:
>>
>> <SNIP>
>
(Continue reading)

Mel Gorman | 10 Aug 2012 13:02
Picon

Re: [RFC PATCH 0/5] Improve hugepage allocation success rates under load V3

On Thu, Aug 09, 2012 at 04:38:24PM -0600, Jim Schutt wrote:
> >><SNIP>
> >
> >My conclusion looking at the vmstat data is that everything is looking ok
> >until system CPU usage goes through the roof. I'm assuming that's what we
> >are all still looking at.
> 
> I'm concerned about both the high CPU usage as well as the
> reduction in write-out rate, but I've been assuming the latter
> is caused by the former.
> 

Almost certainly.

> <snip>
> 
> >
> >Ok, this is an untested hack and I expect it would drop allocation success
> >rates again under load (but not as much). Can you test again and see what
> >effect, if any, it has please?
> >
> >---8<---
> >mm: compaction: back out if contended
> >
> >---
> 
> <snip>
> 
> Initial testing with this patch looks very good from
> my perspective; CPU utilization stays reasonable,
(Continue reading)

Jim Schutt | 10 Aug 2012 19:20
Picon

Re: [RFC PATCH 0/5] Improve hugepage allocation success rates under load V3

On 08/10/2012 05:02 AM, Mel Gorman wrote:
> On Thu, Aug 09, 2012 at 04:38:24PM -0600, Jim Schutt wrote:

>>>
>>> Ok, this is an untested hack and I expect it would drop allocation success
>>> rates again under load (but not as much). Can you test again and see what
>>> effect, if any, it has please?
>>>
>>> ---8<---
>>> mm: compaction: back out if contended
>>>
>>> ---
>>
>> <snip>
>>
>> Initial testing with this patch looks very good from
>> my perspective; CPU utilization stays reasonable,
>> write-out rate stays high, no signs of stress.
>> Here's an example after ~10 minutes under my test load:
>>

Hmmm, I wonder if I should have tested this patch longer,
in view of the trouble I ran into testing the new patch?
See below.

>
> Excellent, so it is contention that is the problem.
>
>> <SNIP>
>> I'll continue testing tomorrow to be sure nothing
(Continue reading)

Mel Gorman | 12 Aug 2012 22:22
Picon

Re: [RFC PATCH 0/5] Improve hugepage allocation success rates under load V3

On Fri, Aug 10, 2012 at 11:20:07AM -0600, Jim Schutt wrote:
> On 08/10/2012 05:02 AM, Mel Gorman wrote:
> >On Thu, Aug 09, 2012 at 04:38:24PM -0600, Jim Schutt wrote:
> 
> >>>
> >>>Ok, this is an untested hack and I expect it would drop allocation success
> >>>rates again under load (but not as much). Can you test again and see what
> >>>effect, if any, it has please?
> >>>
> >>>---8<---
> >>>mm: compaction: back out if contended
> >>>
> >>>---
> >>
> >><snip>
> >>
> >>Initial testing with this patch looks very good from
> >>my perspective; CPU utilization stays reasonable,
> >>write-out rate stays high, no signs of stress.
> >>Here's an example after ~10 minutes under my test load:
> >>
> 
> Hmmm, I wonder if I should have tested this patch longer,
> in view of the trouble I ran into testing the new patch?
> See below.
> 

The two patches are quite different in what they do. I think it's
unlikely they would share a common bug.

(Continue reading)

Jim Schutt | 13 Aug 2012 22:35
Picon

Re: [RFC PATCH 0/5] Improve hugepage allocation success rates under load V3

Hi Mel,

On 08/12/2012 02:22 PM, Mel Gorman wrote:

>
> I went through the patch again but only found the following which is a
> weak candidate. Still, can you retest with the following patch on top and
> CONFIG_PROVE_LOCKING set please?
>

I've gotten in several hours of testing on this patch with
no issues at all, and no output from CONFIG_PROVE_LOCKING
(I'm assuming it would show up on a serial console).  So,
it seems to me this patch has done the trick.

CPU utilization is staying under control, and write-out rate
is good.

You can add my Tested-by: as you see fit.  If you work
up any refinements and would like me to test, please
let me know.

Thanks -- Jim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont <at> kvack.org"> email <at> kvack.org </a>

(Continue reading)

Mel Gorman | 14 Aug 2012 11:23
Picon

Re: [RFC PATCH 0/5] Improve hugepage allocation success rates under load V3

On Mon, Aug 13, 2012 at 02:35:46PM -0600, Jim Schutt wrote:
> Hi Mel,
> 
> On 08/12/2012 02:22 PM, Mel Gorman wrote:
> 
> >
> >I went through the patch again but only found the following which is a
> >weak candidate. Still, can you retest with the following patch on top and
> >CONFIG_PROVE_LOCKING set please?
> >
> 
> I've gotten in several hours of testing on this patch with
> no issues at all, and no output from CONFIG_PROVE_LOCKING
> (I'm assuming it would show up on a serial console).  So,
> it seems to me this patch has done the trick.
> 

Super.

> CPU utilization is staying under control, and write-out rate
> is good.
> 

Even better.

> You can add my Tested-by: as you see fit.  If you work
> up any refinements and would like me to test, please
> let me know.
> 

(Continue reading)


Gmane