Mel Gorman | 8 Aug 2012 21:08
Picon

[RFC PATCH 0/5] Improve hugepage allocation success rates under load V2

Changelog since V1
o Dropped kswapd related patch, basically a no-op and regresses if fixed (minchan)
o Expanded changelogs a little

Allocation success rates have been far lower since 3.4 due to commit
[fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. This
commit was introduced for good reasons and it was known in advance that
the success rates would suffer but it was justified on the grounds that
the high allocation success rates were achieved by aggressive reclaim.
Success rates are expected to suffer even more in 3.6 due to commit
[7db8889a: mm: have order > 0 compaction start off where it left] which
testing has shown to severely reduce allocation success rates under load -
to 0% in one case.  There is a proposed change to that patch in this series
and it would be ideal if Jim Schutt could retest the workload that led to
commit [7db8889a: mm: have order > 0 compaction start off where it left].

This series aims to improve the allocation success rates without regressing
the benefits of commit fe2c2a10. The series is based on 3.5 and includes
the commit 7db8889a to illustrate what impact it has to success rates.

Patch 1 updates a stale comment seeing as I was in the general area.

Patch 2 updates reclaim/compaction to reclaim pages scaled on the number
	of recent failures.

Patch 3 captures suitable high-order pages freed by compaction to reduce
	races with parallel allocation requests.

Patch 4 is an upstream commit that has compaction restart free page scanning
	from an old position instead of always starting from the end of the
(Continue reading)

Mel Gorman | 8 Aug 2012 21:08
Picon

[PATCH 1/5] mm: compaction: Update comment in try_to_compact_pages

The comment about order applied when the check was
order > PAGE_ALLOC_COSTLY_ORDER which has not been the case since
[c5a73c3d: thp: use compaction for all allocation orders]. Fixing
the comment while I'm in the general area.

Signed-off-by: Mel Gorman <mgorman <at> suse.de>
Reviewed-by: Rik van Riel <riel <at> redhat.com>
Reviewed-by: Minchan Kim <minchan <at> kernel.org>
---
 mm/compaction.c |    6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index b39ede1..95ca967 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
 <at>  <at>  -759,11 +759,7  <at>  <at>  unsigned long try_to_compact_pages(struct zonelist *zonelist,
 	struct zone *zone;
 	int rc = COMPACT_SKIPPED;

-	/*
-	 * Check whether it is worth even starting compaction. The order check is
-	 * made because an assumption is made that the page allocator can satisfy
-	 * the "cheaper" orders without taking special steps
-	 */
+	/* Check if the GFP flags allow compaction */
 	if (!order || !may_enter_fs || !may_perform_io)
 		return rc;

--

-- 
(Continue reading)

Mel Gorman | 8 Aug 2012 21:08
Picon

[PATCH 2/5] mm: vmscan: Scale number of pages reclaimed by reclaim/compaction based on failures

If allocation fails after compaction then compaction may be deferred for
a number of allocation attempts. If there are subsequent failures,
compact_defer_shift is increased to defer for longer periods. This patch
uses that information to scale the number of pages reclaimed with
compact_defer_shift until allocations succeed again.

Signed-off-by: Mel Gorman <mgorman <at> suse.de>
Acked-by: Rik van Riel <riel <at> redhat.com>
---
 mm/vmscan.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 66e4310..0cb2593 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
 <at>  <at>  -1708,6 +1708,7  <at>  <at>  static inline bool should_continue_reclaim(struct lruvec *lruvec,
 {
 	unsigned long pages_for_compaction;
 	unsigned long inactive_lru_pages;
+	struct zone *zone;

 	/* If not in reclaim/compaction mode, stop */
 	if (!in_reclaim_compaction(sc))
 <at>  <at>  -1741,6 +1742,15  <at>  <at>  static inline bool should_continue_reclaim(struct lruvec *lruvec,
 	 * inactive lists are large enough, continue reclaiming
 	 */
 	pages_for_compaction = (2UL << sc->order);
+
+	/*
(Continue reading)

Mel Gorman | 8 Aug 2012 21:08
Picon

[PATCH 3/5] mm: compaction: Capture a suitable high-order page immediately when it is made available

While compaction is migrating pages to free up large contiguous blocks for
allocation it races with other allocation requests that may steal these
blocks or break them up. This patch alters direct compaction to capture a
suitable free page as soon as it becomes available to reduce this race. It
uses similar logic to split_free_page() to ensure that watermarks are
still obeyed.

Signed-off-by: Mel Gorman <mgorman <at> suse.de>
Reviewed-by: Rik van Riel <riel <at> redhat.com>
---
 include/linux/compaction.h |    4 +--
 include/linux/mm.h         |    1 +
 mm/compaction.c            |   71 +++++++++++++++++++++++++++++++++++++-------
 mm/internal.h              |    1 +
 mm/page_alloc.c            |   63 +++++++++++++++++++++++++++++----------
 5 files changed, 111 insertions(+), 29 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 51a90b7..5673459 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
 <at>  <at>  -22,7 +22,7  <at>  <at>  extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
 extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
 			int order, gfp_t gfp_mask, nodemask_t *mask,
-			bool sync);
+			bool sync, struct page **page);
 extern int compact_pgdat(pg_data_t *pgdat, int order);
 extern unsigned long compaction_suitable(struct zone *zone, int order);

(Continue reading)

Minchan Kim | 9 Aug 2012 03:33

Re: [PATCH 3/5] mm: compaction: Capture a suitable high-order page immediately when it is made available

Hi Mel,

Just one questoin below.

On Wed, Aug 08, 2012 at 08:08:42PM +0100, Mel Gorman wrote:
> While compaction is migrating pages to free up large contiguous blocks for
> allocation it races with other allocation requests that may steal these
> blocks or break them up. This patch alters direct compaction to capture a
> suitable free page as soon as it becomes available to reduce this race. It
> uses similar logic to split_free_page() to ensure that watermarks are
> still obeyed.
> 
> Signed-off-by: Mel Gorman <mgorman <at> suse.de>
> Reviewed-by: Rik van Riel <riel <at> redhat.com>
> ---
>  include/linux/compaction.h |    4 +--
>  include/linux/mm.h         |    1 +
>  mm/compaction.c            |   71 +++++++++++++++++++++++++++++++++++++-------
>  mm/internal.h              |    1 +
>  mm/page_alloc.c            |   63 +++++++++++++++++++++++++++++----------
>  5 files changed, 111 insertions(+), 29 deletions(-)
> 
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index 51a90b7..5673459 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
>  <at>  <at>  -22,7 +22,7  <at>  <at>  extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
>  extern int fragmentation_index(struct zone *zone, unsigned int order);
>  extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
>  			int order, gfp_t gfp_mask, nodemask_t *mask,
(Continue reading)

Mel Gorman | 9 Aug 2012 10:11
Picon

Re: [PATCH 3/5] mm: compaction: Capture a suitable high-order page immediately when it is made available

On Thu, Aug 09, 2012 at 10:33:58AM +0900, Minchan Kim wrote:
> Hi Mel,
> 
> Just one questoin below.
> 

Sure! Your questions usually get me thinking about the right part of the
series, this series in particular :)

> > <SNIP>
> >  <at>  <at>  -708,6 +750,10  <at>  <at>  static int compact_zone(struct zone *zone, struct compact_control *cc)
> >  				goto out;
> >  			}
> >  		}
> > +
> > +		/* Capture a page now if it is a suitable size */
> 
> Why do we capture only when we migrate MIGRATE_MOVABLE type?
> If you have a reasone, it should have been added as comment.
> 

Good question and there is an answer. However, I also spotted a problem when
thinking about this more where !MIGRATE_MOVABLE allocations are forced to
do a full compaction. The simple solution would be to only set cc->page for
MIGRATE_MOVABLE but there is a better approach that I've implemented in the
patch below. It includes a comment that should answer your question. Does
this make sense to you?

diff --git a/mm/compaction.c b/mm/compaction.c
index 63af8d2..384164e 100644
(Continue reading)

Minchan Kim | 9 Aug 2012 10:41

Re: [PATCH 3/5] mm: compaction: Capture a suitable high-order page immediately when it is made available

On Thu, Aug 09, 2012 at 09:11:20AM +0100, Mel Gorman wrote:
> On Thu, Aug 09, 2012 at 10:33:58AM +0900, Minchan Kim wrote:
> > Hi Mel,
> > 
> > Just one questoin below.
> > 
> 
> Sure! Your questions usually get me thinking about the right part of the
> series, this series in particular :)
> 
> > > <SNIP>
> > >  <at>  <at>  -708,6 +750,10  <at>  <at>  static int compact_zone(struct zone *zone, struct compact_control *cc)
> > >  				goto out;
> > >  			}
> > >  		}
> > > +
> > > +		/* Capture a page now if it is a suitable size */
> > 
> > Why do we capture only when we migrate MIGRATE_MOVABLE type?
> > If you have a reasone, it should have been added as comment.
> > 
> 
> Good question and there is an answer. However, I also spotted a problem when
> thinking about this more where !MIGRATE_MOVABLE allocations are forced to
> do a full compaction. The simple solution would be to only set cc->page for
> MIGRATE_MOVABLE but there is a better approach that I've implemented in the
> patch below. It includes a comment that should answer your question. Does
> this make sense to you?

It does make sense.
(Continue reading)

Mel Gorman | 8 Aug 2012 21:08
Picon

[PATCH 4/5] mm: have order > 0 compaction start off where it left

From: Rik van Riel <riel <at> redhat.com>

This commit is already upstream as [7db8889a: mm: have order > 0 compaction
start off where it left]. It's included in this series to provide context
to the next patch as the series is based on 3.5.

Order > 0 compaction stops when enough free pages of the correct page
order have been coalesced.  When doing subsequent higher order
allocations, it is possible for compaction to be invoked many times.

However, the compaction code always starts out looking for things to
compact at the start of the zone, and for free pages to compact things to
at the end of the zone.

This can cause quadratic behaviour, with isolate_freepages starting at the
end of the zone each time, even though previous invocations of the
compaction code already filled up all free memory on that end of the zone.

This can cause isolate_freepages to take enormous amounts of CPU with
certain workloads on larger memory systems.

The obvious solution is to have isolate_freepages remember where it left
off last time, and continue at that point the next time it gets invoked
for an order > 0 compaction.  This could cause compaction to fail if
cc->free_pfn and cc->migrate_pfn are close together initially, in that
case we restart from the end of the zone and try once more.

Forced full (order == -1) compactions are left alone.

[akpm <at> linux-foundation.org: checkpatch fixes]
(Continue reading)

Mel Gorman | 8 Aug 2012 21:08
Picon

[PATCH 5/5] mm: have order > 0 compaction start near a pageblock with free pages

commit [7db8889a: mm: have order > 0 compaction start off where it left]
introduced a caching mechanism to reduce the amount work the free page
scanner does in compaction. However, it has a problem. Consider two process
simultaneously scanning free pages

				    			C
Process A		M     S     			F
		|---------------------------------------|
Process B		M 	FS

C is zone->compact_cached_free_pfn
S is cc->start_pfree_pfn
M is cc->migrate_pfn
F is cc->free_pfn

In this diagram, Process A has just reached its migrate scanner, wrapped
around and updated compact_cached_free_pfn accordingly.

Simultaneously, Process B finishes isolating in a block and updates
compact_cached_free_pfn again to the location of its free scanner.

Process A moves to "end_of_zone - one_pageblock" and runs this check

                if (cc->order > 0 && (!cc->wrapped ||
                                      zone->compact_cached_free_pfn >
                                      cc->start_free_pfn))
                        pfn = min(pfn, zone->compact_cached_free_pfn);

compact_cached_free_pfn is above where it started so the free scanner skips
almost the entire space it should have scanned. When there are multiple
(Continue reading)

Minchan Kim | 9 Aug 2012 02:12

Re: [PATCH 5/5] mm: have order > 0 compaction start near a pageblock with free pages

Hi Mel,

On Wed, Aug 08, 2012 at 08:08:44PM +0100, Mel Gorman wrote:
> commit [7db8889a: mm: have order > 0 compaction start off where it left]
> introduced a caching mechanism to reduce the amount work the free page
> scanner does in compaction. However, it has a problem. Consider two process
> simultaneously scanning free pages
> 
> 				    			C
> Process A		M     S     			F
> 		|---------------------------------------|
> Process B		M 	FS
> 
> C is zone->compact_cached_free_pfn
> S is cc->start_pfree_pfn
> M is cc->migrate_pfn
> F is cc->free_pfn
> 
> In this diagram, Process A has just reached its migrate scanner, wrapped
> around and updated compact_cached_free_pfn accordingly.
> 
> Simultaneously, Process B finishes isolating in a block and updates
> compact_cached_free_pfn again to the location of its free scanner.
> 
> Process A moves to "end_of_zone - one_pageblock" and runs this check
> 
>                 if (cc->order > 0 && (!cc->wrapped ||
>                                       zone->compact_cached_free_pfn >
>                                       cc->start_free_pfn))
>                         pfn = min(pfn, zone->compact_cached_free_pfn);
(Continue reading)

Mel Gorman | 9 Aug 2012 10:23
Picon

Re: [PATCH 5/5] mm: have order > 0 compaction start near a pageblock with free pages

On Thu, Aug 09, 2012 at 09:12:12AM +0900, Minchan Kim wrote:
> > <SNIP>
> > 
> > Second, it updates compact_cached_free_pfn in a more limited set of
> > circumstances.
> > 
> > If a scanner has wrapped, it updates compact_cached_free_pfn to the end
> > 	of the zone. When a wrapped scanner isolates a page, it updates
> > 	compact_cached_free_pfn to point to the highest pageblock it
> > 	can isolate pages from.
> 
> Okay until here.
> 

Great.

> > 
> > If a scanner has not wrapped when it has finished isolated pages it
> > 	checks if compact_cached_free_pfn is pointing to the end of the
> > 	zone. If so, the value is updated to point to the highest
> > 	pageblock that pages were isolated from. This value will not
> > 	be updated again until a free page scanner wraps and resets
> > 	compact_cached_free_pfn.
> 
> I tried to understand your intention of this part but unfortunately failed.
> By this part, the problem you mentioned could happen again?
> 

Potentially yes, I did say it still races in the changelog.

(Continue reading)

Minchan Kim | 9 Aug 2012 10:46

Re: [PATCH 5/5] mm: have order > 0 compaction start near a pageblock with free pages

On Thu, Aug 09, 2012 at 09:23:28AM +0100, Mel Gorman wrote:
> On Thu, Aug 09, 2012 at 09:12:12AM +0900, Minchan Kim wrote:
> > > <SNIP>
> > > 
> > > Second, it updates compact_cached_free_pfn in a more limited set of
> > > circumstances.
> > > 
> > > If a scanner has wrapped, it updates compact_cached_free_pfn to the end
> > > 	of the zone. When a wrapped scanner isolates a page, it updates
> > > 	compact_cached_free_pfn to point to the highest pageblock it
> > > 	can isolate pages from.
> > 
> > Okay until here.
> > 
> 
> Great.
> 
> > > 
> > > If a scanner has not wrapped when it has finished isolated pages it
> > > 	checks if compact_cached_free_pfn is pointing to the end of the
> > > 	zone. If so, the value is updated to point to the highest
> > > 	pageblock that pages were isolated from. This value will not
> > > 	be updated again until a free page scanner wraps and resets
> > > 	compact_cached_free_pfn.
> > 
> > I tried to understand your intention of this part but unfortunately failed.
> > By this part, the problem you mentioned could happen again?
> > 
> 
> Potentially yes, I did say it still races in the changelog.
(Continue reading)

Minchan Kim | 9 Aug 2012 10:47

Re: [PATCH 5/5] mm: have order > 0 compaction start near a pageblock with free pages

On Wed, Aug 08, 2012 at 08:08:44PM +0100, Mel Gorman wrote:
> commit [7db8889a: mm: have order > 0 compaction start off where it left]
> introduced a caching mechanism to reduce the amount work the free page
> scanner does in compaction. However, it has a problem. Consider two process
> simultaneously scanning free pages
> 
> 				    			C
> Process A		M     S     			F
> 		|---------------------------------------|
> Process B		M 	FS
> 
> C is zone->compact_cached_free_pfn
> S is cc->start_pfree_pfn
> M is cc->migrate_pfn
> F is cc->free_pfn
> 
> In this diagram, Process A has just reached its migrate scanner, wrapped
> around and updated compact_cached_free_pfn accordingly.
> 
> Simultaneously, Process B finishes isolating in a block and updates
> compact_cached_free_pfn again to the location of its free scanner.
> 
> Process A moves to "end_of_zone - one_pageblock" and runs this check
> 
>                 if (cc->order > 0 && (!cc->wrapped ||
>                                       zone->compact_cached_free_pfn >
>                                       cc->start_free_pfn))
>                         pfn = min(pfn, zone->compact_cached_free_pfn);
> 
> compact_cached_free_pfn is above where it started so the free scanner skips
(Continue reading)


Gmane