Kirill A. Shutemov | 9 Aug 2012 11:08
Picon

[PATCH, RFC 0/9] Introduce huge zero page

From: "Kirill A. Shutemov" <kirill.shutemov <at> linux.intel.com>

During testing I noticed big (up to 2.5 times) memory consumption overhead
on some workloads (e.g. ft.A from NPB) if THP is enabled.

The main reason for that big difference is lacking zero page in THP case.
We have to allocate a real page on read page fault.

A program to demonstrate the issue:
#include <assert.h>
#include <stdlib.h>
#include <unistd.h>

#define MB 1024*1024

int main(int argc, char **argv)
{
        char *p;
        int i;

        posix_memalign((void **)&p, 2 * MB, 200 * MB);
        for (i = 0; i < 200 * MB; i+= 4096)
                assert(p[i] == 0);
        pause();
        return 0;
}

With thp-never RSS is about 400k, but with thp-always it's 200M.
After the patcheset thp-always RSS is 400k too.

(Continue reading)

Kirill A. Shutemov | 9 Aug 2012 11:08
Picon

[PATCH, RFC 4/9] thp: do_huge_pmd_wp_page(): handle huge zero page

From: "Kirill A. Shutemov" <kirill.shutemov <at> linux.intel.com>

On right access to huge zero page we alloc a new page and clear it.

In fallback path we create a new table and set pte around fault address
to the newly allocated page. All other ptes set to normal zero page.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov <at> linux.intel.com>
---
 include/linux/mm.h |    8 ++++
 mm/huge_memory.c   |  102 ++++++++++++++++++++++++++++++++++++++++++++--------
 mm/memory.c        |    7 ----
 3 files changed, 95 insertions(+), 22 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index b36d08c..c6eef63 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
 <at>  <at>  -514,6 +514,14  <at>  <at>  static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
 }
 #endif

+#ifndef my_zero_pfn
+static inline unsigned long my_zero_pfn(unsigned long addr)
+{
+	extern unsigned long zero_pfn;
+	return zero_pfn;
+}
+#endif
+
(Continue reading)

Kirill A. Shutemov | 9 Aug 2012 11:08
Picon

[PATCH, RFC 1/9] thp: huge zero page: basic preparation

From: "Kirill A. Shutemov" <kirill.shutemov <at> linux.intel.com>

For now let's allocate the page on hugepage_init(). We'll switch to lazy
allocation later.

We are not going to map the huge zero page until we can handle it
properly on all code paths.

is_huge_zero_{pfn,pmd}() functions will be used by following patches to
check whether the pfn/pmd is huge zero page.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov <at> linux.intel.com>
---
 mm/huge_memory.c |   29 +++++++++++++++++++++++++++++
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 57c4b93..88e0a7a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
 <at>  <at>  -46,6 +46,7  <at>  <at>  static unsigned int khugepaged_scan_sleep_millisecs __read_mostly = 10000;
 /* during fragmentation poll the hugepage allocator once every minute */
 static unsigned int khugepaged_alloc_sleep_millisecs __read_mostly = 60000;
 static struct task_struct *khugepaged_thread __read_mostly;
+static unsigned long huge_zero_pfn __read_mostly;
 static DEFINE_MUTEX(khugepaged_mutex);
 static DEFINE_SPINLOCK(khugepaged_mm_lock);
 static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
 <at>  <at>  -167,6 +168,28  <at>  <at>  out:
 	return err;
(Continue reading)

Kirill A. Shutemov | 9 Aug 2012 11:08
Picon

[PATCH, RFC 2/9] thp: zap_huge_pmd(): zap huge zero pmd

From: "Kirill A. Shutemov" <kirill.shutemov <at> linux.intel.com>

We don't have a real page to zap in huge zero page case. Let's just
clear pmd and remove it from tlb.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov <at> linux.intel.com>
---
 mm/huge_memory.c |   27 +++++++++++++++++----------
 1 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 88e0a7a..9dcb9e6 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
 <at>  <at>  -1071,16 +1071,23  <at>  <at>  int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		struct page *page;
 		pgtable_t pgtable;
 		pgtable = get_pmd_huge_pte(tlb->mm);
-		page = pmd_page(*pmd);
-		pmd_clear(pmd);
-		tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
-		page_remove_rmap(page);
-		VM_BUG_ON(page_mapcount(page) < 0);
-		add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
-		VM_BUG_ON(!PageHead(page));
-		tlb->mm->nr_ptes--;
-		spin_unlock(&tlb->mm->page_table_lock);
-		tlb_remove_page(tlb, page);
+		if (is_huge_zero_pmd(*pmd)) {
+			pmd_clear(pmd);
(Continue reading)

Kirill A. Shutemov | 9 Aug 2012 11:08
Picon

[PATCH, RFC 7/9] thp: implement splitting pmd for huge zero page

From: "Kirill A. Shutemov" <kirill.shutemov <at> linux.intel.com>

We can't split huge zero page itself, but we can split the pmd which
points to it.

On splitting the pmd we create a table with all ptes set to normal zero
page.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov <at> linux.intel.com>
---
 mm/huge_memory.c |   36 ++++++++++++++++++++++++++++++++++++
 1 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c8948d6..4df5841 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
 <at>  <at>  -1599,6 +1599,7  <at>  <at>  int split_huge_page(struct page *page)
 	struct anon_vma *anon_vma;
 	int ret = 1;

+	BUG_ON(is_huge_zero_pfn(page_to_pfn(page)));
 	BUG_ON(!PageAnon(page));
 	anon_vma = page_lock_anon_vma(page);
 	if (!anon_vma)
 <at>  <at>  -2503,6 +2504,36  <at>  <at>  static int khugepaged(void *none)
 	return 0;
 }

+static void __split_huge_zero_page_pmd(struct mm_struct *mm, pmd_t *pmd,
(Continue reading)

Andrea Arcangeli | 16 Aug 2012 21:27
Picon
Favicon

Re: [PATCH, RFC 7/9] thp: implement splitting pmd for huge zero page

On Thu, Aug 09, 2012 at 12:08:18PM +0300, Kirill A. Shutemov wrote:
> +static void __split_huge_zero_page_pmd(struct mm_struct *mm, pmd_t *pmd,
> +		unsigned long address)
> +{
> +	pgtable_t pgtable;
> +	pmd_t _pmd;
> +	unsigned long haddr = address & HPAGE_PMD_MASK;
> +	struct vm_area_struct *vma;
> +	int i;
> +
> +	vma = find_vma(mm, address);
> +	VM_BUG_ON(vma == NULL);

I think you can use BUG_ON here just in case but see below how I would
change it.

> +	pmdp_clear_flush_notify(vma, haddr, pmd);
> +	/* leave pmd empty until pte is filled */
> +
> +	pgtable = get_pmd_huge_pte(mm);
> +	pmd_populate(mm, &_pmd, pgtable);
> +
> +	for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> +		pte_t *pte, entry;
> +		entry = pfn_pte(my_zero_pfn(haddr), vma->vm_page_prot);
> +		entry = pte_mkspecial(entry);
> +		pte = pte_offset_map(&_pmd, haddr);
> +		VM_BUG_ON(!pte_none(*pte));
> +		set_pte_at(mm, haddr, pte, entry);
> +		pte_unmap(pte);
(Continue reading)

Kirill A. Shutemov | 17 Aug 2012 10:12
Picon

Re: [PATCH, RFC 7/9] thp: implement splitting pmd for huge zero page

On Thu, Aug 16, 2012 at 09:27:38PM +0200, Andrea Arcangeli wrote:
> On Thu, Aug 09, 2012 at 12:08:18PM +0300, Kirill A. Shutemov wrote:
> > +static void __split_huge_zero_page_pmd(struct mm_struct *mm, pmd_t *pmd,
> > +		unsigned long address)
> > +{
> > +	pgtable_t pgtable;
> > +	pmd_t _pmd;
> > +	unsigned long haddr = address & HPAGE_PMD_MASK;
> > +	struct vm_area_struct *vma;
> > +	int i;
> > +
> > +	vma = find_vma(mm, address);
> > +	VM_BUG_ON(vma == NULL);
> 
> I think you can use BUG_ON here just in case but see below how I would
> change it.
> 
> > +	pmdp_clear_flush_notify(vma, haddr, pmd);
> > +	/* leave pmd empty until pte is filled */
> > +
> > +	pgtable = get_pmd_huge_pte(mm);
> > +	pmd_populate(mm, &_pmd, pgtable);
> > +
> > +	for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> > +		pte_t *pte, entry;
> > +		entry = pfn_pte(my_zero_pfn(haddr), vma->vm_page_prot);
> > +		entry = pte_mkspecial(entry);
> > +		pte = pte_offset_map(&_pmd, haddr);
> > +		VM_BUG_ON(!pte_none(*pte));
> > +		set_pte_at(mm, haddr, pte, entry);
(Continue reading)

Andrea Arcangeli | 17 Aug 2012 18:33
Picon
Favicon

Re: [PATCH, RFC 7/9] thp: implement splitting pmd for huge zero page

On Fri, Aug 17, 2012 at 11:12:33AM +0300, Kirill A. Shutemov wrote:
> I've used do_huge_pmd_wp_page_fallback() as template for my code.
> What's difference between these two code paths?
> Why is do_huge_pmd_wp_page_fallback() safe?

Good point. do_huge_pmd_wp_page_fallback works only on the current
"mm" so it doesn't need the splitting transition, but thinking twice
the split_huge_zero_page_pmd also works only on the local "mm" because
you're not really splitting the zero page there (you're not affecting
other mm). As long as you keep holding the page_table_lock of the "mm"
that you're altering your current version is safe.

I got mistaken because I'm very used to think at split huge page as
something that cannot relay on the page_table_lock, but this is a
simpler case that isn't splitting the "page" but only the "pmd" of a
single "mm", so you can safely relay on the mm->page_table_lock :).

> Looks resonable. I'll update it in next revision.

Thanks. Of course the function parameter comments to avoid unnecessary
calls of find_vma, weren't related to the above locking issues.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont <at> kvack.org"> email <at> kvack.org </a>

Kirill A. Shutemov | 31 Aug 2012 16:06
Picon

Re: [PATCH, RFC 7/9] thp: implement splitting pmd for huge zero page

On Thu, Aug 16, 2012 at 09:27:38PM +0200, Andrea Arcangeli wrote:
> On Thu, Aug 09, 2012 at 12:08:18PM +0300, Kirill A. Shutemov wrote:
> > +	if (is_huge_zero_pmd(*pmd)) {
> > +		__split_huge_zero_page_pmd(mm, pmd, address);
> 
> This will work fine but it's a bit sad having to add "address" at
> every call, just to run a find_vma().

Hm. address is also used to calculate haddr..

It seems we need pass address anyway. I mean vma + address.

--

-- 
 Kirill A. Shutemov
Kirill A. Shutemov | 9 Aug 2012 11:08
Picon

[PATCH, RFC 3/9] thp: copy_huge_pmd(): copy huge zero page

From: "Kirill A. Shutemov" <kirill.shutemov <at> linux.intel.com>

It's easy to copy huge zero page. Just set destination pmd to huge zero
page.

It's safe to copy huge zero page since we have none yet :-p

Signed-off-by: Kirill A. Shutemov <kirill.shutemov <at> linux.intel.com>
---
 mm/huge_memory.c |   17 +++++++++++++++++
 1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9dcb9e6..a534f84 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
 <at>  <at>  -725,6 +725,18  <at>  <at>  static inline struct page *alloc_hugepage(int defrag)
 }
 #endif

+static void set_huge_zero_page(pgtable_t pgtable, struct mm_struct *mm,
+		struct vm_area_struct *vma, unsigned long haddr, pmd_t *pmd)
+{
+	pmd_t entry;
+	entry = pfn_pmd(huge_zero_pfn, vma->vm_page_prot);
+	entry = pmd_wrprotect(entry);
+	entry = pmd_mkhuge(entry);
+	set_pmd_at(mm, haddr, pmd, entry);
+	prepare_pmd_huge_pte(pgtable, mm);
+	mm->nr_ptes++;
(Continue reading)

Kirill A. Shutemov | 9 Aug 2012 11:08
Picon

[PATCH, RFC 5/9] thp: change_huge_pmd(): keep huge zero page write-protected

From: "Kirill A. Shutemov" <kirill.shutemov <at> linux.intel.com>

We want to get page fault on write attempt to huge zero page, so let's
keep it write-protected.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov <at> linux.intel.com>
---
 mm/huge_memory.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f5029d4..4001f1a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
 <at>  <at>  -1248,6 +1248,8  <at>  <at>  int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 		pmd_t entry;
 		entry = pmdp_get_and_clear(mm, addr, pmd);
 		entry = pmd_modify(entry, newprot);
+		if (is_huge_zero_pmd(entry))
+			entry = pmd_wrprotect(entry);
 		set_pmd_at(mm, addr, pmd, entry);
 		spin_unlock(&vma->vm_mm->page_table_lock);
 		ret = 1;
--

-- 
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
(Continue reading)

Kirill A. Shutemov | 9 Aug 2012 11:08
Picon

[PATCH, RFC 8/9] thp: setup huge zero page on non-write page fault

From: "Kirill A. Shutemov" <kirill.shutemov <at> linux.intel.com>

All code paths seems covered. Now we can map huge zero page on read page
fault.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov <at> linux.intel.com>
---
 mm/huge_memory.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4df5841..3a78677 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
 <at>  <at>  -750,6 +750,16  <at>  <at>  int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			return VM_FAULT_OOM;
 		if (unlikely(khugepaged_enter(vma)))
 			return VM_FAULT_OOM;
+		if (!(flags & FAULT_FLAG_WRITE)) {
+			pgtable_t pgtable;
+			pgtable = pte_alloc_one(mm, haddr);
+			if (unlikely(!pgtable))
+				goto out;
+			spin_lock(&mm->page_table_lock);
+			set_huge_zero_page(pgtable, mm, vma, haddr, pmd);
+			spin_unlock(&mm->page_table_lock);
+			return 0;
+		}
 		page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
 					  vma, haddr, numa_node_id(), 0);
(Continue reading)

Kirill A. Shutemov | 9 Aug 2012 11:08
Picon

[PATCH, RFC 6/9] thp: add address parameter to split_huge_page_pmd()

From: "Kirill A. Shutemov" <kirill.shutemov <at> linux.intel.com>

It's required to implement huge zero pmd splitting.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov <at> linux.intel.com>
---
 Documentation/vm/transhuge.txt |    4 ++--
 arch/x86/kernel/vm86_32.c      |    2 +-
 fs/proc/task_mmu.c             |    2 +-
 include/linux/huge_mm.h        |   10 ++++++----
 mm/huge_memory.c               |    5 +++--
 mm/memory.c                    |    4 ++--
 mm/mempolicy.c                 |    2 +-
 mm/mprotect.c                  |    2 +-
 mm/mremap.c                    |    3 ++-
 mm/pagewalk.c                  |    2 +-
 10 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt
index f734bb2..b1fe2ca 100644
--- a/Documentation/vm/transhuge.txt
+++ b/Documentation/vm/transhuge.txt
 <at>  <at>  -276,7 +276,7  <at>  <at>  unaffected. libhugetlbfs will also work fine as usual.
 == Graceful fallback ==

 Code walking pagetables but unware about huge pmds can simply call
-split_huge_page_pmd(mm, pmd) where the pmd is the one returned by
+split_huge_page_pmd(mm, pmd, addr) where the pmd is the one returned by
 pmd_offset. It's trivial to make the code transparent hugepage aware
 by just grepping for "pmd_offset" and adding split_huge_page_pmd where
(Continue reading)

Andrea Arcangeli | 16 Aug 2012 21:42
Picon
Favicon

Re: [PATCH, RFC 6/9] thp: add address parameter to split_huge_page_pmd()

On Thu, Aug 09, 2012 at 12:08:17PM +0300, Kirill A. Shutemov wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov <at> linux.intel.com>
> 
> It's required to implement huge zero pmd splitting.
> 

This isn't bisectable with the next one, it'd fail on wfg 0-DAY kernel
build testing backend, however this is clearly to separate this patch
from the next, to keep the size small so I don't mind.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont <at> kvack.org"> email <at> kvack.org </a>

Kirill A. Shutemov | 17 Aug 2012 09:49
Picon

Re: [PATCH, RFC 6/9] thp: add address parameter to split_huge_page_pmd()

On Thu, Aug 16, 2012 at 09:42:01PM +0200, Andrea Arcangeli wrote:
> On Thu, Aug 09, 2012 at 12:08:17PM +0300, Kirill A. Shutemov wrote:
> > From: "Kirill A. Shutemov" <kirill.shutemov <at> linux.intel.com>
> > 
> > It's required to implement huge zero pmd splitting.
> > 
> 
> This isn't bisectable with the next one, it'd fail on wfg 0-DAY kernel
> build testing backend, however this is clearly to separate this patch
> from the next, to keep the size small so I don't mind.

Hm. I don't see why it's not bisectable. It's only add a new parameter to
the function. The parameter is unused until next patch.

Actually, I've checked build bisectability with aiaiai[1].

[1] http://git.infradead.org/users/dedekind/aiaiai.git

--

-- 
 Kirill A. Shutemov
Kirill A. Shutemov | 9 Aug 2012 11:08
Picon

[PATCH, RFC 9/9] thp: lazy huge zero page allocation

From: "Kirill A. Shutemov" <kirill.shutemov <at> linux.intel.com>

Instead of allocating huge zero page on hugepage_init() we can postpone it
until first huge zero page map. It saves memory if THP is not in use.

cmpxchg() is used to avoid race on huge_zero_pfn initialization.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov <at> linux.intel.com>
---
 mm/huge_memory.c |   20 ++++++++++----------
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3a78677..6861230 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
 <at>  <at>  -168,21 +168,23  <at>  <at>  out:
 	return err;
 }

-static int init_huge_zero_page(void)
+static int init_huge_zero_pfn(void)
 {
 	struct page *hpage;
+	unsigned long pfn;

 	hpage = alloc_pages(GFP_TRANSHUGE | __GFP_ZERO, HPAGE_PMD_ORDER);
 	if (!hpage)
 		return -ENOMEM;
-
(Continue reading)

Wanpeng Li | 10 Aug 2012 05:49
Picon

Re: [PATCH, RFC 0/9] Introduce huge zero page

On Thu, Aug 09, 2012 at 12:08:11PM +0300, Kirill A. Shutemov wrote:
>From: "Kirill A. Shutemov" <kirill.shutemov <at> linux.intel.com>
>
>During testing I noticed big (up to 2.5 times) memory consumption overhead
>on some workloads (e.g. ft.A from NPB) if THP is enabled.
>
>The main reason for that big difference is lacking zero page in THP case.
>We have to allocate a real page on read page fault.
>
>A program to demonstrate the issue:
>#include <assert.h>
>#include <stdlib.h>
>#include <unistd.h>
>
>#define MB 1024*1024
>
>int main(int argc, char **argv)
>{
>        char *p;
>        int i;
>
>        posix_memalign((void **)&p, 2 * MB, 200 * MB);
>        for (i = 0; i < 200 * MB; i+= 4096)
>                assert(p[i] == 0);
>        pause();
>        return 0;
>}
>
>With thp-never RSS is about 400k, but with thp-always it's 200M.
>After the patcheset thp-always RSS is 400k too.
(Continue reading)

Andrew Morton | 16 Aug 2012 21:20

Re: [PATCH, RFC 0/9] Introduce huge zero page

On Thu,  9 Aug 2012 12:08:11 +0300
"Kirill A. Shutemov" <kirill.shutemov <at> linux.intel.com> wrote:

> During testing I noticed big (up to 2.5 times) memory consumption overhead
> on some workloads (e.g. ft.A from NPB) if THP is enabled.
> 
> The main reason for that big difference is lacking zero page in THP case.
> We have to allocate a real page on read page fault.
> 
> A program to demonstrate the issue:
> #include <assert.h>
> #include <stdlib.h>
> #include <unistd.h>
> 
> #define MB 1024*1024
> 
> int main(int argc, char **argv)
> {
>         char *p;
>         int i;
> 
>         posix_memalign((void **)&p, 2 * MB, 200 * MB);
>         for (i = 0; i < 200 * MB; i+= 4096)
>                 assert(p[i] == 0);
>         pause();
>         return 0;
> }
> 
> With thp-never RSS is about 400k, but with thp-always it's 200M.
> After the patcheset thp-always RSS is 400k too.
(Continue reading)

Andrea Arcangeli | 16 Aug 2012 21:40
Picon
Favicon

Re: [PATCH, RFC 0/9] Introduce huge zero page

Hi Andrew,

On Thu, Aug 16, 2012 at 12:20:23PM -0700, Andrew Morton wrote:
> That's a pretty big improvement for a rather fake test case.  I wonder
> how much benefit we'd see with real workloads?

The same discussion happened about the zero page in general and
there's no easy answer. I seem to recall that it was dropped at some
point and then we reintroduced the zero page later.

Most of the time it won't be worth it, it's just a few pathological
compute loads that benefits IIRC. So I'm overall positive about it
(after it's stable).

Because this is done the right way (i.e. to allocate an hugepage at
the first wp fault, and to fallback exclusively if compaction fails)
it will help much less than the 4k zero pages if the zero pages are
scattered over the address space and not contiguous (it only helps if
there are 512 of them in a row). OTOH if they're contiguous, the huge
zero pages will perform better than the 4k zero pages.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont <at> kvack.org"> email <at> kvack.org </a>

H. Peter Anvin | 17 Aug 2012 01:08
Picon

Re: [PATCH, RFC 0/9] Introduce huge zero page

On 08/16/2012 12:40 PM, Andrea Arcangeli wrote:
> Hi Andrew,
> 
> On Thu, Aug 16, 2012 at 12:20:23PM -0700, Andrew Morton wrote:
>> That's a pretty big improvement for a rather fake test case.  I wonder
>> how much benefit we'd see with real workloads?
> 
> The same discussion happened about the zero page in general and
> there's no easy answer. I seem to recall that it was dropped at some
> point and then we reintroduced the zero page later.
> 
> Most of the time it won't be worth it, it's just a few pathological
> compute loads that benefits IIRC. So I'm overall positive about it
> (after it's stable).
> 
> Because this is done the right way (i.e. to allocate an hugepage at
> the first wp fault, and to fallback exclusively if compaction fails)
> it will help much less than the 4k zero pages if the zero pages are
> scattered over the address space and not contiguous (it only helps if
> there are 512 of them in a row). OTOH if they're contiguous, the huge
> zero pages will perform better than the 4k zero pages.
> 

One thing that I asked for testing a "virtual zero page" where the same
page (or N pages for N-way page coloring) is reused across a page table.
 It would have worse TLB performance but likely *much* better cache
behavior.

	-hpa

(Continue reading)

Andi Kleen | 17 Aug 2012 01:12
Picon

Re: [PATCH, RFC 0/9] Introduce huge zero page

> Because this is done the right way (i.e. to allocate an hugepage at
> the first wp fault, and to fallback exclusively if compaction fails)
> it will help much less than the 4k zero pages if the zero pages are

The main benefit is that you have a zero page with THP enabled.
So it lowers the cost of having THP on (for workloads that benefit
from a zero page)

-Andi
--

-- 
ak <at> linux.intel.com -- Speaking for myself only

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont <at> kvack.org"> email <at> kvack.org </a>


Gmane