Jim Schutt | 27 Jun 2012 23:59
Picon

excessive CPU utilization by isolate_freepages?

Hi,

I'm running into trouble with systems going unresponsive,
and perf suggests it's excessive CPU usage by isolate_freepages().
I'm currently testing 3.5-rc4, but I think this problem may have
first shown up in 3.4.  I'm only just learning how to use perf,
so I only currently have results to report for 3.5-rc4.

(FWIW I'm running the my distro version of perf; please let me know
if I need to compile the tools/perf version to match my kernel.)

The systems in question have 24 SAS drives spread across 3 HBAs,
running 24 Ceph OSD instances, one per drive.  FWIW these servers
are dual-socket Intel 5675 Xeons w/48 GB memory.  I've got ~160
Ceph Linux clients doing dd simultaneously to a Ceph file system
backed by 12 of these servers.

In the early phase of such a test, when things are running well,
here's what vmstat reports for the state of one of these servers:

2012-06-27 13:56:58.356-06:00
vmstat -w 4 16
procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
  r  b       swpd       free       buff      cache   si   so    bi    bo   in   cs  us sy  id wa st
31 15          0     287216        576   38606628    0    0     2  1158    2   14   1  3  95  0  0
27 15          0     225288        576   38583384    0    0    18 2222016 203357 134876  11 56  17 15  0
28 17          0     219256        576   38544736    0    0    11 2305932 203141 146296  11 49  23 17  0
  6 18          0     215596        576   38552872    0    0     7 2363207 215264 166502  12 45  22 20  0
22 18          0     226984        576   38596404    0    0     3 2445741 223114 179527  12 43  23 22  0
30 12          0     230844        576   38461648    0    0    14 2298537 216580 166661  12 45  23 20  0
(Continue reading)

Mark Nelson | 28 Jun 2012 00:35

Re: excessive CPU utilization by isolate_freepages?

On 06/27/2012 04:59 PM, Jim Schutt wrote:
> Hi,
>
> I'm running into trouble with systems going unresponsive,
> and perf suggests it's excessive CPU usage by isolate_freepages().
> I'm currently testing 3.5-rc4, but I think this problem may have
> first shown up in 3.4. I'm only just learning how to use perf,
> so I only currently have results to report for 3.5-rc4.
>
> (FWIW I'm running the my distro version of perf; please let me know
> if I need to compile the tools/perf version to match my kernel.)
>
> The systems in question have 24 SAS drives spread across 3 HBAs,
> running 24 Ceph OSD instances, one per drive. FWIW these servers
> are dual-socket Intel 5675 Xeons w/48 GB memory. I've got ~160
> Ceph Linux clients doing dd simultaneously to a Ceph file system
> backed by 12 of these servers.
>
> In the early phase of such a test, when things are running well,
> here's what vmstat reports for the state of one of these servers:
>
> 2012-06-27 13:56:58.356-06:00
> vmstat -w 4 16
> procs -------------------memory------------------ ---swap-- -----io----
> --system-- -----cpu-------
> r b swpd free buff cache si so bi bo in cs us sy id wa st
> 31 15 0 287216 576 38606628 0 0 2 1158 2 14 1 3 95 0 0
> 27 15 0 225288 576 38583384 0 0 18 2222016 203357 134876 11 56 17 15 0
> 28 17 0 219256 576 38544736 0 0 11 2305932 203141 146296 11 49 23 17 0
> 6 18 0 215596 576 38552872 0 0 7 2363207 215264 166502 12 45 22 20 0
(Continue reading)

Minchan Kim | 28 Jun 2012 01:59

Re: excessive CPU utilization by isolate_freepages?

On 06/28/2012 06:59 AM, Jim Schutt wrote:

> Hi,
> 
> I'm running into trouble with systems going unresponsive,
> and perf suggests it's excessive CPU usage by isolate_freepages().
> I'm currently testing 3.5-rc4, but I think this problem may have
> first shown up in 3.4.  I'm only just learning how to use perf,
> so I only currently have results to report for 3.5-rc4.
> 
> (FWIW I'm running the my distro version of perf; please let me know
> if I need to compile the tools/perf version to match my kernel.)
> 
> The systems in question have 24 SAS drives spread across 3 HBAs,
> running 24 Ceph OSD instances, one per drive.  FWIW these servers
> are dual-socket Intel 5675 Xeons w/48 GB memory.  I've got ~160
> Ceph Linux clients doing dd simultaneously to a Ceph file system
> backed by 12 of these servers.
> 
> In the early phase of such a test, when things are running well,
> here's what vmstat reports for the state of one of these servers:
> 
> 2012-06-27 13:56:58.356-06:00
> vmstat -w 4 16
> procs -------------------memory------------------ ---swap-- -----io----
> --system-- -----cpu-------
>  r  b       swpd       free       buff      cache   si   so    bi   
> bo   in   cs  us sy  id wa st
> 31 15          0     287216        576   38606628    0    0     2 
> 1158    2   14   1  3  95  0  0
(Continue reading)

Rik van Riel | 28 Jun 2012 02:28
Picon
Favicon

Re: excessive CPU utilization by isolate_freepages?

On 06/27/2012 07:59 PM, Minchan Kim wrote:

> I doubt compaction try to migrate continuously although we have no free memory.
> Could you apply this patch and retest?
>
> https://lkml.org/lkml/2012/6/21/30

Another possibility is that compaction is succeeding every time,
but since we always start scanning all the way at the beginning
and end of each zone, we waste a lot of CPU time rescanning the
same pages (that we just filled up with moved pages) to see if
any are free.

In short, due to the way compaction behaves right now,
compaction + isolate_freepages are essentially quadratic.

What we need to do is remember where we left off after a
successful compaction, so we can continue the search there
at the next invocation.

--

-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Rientjes | 28 Jun 2012 02:52
Picon
Favicon

Re: excessive CPU utilization by isolate_freepages?

On Wed, 27 Jun 2012, Rik van Riel wrote:

> > I doubt compaction try to migrate continuously although we have no free
> > memory.
> > Could you apply this patch and retest?
> > 
> > https://lkml.org/lkml/2012/6/21/30
> 

Not sure if Jim is using memcg; if not, then this won't be helpful.

> Another possibility is that compaction is succeeding every time,
> but since we always start scanning all the way at the beginning
> and end of each zone, we waste a lot of CPU time rescanning the
> same pages (that we just filled up with moved pages) to see if
> any are free.
> 
> In short, due to the way compaction behaves right now,
> compaction + isolate_freepages are essentially quadratic.
> 
> What we need to do is remember where we left off after a
> successful compaction, so we can continue the search there
> at the next invocation.
> 

So when the free and migration scanners meet and compact_finished() == 
COMPACT_CONTINUE, loop around to the start of the zone and continue until 
you reach the pfn that it was started at?  Seems appropriate.

--
(Continue reading)

Minchan Kim | 28 Jun 2012 02:58

Re: excessive CPU utilization by isolate_freepages?

On 06/28/2012 09:52 AM, David Rientjes wrote:

> On Wed, 27 Jun 2012, Rik van Riel wrote:
> 
>>> > > I doubt compaction try to migrate continuously although we have no free
>>> > > memory.
>>> > > Could you apply this patch and retest?
>>> > > 
>>> > > https://lkml.org/lkml/2012/6/21/30
>> > 
> Not sure if Jim is using memcg; if not, then this won't be helpful.
> 

It doesn't related to memcg.
if compaction_alloc can't find suitable migration target, it returns NULL.
Then, migrate_pages should be exit.

--

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Rientjes | 28 Jun 2012 03:06
Picon
Favicon

Re: excessive CPU utilization by isolate_freepages?

On Thu, 28 Jun 2012, Minchan Kim wrote:

> >>> > > https://lkml.org/lkml/2012/6/21/30
> >> > 
> > Not sure if Jim is using memcg; if not, then this won't be helpful.
> > 
> 
> 
> It doesn't related to memcg.
> if compaction_alloc can't find suitable migration target, it returns NULL.
> Then, migrate_pages should be exit.
> 

If isolate_freepages() is going to fail, then this zone should have been 
skipped when checking for compaction_suitable().  In Jim's perf output, 
compaction_suitable() returns COMPACT_CONTINUE for a transparent hugepage.  
Why is zone_watermark_ok(zone, 0 low_wmark + 1024, 0, 0) succeeding if 
isolate_freepages() is going to fail?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo <at> kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont <at> kvack.org"> email <at> kvack.org </a>

Minchan Kim | 28 Jun 2012 03:18

Re: excessive CPU utilization by isolate_freepages?

On 06/28/2012 10:06 AM, David Rientjes wrote:

> On Thu, 28 Jun 2012, Minchan Kim wrote:
> 
>>>>>>> https://lkml.org/lkml/2012/6/21/30
>>>>>
>>> Not sure if Jim is using memcg; if not, then this won't be helpful.
>>>
>>
>>
>> It doesn't related to memcg.
>> if compaction_alloc can't find suitable migration target, it returns NULL.
>> Then, migrate_pages should be exit.
>>
> 
> If isolate_freepages() is going to fail, then this zone should have been 
> skipped when checking for compaction_suitable().  In Jim's perf output, 
> compaction_suitable() returns COMPACT_CONTINUE for a transparent hugepage.  
> Why is zone_watermark_ok(zone, 0 low_wmark + 1024, 0, 0) succeeding if 
> isolate_freepages() is going to fail?
> 

zone_watermark_ok doesn't consider migratetype but suitable_migrate_target
does consider one.

> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo <at> kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont <at> kvack.org"> email <at> kvack.org </a>
(Continue reading)

Rik van Riel | 28 Jun 2012 03:13
Picon
Favicon

Re: excessive CPU utilization by isolate_freepages?

On 06/27/2012 08:52 PM, David Rientjes wrote:
> On Wed, 27 Jun 2012, Rik van Riel wrote:

>> Another possibility is that compaction is succeeding every time,
>> but since we always start scanning all the way at the beginning
>> and end of each zone, we waste a lot of CPU time rescanning the
>> same pages (that we just filled up with moved pages) to see if
>> any are free.
>>
>> In short, due to the way compaction behaves right now,
>> compaction + isolate_freepages are essentially quadratic.
>>
>> What we need to do is remember where we left off after a
>> successful compaction, so we can continue the search there
>> at the next invocation.
>>
>
> So when the free and migration scanners meet and compact_finished() ==
> COMPACT_CONTINUE, loop around to the start of the zone and continue until
> you reach the pfn that it was started at?  Seems appropriate.

Exactly.

It would entail changes to struct compact_control, where
we have to remember whether we started at the top of the
zone or not (for a full compaction, ie order==-1 we might).

For a compaction of order >0, we would remember the last
pfn where isolate_freepages isolated a page, and start
isolating below that.
(Continue reading)

Minchan Kim | 28 Jun 2012 02:55

Re: excessive CPU utilization by isolate_freepages?

On 06/28/2012 09:28 AM, Rik van Riel wrote:

> On 06/27/2012 07:59 PM, Minchan Kim wrote:
> 
>> I doubt compaction try to migrate continuously although we have no
>> free memory.
>> Could you apply this patch and retest?
>>
>> https://lkml.org/lkml/2012/6/21/30
> 
> Another possibility is that compaction is succeeding every time,
> but since we always start scanning all the way at the beginning
> and end of each zone, we waste a lot of CPU time rescanning the
> same pages (that we just filled up with moved pages) to see if
> any are free.

It does make sense.

> 
> In short, due to the way compaction behaves right now,
> compaction + isolate_freepages are essentially quadratic.
> 
> What we need to do is remember where we left off after a
> successful compaction, so we can continue the search there
> at the next invocation.
> 

Good idea.
It could enhance parallel compaction, too.
Of course, if we can't meet the goal, we need loop around from start/end of zone.
(Continue reading)

Mel Gorman | 28 Jun 2012 13:36
Picon

Re: excessive CPU utilization by isolate_freepages?

On Wed, Jun 27, 2012 at 03:59:19PM -0600, Jim Schutt wrote:
> Hi,
> 
> I'm running into trouble with systems going unresponsive,
> and perf suggests it's excessive CPU usage by isolate_freepages().
> I'm currently testing 3.5-rc4, but I think this problem may have
> first shown up in 3.4.  I'm only just learning how to use perf,
> so I only currently have results to report for 3.5-rc4.
> 

Out of curiosity, why do you think it showed up in 3.4? It's not
surprising as such if it did show up there but I'm wondering what you
are basing it on.

It's not a suprise because it's also where reclaim/compaction stopped
depending on lumpy reclaim. In the past we would have reclaimed more
pages but now rely on compaction more. It's plassible that for many
parallel compactions that there would be higher CPU usage now.

> <SNIP>
> 2012-06-27 14:00:03.219-06:00
> vmstat -w 4 16
> procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
>  r  b       swpd       free       buff      cache   si   so    bi    bo   in   cs  us sy  id wa st
> 75  1          0     566988        576   35664800    0    0     2  1355   21    3   1  4  95  0  0
> 433  1          0     964052        576   35069112    0    0     7 456359 102256 20901   2 98   0  0  0
> 547  3          0     820116        576   34893932    0    0    57 560507 114878 28115   3 96   0  0  0
> 806  2          0     606992        576   34848180    0    0   339 309668 101230 21056   2 98   0  0  0
> 708  1          0     529624        576   34708000    0    0   248 370886 101327 20062   2 97   0  0  0
> 231  5          0     504772        576   34663880    0    0   305 334824 95045 20407   2 97   1  1  0
(Continue reading)

Jim Schutt | 28 Jun 2012 17:30
Picon

Re: excessive CPU utilization by isolate_freepages?

On 06/28/2012 05:36 AM, Mel Gorman wrote:
> On Wed, Jun 27, 2012 at 03:59:19PM -0600, Jim Schutt wrote:
>> Hi,
>>
>> I'm running into trouble with systems going unresponsive,
>> and perf suggests it's excessive CPU usage by isolate_freepages().
>> I'm currently testing 3.5-rc4, but I think this problem may have
>> first shown up in 3.4.  I'm only just learning how to use perf,
>> so I only currently have results to report for 3.5-rc4.
>>
>
> Out of curiosity, why do you think it showed up in 3.4? It's not
> surprising as such if it did show up there but I'm wondering what you
> are basing it on.

If I remember correctly, when I was testing this workload on 3.4 is
when I first saw hundreds of runable threads being reported by vmstat.
At that time I couldn't reproduce quite as reliably, and I didn't
know how to get perf to give me call chains, so I didn't follow up :(

>
> It's not a suprise because it's also where reclaim/compaction stopped
> depending on lumpy reclaim. In the past we would have reclaimed more
> pages but now rely on compaction more. It's plassible that for many
> parallel compactions that there would be higher CPU usage now.
>
>> <SNIP>
>> 2012-06-27 14:00:03.219-06:00
>> vmstat -w 4 16
>> procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
(Continue reading)


Gmane