'Tejun Heo' | 8 Jul 2012 01:46

[PATCH 1/2] Revert "cgroup: superblock can't be released with active dentries"

From 7db5b3ca0ecdb2e8fad52a4770e4e320e61c77a6 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@...>
Date: Sat, 7 Jul 2012 15:55:47 -0700

This reverts commit fa980ca87d15bb8a1317853f257a505990f3ffde.  The
commit was an attempt to fix a race condition where a cgroup hierarchy
may be unmounted with positive dentry reference on root cgroup.  While
the commit made the race condition slightly more difficult to trigger,
the race was still there and could be reliably triggered using a
different test case.

Revert the incorrect fix.  The next commit will describe the race and
fix it correctly.

Signed-off-by: Tejun Heo <tj@...>
LKML-Reference: <4FEEA5CB.8070809@...>
Reported-by: shyju pv <shyju.pv@...>
Cc: Sasha Levin <levinsasha928@...>
Acked-by: Li Zefan <lizefan@...>
---
Sorry about the delay.  Was on vacation.  Will soon push these two
patches to Linus.

(Shyju, didn't realize I was replying to private reply, resending)

Thanks.

 kernel/cgroup.c |   17 +++--------------
 1 files changed, 3 insertions(+), 14 deletions(-)

(Continue reading)

'Tejun Heo' | 8 Jul 2012 01:46

[PATCH 2/2] cgroup: fix cgroup hierarchy umount race

From 5db9a4d99b0157a513944e9a44d29c9cec2e91dc Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@...>
Date: Sat, 7 Jul 2012 16:08:18 -0700

48ddbe1946 "cgroup: make css->refcnt clearing on cgroup removal
optional" allowed a css to linger after the associated cgroup is
removed.  As a css holds a reference on the cgroup's dentry, it means
that cgroup dentries may linger for a while.

Destroying a superblock which has dentries with positive refcnts is a
critical bug and triggers BUG() in vfs code.  As each cgroup dentry
holds an s_active reference, any lingering cgroup has both its dentry
and the superblock pinned and thus preventing premature release of
superblock.

Unfortunately, after 48ddbe1946, there's a small window while
releasing a cgroup which is directly under the root of the hierarchy.
When a cgroup directory is released, vfs layer first deletes the
corresponding dentry and then invokes dput() on the parent, which may
recurse further, so when a cgroup directly below root cgroup is
released, the cgroup is first destroyed - which releases the s_active
it was holding - and then the dentry for the root cgroup is dput().

This creates a window where the root dentry's refcnt isn't zero but
superblock's s_active is.  If umount happens before or during this
window, vfs will see the root dentry with non-zero refcnt and trigger
BUG().

Before 48ddbe1946, this problem didn't exist because the last dentry
reference was guaranteed to be put synchronously from rmdir(2)
(Continue reading)

Al Viro | 14 Jul 2012 14:08
Picon

Re: [PATCH 2/2] cgroup: fix cgroup hierarchy umount race

On Sat, Jul 07, 2012 at 04:46:59PM -0700, 'Tejun Heo' wrote:
> Fix it by holding an extra superblock->s_active reference across
> dput() from css release, which is the dput() path added by 48ddbe1946
> and the only one which doesn't hold an extra s_active ref across the
> final cgroup dput().

>  <at>  <at>  -3883,8 +3883,12  <at>  <at>  static void css_dput_fn(struct work_struct *work)
>  {
>  	struct cgroup_subsys_state *css =
>  		container_of(work, struct cgroup_subsys_state, dput_work);
> +	struct dentry *dentry = css->cgroup->dentry;
> +	struct super_block *sb = dentry->d_sb;
>  
> -	dput(css->cgroup->dentry);
> +	atomic_inc(&sb->s_active);
> +	dput(dentry);
> +	deactivate_super(sb);
>  }

While we are at it, what guarantees that css->dput_work will complete before
css->cgroup or the object containing css get freed under us?
'Tejun Heo' | 16 Jul 2012 18:44

Re: [PATCH 2/2] cgroup: fix cgroup hierarchy umount race

Hello, Al.

On Sat, Jul 14, 2012 at 01:08:52PM +0100, Al Viro wrote:
> On Sat, Jul 07, 2012 at 04:46:59PM -0700, 'Tejun Heo' wrote:
> > Fix it by holding an extra superblock->s_active reference across
> > dput() from css release, which is the dput() path added by 48ddbe1946
> > and the only one which doesn't hold an extra s_active ref across the
> > final cgroup dput().
> 
> >  <at>  <at>  -3883,8 +3883,12  <at>  <at>  static void css_dput_fn(struct work_struct *work)
> >  {
> >  	struct cgroup_subsys_state *css =
> >  		container_of(work, struct cgroup_subsys_state, dput_work);
> > +	struct dentry *dentry = css->cgroup->dentry;
> > +	struct super_block *sb = dentry->d_sb;
> >  
> > -	dput(css->cgroup->dentry);
> > +	atomic_inc(&sb->s_active);
> > +	dput(dentry);
> > +	deactivate_super(sb);
> >  }
> 
> While we are at it, what guarantees that css->dput_work will complete before
> css->cgroup or the object containing css get freed under us?

css's are tied to the cgroup.  They are created on cgroup creation and
destroyed together with cgroup, which is controlled by the dentry
refcnt.  css refcnts are relaying reference to cgroup dentry refcnt.

The reason why css needs this finer grained refcnts instead of
(Continue reading)


Gmane