Christian Gudrian | 17 Nov 11:28 2010

WOW64 bug: GetThreadContext() may return stale contents

Hello!

Just came along this bug report in Microsoft Connect:

http://tinyurl.com/2d9cl4x

It references this blog article:

http://tinyurl.com/2awzk65

Does this affect the Windows version of the collector as well?

Christian
Ivan Maidanski | 19 Nov 21:38 2010
Picon

Re: WOW64 bug: GetThreadContext() may return stale contents

Hi, Christian!

Probably yes (as ESP of the suspended threads is used to get the stack bounds).
Is any workaround suggested for this?

Regards.

Wed, 17 Nov 2010 11:28:29 +0100 Christian Gudrian <christian@...>:

> Hello!
> 
> Just came along this bug report in Microsoft Connect:
> 
> http://tinyurl.com/2d9cl4x
> 
> It references this blog article:
> 
> http://tinyurl.com/2awzk65
> 
> Does this affect the Windows version of the collector as well?
> 
> Christian
> _______________________________________________
> Gc mailing list
> Gc@...
> http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
Henning Makholm | 21 Nov 01:04 2010

Re: WOW64 bug: GetThreadContext() may return stale contents

> > Just came along this bug report in Microsoft Connect:
> > http://tinyurl.com/2d9cl4x
> > Does this affect the Windows version of the collector as well?

> Probably yes (as ESP of the suspended threads is used to get the stack
> bounds).
> Is any workaround suggested for this?

Apparently not, except for instrumenting all mutator threads with code to
save their ESP in a place where the GC can find it, whenever they call an
OS service that may be implemented using a WoW64 thunk. Which it is clearly
not even within the GC's power to do.

Alternatively, scan the entire mapped stack (which can probably be gotten
hold of somehow) rather than just the part above ESP. That would surely
cause really horrible amounts of excessive retention.

If only it were possible to *detect* that the bug had happened, it would
be much more palatable to fall back to whole-stack scanning in that case.
(It appears to be a fairly short window during which the bug can hit, so
it wouldn't be often that it did). But the linked threads do not suggest
any workable way to do that.

--

-- 
Henning Makholm
Octoshape ApS
Johannes Totz | 31 Jan 15:35 2011
Picon

Re: WOW64 bug: GetThreadContext() may return stale contents

On 21/11/2010 00:04, Henning Makholm wrote:
>>> Just came along this bug report in Microsoft Connect:
>>> http://tinyurl.com/2d9cl4x
>>> Does this affect the Windows version of the collector as well?
> 
>> Probably yes (as ESP of the suspended threads is used to get the stack
>> bounds).
>> Is any workaround suggested for this?
> 
> Apparently not, except for instrumenting all mutator threads with code to
> save their ESP in a place where the GC can find it

Just thinking out loud...

1) GC calls SuspendThread()
2) GC calls GetThreadContext() (or whatever it was called)
3) Get IP, mark page as writable
4) Write jump-instruction to GC-code at IP
5) Resume thread
6) Thread executes part of GC-code that stores ESP to known location
7) Thread restores original instruction, write-protect page
8) Thread suspends itself, and signals GC that ESP is now valid

When GC is done with its work threads resume and jump back to original
location, as if nothing ever happened.

--

-- 
https://bitbucket.org/jtotz/bdwgc
Henning Makholm | 31 Jan 20:04 2011

Re: Re: WOW64 bug: GetThreadContext() may return stale contents

> > Apparently not, except for instrumenting all mutator threads with
> > code to save their ESP in a place where the GC can find it

> Just thinking out loud...
> 1) GC calls SuspendThread()
> 2) GC calls GetThreadContext() (or whatever it was called)
> 3) Get IP, mark page as writable
> 4) Write jump-instruction to GC-code at IP
> 5) Resume thread

Um, no. If the result from GetThreadContext is stale, then *neither*
ESP *nor* EIP can be trusted.

And even if we could get a good EIP, the thread might be blocked in
the OS and not about to resume executing from that EIP anytime
soon.

(Not to speak of the fact that EIP might point to kernel code or
DLL code segments that other processes share).

--

-- 
Henning Makholm
Octoshape ApS
Zach Saw | 10 Feb 07:48 2011
Picon

Re: WOW64 bug: GetThreadContext() may return stale contents

Henning Makholm <makholm <at> ...> writes:

> If only it were possible to *detect* that the bug had happened, it would
> be much more palatable to fall back to whole-stack scanning in that case.
> (It appears to be a fairly short window during which the bug can hit, so
> it wouldn't be often that it did). But the linked threads do not suggest
> any workable way to do that.
> 

That is because there isn't one.

Microsoft would need to come up with a mechanism for us to detect that 
(which Alex from MS suggested and said would be the simplest fix of all 
other methods proposed).

Even then, he's not sure if MS would implement that workaround. From what I
understand in my lengthy discussion with him, it's likely that this won't 
even get fixed for the life time of Windows 7.
Zach Saw | 21 Nov 08:24 2010
Picon

Re: WOW64 bug: GetThreadContext() may return stale contents

> Probably yes (as ESP of the suspended threads is used to get the stack bounds).
> Is any workaround suggested for this?

No - there won't be a workaround. Its a design oversight by MSFT in their WOW64
which dates back to XP-64. The design (having translation in user mode)
conflicts with kernel APC which SuspendThread uses to halt the execution of a
thread. To truly fix it, WOW64 would need to be moved to kernel mode. Now that
it's been in the wild for about 10 years, they can't do that in fear that apps
that rely on undocumented features would break (although they have no idea if
such apps even exist).

MSFT may not fix the bug as they feel the use of Boehm GC (and other apps that
rely on this method) is limited.
Johannes Totz | 28 Dec 20:19 2010
Picon

Re: WOW64 bug: GetThreadContext() may return stale contents

Maybe a warning for the developer is appropriate?
# HG changeset patch
# User jtotz <at> ic.ac.uk
# Date 1292903539 0
# Node ID 51f8c8179ecfb377f9f4612523964e075fecf88b
# Parent  eeae26f5449fed2805489da4c08e26ff412121cc
warning message when running under wow64
wow64 has a design flaw that make thread contexts invalid thus breaking the gc

diff -r eeae26f5449f -r 51f8c8179ecf win32_threads.c
--- a/win32_threads.c	Fri Oct 22 14:58:44 2010 +0100
+++ b/win32_threads.c	Tue Dec 21 03:52:19 2010 +0000
 <at>  <at>  -2571,6 +2571,33  <at>  <at> 
 #     endif
       static int entry_count = 0;

+#ifdef GC_DEBUG
+#ifndef _WIN64
+	  if (reason == DLL_PROCESS_ATTACH)
+	  {
+			// wow64: running 32-bit-gc on 64-bit-sys is broken!
+			// code copied from msdn docs example
+			typedef BOOL (WINAPI *LPFN_ISWOW64PROCESS)(HANDLE, PBOOL);
+			LPFN_ISWOW64PROCESS fnIsWow64Process = 0;
+			fnIsWow64Process = (LPFN_ISWOW64PROCESS) GetProcAddress(GetModuleHandleA("kernel32"), "IsWow64Process");
+			if (NULL != fnIsWow64Process)
+			{
+				BOOL bIsWow64 = FALSE;
(Continue reading)

Johannes Totz | 28 Dec 20:22 2010
Picon

Re: WOW64 bug: GetThreadContext() may return stale contents

Maybe a warning for the developer is appropriate?

# HG changeset patch
# User jtotz <at> ic.ac.uk
# Date 1292903539 0
# Node ID 51f8c8179ecfb377f9f4612523964e075fecf88b
# Parent  eeae26f5449fed2805489da4c08e26ff412121cc
warning message when running under wow64
wow64 has a design flaw that make thread contexts invalid thus breaking the gc

diff -r eeae26f5449f -r 51f8c8179ecf win32_threads.c
--- a/win32_threads.c	Fri Oct 22 14:58:44 2010 +0100
+++ b/win32_threads.c	Tue Dec 21 03:52:19 2010 +0000
 <at>  <at>  -2571,6 +2571,33  <at>  <at> 
 #     endif
       static int entry_count = 0;

+#ifdef GC_DEBUG
+#ifndef _WIN64
+	  if (reason == DLL_PROCESS_ATTACH)
+	  {
+			// wow64: running 32-bit-gc on 64-bit-sys is broken!
+			// code copied from msdn docs example
+			typedef BOOL (WINAPI *LPFN_ISWOW64PROCESS)(HANDLE, PBOOL);
+			LPFN_ISWOW64PROCESS fnIsWow64Process = 0;
+			fnIsWow64Process = (LPFN_ISWOW64PROCESS) GetProcAddress(GetModuleHandleA("kernel32"), "IsWow64Process");
+			if (NULL != fnIsWow64Process)
+			{
(Continue reading)

Ivan Maidanski | 28 Dec 22:45 2010
Picon

Re: Re: WOW64 bug: GetThreadContext() may return stale contents

Hi,

Does the WoW bug affects only GC in multi-threaded support (with DllMain-based threads registration). If
not, why not to patch GC_init?

Tue, 28 Dec 2010 19:22:09 +0000 Johannes Totz <jtotz@...>:

> Maybe a warning for the developer is appropriate?
> 
> 
> _______________________________________________
> Gc mailing list
> Gc@...
> http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
BGB | 28 Dec 21:05 2010
Picon

Re: Re: WOW64 bug: GetThreadContext() may return stale contents


On 12/28/2010 12:22 PM, Johannes Totz wrote:
> Maybe a warning for the developer is appropriate?
>
or switching to full stack scanning instead on WoW64...

reduced performance is probably preferable to having a message popup 
every time an app starts, or a message can be sent to stderr or to a log 
or similar.

and, hell, MS may eventually address this issue somehow (like, say, 
making it return less stale values...).

other remote possibilities include using stack-cleaning and 
stack-bottom-estimation (where the GC can try to make an educated guess 
as to the stack bottom) to reduce retention.

or something...

Gmane