Steven Levine | 3 Feb 18:31
Picon
Favicon

Re: FM2 and traps

In
<201202030642.1rThxc6B93Nl34l0@...>, on
02/03/12
   at 06:42 AM, "John Small" <jsmall@...> said:

Hi John,

>1) How big must the trap dump partition be? Exactly the size of the RAM?
>Or is a little more needed?

There's no exact number for this.  The dump file consists of a header and
the physical memory image.  For a 2GiB system, the physical memory will
always less than 2GiB.  There's PCI address space and sometimes shared
video memory.

>2) I have 2GB of RAM. Isn't 2GB the maximum size for a FAT partition? So
>is it possible to set up a trap partition if you have 2GB (or more) of
>RAM?

These days there's no reason not to make the dump partition 2GiB.  If you
have more than 2GiB of RAM you need to use the dumpfs version of os2dump

 <http://home.earthlink.net/~steve53/os2diags/dumpfs.txt>

Steven

--

-- 
----------------------------------------------------------------------
"Steven Levine" <steve53@...>  eCS/Warp/DIY etc.
www.scoug.com www.ecomstation.com
(Continue reading)

Gregg Young | 3 Feb 22:26
Favicon

Re: FM2 and traps

Hi  Steven

I did a much better job of trapping. I hit the close button while the initial drive scan was in 
progress. All was fine until the semaphore released then FM2 trapped in DOSCALL1.DLL and 
subsequently trapped the kernel. The initial process dump isn't of much use to me at least 
as the ring 3 stack and the gate call are missing. Nothing in the kernel (actually this was a 
system trap in DOSCALL1 also) trap seem to give any indication as to what fm3dll.dll was 
doing when this all occurred. The trap screens are below. I can upload the trap dumps to 
netlabs if you would like to look at them. Thanks

Gregg

This is the initial doscall1.dll trap

Trap screen 2 found at address #70:9c84

P1=0000000b  P2=XXXXXXXX  P3=XXXXXXXX  P4=XXXXXXXX   
EAX=00000000  EBX=0000efc0  ECX=00000007  EDX=00000020 
ESI=01460a54  EDI=21920000   
DS=0053  DSACC=f0f3  DSLIM=ffffffff   
ES=0053  ESACC=f0f3  ESLIM=ffffffff   
FS=150b  FSACC=00f2  FSLIM=00000030 
GS=0000  GSACC=****  GSLIM=******** 
CS:EIP=ffef:0000e3e1  CSACC=00df  CSLIM=0000fdf8 
SS:ESP=0a27:0144efb4  SSACC=****  SSLIM=******** 
EBP=01460a44  FLG=0000020

ASCII found at #70:9c84-12 is c0010002 (Exception Code?) 
c0010002:  Unknown error code in 3175 trap info: 

(Continue reading)

Steven Levine | 4 Feb 00:54
Picon
Favicon

Re: FM2 and traps

In <100.a8c0050018512c4f.007@...>, on 02/03/12
   at 02:26 PM, "Gregg Young" <ygk@...> said:

Hi,

>I did a much better job of trapping. I hit the close button while the
>initial drive scan was in  progress. All was fine until the semaphore
>released then FM2 trapped in DOSCALL1.DLL and  subsequently trapped the
>kernel.

I think the doscall1 trap screen is an artifact.  The cs:eip points to a
return instruction which is typical for a thread that is expected to
return from a kernel API.

The _TKDeclareInversion + 22 trap is more interesting.  I think it implies
a timing race in the kernel.  The kernel is attempting to look at a TCB
that has been erased.  The trapping instruction is

  cmp     [eax+266h], cx

and the kernel expects the eax to point to a TCB.

>(actually this was a  system trap in DOSCALL1 also) trap seem to give any
>indication as to what fm3dll.dll was  doing when this all occurred. The
>trap screens are below. I can upload the trap dumps to  netlabs if you
>would like to look at them.

Please upload them.  The information is there, but it will take a bit of
poking to extract it.

(Continue reading)

Gregg Young | 4 Feb 20:54
Favicon

Re: FM2 and traps

Steven

>
>I think the doscall1 trap screen is an artifact.  The cs:eip points to a
>return instruction which is typical for a thread that is expected to
>return from a kernel API.

There are actually 3 traps before the system trap. Only one appeared in the popup log but 
based on the time stamps the other 2 occurred during the same event and that better 
matches the beep sequence that occured. I have uploaded them all. 
>
>FWIW, the workaround might be to delay the exit until the drive scan
>thread indicates it has finished.  This will require some plumbing and a
>global to tell drive scan to do a quick exit.  I don't recall how much of
>this we already have in place.
>

I think the globals exist and a semaphore locks out a lot of the UI during the scan. This 
caused the delay between my hitting close and the trap sequence. If I remember correctly 
the drive thread count dropped to 0 before the trap occurred. I am guessing I can reproduce 
this if you want me to confirm the thread count. Thanks

Gregg
Steven Levine | 4 Feb 23:27
Picon
Favicon

Re: FM2 and traps

In <100.981007000f8d2d4f.005@...>, on 02/04/12
   at 12:54 PM, "Gregg Young" <ygk@...> said:

Hi Gregg,

>There are actually 3 traps before the system trap. Only one appeared in
>the popup log but  based on the time stamps the other 2 occurred during
>the same event and that better  matches the beep sequence that occured. I
>have uploaded them all.  >

It appears you may have done an ascii upload.  The zip is corrupted and I
see some CrNl sequences in the binary.  Please check.

Thanks,

Steven

--

-- 
----------------------------------------------------------------------
"Steven Levine" <steve53@...>  eCS/Warp/DIY etc.
www.scoug.com www.ecomstation.com
----------------------------------------------------------------------
Gregg Young | 5 Feb 05:24
Favicon

Re: FM2 and traps

>
>It appears you may have done an ascii upload.  The zip is corrupted and I
>see some CrNl sequences in the binary.  Please check.

It was done with the netdrives FTP plugin so it shouldn't have been ASCII. I just reuploaded 
it with FTPPM set to binary. Thanks

Gregg
Steven Levine | 7 Feb 08:40
Picon
Favicon

Re: FM2 and traps

In <100.981007000f8d2d4f.005@...>, on 02/04/12
   at 12:54 PM, "Gregg Young" <ygk@...> said:

Hi Gregg,

<isaid>
It appears you may have done an ascii upload.  The zip is corrupted and I
see some CrNl sequences in the binary.
</isaid>

The refreshed upload is clean.

I looked at pdump.302 since it appears to be the first in time.  As I
suspected, we are missing some serialization code.  The trap is at
flesh.c:213

    driveflags[*pciParent->pszFileName - 'A'] |= DRIVE_RSCANNED;

and occurs because pszFileName has been set to NULL, probably when
pciParent was released.  pszFileName was not NULL before ProcessDirectory
was called.

As we have discussed before, what needs to happen is that the shutdown
logic needs tell Flesh to quit fast and it needs to wait for the threads
to go away.  Unless we already have something equivalent, we need a
fAmShuttingDown variable.  It's probably enough for Flesh() to check
fAmShuttingDown and for the WM_CLOSE logic to check the running thread
count.  While there are running threads, WM_CLOSE needs to repost the
WM_CLOSE until the thread count drops or there's some sort of timeout.

(Continue reading)


Gmane