Jeremy Erie Phillips | 5 May 22:20 2008

Missing File Descriptors

HI List,
  On a 64bit- RHAS 4.5 box.  Randomly seeing files and directories with missing file descriptors.  Rebooted the box and ran fsck and found no errors.

The files look like:
?---------        ?    ?    ?        ? NODE.dat

What could be the cause of this?  Still trying to get access to /var/log/messages, which would provide some additional troubleshooting help; but for now... Any ideas?

Thanks,


--
Jeremy Erie Phillips
Senior Technical Support Engineer
Sensage, Inc.


<div>
<span>HI List,<br>
&nbsp;&nbsp;On a 64bit- RHAS 4.5 box. &nbsp;Randomly seeing files and directories with missing file descriptors. &nbsp;Rebooted the box and ran fsck and found no errors.<br><br>
The files look like:<br></span><span>?--------- &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;? &nbsp;&nbsp;&nbsp;? &nbsp;&nbsp;&nbsp;? &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;? NODE.dat<br><br></span><span>What could be the cause of this? &nbsp;Still trying to get access to /var/log/messages, which would provide some additional troubleshooting help; but for now... Any ideas?<br><br>
Thanks,<br></span><span><br></span><span><br>
-- <br>
Jeremy Erie Phillips<br>
Senior Technical Support Engineer<br>
Sensage, Inc.<br><br><br></span>
</div>
Neu, Timothy | 5 May 22:29 2008

RE: Missing File Descriptors

I've seen funky directory permissions cause this...  Check and make sure you have read and execute permissions to the directory.

From: nahant-list-bounces <at> redhat.com [mailto:nahant-list-bounces <at> redhat.com] On Behalf Of Jeremy Erie Phillips
Sent: Monday, May 05, 2008 3:20 PM
To: Red Hat Enterprise Linux 4 (Nahant) Discussion List
Subject: Missing File Descriptors

HI List,
  On a 64bit- RHAS 4.5 box.  Randomly seeing files and directories with missing file descriptors.  Rebooted the box and ran fsck and found no errors.

The files look like:
?---------        ?    ?    ?        ? NODE.dat

What could be the cause of this?  Still trying to get access to /var/log/messages, which would provide some additional troubleshooting help; but for now... Any ideas?

Thanks,


--
Jeremy Erie Phillips
Senior Technical Support Engineer
Sensage, Inc.


<div>
<div dir="ltr" align="left"><span class="383372820-05052008">I've seen funky directory permissions cause this...&nbsp; 
Check and make sure you have read and execute permissions to the directory. 
</span></div>
<br><div class="OutlookMessageHeader" lang="en-us" dir="ltr" align="left">
From: nahant-list-bounces <at> redhat.com 
[mailto:nahant-list-bounces <at> redhat.com] On Behalf Of Jeremy Erie 
Phillips<br>Sent: Monday, May 05, 2008 3:20 PM<br>To: Red Hat 
Enterprise Linux 4 (Nahant) Discussion List<br>Subject: Missing File 
Descriptors <br><br>
</div>
<div></div>
<span>HI List,<br>&nbsp;&nbsp;On a 64bit- RHAS 4.5 box. 
&nbsp;Randomly seeing files and directories with missing file descriptors. 
&nbsp;Rebooted the box and ran fsck and found no errors.<br><br>The files look 
like:<br></span><span>?--------- 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;? &nbsp;&nbsp;&nbsp;? 
&nbsp;&nbsp;&nbsp;? &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;? 
NODE.dat<br><br></span><span>What 
could be the cause of this? &nbsp;Still trying to get access to 
/var/log/messages, which would provide some additional troubleshooting help; but 
for now... Any ideas?<br><br>Thanks,<br></span><span><br></span><span><br>-- 
<br>Jeremy Erie Phillips<br>Senior Technical Support Engineer<br>Sensage, 
Inc.<br><br><br></span>
</div>
Collins, Kevin [Beeline] | 5 May 22:40 2008

RE: Missing File Descriptors

Just to clarify, what you are descibing are the medata for a file, not "file descriptors". File descriptors are files opened for access by a process...
 
Kevin

From: nahant-list-bounces <at> redhat.com [mailto:nahant-list-bounces <at> redhat.com] On Behalf Of Jeremy Erie Phillips
Sent: Monday, May 05, 2008 1:20 PM
To: Red Hat Enterprise Linux 4 (Nahant) Discussion List
Subject: Missing File Descriptors

HI List,
  On a 64bit- RHAS 4.5 box.  Randomly seeing files and directories with missing file descriptors.  Rebooted the box and ran fsck and found no errors.

The files look like:
?---------        ?    ?    ?        ? NODE.dat

What could be the cause of this?  Still trying to get access to /var/log/messages, which would provide some additional troubleshooting help; but for now... Any ideas?

Thanks,


--
Jeremy Erie Phillips
Senior Technical Support Engineer
Sensage, Inc.


<div>
<div dir="ltr" align="left"><span class="824323820-05052008">Just to clarify, what you are descibing are the medata for 
a file, not "file descriptors". File descriptors are files opened for access by 
a process...</span></div>
<div dir="ltr" align="left">
<span class="824323820-05052008"></span>&nbsp;</div>
<div dir="ltr" align="left"><span class="824323820-05052008">Kevin</span></div>
<br><div class="OutlookMessageHeader" lang="en-us" dir="ltr" align="left">
From: nahant-list-bounces <at> redhat.com 
[mailto:nahant-list-bounces <at> redhat.com] On Behalf Of Jeremy Erie 
Phillips<br>Sent: Monday, May 05, 2008 1:20 PM<br>To: Red Hat 
Enterprise Linux 4 (Nahant) Discussion List<br>Subject: Missing File 
Descriptors <br><br>
</div>
<div></div>
<span>HI List,<br>&nbsp;&nbsp;On a 64bit- RHAS 4.5 box. 
&nbsp;Randomly seeing files and directories with missing file descriptors. 
&nbsp;Rebooted the box and ran fsck and found no errors.<br><br>The files look 
like:<br></span><span>?--------- 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;? &nbsp;&nbsp;&nbsp;? 
&nbsp;&nbsp;&nbsp;? &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;? 
NODE.dat<br><br></span><span>What 
could be the cause of this? &nbsp;Still trying to get access to 
/var/log/messages, which would provide some additional troubleshooting help; but 
for now... Any ideas?<br><br>Thanks,<br></span><span><br></span><span><br>-- 
<br>Jeremy Erie Phillips<br>Senior Technical Support Engineer<br>Sensage, 
Inc.<br><br><br></span>
</div>
Tom Sightler | 5 May 23:36 2008

Re: Missing File Descriptors

On Mon, 2008-05-05 at 13:20 -0700, Jeremy Erie Phillips wrote:
> HI List,
>   On a 64bit- RHAS 4.5 box.  Randomly seeing files and directories
> with missing file descriptors.  Rebooted the box and ran fsck and
> found no errors.
> 
> The files look like:
> ?---------        ?    ?    ?        ? NODE.dat

When you say "the files look like:" do you mean they are all NODE.dat,
or are they more random.  If more random could you post some more
examples?

> What could be the cause of this?  Still trying to get access
> to /var/log/messages, which would provide some additional
> troubleshooting help; but for now... Any ideas?

We saw this a few years ago on some Dell hardware (I believe a 6850, but
maybe a 6650 or 2850) that was running RHEL4 64bit with Dell OpenManage
and a DRAC card.  Some strange problem with the virtual media on the
DRAC would cause the system to hang randomly and, since the
DRAC/OpenManage was configured to do a ASR, the system would reboot
automatically.  Typically, after a few of those cycles, a search of the
directories would turn up slightly corrupted files/directories here or
there, many with very strange directory entries.

Unfortunately I can't remember the eventual resolution (and it might not
matter since you might not even have a Dell).  It seems like it was
adding some command to the kernel boot line to tell it to ignore the
virtual floppy and CD in the DRAC, but I just can't remember for sure.

To fix the corruption we simply tried to identify all of the files using
standard tools (like rpm -V and comparison to older backups).  We
removed any corrupted files (in many cases having to modify attribute
flags to allow a file to be removed) and then did forced reinstalls of
RPM's that contained damaged/corrupted files and restored corrupted user
data from backups.  The damage was pretty minor in our case, and mostly
in just a few directories, more serious damage would have probably been
a full backup/reinstall/restore.

Later,
Tom

Jeremy Erie Phillips | 6 May 00:55 2008

Re: Missing File Descriptors

Thanks Tom.

Unfortunately, for me, these are HP DL380's.  I've managed to "fix" the
symptom of the problem by restoring from backups.  Well, to be precise, it
is a clustered database and there is copy of the data slice on another node.
Moving forward though, I need to find out why this happening to prevent
doing surgery on the database again.

The files/directories with fubar metadata are pretty random, although when
they pop up are they are usually within the same parent directory.  What is
random is ultimately when and where they start to pop up.  It is not
exclusive to NODE.dat, that was just an example.

For directory "/opt/application/data/Monday" there may be a collection of
directories in addition to a NODE.dat file.  You will start to see the
corrupted metadata for all files/directories in the
"/opt/application/data/Monday" directory.

I'll have a better grasp on this once I get to see /var/log/messages.  It
may a function of our application that is failing when trying to write to
particular directory, which actually corresponds to a database table.  So,
when writing to a table, that particular nodes data slice is corrupted.  I
just didn't think I would see corruption in the form of metadata issues.

I'm just theorizing here.  For all I know the corrupted metadata is system
wide and not contained to the application's data directories.

Thanks again!

On 5/5/08 2:36 PM, "Tom Sightler" <ttsig <at> tuxyturvy.com> wrote:

> On Mon, 2008-05-05 at 13:20 -0700, Jeremy Erie Phillips wrote:
>> HI List,
>>   On a 64bit- RHAS 4.5 box.  Randomly seeing files and directories
>> with missing file descriptors.  Rebooted the box and ran fsck and
>> found no errors.
>> 
>> The files look like:
>> ?---------        ?    ?    ?        ? NODE.dat
> 
> When you say "the files look like:" do you mean they are all NODE.dat,
> or are they more random.  If more random could you post some more
> examples?
> 
>> What could be the cause of this?  Still trying to get access
>> to /var/log/messages, which would provide some additional
>> troubleshooting help; but for now... Any ideas?
> 
> We saw this a few years ago on some Dell hardware (I believe a 6850, but
> maybe a 6650 or 2850) that was running RHEL4 64bit with Dell OpenManage
> and a DRAC card.  Some strange problem with the virtual media on the
> DRAC would cause the system to hang randomly and, since the
> DRAC/OpenManage was configured to do a ASR, the system would reboot
> automatically.  Typically, after a few of those cycles, a search of the
> directories would turn up slightly corrupted files/directories here or
> there, many with very strange directory entries.
> 
> Unfortunately I can't remember the eventual resolution (and it might not
> matter since you might not even have a Dell).  It seems like it was
> adding some command to the kernel boot line to tell it to ignore the
> virtual floppy and CD in the DRAC, but I just can't remember for sure.
> 
> To fix the corruption we simply tried to identify all of the files using
> standard tools (like rpm -V and comparison to older backups).  We
> removed any corrupted files (in many cases having to modify attribute
> flags to allow a file to be removed) and then did forced reinstalls of
> RPM's that contained damaged/corrupted files and restored corrupted user
> data from backups.  The damage was pretty minor in our case, and mostly
> in just a few directories, more serious damage would have probably been
> a full backup/reinstall/restore.
> 
> Later,
> Tom
> 
> 
> --
> nahant-list mailing list
> nahant-list <at> redhat.com
> https://www.redhat.com/mailman/listinfo/nahant-list

--

-- 
Jeremy Erie Phillips
Senior Technical Support Engineer
Sensage, Inc.
office: 415.808.5934

Collins, Kevin [Beeline] | 6 May 01:10 2008

RE: Missing File Descriptors

Are your disks on a SAN? I would presume so since you mention
clustering... In any case, I saw a similar thing happen once during a
Disaster Recovery exercise where the D/R hosting staff had given us SAN
storage and accidentally assigned another customer one of the same
disks! Unfortunately for us, we were mostly done rebuilding the OS and
it was one of our root volume disks. As soon as the other folks
pvcreate'd the disk, we started seeing some very strange behavior on our
running system, including wacky output in ls -l output.

Granted, this was with HP-UX not Linux and I'm pretty certain an fsck
would have found something, but it is something to consider. SAN
configuration can get pretty hairy, and its easy to screw up if you are
using a point-n-click, drag-n-drop kind of tool!

Kevin

-----Original Message-----
From: nahant-list-bounces <at> redhat.com
[mailto:nahant-list-bounces <at> redhat.com] On Behalf Of Jeremy Erie
Phillips
Sent: Monday, May 05, 2008 3:56 PM
To: Red Hat Enterprise Linux 4 (Nahant) Discussion List
Subject: Re: Missing File Descriptors

Thanks Tom.

Unfortunately, for me, these are HP DL380's.  I've managed to "fix" the
symptom of the problem by restoring from backups.  Well, to be precise,
it
is a clustered database and there is copy of the data slice on another
node.
Moving forward though, I need to find out why this happening to prevent
doing surgery on the database again.

The files/directories with fubar metadata are pretty random, although
when
they pop up are they are usually within the same parent directory.  What
is
random is ultimately when and where they start to pop up.  It is not
exclusive to NODE.dat, that was just an example.

For directory "/opt/application/data/Monday" there may be a collection
of
directories in addition to a NODE.dat file.  You will start to see the
corrupted metadata for all files/directories in the
"/opt/application/data/Monday" directory.

I'll have a better grasp on this once I get to see /var/log/messages.
It
may a function of our application that is failing when trying to write
to
particular directory, which actually corresponds to a database table.
So,
when writing to a table, that particular nodes data slice is corrupted.
I
just didn't think I would see corruption in the form of metadata issues.

I'm just theorizing here.  For all I know the corrupted metadata is
system
wide and not contained to the application's data directories.

Thanks again!

On 5/5/08 2:36 PM, "Tom Sightler" <ttsig <at> tuxyturvy.com> wrote:

> On Mon, 2008-05-05 at 13:20 -0700, Jeremy Erie Phillips wrote:
>> HI List,
>>   On a 64bit- RHAS 4.5 box.  Randomly seeing files and directories
>> with missing file descriptors.  Rebooted the box and ran fsck and
>> found no errors.
>> 
>> The files look like:
>> ?---------        ?    ?    ?        ? NODE.dat
> 
> When you say "the files look like:" do you mean they are all NODE.dat,
> or are they more random.  If more random could you post some more
> examples?
> 
>> What could be the cause of this?  Still trying to get access
>> to /var/log/messages, which would provide some additional
>> troubleshooting help; but for now... Any ideas?
> 
> We saw this a few years ago on some Dell hardware (I believe a 6850,
but
> maybe a 6650 or 2850) that was running RHEL4 64bit with Dell
OpenManage
> and a DRAC card.  Some strange problem with the virtual media on the
> DRAC would cause the system to hang randomly and, since the
> DRAC/OpenManage was configured to do a ASR, the system would reboot
> automatically.  Typically, after a few of those cycles, a search of
the
> directories would turn up slightly corrupted files/directories here or
> there, many with very strange directory entries.
> 
> Unfortunately I can't remember the eventual resolution (and it might
not
> matter since you might not even have a Dell).  It seems like it was
> adding some command to the kernel boot line to tell it to ignore the
> virtual floppy and CD in the DRAC, but I just can't remember for sure.
> 
> To fix the corruption we simply tried to identify all of the files
using
> standard tools (like rpm -V and comparison to older backups).  We
> removed any corrupted files (in many cases having to modify attribute
> flags to allow a file to be removed) and then did forced reinstalls of
> RPM's that contained damaged/corrupted files and restored corrupted
user
> data from backups.  The damage was pretty minor in our case, and
mostly
> in just a few directories, more serious damage would have probably
been
> a full backup/reinstall/restore.
> 
> Later,
> Tom
> 
> 
> --
> nahant-list mailing list
> nahant-list <at> redhat.com
> https://www.redhat.com/mailman/listinfo/nahant-list

--

-- 
Jeremy Erie Phillips
Senior Technical Support Engineer
Sensage, Inc.
office: 415.808.5934

--
nahant-list mailing list
nahant-list <at> redhat.com
https://www.redhat.com/mailman/listinfo/nahant-list

Göran Uddeborg | 15 May 14:22 2008
Picon

Missing File Descriptors

Jeremy Erie Phillips writes:
> The files look like:
> ?---------        ?    ?    ?        ? NODE.dat

Are these files/directories on an NFS partition?  There is some kind
of race or something in the NFS implementation in RHEL4 that gives
this behaviour.  I don't remember the details right now, but search
bugzilla for ?--------- and you will find a few things to read.


Gmane