RE: Missing File Descriptors
Collins, Kevin [Beeline] <KCollins <at> chevron.com>
2008-05-05 23:10:34 GMT
Are your disks on a SAN? I would presume so since you mention
clustering... In any case, I saw a similar thing happen once during a
Disaster Recovery exercise where the D/R hosting staff had given us SAN
storage and accidentally assigned another customer one of the same
disks! Unfortunately for us, we were mostly done rebuilding the OS and
it was one of our root volume disks. As soon as the other folks
pvcreate'd the disk, we started seeing some very strange behavior on our
running system, including wacky output in ls -l output.
Granted, this was with HP-UX not Linux and I'm pretty certain an fsck
would have found something, but it is something to consider. SAN
configuration can get pretty hairy, and its easy to screw up if you are
using a point-n-click, drag-n-drop kind of tool!
From: nahant-list-bounces <at> redhat.com
[mailto:nahant-list-bounces <at> redhat.com] On Behalf Of Jeremy Erie
Sent: Monday, May 05, 2008 3:56 PM
To: Red Hat Enterprise Linux 4 (Nahant) Discussion List
Subject: Re: Missing File Descriptors
Unfortunately, for me, these are HP DL380's. I've managed to "fix" the
symptom of the problem by restoring from backups. Well, to be precise,
is a clustered database and there is copy of the data slice on another
Moving forward though, I need to find out why this happening to prevent
doing surgery on the database again.
The files/directories with fubar metadata are pretty random, although
they pop up are they are usually within the same parent directory. What
random is ultimately when and where they start to pop up. It is not
exclusive to NODE.dat, that was just an example.
For directory "/opt/application/data/Monday" there may be a collection
directories in addition to a NODE.dat file. You will start to see the
corrupted metadata for all files/directories in the
I'll have a better grasp on this once I get to see /var/log/messages.
may a function of our application that is failing when trying to write
particular directory, which actually corresponds to a database table.
when writing to a table, that particular nodes data slice is corrupted.
just didn't think I would see corruption in the form of metadata issues.
I'm just theorizing here. For all I know the corrupted metadata is
wide and not contained to the application's data directories.
On 5/5/08 2:36 PM, "Tom Sightler" <ttsig <at> tuxyturvy.com> wrote:
> On Mon, 2008-05-05 at 13:20 -0700, Jeremy Erie Phillips wrote:
>> HI List,
>> On a 64bit- RHAS 4.5 box. Randomly seeing files and directories
>> with missing file descriptors. Rebooted the box and ran fsck and
>> found no errors.
>> The files look like:
>> ?--------- ? ? ? ? NODE.dat
> When you say "the files look like:" do you mean they are all NODE.dat,
> or are they more random. If more random could you post some more
>> What could be the cause of this? Still trying to get access
>> to /var/log/messages, which would provide some additional
>> troubleshooting help; but for now... Any ideas?
> We saw this a few years ago on some Dell hardware (I believe a 6850,
> maybe a 6650 or 2850) that was running RHEL4 64bit with Dell
> and a DRAC card. Some strange problem with the virtual media on the
> DRAC would cause the system to hang randomly and, since the
> DRAC/OpenManage was configured to do a ASR, the system would reboot
> automatically. Typically, after a few of those cycles, a search of
> directories would turn up slightly corrupted files/directories here or
> there, many with very strange directory entries.
> Unfortunately I can't remember the eventual resolution (and it might
> matter since you might not even have a Dell). It seems like it was
> adding some command to the kernel boot line to tell it to ignore the
> virtual floppy and CD in the DRAC, but I just can't remember for sure.
> To fix the corruption we simply tried to identify all of the files
> standard tools (like rpm -V and comparison to older backups). We
> removed any corrupted files (in many cases having to modify attribute
> flags to allow a file to be removed) and then did forced reinstalls of
> RPM's that contained damaged/corrupted files and restored corrupted
> data from backups. The damage was pretty minor in our case, and
> in just a few directories, more serious damage would have probably
> a full backup/reinstall/restore.
> nahant-list mailing list
> nahant-list <at> redhat.com
Jeremy Erie Phillips
Senior Technical Support Engineer
nahant-list mailing list
nahant-list <at> redhat.com