Re: Solaris vs FreeBSD question
Chris Forgeron <cforgeron <at> acsi.ca>
2011-05-20 12:37:04 GMT
From: Frank Van Damme
Sent: Friday, May 20, 2011 6:25 AM
>Op 20-05-11 01:17, Chris Forgeron schreef:
>> I ended up switching back to FreeBSD after using Solaris for some time because I was getting tired of weird
pool corruptions and the like.
>Did you ever manage to recover the data you blogged about on Sunday, February 6, 2011?
Oh yes, I didn't follow up on that. I'll have to that now.. here's the recap.
Yes, I did get most of it back, thanks to a lot of effort from George Wilson (great guy, and I'm very indebted to
him) . However, any data that was in play at the time of the fault was irreversibly damaged and couldn't be
restored. Any data that wasn't active at the time of the crash was perfectly fine, it just needed to be
copied out of the pool into a new pool. George had to mount my pool for me, as it was beyond
non-ZFS-programmer skills to mount. Unfortunately Solaris would dump after about 24 hours, requiring a
second mounting by George. It was also slower than cold molasses to copy anything in it's faulted state. If
I was getting 1 Meg/Sec, I was lucky. You can imaging that creates an issue when you're trying to evacuate a
few TB of data through a slow pipe like that.
After it dumped again, I didn't bother George for a third remounting (or I tried very half-heartedly, the
guy was already into this for a lot of time, and we all have our day jobs), and abandoned the data that was
still stranded on the faulted pool. I copied my most wanted data first, so what I abandoned was a personal
collection of movies that I could always re-rip.
I was still experimenting with ZFS at the time, so I wasn't using snapshots for backup, just conventional
image backups of the VM's that were running. Snapshots would have had a good chance of protecting my data
from the fault that I ran into.
I was originally blaming my Areca 1880 card, as I was working with Areca tech support on a more stable driver
for Solaris, and was on the 3rd revision of a driver with them. However, in the end it wasn't the Areca, as I
was very familiar with it's tricks - The Areca would hang (about once every day or two), but it wouldn't take
out the pool. After removing the Arcea and going with just LSI 2008 based controllers, I had one final fault
about 3 weeks later that corrupted another pool (luckily it was just a backup pool). At that point, the
swearing in the server room reached a peak, I booted back into FreeBSD, and haven't looked back.
Originally when I used the Areca controller with FreeBSD, I didn't have any problems for about 2 months.
I've had only small FreeBSD issues since then, nothing else has changed on my hardware. So the only claim I
can make is that in my environment, on my hardware, I've had better stability with FreeBSD.
One of the speed slow-downs with FreeBSD from my comparison tests was the O_SYNC method that ESX uses to
mount a NFS store. I edited the FreeBSD NFS source to always do a async write, regardless of the O_SYNC from
the client, and that perked FreeBSD up a lot for speed, making it fairly close to what I was getting on
Solaris. FreeBSD is now using a 4.1 NFS server by default as of the last month, and I'm just starting my
stability tests with using a new FreeBSD-9 build to see if I can run newer code. I'll do speed tests again,
and will probably make the same hack to the 4.1 NFS code to force async writes. I'll post to my blog and the
FreeBSD lists when that occurs, as it's out of scope for this list.
I do like Solaris - After some initial discomfort about the different way things were being done, I do see the
overall design and idea, and I now have a wish list of features I'd like see ported to FreeBSD. I think I'll
have a Solaris based box setup again for testing. We'll see what time allows.