Tracy Reed | 7 Aug 11:59 2009

Retransmit issues

I currently have two AoE SANs deployed and they both have the same
problem. So I must be missing something somewhere. I originally wrote
about this last November:

http://www.mail-archive.com/aoetools-discuss-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f <at> public.gmane.org/msg00136.html

And never got the problem solved. I also did not get around to trying
the patch Ed suggested. But I have a feeling there has got to be
something I am doing wrong in the setup here. Performance didn't
really matter much on that deployment at the time although it is
becoming more important and I have just set up a second SAN with the
same issue where I really do need it to perform. I have set up AoE
SANs a few times before and got great performance. I'm not sure what
could possibly be different this time.

vblade-19 on the target side AoE v72 kernel module on the
initiator. Using mtu 9000 on all of the interfaces involved. HP
ProCurve 2810 switch with a dedicated VLAN for the AoE SAN. The switch
is set up for 9000 MTU also. The initiator says:

aoe: e0.0: setting 8704 byte data frames
aoe: e1.0: setting 8704 byte data frames

so I know it is getting the MTU right on that side. The initiator has
a vlan interface for the SAN which then goes over the bonded link.

cat /dev/etherd/err on the initiator produces lots of:

 unexpected rsp e2.0    tag=7e426f7f <at> 102e56f91 s=0024e860c18a
 d=00219b916485
(Continue reading)

Ed Cashin | 7 Aug 14:13 2009

Re: Retransmit issues

A couple of things come to mind.  One is that there was a period where
drivers could not use jumbo frames.  That was followed by a long
period where aoe drivers reported the maximum usable payload size in
`aoe-stat`, but they would never use more than 4096-byte payloads.
Starting with aoe6-49, the current period began, where the payload
reported by aoe-stat is able to be fully used.

So if you used to get good performance and now can't, it might just be
because you were only using jumbos up to a size that your network
equipment could handle well before upgrading to an aoe driver that
used even bigger jumbos.  You can test for that by using a 4200 MTU on
your initiator network interfaces.

Second, the latest drivers handle network congestion dynamically, in
response to actual network conditions, and it is normal for
retransmissions to occur as the "ideal" rate is momentarily exceeded
and then backed away from.  If you see short bursts of retransmits
every few seconds, it probably isn't anything to worry about.  A
steady, rapid stream of retransmits could indicate a problem, and
retransmits without related "unexpected responses" could also indicate
a problem, specifically network packet loss.

--

-- 
  Ed

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
(Continue reading)

Matthew Ingersoll | 8 Aug 01:52 2009
Picon

Re: Retransmit issues

> vblade-19 on the target side AoE v72 kernel module on the
> initiator. Using mtu 9000 on all of the interfaces involved. HP
> ProCurve 2810 switch with a dedicated VLAN for the AoE SAN. The switch
> is set up for 9000 MTU also. The initiator says:

I had issues with a different HP set to 9000 MTU.  The switch worked  
fine at 1500 MTU with very little retransmits and no noticeable  
throughput degradation but switching to 9000 seemed to confuse it and  
throughput went from around 90MB/s to around 1MB/s after a few seconds  
of continuous writing.  Changing out the switch to a different brand  
cleared this ( running at 9000 MTU ) and it now has normal  
retransmissions without decreased throughput. This was running a newer  
kernel 2.6.26.8, newest vblade and the aoe initiator that comes with  
the vanilla kernel.

Doing a direct connection worked fine for when I tested and like Ed  
said, it may have to do with your versions.

--
Matth Ingersoll

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july

Gmane