Some information on the physical layout:
3 subnets in question - a general server VLAN, a '3rd party developer'
server VLAN that is heavily firewalled off from everything else, a
dedicated unmanaged switch LAN that is intended to segregate iSCSI
traffic from the rest of the network. The first two run on the same
switch fabric.
4 initiators in question - All RHEL4.4, all current packages and errata
applied, all have confirmed connectivity (ping and telnet to port 3260).
The NetApp is an IBM N5200 (rebadged NetApp 3020) running DOT 7.1.
Interface layout looks like this: e0a is connected to the general server
VLAN, e0b is connected to the iSCSI LAN, e0c and e0d are used as
crossover cable connections (which I am trying to eliminate by
implementing the iSCSI LAN). The e0a interface has iSCSI targeting
disabled, it is enabled (the default setting) on b c and d.
HostA - on the 3rd party VLAN (eth0) and the iSCSI LAN (eth1)
HostB - on the general server VLAN (eth0) and the iSCSI LAN (eth1)
HostC - on the general server VLAN (eth0) and connected to the NetApp
via a crossover (eth1)
NetApp - on the general server VLAN, the iSCSI LAN, and using 2 ports
for crossover (interface layout given above)
The story/problem (as I see it):
Was trying to hook HostA up to the NetApp to get some extra disk space
for it. Software installed clean, no special settings - just set the
DiscoveryAddress to the NetApp iSCSI LAN address in iscsi.conf, set the
initiator name, mapped the initiator name/group to a LUN, and figured
that would be it (that config worked great with the crossover
connections). Fired up the daemon, saw the connection on both ends,
iscsi-ls shows the LUN on the initiator. But, if I try to do anything
that actially looks at it from a device level (fdisk -l, cat
/proc/scsi/scsi, etc), I get a mess of 'ping timeout, failed command,
session dropped, session established' loops before it finally gives up a
few minutes later (usually gives up, sometimes it locks so hard I may as
well reboot the box). After messing with additional parameters that
looked promising (particularly the PingTimeout option) in iscsi.conf
without success, I came up with the hypothesis that maybe there was some
strange interaction because the primary network interface for this
machine is on is heavily firewalled (no outbound, period; only inbound
is on port 80 and 22).
To test this hypothesis, I set up HostB, identical packages/network
layout as HostA, and it worked great. Figured that kinda added weight
to my hypothesis above. Would deal with that later, I was happy.
The problem came in last night when I attempted to change HostC from the
crossover connection to the iSCSI LAN. Stopped services, unmounted
drives, changes network and iSCSI parameters, started things back up
again. Lo and behold, the same thing that happened to HostA was
happening to HostC. Fortunately, was able to roll this back very
easily, so nobody but myself was irritated.
Been searching around today to see if I can find information about
similar problems on the web, found a couple things that didn't help (but
were interesting reading, and I did manage to get some performance
boosts out of them), and I thought it was time that I should ask someone
who might have experience in this kind of thing before I pull out too
much hair.
I really hope that this is something simple that I'm overlooking. Log
files/packet traces/general details available if needed (don't want to
spam too many people with info that may not be applicable)...
--
Dave DeMaagd, DOL QA Engineering
dave.demaagd(a)dig.com | (818) 623-3755 | (818) 262-7958