Some information on the physical layout:
3 subnets in question - a general server VLAN, a '3rd party developer' server VLAN that is heavily firewalled off from everything else, a dedicated unmanaged switch LAN that is intended to segregate iSCSI traffic from the rest of the network. The first two run on the same switch fabric.
4 initiators in question - All RHEL4.4, all current packages and errata applied, all have confirmed connectivity (ping and telnet to port 3260).
The NetApp is an IBM N5200 (rebadged NetApp 3020) running DOT 7.1. Interface layout looks like this: e0a is connected to the general server VLAN, e0b is connected to the iSCSI LAN, e0c and e0d are used as crossover cable connections (which I am trying to eliminate by implementing the iSCSI LAN). The e0a interface has iSCSI targeting disabled, it is enabled (the default setting) on b c and d.
HostA - on the 3rd party VLAN (eth0) and the iSCSI LAN (eth1) HostB - on the general server VLAN (eth0) and the iSCSI LAN (eth1) HostC - on the general server VLAN (eth0) and connected to the NetApp via a crossover (eth1) NetApp - on the general server VLAN, the iSCSI LAN, and using 2 ports for crossover (interface layout given above)
The story/problem (as I see it):
Was trying to hook HostA up to the NetApp to get some extra disk space for it. Software installed clean, no special settings - just set the DiscoveryAddress to the NetApp iSCSI LAN address in iscsi.conf, set the initiator name, mapped the initiator name/group to a LUN, and figured that would be it (that config worked great with the crossover connections). Fired up the daemon, saw the connection on both ends, iscsi-ls shows the LUN on the initiator. But, if I try to do anything that actially looks at it from a device level (fdisk -l, cat /proc/scsi/scsi, etc), I get a mess of 'ping timeout, failed command, session dropped, session established' loops before it finally gives up a few minutes later (usually gives up, sometimes it locks so hard I may as well reboot the box). After messing with additional parameters that looked promising (particularly the PingTimeout option) in iscsi.conf without success, I came up with the hypothesis that maybe there was some strange interaction because the primary network interface for this machine is on is heavily firewalled (no outbound, period; only inbound is on port 80 and 22).
To test this hypothesis, I set up HostB, identical packages/network layout as HostA, and it worked great. Figured that kinda added weight to my hypothesis above. Would deal with that later, I was happy.
The problem came in last night when I attempted to change HostC from the crossover connection to the iSCSI LAN. Stopped services, unmounted drives, changes network and iSCSI parameters, started things back up again. Lo and behold, the same thing that happened to HostA was happening to HostC. Fortunately, was able to roll this back very easily, so nobody but myself was irritated.
Been searching around today to see if I can find information about similar problems on the web, found a couple things that didn't help (but were interesting reading, and I did manage to get some performance boosts out of them), and I thought it was time that I should ask someone who might have experience in this kind of thing before I pull out too much hair.
I really hope that this is something simple that I'm overlooking. Log files/packet traces/general details available if needed (don't want to spam too many people with info that may not be applicable)...
-- Dave DeMaagd, DOL QA Engineering dave.demaagd@dig.com | (818) 623-3755 | (818) 262-7958