Toasters,
The following is a
description of a problem that occurred over three nights last week.
Although vendors are involved in attempting to determine the cause and suggest
'recommended practise' changes there may be someone out there who has
experienced a similar issue.
What has been
observed is;
- Oracle unable to write
to a redo log file and its mirror on drive E: or F:, logged error: "O/S-Error:
(OS 64)
The
specified network name is no longer available"
- The Oracle Log Writer terminates the Oracle
instance.
- The service
"OracleService<SID>" terminates.
This has
occurred in one instance on seven Servers at the same time and on other nights
with a smaller number of Servers. Backups across the Network have been
taking place at this time and we are therefore lead to believe that this is load
related.
There are
seven Windows based Servers running SAP/Oracle. Five are Win2k/SP3, one
WIn2K/SP4, and one is NT/SP6a .
Oracle on NT
is 7.2.4
Oracle on
Win2K is 8.1.7
SAP used is 1
x 3.1i (NT), 3 x 4.6c, 3 x 3.0b
These servers
use data drives mapped off a F820 with 3TB on 3 x DS-14
shelves. Eight volumes are used with all CIFS shares in qtrees.
Logs and executables are kept on mapped drives not local disk. The systems
are attached to a Foundry (Bigiron400) via 1GB fibre. The F820
connects to the Foundry via trunked dual 1GB fibre (dual single port
cards). The Filer appears to check out OK with just uptime
messages in the logs at this time with no link-loss or any other
indication. Netdiag -v reports only on small average packet size for
a couple of hosts and a small number of retransmissions. Ifstat looks
good. The Foundry appears to check out ok with no indication of packet
loss or similar.
There are a
number of error messages in the Oracle logs. I have listed a couple of
these below. What is interesting is that at no time has any of these
machines indicated redirector problems in the event logs at the time of the
event with the only event being the termination of the Oracle
service.
Some
points.
iSCSI may be
looked at in the future. It is not an option at the
moment.
The automatic
hourly snapshots on two volumes are being removed.
All
future
snapshots will be scheduled.
At first
glance this looks like some sort of redirector issue where Oracle
is sensitive
to Mapped
drive loss. In one log Oracle places a log entry in a log on a
drive that it is
reporting the
loss of. If this is the case then why does Win2K not report
drive
loss?. Would Oracle not be responding to an SMB indication of drive loss
?.
If the
redirector is being hit and not itself at fault then what infrastructure
related
issue might
cause this?.
Any and all
comment appreciated.
Thanks,
Neil
Stichbury
Technical
Support
Gen-i
Limited
New
Zealand
Fri Feb 13
20:07:19 2004
KCF: write/open error block=0x404c
online=1
file=8
E:\ORACLE\BWP\SAPDATA2\ROLL_1\ROLL.DATA1
error=27070
txt: 'OSD-04016: Error queuing an asynchronous I/O request.
O/S-Error: (OS
64) The specified network name is no longer available.'
Fri Feb 13 20:07:19
2004
Errors in file
E:\oracle\BWP/saptrace/background\bwpLGWR.TRC:
ORA-00345: Message 345 not
found; product=RDBMS; facility=ORA
; arguments: [199]
[55]
ORA-00312: Message 312 not found; product=RDBMS; facility=ORA
;
arguments: [2] [1] [E:\ORACLE\BWP\ORIGLOGB\LOG_G12M1.DBF]
ORA-27070: Message
27070 not found; product=RDBMS; facility=ORA
OSD-04016: Error queuing
an asynchronous I/O request.
Thu
Feb 12 20:05:52 2004
LGWR:
terminating instance due to error 340
Thu
Feb 12 20:05:52 2004
KCF:
write/open error block=0xdda6 online=1
file=2
E:\ORACLE\D46\SAPDATA1\ROLL_1\ROLL.DATA1
error=27070 txt:
'OSD-04016: Error queuing an asynchronous I/O request.
O/S-Error:
(OS 64) The specified network name is no longer
available.'