Toasters,
The following is a description of a problem that occurred over three nights
last week. Although vendors are involved in attempting to determine the
cause and suggest 'recommended practise' changes there may be someone out
there who has experienced a similar issue.
What has been observed is;
- Oracle unable to write to a redo log file and its mirror on drive
E: or F:, logged error: "O/S-Error: (OS 64) The specified
network name is no longer available"
- The Oracle Log Writer terminates the Oracle instance.
- The service "OracleService<SID>" terminates.
This has occurred in one instance on seven Servers at the same time and on
other nights with a smaller number of Servers. Backups across the Network
have been taking place at this time and we are therefore lead to believe
that this is load related.
There are seven Windows based Servers running SAP/Oracle. Five are
Win2k/SP3, one WIn2K/SP4, and one is NT/SP6a .
Oracle on NT is 7.2.4
Oracle on Win2K is 8.1.7
SAP used is 1 x 3.1i (NT), 3 x 4.6c, 3 x 3.0b
These servers use data drives mapped off a F820 with 3TB on 3 x DS-14
shelves. Eight volumes are used with all CIFS shares in qtrees. Logs and
executables are kept on mapped drives not local disk. The systems are
attached to a Foundry (Bigiron400) via 1GB fibre. The F820 connects to the
Foundry via trunked dual 1GB fibre (dual single port cards). The Filer
appears to check out OK with just uptime messages in the logs at this time
with no link-loss or any other indication. Netdiag -v reports only on small
average packet size for a couple of hosts and a small number of
retransmissions. Ifstat looks good. The Foundry appears to check out ok
with no indication of packet loss or similar.
There are a number of error messages in the Oracle logs. I have listed a
couple of these below. What is interesting is that at no time has any of
these machines indicated redirector problems in the event logs at the time
of the event with the only event being the termination of the Oracle
service.
Some points.
iSCSI may be looked at in the future. It is not an option at the moment.
The automatic hourly snapshots on two volumes are being removed. All
future snapshots will be scheduled.
At first glance this looks like some sort of redirector issue where Oracle
is sensitive
to Mapped drive loss. In one log Oracle places a log entry in a log on a
drive that it is
reporting the loss of. If this is the case then why does Win2K not report
drive loss?. Would Oracle not be responding to an SMB indication of drive
loss ?.
If the redirector is being hit and not itself at fault then what
infrastructure related
issue might cause this?.
Any and all comment appreciated.
Thanks,
Neil Stichbury
Technical Support
Gen-i Limited
New Zealand
Fri Feb 13 20:07:19 2004
KCF: write/open error block=0x404c online=1
file=8 E:\ORACLE\BWP\SAPDATA2\ROLL_1\ROLL.DATA1
error=27070 txt: 'OSD-04016: Error queuing an asynchronous I/O request.
O/S-Error: (OS 64) The specified network name is no longer available.'
Fri Feb 13 20:07:19 2004
Errors in file E:\oracle\BWP/saptrace/background\bwpLGWR.TRC:
ORA-00345: Message 345 not found; product=RDBMS; facility=ORA
; arguments: [199] [55]
ORA-00312: Message 312 not found; product=RDBMS; facility=ORA
; arguments: [2] [1] [E:\ORACLE\BWP\ORIGLOGB\LOG_G12M1.DBF]
ORA-27070: Message 27070 not found; product=RDBMS; facility=ORA
OSD-04016: Error queuing an asynchronous I/O request.
Thu Feb 12 20:05:52 2004<?xml:namespace prefix = o ns =
"urn:schemas-microsoft-com:office:office" />
LGWR: terminating instance due to error 340
Thu Feb 12 20:05:52 2004
KCF: write/open error block=0xdda6 online=1
file=2 E:\ORACLE\D46\SAPDATA1\ROLL_1\ROLL.DATA1
error=27070 txt: 'OSD-04016: Error queuing an asynchronous I/O request.
O/S-Error: (OS 64) The specified network name is no longer available.'