Snapdrive failing to create a flexclone but only on Sundays. Works fine every other day of the week. - toasters

27 Feb 2012


      Long story short we have a clone of our production database that we refresh and remount on a linux server every night.  This is a script that is being called from cron.  We recently converted the server that runs this script from a physical server to a vmware vm.  Since we have done that this refresh process has failed the past 3 sunday nights.  It runs every single night and it runs without errors every other night.
There's not much to the script, here's the command that creates the flex clone:
snapdrive snap connect -lun log/disk disk_$LUN -snapname filer:/vol/database_volume/log/disk:$SNAPNAME
them we mount the flex clone that is create via iSCSI.
The error that is being reported by the script is:
385  connecting filer:/vol/database_volume/log/disk:
    386     creating unrestricted volume clone filer:/vol/clone_volume ... failure
    387 0002-098 Admin error: Failed to create volume clone clone_volume on filer filer : The volume is not currently available
The error from the sd-trace.log:
173338 00:30:54 02/26/12 [f7f61b90]i,2,18,FilerZephyr::volume_clone_create: zapi filer root "" volume-clone-create parent-snapsho        t daily-2012-02-26 parent-volume database_volume space-reserve none volume planningdb_rootdb_log
 173339 00:30:54 02/26/12 [f7f61b90]i,2,18,FilerZephyr::volume_clone_create in = <volume-clone-create>
 173340         <parent-volume>database_volume</parent-volume>
 173341         <parent-snapshot>daily-2012-02-26</parent-snapshot>
 173342         <volume>clone_volume</volume>
 173343         <space-reserve>none</space-reserve>
 173344 </volume-clone-create>
 173345 
 173346 00:30:54 02/26/12 [f7f61b90]i,2,18,FilerZephyr::output in = <results reason="The volume is not currently available" statu        s="failed" errno="13156"/>
 173347 
 173348 00:30:54 02/26/12 [f7f61b90]E,2,18,FilerZephyr:: (filer) volume-clone-create failed, error: 13156: The volume is not currently available
And then what's odd is if you look at the messages file on the filer, it's reporting as if the clone was successfully created:
Sun Feb 26 00:29:28 MST [filer: wafl.snap.delete:info]: Snapshot copy 2012-02-24 07_00_27 daily_filer_database_volume.-.log on volume database_volume NetApp was deleted by the Data ONTAP function zapi_snapshot_delete. The unique ID for this Snapshot copy is (111, 4139408).
Sun Feb 26 00:30:56 MST [filer: wafl.volume.clone.created:info]: Volume clone clone_volume of volume database_volume was created successfully.
Sun Feb 26 00:30:56 MST [filer: lun.newLocation.offline:warning]: LUN /vol/clone_volume/log/disk has been taken offline to prevent map conflicts after a copy or move operation.
Sun Feb 26 00:30:56 MST [filer: lun.map:info]: LUN /vol/clone_volume/disk was mapped to initiator group linux-server.domain.com_iscsi_SdIg=0
The only thing I'm wondering is if that zapi_snapshot_delete that is happening 1 second before could be causing an issue however I looked at the logs from the past sunday failure and that zapi_snapshot_delete did not happen at the same time so I'm not sure that's the issue.
Any help would be really appreciated.
Thanks,
Dan