You might be able to write up a bash script to reconvert it, but I'm not very conversant with the different methods of text encoding. It's possible that the damage is one-way and that you're missing information needed to translate it to unicode. Another option would be to reshare the data on the NAS as CIFS and redo the migration using another method. Either mount it using a cifs mount on unix and try a unix copy, or mount a new unix target on a windows machine and use robocopy. One of those might work.

On Fri, Dec 12, 2014 at 7:01 AM, <scl@virginia.edu> wrote:

Hi folks,

We recently copied some netapp volumes to a
non-netapp file server via NFS and rsync.
Unfortunately we have now converted a lot
of file names that were created via CIFS
on the Netapp using a variety of character
sets, such as Arabic, Cyrillic, Persian, etc.

As I understand it, a Netapp keeps two file
names for each file/folder. The CIFS filename
is 16 bit Unicode while the NFS filename is
"literal" 8 bit characters (i.e., not
UTF-8).

I presume that when a file is created on the
Netapp via CIFS that the Netapp creates both
the CIFS filename and a NFS filename. When we
copied the volumes via NFS and rsync, the
NFS names were copied over, so we "lost" the
CIFS names.

Is there any algorithm or procedure that we
can use to convert these 8 bit "NFS" names
to their equivalent 16 bit Unicode names?
Ultimately we need to convert to UTF-8 which
is what the destination file server uses
for file names (both NFS and CIFS use the
same UTF-8 filename).

We have turned off CIFS access to the Netapps,
and I would like to avoid turning it back on,
but I may have no other way to retrieve the
original 16 bit unicode names.

The Netapp volumes are Unix security style and
were accessed both by NFS and CIFS. Some
files were originally created via NFS, some
by CIFS. And unfortunately, some CIFS filenames
had non-English character sets.

Here are some sample NFS filenames. I have no idea
what the corresponding CIFS filenames are or what
character set.

:31K4J00 :44O8K00 :5J0TA00 :8QLDT00 :BQLDT00
:EA0IL00 :FA0IL00

These look like the result of a hash, so there
may be no way to convert back to CIFS 16bit Unicode.

Steve Losen scl@virginia.edu

_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters