Also take note, that in rejecting invalid UTF surrogate pairs, you harden your system:

http://en.wikipedia.org/wiki/UTF-16/UCS-2
Because the most commonly used characters are all in the Basic Multilingual Plane, handling of surrogate pairs is often not thoroughly tested. This leads to persistent bugs and potential security holes, even in popular and well-reviewed application software (e.g. CVE-2008-2938, CVE-2012-2135).

The only way I see, that invalid characters would have 'entered' your system(s) AFAIK would be by people trying to hack your system, similar to what's described here:

http://en.wikipedia.org/wiki/UTF-8

Invalid byte sequences

Not all sequences of bytes are valid UTF-8. A UTF-8 decoder should be prepared for:

Many earlier decoders would happily try to decode these. Carefully crafted invalid UTF-8 could make them either skip or create ASCII characters such as NUL, slash, or quotes. Invalid UTF-8 has been used to bypass security validations in high profile products including Microsoft's IIS web server[12] and Apache's Tomcat servlet container.[13]

RFC 3629 states "Implementations of the decoding algorithm MUST protect against decoding invalid sequences."[14] The Unicode Standard requires decoders to "...treat any ill-formed code unit sequence as an error condition. This guarantees that it will neither interpret nor emit an ill-formed code unit sequence."


Hope that helps

Sebastian



On 4/25/2014 8:40 PM, Philbert Rupkins wrote:
Excellent!  Thanks for the feedback!  I doubt we will see any issues but it surely is an interesting change from NetApp.   Its difficult for me to say what we are and are not using in terms of file names on our systems.  That makes it dififcult to reassure my superiors that we won't run into issues.   

It looks like the UTF-16 supplementary characters are primarily used in Chinese and Japanese personal names so any risk of issues for us is very small.  Then again, I have no idea what kind of language sets applications are using so I just want to make sure Im taking this new restriction around UTF-16 supplementary characters into consideration.
 
Thanks again for the feedback!


On Fri, Apr 25, 2014 at 1:30 PM, Michael Garrison <mcgarr@umich.edu> wrote:
We're running two 7-mode FAS6240 pairs on 8.1.4P1 with numerous CIFS
shares (300+ volumes, ~400TB of space) and I have not heard a single
case of this causing problems since we upgraded a while back.

--
Mike Garrison

On Fri, Apr 25, 2014 at 2:07 PM, Philbert Rupkins
<philbertrupkins@gmail.com> wrote:
> Hello Toasters,
>
> Has anybody upgraded to ONTAP 8.1.4 and made considerations for the new way
> ONTAP handles UTF-16 Supplementary characters?   If so, how did you go about
> evaluating your exposure to issues with the new way the UTF-16 Supplementary
> characters are handled?
>
> The 8.1.4 release notes state the following:
>
> -------------------
>
> Change in how Data ONTAP handles file names containing UTF-16
>
> supplementary characters
>
>
>
> Starting with Data ONTAP 8.1.4, there is a change in how Data ONTAP handles
> file names containing UTF-16 supplementary characters that you must be aware
> of if your environment uses such file names. Unicode character data is
> typically represented in Windows applications using the 16-bit Unicode
> Transformation Format (UTF-16). Characters in the basic multilingual plane
> (BMP) of UTF-16 are represented as single 16-bit code units. Characters in
> the additional 16 supplementary planes are represented as pairs of 16-bit
> code units that are referred to as surrogate pairs. When you create file
> names on the storage system that contain supplementary characters, Data
> ONTAP checks the surrogate pairs. If they are valid, Data ONTAP accepts the
> file name. If they are invalid, Data ONTAP now rejects the file name and
> returns an invalid file name error.
>
> ---------------------------------
>
>
>
>
>
> Any thoughts or guidance would be appreciated. I also have a ticket with
> NetApp support. The first gentlemen I spoke with said he didnt know much
> about this issue despite his having worked with several customers running
> 8.1.4.   He doesnt have any good recommendations for assesing our
> environment to exposure to UTF-16 Supplementary characters and thinks it is
> generally a low risk concern.
>
> Thanks!
> -Phil
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters
>



_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters