Hello Toasters,
Has anybody upgraded to ONTAP 8.1.4 and made considerations for the new way ONTAP handles UTF-16 Supplementary characters? If so, how did you go about evaluating your exposure to issues with the new way the UTF-16 Supplementary characters are handled?
The 8.1.4 release notes state the following:
-------------------
*Change in how Data ONTAP handles file names containing UTF-16*
*supplementary characters*
Starting with Data ONTAP 8.1.4, there is a change in how Data ONTAP handles file names containing UTF-16 supplementary characters that you must be aware of if your environment uses such file names. Unicode character data is typically represented in Windows applications using the 16-bit Unicode Transformation Format (UTF-16). Characters in the basic multilingual plane (BMP) of UTF-16 are represented as single 16-bit code units. Characters in the additional 16 supplementary planes are represented as pairs of 16-bit code units that are referred to as surrogate pairs. When you create file names on the storage system that contain supplementary characters, Data ONTAP checks the surrogate pairs. If they are valid, Data ONTAP accepts the file name. If they are invalid, Data ONTAP now rejects the file name and returns an invalid file name error.
---------------------------------
Any thoughts or guidance would be appreciated. I also have a ticket with NetApp support. The first gentlemen I spoke with said he didnt know much about this issue despite his having worked with several customers running 8.1.4. He doesnt have any good recommendations for assesing our environment to exposure to UTF-16 Supplementary characters and thinks it is generally a low risk concern.
Thanks! -Phil
We're running two 7-mode FAS6240 pairs on 8.1.4P1 with numerous CIFS shares (300+ volumes, ~400TB of space) and I have not heard a single case of this causing problems since we upgraded a while back.
-- Mike Garrison
On Fri, Apr 25, 2014 at 2:07 PM, Philbert Rupkins philbertrupkins@gmail.com wrote:
Hello Toasters,
Has anybody upgraded to ONTAP 8.1.4 and made considerations for the new way ONTAP handles UTF-16 Supplementary characters? If so, how did you go about evaluating your exposure to issues with the new way the UTF-16 Supplementary characters are handled?
The 8.1.4 release notes state the following:
Change in how Data ONTAP handles file names containing UTF-16
supplementary characters
Starting with Data ONTAP 8.1.4, there is a change in how Data ONTAP handles file names containing UTF-16 supplementary characters that you must be aware of if your environment uses such file names. Unicode character data is typically represented in Windows applications using the 16-bit Unicode Transformation Format (UTF-16). Characters in the basic multilingual plane (BMP) of UTF-16 are represented as single 16-bit code units. Characters in the additional 16 supplementary planes are represented as pairs of 16-bit code units that are referred to as surrogate pairs. When you create file names on the storage system that contain supplementary characters, Data ONTAP checks the surrogate pairs. If they are valid, Data ONTAP accepts the file name. If they are invalid, Data ONTAP now rejects the file name and returns an invalid file name error.
Any thoughts or guidance would be appreciated. I also have a ticket with NetApp support. The first gentlemen I spoke with said he didnt know much about this issue despite his having worked with several customers running 8.1.4. He doesnt have any good recommendations for assesing our environment to exposure to UTF-16 Supplementary characters and thinks it is generally a low risk concern.
Thanks! -Phil
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Excellent! Thanks for the feedback! I doubt we will see any issues but it surely is an interesting change from NetApp. Its difficult for me to say what we are and are not using in terms of file names on our systems. That makes it dififcult to reassure my superiors that we won't run into issues.
It looks like the UTF-16 supplementary characters are primarily used in Chinese and Japanese personal names so any risk of issues for us is very small. Then again, I have no idea what kind of language sets applications are using so I just want to make sure Im taking this new restriction around UTF-16 supplementary characters into consideration.
Thanks again for the feedback!
On Fri, Apr 25, 2014 at 1:30 PM, Michael Garrison mcgarr@umich.edu wrote:
We're running two 7-mode FAS6240 pairs on 8.1.4P1 with numerous CIFS shares (300+ volumes, ~400TB of space) and I have not heard a single case of this causing problems since we upgraded a while back.
-- Mike Garrison
On Fri, Apr 25, 2014 at 2:07 PM, Philbert Rupkins philbertrupkins@gmail.com wrote:
Hello Toasters,
Has anybody upgraded to ONTAP 8.1.4 and made considerations for the new
way
ONTAP handles UTF-16 Supplementary characters? If so, how did you go
about
evaluating your exposure to issues with the new way the UTF-16
Supplementary
characters are handled?
The 8.1.4 release notes state the following:
Change in how Data ONTAP handles file names containing UTF-16
supplementary characters
Starting with Data ONTAP 8.1.4, there is a change in how Data ONTAP
handles
file names containing UTF-16 supplementary characters that you must be
aware
of if your environment uses such file names. Unicode character data is typically represented in Windows applications using the 16-bit Unicode Transformation Format (UTF-16). Characters in the basic multilingual
plane
(BMP) of UTF-16 are represented as single 16-bit code units. Characters
in
the additional 16 supplementary planes are represented as pairs of 16-bit code units that are referred to as surrogate pairs. When you create file names on the storage system that contain supplementary characters, Data ONTAP checks the surrogate pairs. If they are valid, Data ONTAP accepts
the
file name. If they are invalid, Data ONTAP now rejects the file name and returns an invalid file name error.
Any thoughts or guidance would be appreciated. I also have a ticket with NetApp support. The first gentlemen I spoke with said he didnt know much about this issue despite his having worked with several customers running 8.1.4. He doesnt have any good recommendations for assesing our environment to exposure to UTF-16 Supplementary characters and thinks it
is
generally a low risk concern.
Thanks! -Phil
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Also take note, that in rejecting invalid UTF surrogate pairs, you harden your system:
http://en.wikipedia.org/wiki/UTF-16/UCS-2 Because the most commonly used characters are all in the Basic Multilingual Plane, handling of surrogate pairs is often not thoroughly tested. This leads to persistent bugs and potential security holes, even in popular and well-reviewed application software (e.g. CVE-2008-2938, CVE-2012-2135).
The only way I see, that invalid characters would have 'entered' your system(s) AFAIK would be by people trying to hack your system, similar to what's described here:
http://en.wikipedia.org/wiki/UTF-8
Invalid byte sequences
Not all sequences of bytes are valid UTF-8. A UTF-8 decoder should be prepared for:
* the red invalid bytes in the above table * an unexpected continuation byte * a start byte not followed by enough continuation bytes * an Overlong Encoding as described above * A 4-byte sequence (starting with 0xF4) that decodes to a value greater than U+10FFFF
Many earlier decoders would happily try to decode these. Carefully crafted invalid UTF-8 could make them either skip or create ASCII characters such as NUL, slash, or quotes. Invalid UTF-8 has been used to bypass security validations in high profile products including Microsoft's IIS http://en.wikipedia.org/wiki/Internet_Information_Services web server^[12] http://en.wikipedia.org/wiki/UTF-8#cite_note-MS00-078-12 and Apache's Tomcat servlet container.^[13] http://en.wikipedia.org/wiki/UTF-8#cite_note-CVE-2008-2938-13
RFC 3629 states "Implementations of the decoding algorithm MUST protect against decoding invalid sequences."^[14] http://en.wikipedia.org/wiki/UTF-8#cite_note-rfc3629-14 /The Unicode Standard/ requires decoders to "...treat any ill-formed code unit sequence as an error condition. This guarantees that it will neither interpret nor emit an ill-formed code unit sequence."
Hope that helps
Sebastian
On 4/25/2014 8:40 PM, Philbert Rupkins wrote:
Excellent! Thanks for the feedback! I doubt we will see any issues but it surely is an interesting change from NetApp. Its difficult for me to say what we are and are not using in terms of file names on our systems. That makes it dififcult to reassure my superiors that we won't run into issues.
It looks like the UTF-16 supplementary characters are primarily used in Chinese and Japanese personal names so any risk of issues for us is very small. Then again, I have no idea what kind of language sets applications are using so I just want to make sure Im taking this new restriction around UTF-16 supplementary characters into consideration. Thanks again for the feedback!
On Fri, Apr 25, 2014 at 1:30 PM, Michael Garrison <mcgarr@umich.edu mailto:mcgarr@umich.edu> wrote:
We're running two 7-mode FAS6240 pairs on 8.1.4P1 with numerous CIFS shares (300+ volumes, ~400TB of space) and I have not heard a single case of this causing problems since we upgraded a while back. -- Mike Garrison On Fri, Apr 25, 2014 at 2:07 PM, Philbert Rupkins <philbertrupkins@gmail.com <mailto:philbertrupkins@gmail.com>> wrote: > Hello Toasters, > > Has anybody upgraded to ONTAP 8.1.4 and made considerations for the new way > ONTAP handles UTF-16 Supplementary characters? If so, how did you go about > evaluating your exposure to issues with the new way the UTF-16 Supplementary > characters are handled? > > The 8.1.4 release notes state the following: > > ------------------- > > Change in how Data ONTAP handles file names containing UTF-16 > > supplementary characters > > > > Starting with Data ONTAP 8.1.4, there is a change in how Data ONTAP handles > file names containing UTF-16 supplementary characters that you must be aware > of if your environment uses such file names. Unicode character data is > typically represented in Windows applications using the 16-bit Unicode > Transformation Format (UTF-16). Characters in the basic multilingual plane > (BMP) of UTF-16 are represented as single 16-bit code units. Characters in > the additional 16 supplementary planes are represented as pairs of 16-bit > code units that are referred to as surrogate pairs. When you create file > names on the storage system that contain supplementary characters, Data > ONTAP checks the surrogate pairs. If they are valid, Data ONTAP accepts the > file name. If they are invalid, Data ONTAP now rejects the file name and > returns an invalid file name error. > > --------------------------------- > > > > > > Any thoughts or guidance would be appreciated. I also have a ticket with > NetApp support. The first gentlemen I spoke with said he didnt know much > about this issue despite his having worked with several customers running > 8.1.4. He doesnt have any good recommendations for assesing our > environment to exposure to UTF-16 Supplementary characters and thinks it is > generally a low risk concern. > > Thanks! > -Phil > > _______________________________________________ > Toasters mailing list > Toasters@teaparty.net <mailto:Toasters@teaparty.net> > http://www.teaparty.net/mailman/listinfo/toasters >
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters