Hi,
I am working with a group maintaining a multi-platform command-line utility and the question has come up: are there any Mac users working in a locale that doesn't use Unicode (e.g. one of the JIS encodings)?
On one hand, this appears possible based on Mac's BSD origins (e.g. like Free/NetBSD/etc, it doesn't define the __STDC_ISO_10646__ C macro which implies that the wide character type [wchar_t] can't be assumed to be an integer of the Unicode codepoint). On the other hand, given the Mac setup/language selection process and the Unicode encoding of filenames in the native filesystems (and UTF-8 in particular on APFS), it would seem that it isn't practical to use the system in a non-Unicode locale.
Some practical implications of this is that we would assume that the filename argument to open(2)/fopen(3)/etc has to be UTF-8 encoded and similarly readdir(3)/etc will always return UTF-8 encoded filenames. Depending we could just assume the locale is always set correctly and let the default code does a null conversion or we could force a conversion to and from Unicode even if the locale is not Unicode (since APFS requires valid Unicode regardless).
Also as a side note, do you always apply a Unicode NFD normalization to filenames that are generated externally (e.g. by the user but not through a GUI file dialog box)?
I am working with a group maintaining a multi-platform command-line utility and the question has come up: are there any Mac users working in a locale that doesn't use Unicode (e.g. one of the JIS encodings)?
On one hand, this appears possible based on Mac's BSD origins (e.g. like Free/NetBSD/etc, it doesn't define the __STDC_ISO_10646__ C macro which implies that the wide character type [wchar_t] can't be assumed to be an integer of the Unicode codepoint). On the other hand, given the Mac setup/language selection process and the Unicode encoding of filenames in the native filesystems (and UTF-8 in particular on APFS), it would seem that it isn't practical to use the system in a non-Unicode locale.
Some practical implications of this is that we would assume that the filename argument to open(2)/fopen(3)/etc has to be UTF-8 encoded and similarly readdir(3)/etc will always return UTF-8 encoded filenames. Depending we could just assume the locale is always set correctly and let the default code does a null conversion or we could force a conversion to and from Unicode even if the locale is not Unicode (since APFS requires valid Unicode regardless).
Also as a side note, do you always apply a Unicode NFD normalization to filenames that are generated externally (e.g. by the user but not through a GUI file dialog box)?