bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] normalization tweaks for macOS


From: Chet Ramey
Subject: Re: [PATCH] normalization tweaks for macOS
Date: Tue, 18 Jul 2023 09:54:57 -0400
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.13.0

On 7/17/23 6:12 PM, Grisha Levit wrote:
On Mon, Jul 17, 2023 at 3:29 PM Chet Ramey <chet.ramey@case.edu> wrote:

On 7/7/23 5:05 PM, Grisha Levit wrote:
A few small tweaks for the macOS-specific normalization handling to
handle the issues below:

The issue is that the behavior has to be different between cases where
the shell is reading input from the terminal and gets NFC characters
that need to be converted to NFD (which is how HFS+ and APFS store them)
and when the shell is reading input from a file and doesn't need to (and
should not) do anything with NFD characters.

Unicode normalization on macOS has always been a pain in the ass.

This is the basic assumption that drives all the decisions: character input
you get from the terminal is in NFC, and files from the file system (names
and usually contents) are in NFD. That dates from the original links I
posted in my previous message.


NB: while HFS+ stores NFD names, APFS preserves normalization, so we
can get either NFC or NFD text back from readdir.

Well, that doesn't help. But I haven't seen any NFC text coming back from
readdir on any of my macs.

Currently, Bash never actually converts to NFD.  The fnx_tofs()
function is there but it is never used.  Instead, Bash converts
filenames to NFC with fnx_fromfs() before comparing with either the
glob pattern or the completion hint text (which is never converted).

Correct. It's a one-way conversion, since you only have to convert one
of the two different forms, and the current implementation works on text
entered interactively (which is in NFC). When you're reading a script, you
don't have to perform any conversion at all; your NFD examples all work
fine when run from a script.

Since access is normalization-insensitive, we just need to normalize > to 
_some_ form, so going to NFC is fine, but if we're going to do that
we should normalize both the filesystem name and the text being
compared.

The idea is that since the text entered interactively at the terminal is
already in NFC, the curent implementation converts only what it knows is
coming from the keyboard.

If there's a match, globs expand to the filenames (NFC or NFD) as
returned by readdir(), and Readline completes with NFC-normalized
versions of the names.  I think this makes sense.

Because NFC is what you get from terminal input.

What doesn't work quite right currently though is that glob patterns
with NFD text never match anything, and completion prefixes with NFD
text never expand to anything.

When entered from the terminal. It goes back to the basic assumption: NFC
is what you get from the terminal, so you have to convert from the file
system normalization form when you're sure what you want to compare is
coming from the terminal.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]