[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] v4.1.3 (run on OSX 10.11.3): potential gsub() bug
From: |
Hermann Peifer |
Subject: |
Re: [bug-gawk] v4.1.3 (run on OSX 10.11.3): potential gsub() bug |
Date: |
Fri, 29 Jan 2016 09:19:37 +0100 |
User-agent: |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 |
Below is what I get on Mac OS X 10.10.5, using gawk/master
This seems to be related to the UTF-8 locale and the fact that all bytes
in the given range (0x80..0xFF) are not valid as first byte in an UTF-8
byte sequence.
Hermann
$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"
$
$ echo 'hät' | gawk '{ gsub(/[\x80-\xFF]/, ""); print }'
gawk: cmd. line:1: error: Invalid collation character: /[�-�]/
$
$ echo 'hät' | LC_ALL=C gawk '{ gsub(/[^\x80-\xFF]/, ""); print }'
ä
On 2016-01-29 1:29, Michael Klement wrote:
> The following, which should return 'ht', crashes:
>
> $ echo 'hät' | gawk '{ gsub(/[\x80-\xFF]/, ""); print }'
> gawk: cmd. line:1: fatal error: internal error
> Abort trap: 6
>
> Its inverse, which should return 'ä', does not:
>
> $ echo 'hät' | gawk '{ gsub(/[^\x80-\xFF]/, ""); print }'
> ä
>
>
> Regards,
>
> Michael