[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Lynx-dev] Japanese line wrap patch for UTF-8 terminal
From: |
KIHARA Hideto |
Subject: |
[Lynx-dev] Japanese line wrap patch for UTF-8 terminal |
Date: |
Sat, 19 Mar 2022 15:33:08 +0900 |
User-agent: |
Mutt/1.9.4 (2018-02-28) |
Attached patch improves Japanese line wrap
for UTF-8 display character set.
* Issue:
Lynx sometimes breaks lines near the beginning of a line
for Japanese texts and UTF-8 display character set.
These line breaks hinder smooth reading.
* Example:
<p>
Rails 7........(Japanese_text_with_no_spaces)...........
......
</p>
is displayed as:
Rails
7........(Japanese_text_with_no_spaces).................
expected result is:
Rails 7........(Japanese_text_with_no_spaces)...........
......
screen captures:
http://www1.interq.or.jp/~deton/lynx-jawrapline/
* Cause:
Lynx breaks lines at space, but Japanese texts usually have no spaces.
In Japanese, line breaks can usually occur
before and after almost any Japanese characters, not just spaces.
* Patch:
This patch permits line breaks after any Japanese character.
(enabled by --enable-wcwidth-support configuration
and only called on last byte of multibyte UTF-8 sequence)
Note that lynx already has similar code for EUC-JP display character set.
--- src/GridText.c.orig 2021-12-29 15:28:45.256049180 +0900
+++ src/GridText.c 2022-02-19 16:56:57.749568192 +0900
@@ -605,6 +605,7 @@ static int utfxtra_on_this_line = 0; /*
#ifdef EXP_WCWIDTH_SUPPORT
static int utfxtracells_on_this_line = 0; /* num of UTF-8 extra cells in
line */
static int utfextracells(const char *s);
+static void permit_split_after_CJchar(HText *text, const char *s, unsigned
short pos);
#endif
#ifdef WIDEC_CURSES
# ifdef EXP_WCWIDTH_SUPPORT /* TODO: support for !WIDEC_CURSES */
@@ -4165,8 +4166,10 @@ void HText_appendCharacter(HText *text,
utff--;
utf_xlen = UTF_XLEN(line->data[utff]);
- if (line->size - utff == utf_xlen + 1) /* have last byte */
+ if (line->size - utff == utf_xlen + 1) { /* have last byte */
utfxtracells_on_this_line +=
utfextracells(&(line->data[utff]));
+ permit_split_after_CJchar(text, &(line->data[utff]),
line->size);
+ }
}
#endif
return;
@@ -14965,4 +14968,19 @@ static int utfextracells(const char *s)
}
return result;
}
+
+static void permit_split_after_CJchar(HText *text, const char *s, unsigned
short pos)
+{
+ /* Can split after almost any CJ char (Korean uses space) */
+ /* TODO: UAX#14 Unicode Line Breaking Algorithm (use ICU4C?) */
+ UCode_t u = UCGetUniFromUtf8String(&s);
+ if (u >= 0x4e00 && u <= 0x9fff || /* CJK Unified Ideographs */
+ u >= 0x3000 && u <= 0x30ff || /* CJK Symbols and Punctuation, Hiragana,
Katakana */
+ u >= 0xff00 && u <= 0xffef || /* Halfwidth and Fullwidth Forms.
Fullwidth ?! are often used */
+ /* rare characters */
+ u >= 0x3400 && u <= 0x4dbf || /* CJK Unified Ideographs Extension A */
+ u >= 0xf900 && u <= 0xfaff || /* CJK Compatibility Ideographs */
+ u >= 0x20000 && u <= 0x3ffff) /* {Supplementary,Tertiary} Ideographic
Plane */
+ text->permissible_split = pos;
+}
#endif
- [Lynx-dev] Japanese line wrap patch for UTF-8 terminal,
KIHARA Hideto <=
- Re: [Lynx-dev] Japanese line wrap patch for UTF-8 terminal, Thomas Dickey, 2022/03/19
- Re: [Lynx-dev] Japanese line wrap patch for UTF-8 terminal, Henry, 2022/03/21
- Re: [Lynx-dev] Japanese line wrap patch for UTF-8 terminal, Thorsten Glaser, 2022/03/21
- Re: [Lynx-dev] Japanese line wrap patch for UTF-8 terminal, KIHARA Hideto, 2022/03/26
- Re: [Lynx-dev] Japanese line wrap patch for UTF-8 terminal, KIHARA Hideto, 2022/03/26
- Re: [Lynx-dev] Japanese line wrap patch for UTF-8 terminal, Henry, 2022/03/27
- Re: [Lynx-dev] Japanese line wrap patch for UTF-8 terminal, David Woolley, 2022/03/27
- Re: [Lynx-dev] Japanese line wrap patch for UTF-8 terminal, KIHARA Hideto, 2022/03/31