groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-8 in grout and a performance regression (was: synchronous and as


From: G. Branden Robinson
Subject: Re: UTF-8 in grout and a performance regression (was: synchronous and asynchronous grout)
Date: Thu, 19 Dec 2024 20:45:56 -0600

At 2024-12-19T22:48:41+0100, onf wrote:
> On Thu Dec 19, 2024 at 8:43 PM CET, G. Branden Robinson wrote:
> > At 2024-12-19T20:23:56+0100, onf wrote:
> > > Although looking up Unicode codepoint numbers is arguably better
> > > than seeing gibberish, neither is a particularly good form to work
> > > with. Your reasoning sounds like "making it perfect for most
> > > people would make it horrible for a small minority, so let's
> > > rather keep it bad for everyone".
> >
> > I think you're exhibiting a form of the base rate fallacy here.
> >
> > The number of people who read GNU troff output ("grout"), whether
> > with their eyeballs or with a program they've written, is *tiny*.
> 
> If that's the case, I wonder why you're concerned about a tiny
> fraction of a tiny fraction of people not being able to display those
> characters..?

Because I think a lot of the occurrences of someone staring at grout are
going to come from people attempting to troubleshoot problems.  The very
first time they see the output format may be when they are in a
frustrating situation.  Under such circumstances, a representation
format that will work even over a serial line to a bad DEC VT100
emulator is a good thing to have.  Rendering a blue 🎈 and a brown 💩 is
not a high priority.

Now, it is true that sometimes, the nature of the problem people will be
troubleshooting will in fact have something to do with correct glyph
selection.  Not all the time.  Maybe not even most of the time.  If
you're staring at grout, I suspect output positioning problems or, as
we've seen recently with Peter Schaffter's novel use of `char`, tricky
sequencing issues involving the asynchrony of the command stream are
more likely.

But, some of the time, sure, yes.  In those cases, depicting 🤡s and
🎈s [gratuitous 8-bit microcomputer game reference for the aged] in a
self-representing manner would be nice.  Thus my openness to it being a
dynamically configurable choice.

> > I think a lot of people, even when troubleshooting, fail to consider
> > grout as an inspection site in the first place.  They try to reason
> > from groff input to whatever is emitted by the output driver.  And
> > that works, often enough.
> 
> That's true, but it can come in handy. I did examine the intermediate
> output when trying to understand .ne's behavior, for example.

Yes.  I think we could do more to encourage people to understand the
availability of the grout format as a resource on more occasions, in a
similar way that Dave Kemper finally got the value of "groff -a" output
through my thick skull.

One important step to take here is to get the groff_out(5) man page and
corresponding Texinfo nodes revised to eliminate lingering
idiosyncrasies.  I made another push on the battle front this week.

https://lists.gnu.org/archive/html/groff-commit/2024-12/msg00104.html

An improved example for the Texinfo manual, and a modestly more readable
use of white space in GNU troff's output, would also help.

> > I therefore consider your claim overstated.
> 
> Well, likewise I guess.

Fair enough.

> Yeah, I realized you responded to that only after I sent the message.

Fair enough ×2.

> It wasn't my intention to make you appear argumentative, I simply
> failed to understand part of what you were trying to say. I feel like
> you are ascribing unreasonable amount of animosity to me.

Would you settle for a _reasonable_ amount?  ;-)

When someone is simultaneously articulate, seems knowledgeable, _and_
doesn't seem to be making sense, I admit I tend to get suspicious.  But
I'm also quick to revise my opinion given further evidence.  And also
quick to forgive errors.  I'd better be--I make mistakes _all_ _the_
_time_, and strive to own up to them, in emails, in commit
messages, and in ChangeLogs.  Humans are fallible creatures.  That's why
we should make our machines robust.

> I just felt like the argument that emitting \[u...] is preferrable due
> to some tiny number of people who don't have Unicode-compliant
> terminals was fairly weak, and it would be cool if the output was more
> readable than this:
>   tv
>   Cu0065_030C
>   h5444
>   tt
>   Cvs
>   h4313
>   C'i
> 
> (that's "větší")

I agree.  For situations like that, UTF-8-empowered grout would be an
advantage.

But *roff is not Markdown.  If the main thing the formatter was doing
was streaming an array of character codes from one place to another, we
wouldn't need most of the features we have.

My assessment, based on some experience troubleshooting and debugging
groff documents, is that accurate glyph selection in formatted text just
is not something we screw up very often.

Accurate glyph selection in _document metadata_, on the other hand, has
been a significant challenge (and a notorious source of inscrutable
diagnostic messages) in groff for a long time, I guess since Deri first
brought us gropdf in 2011 or so and started to demand something the
system didn't originally anticipate.  I _think_ we finally cracked it
this year, and I hope he will get back to me regarding the scenario he
has where performance goes to hell, because I'd prefer not to release
with that deficiency.

> By the way, when you said that some terminals' fonts might be missing
> the relevant glyphs... I have the opposite problem, my groff fonts
> support only a fraction of what my terminal can display.

They're both problems.  Not everybody selects a terminal emulator font
with generous coverage, and I think there are still severe limitations
on the Linux console device.

Not having good coverage font-wise is, largely, the install-font.sh
problem.

https://savannah.gnu.org/bugs/?58831

I've been jammed up on wanting to rewrite (or supplement) that tool in a
form that will be practical for distributions to use, but since I never
get around to it, I now think the better thing to do is ship it, albeit
not in /usr/bin.  But in a place where it will get packaged.  Then
distributions' users will find it and try it, and start asking their
vendors why it isn't well-integrated with the system, and at that point,
maybe, productive conversations with distribution vendors can begin.
I'm a sympathetic counterparty--packaging used to be my job.

Also I know Colin Watson is standing right over there.

Regards,
Branden

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]