bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bash' code to store string of any characters containing pair of "" a


From: Martin D Kealey
Subject: Re: Bash' code to store string of any characters containing pair of "" and '' at once
Date: Sun, 22 Dec 2024 13:45:19 +1000

On Sun, 22 Dec 2024, 03:49 Budi, <budikusasi@gmail.com> wrote:

> How is Bash' code to store string of any characters should be containing
> pair of "" and '' at the same time explicitly, being exact verbatim so,
> ie. cannot be modified, escaped or etc, as expected from ordinary/naive
> human writing), into a variable
>

Any expectation that literal text can be written exactly verbatim (only
adding fixed delimiters before and after) is unrealistic. We can't even do
that in human languages, so doing that in a language intended for computers
to read would make it very hard for humans to write.

If you can abandon that assumption then Greg's answer is quite
comprehensive.

But a nagging question is why does the text contain both ' and "
characters? Is it written in a human language or a computer language?

If it's a human language, consider using “typographic” ‘quotes’ instead of
the much-overloaded ASCII " and ' characters.

If it contains shell script fragments, then consider not storing code in
variables in the first place, but using something more appropriate such as
functions.

If it absolutely has to store shell code in variables, consider arranging
the code in question so that it only uses one kind of quote mark, and use
the other kind to delimit that.

Failing that, by far the "simplest" approach to "literal" text is not to
have it inside the script, but in a separate file that the script can read.

 to insert \ in front of EVERY character, and NOT use any surrounding
quotes at all. Of course, this looks REALLY ugly and hard to read, but it's
very easy to think about when you're WRITING.

Let's say you want this:

 let's put "this" text worth $1 in a variable

This becomes:

 phrase=\l\e\t\'\s\ \p\u\t\ \"\t\h\i\s\"\ \t\e\x\t\ \w\o\r\t\h\ \$\1\ \i\n\
\a\ \v\a\r\i\a\b\l\e

If you don't mind a little bit of thinking, you can skip the \ before
alphanumeric characters, leaving:

 phrase=let\'s\ put\ \"this\"\ text\ worth\ \$1\ in\ a\ variable

If you don't mind even more thinking then you can instead memorize the list
of characters that need \ and skip all the others; they are space, tab,
newline, and "#$'&<>*()?`\[]|

Importantly \ itself is in that list. Each round of encoding will (at
least) double the number of \ present; $foo -> \$foo -> \\\$foo ->
\\\\\\\$foo etc.

 >7 tries to no avail.. so help out solve
>

Using $( echo … ) partially undoes some of the encoding, so it's actively
counterproductive to this task.

"Guessing" when there are infinite possibilities is pointless; knowledge of
other programming languages will lead you *away* from what you need to
know, not towards it.

You should *not* expect the code that embodies your text to look *exactly*
like the text unless it's very simple; it will almost always need
modifications. Stop thinking in terms of "quoting", and instead think in
terms of "encoding".

If you want to assign something to a variable, start by looking up
"assignment" in the manual. It says that an assignment looks like

 NAME=WORD

The fact that NAME and WORD are upper-case is important: it means they have
meanings that are defined elsewhere in the manual.

NAME is easy: any sequence of alphanumeric characters plus underscore, as
long as it doesn't start with a digit.

WORD is more complicated, but the important point is that it *ends* just
before the first *unquoted* whitespace or shell metacharacter. It does *NOT*
say that it end at a "closing quote".

Let me say that again, because it's critically important. The WORD that is
assigned to the variable does *NOT* end just because there's a closing
quote. It continues until just before whitespace or one of ();<>&|

Let that sink in for a bit.
Quotes can be opened and closed multiple times within a single WORD.
Different parts of a single WORD can be unquoted, or quoted in one of 4
different ways.

All of this is just one assignment:

 this_is_a_NAME='everything after the = is a valid WORD, and this part of
it is in single quotes 'this-part-is-unquoted" this part is in double
quotes"$' this part is in special quotes 'and\ the\ spaces\ in\ this\ part\
are\ escaped\ with\ \ backslashes', which counts as a form of quoting'.

-Martin

>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]