We can use this function within a loop to insert commas between
elements of an array to
create a properly formatted CSV record for an associative array, or
for an indexed array like
the fields of a line, as illustrated in the functions rec_to_csv and
arr_to_csv:
# rec_to_csv - convert a record to csv
function rec_to_csv( s, i) {
for (i = 1; i < NF; i++)
s = s to_csv($i) ","
s = s to_csv($NF)
return s
}
# arr_to_csv - convert an indexed array to csv
function arr_to_csv(arr, s, i, n) {
n = length(arr)
for (i = 1; i <= n; i++)
s = s to_csv(arr[i]) ","
return substr(s, 1, length(s)-1) # remove trailing comma
}
-----
Cheers,
Ben.
On Sun, 19 Nov 2023 at 21:37, <arnold@skeeve.com> wrote:
Hi.
I understand what you're saying. I don't have an answer at this point.
I think it would be helpful for you to open an issue on the Github
repo
for Brian Kernighan's awk, as CSV handling was his idea. Maybe he can
come up with something.
In any case, opening an issue there will allow for wider
discussion amongst
AWK implementors.
Thanks,
Arnold
Ed Morton <mortoneccc@comcast.net> wrote:
> Someone posted a question on stackoverflow about how to print
just the
> first 2 fields from a CSV so given this input:
>
> "foo,""bar""",2,3
> 1,"foo,bar",3
> 1,"foo,
> bar",3
>
> the expected output would be:
>
> "foo,""bar""",2
> 1,"foo,bar"
> 1,"foo,
> bar"
>
> I thought I'd answer with "--csv" but when I tried it I got this
output:
>
> $ awk --csv -v OFS=',' '{print $1, $2}' file.csv
> foo,"bar",2
> 1,foo,bar
> 1,foo,
> bar
>
> The quotes around the fields that need to be quoted (and were
quoted in
> the input) are missing and the escaped double quotes (`""`)
around the
> first `bar` have become individual (`"`) so the output is no longer
> valid CSV.
>
> I could get it back to valid CSV and produce the expected output by
> writing this or similar:
>
> $ awk --csv -v OFS=',' '{for (i=1; i<=NF; i++) {
> gsub(/"/,"\"\"",$i); if ($i ~ /[,\n"]/) { $i="\"" $i "\""}
}; print
> $1, $2}' file.csv
> "foo,""bar""",2
> 1,"foo,bar"
> 1,"foo,
> bar"
>
> but that's counter-intuitive and frustrating to have to write and I
> think many users wouldn't know how to, or understand why they
need to,
> write that code to get valid CSV output.
>
> I understand there is a benefit to stripping double quotes for
working
> on field contents and I appreciate that you need to make this
work with
> existing functionality (`OFS` values, etc.) so I understand why
`--csv`
> can't simply always output valid CSV and I also understand the
"don't
> provide constructs to do things that are easy to do with existing
> constructs" awk mantra to avoid code bloat, but there has to be
a way to
> make it easier for people to just print a couple of fields from
valid
> CSV input and have the output still be valid CSV.
>
> If there was a way to have `--csv` optionally NOT strip double
quotes
> when reading the fields then that'd solve the problem, e.g.
`--csv=q` or
> `--csvq` or similar to indicate quotes in and around fields
should be
> retained. If we had that then I could write something like:
>
> awk --csv=q -v OFS=',' '{print $1, $2}' file.csv
>
> or, less desirably as it's longer and can't be set on the
command line
> but would be better than nothing:
>
> awk --csv -v OFS=',' 'BEGIN{PROCINFO["CSV"]="q"} {print $1,
$2}'
> file.csv
>
> to get the desired output above and there are almost certainly
other use
> cases for people wanting to retain the quotes and there is no
simple
> alternative today (not using --csv but instead setting FPAT and
counting
> double quotes to know if a newline is inside or outside of a
field, and
> adding lines to $0 until you have a complete record).
>
> I don't think that would be hard for users to understand or
result in
> language bloat or introduce any additional complexity working with
> existing constructs - you simply wouldn't strip quotes when
reading the
> input and so they'd still be there when producing output.
>
> Regards,
>
> Ed.