[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: manual section 4.7.1
From: |
cph1968 |
Subject: |
Re: manual section 4.7.1 |
Date: |
Tue, 04 Apr 2023 15:04:04 +0000 |
Thanks Arnold,
I was not aware the —cvs option was not officially released yet, but it
works well for me, still.
/Jimmy
On Tue, Apr 4, 2023 at 16:28, <[1]arnold@skeeve.com> wrote:
Thank you for the note.
As the documentation notes, FPAT is only a partial solution for
dealing
with CSV data.
The --csv option is not yet released, although of course folks can
build from
git and use the result if they wish to.
That section of the manual will be rewritten before gawk 5.3.0 is
released.
Thanks,
Arnold
cph1968@proton.me wrote:
> the regex fp[2] in section 4.7.1 (below) don't quite cut it if the
CSV file records end in both CR and NL [0H0D 0H0A]. I believe this
is a common feature of Windows files.
> A simple fix is however to use the gawk --csv option.
>
> ❯ head -n 2 TSCAINV_022023.csv| gawk -f print-fields.awk
> >ID,CASRN,casregno,UID,EXP,ChemName,DEF,UVCB,FLAG,ACTIVITY
> >F = 1 <ID,CASRN,casregno,UID,EXP,ChemName,DEF,UVCB,FLAG,ACTIVITY
> >1,50-00-0,50000,,,Formaldehyde,,,,ACTIVE
> >F = 1 <1,50-00-0,50000,,,Formaldehyde,,,,ACTIVE
>
> note here that the last '>' is first character on the next line.
>
> output using the --csv option:
> ❯ head -n 2 TSCAINV_022023.csv| gawk --csv -f print-fields.awk
> <ID,CASRN,casregno,UID,EXP,ChemName,DEF,UVCB,FLAG,ACTIVITY>
> NF = 10
<ID><CASRN><casregno><UID><EXP><ChemName><DEF><UVCB><FLAG><ACTIVITY>
> <1,50-00-0,50000,,,Formaldehyde,,,,ACTIVE>
> NF = 10 <1><50-00-0><50000><><><Formaldehyde><><><><ACTIVE>
>
> much better :-)
>
> ❯ cat print-fields.awk
> {
> print "<" $0 ">"
> printf("NF = %s ", NF)
> for (i = 1; i <= NF; i++) {
> printf("<%s>", $i)
> }
> print ""
> }
>
>
> from section 4.7.1:
> BEGIN {
> fp[0] = "([^,]+)|(\"[^\"]+\")"
> fp[1] = "([^,]*)|(\"[^\"]+\")"
> fp[2] = "([^,]*)|(\"([^\"]|\"\")+\")"
> FPAT = fp[fpat+0]
> }
>
>
>
> kind regards,
>
> cph1968
>
> Sent with Proton Mail secure email.
References
1. mailto:arnold@skeeve.com
signature.asc
Description: OpenPGP digital signature
- manual section 4.7.1, cph1968, 2023/04/04
- Re: manual section 4.7.1, arnold, 2023/04/04
- Re: manual section 4.7.1,
cph1968 <=
- Re: manual section 4.7.1, Ed Morton, 2023/04/04
- Re: manual section 4.7.1, Andrew J. Schorr, 2023/04/04
- Re: manual section 4.7.1, Ed Morton, 2023/04/04
- Re: stripping of CR characters in --csv mode, Andrew J. Schorr, 2023/04/04
- Re: stripping of CR characters in --csv mode, Ed Morton, 2023/04/04
- Re: stripping of CR characters in --csv mode, Ed Morton, 2023/04/04
- Re: stripping of CR characters in --csv mode, cph1968, 2023/04/05
- Re: stripping of CR characters in --csv mode, arnold, 2023/04/05
- Re: manual section 4.7.1, cph1968, 2023/04/05
- Re: manual section 4.7.1, Manuel Collado, 2023/04/05