[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: manual section 4.7.1
From: |
arnold |
Subject: |
Re: manual section 4.7.1 |
Date: |
Tue, 04 Apr 2023 08:28:38 -0600 |
User-agent: |
Heirloom mailx 12.5 7/5/10 |
Thank you for the note.
As the documentation notes, FPAT is only a partial solution for dealing
with CSV data.
The --csv option is not yet released, although of course folks can build from
git and use the result if they wish to.
That section of the manual will be rewritten before gawk 5.3.0 is released.
Thanks,
Arnold
cph1968@proton.me wrote:
> the regex fp[2] in section 4.7.1 (below) don't quite cut it if the CSV file
> records end in both CR and NL [0H0D 0H0A]. I believe this is a common feature
> of Windows files.
> A simple fix is however to use the gawk --csv option.
>
> ❯ head -n 2 TSCAINV_022023.csv| gawk -f print-fields.awk
> >ID,CASRN,casregno,UID,EXP,ChemName,DEF,UVCB,FLAG,ACTIVITY
> >F = 1 <ID,CASRN,casregno,UID,EXP,ChemName,DEF,UVCB,FLAG,ACTIVITY
> >1,50-00-0,50000,,,Formaldehyde,,,,ACTIVE
> >F = 1 <1,50-00-0,50000,,,Formaldehyde,,,,ACTIVE
>
> note here that the last '>' is first character on the next line.
>
> output using the --csv option:
> ❯ head -n 2 TSCAINV_022023.csv| gawk --csv -f print-fields.awk
> <ID,CASRN,casregno,UID,EXP,ChemName,DEF,UVCB,FLAG,ACTIVITY>
> NF = 10 <ID><CASRN><casregno><UID><EXP><ChemName><DEF><UVCB><FLAG><ACTIVITY>
> <1,50-00-0,50000,,,Formaldehyde,,,,ACTIVE>
> NF = 10 <1><50-00-0><50000><><><Formaldehyde><><><><ACTIVE>
>
> much better :-)
>
> ❯ cat print-fields.awk
> {
> print "<" $0 ">"
> printf("NF = %s ", NF)
> for (i = 1; i <= NF; i++) {
> printf("<%s>", $i)
> }
> print ""
> }
>
>
> from section 4.7.1:
> BEGIN {
> fp[0] = "([^,]+)|(\"[^\"]+\")"
> fp[1] = "([^,]*)|(\"[^\"]+\")"
> fp[2] = "([^,]*)|(\"([^\"]|\"\")+\")"
> FPAT = fp[fpat+0]
> }
>
>
>
> kind regards,
>
> cph1968
>
> Sent with Proton Mail secure email.
- manual section 4.7.1, cph1968, 2023/04/04
- Re: manual section 4.7.1,
arnold <=
- Re: manual section 4.7.1, Ed Morton, 2023/04/04
- Re: manual section 4.7.1, Andrew J. Schorr, 2023/04/04
- Re: manual section 4.7.1, Ed Morton, 2023/04/04
- Re: stripping of CR characters in --csv mode, Andrew J. Schorr, 2023/04/04
- Re: stripping of CR characters in --csv mode, Ed Morton, 2023/04/04
- Re: stripping of CR characters in --csv mode, Ed Morton, 2023/04/04
- Re: stripping of CR characters in --csv mode, cph1968, 2023/04/05
- Re: stripping of CR characters in --csv mode, arnold, 2023/04/05
- Re: manual section 4.7.1, cph1968, 2023/04/05