[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
manual section 4.7.1
From: |
cph1968 |
Subject: |
manual section 4.7.1 |
Date: |
Tue, 04 Apr 2023 10:12:03 +0000 |
the regex fp[2] in section 4.7.1 (below) don't quite cut it if the CSV file
records end in both CR and NL [0H0D 0H0A]. I believe this is a common feature
of Windows files.
A simple fix is however to use the gawk --csv option.
❯ head -n 2 TSCAINV_022023.csv| gawk -f print-fields.awk
>ID,CASRN,casregno,UID,EXP,ChemName,DEF,UVCB,FLAG,ACTIVITY
>F = 1 <ID,CASRN,casregno,UID,EXP,ChemName,DEF,UVCB,FLAG,ACTIVITY
>1,50-00-0,50000,,,Formaldehyde,,,,ACTIVE
>F = 1 <1,50-00-0,50000,,,Formaldehyde,,,,ACTIVE
note here that the last '>' is first character on the next line.
output using the --csv option:
❯ head -n 2 TSCAINV_022023.csv| gawk --csv -f print-fields.awk
<ID,CASRN,casregno,UID,EXP,ChemName,DEF,UVCB,FLAG,ACTIVITY>
NF = 10 <ID><CASRN><casregno><UID><EXP><ChemName><DEF><UVCB><FLAG><ACTIVITY>
<1,50-00-0,50000,,,Formaldehyde,,,,ACTIVE>
NF = 10 <1><50-00-0><50000><><><Formaldehyde><><><><ACTIVE>
much better :-)
❯ cat print-fields.awk
{
print "<" $0 ">"
printf("NF = %s ", NF)
for (i = 1; i <= NF; i++) {
printf("<%s>", $i)
}
print ""
}
from section 4.7.1:
BEGIN {
fp[0] = "([^,]+)|(\"[^\"]+\")"
fp[1] = "([^,]*)|(\"[^\"]+\")"
fp[2] = "([^,]*)|(\"([^\"]|\"\")+\")"
FPAT = fp[fpat+0]
}
kind regards,
cph1968
Sent with Proton Mail secure email.
signature.asc
Description: OpenPGP digital signature
- manual section 4.7.1,
cph1968 <=
- Re: manual section 4.7.1, arnold, 2023/04/04
- Re: manual section 4.7.1, Ed Morton, 2023/04/04
- Re: manual section 4.7.1, Andrew J. Schorr, 2023/04/04
- Re: manual section 4.7.1, Ed Morton, 2023/04/04
- Re: stripping of CR characters in --csv mode, Andrew J. Schorr, 2023/04/04
- Re: stripping of CR characters in --csv mode, Ed Morton, 2023/04/04
- Re: stripping of CR characters in --csv mode, Ed Morton, 2023/04/04
- Re: stripping of CR characters in --csv mode, cph1968, 2023/04/05
- Re: stripping of CR characters in --csv mode, arnold, 2023/04/05