[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GNU Parallel seems to drop
From: |
Dirk Eddelbuettel |
Subject: |
Re: GNU Parallel seems to drop |
Date: |
Tue, 25 Sep 2012 11:50:44 +0000 (UTC) |
User-agent: |
Loom/3.14 (http://gmane.org/) |
Dirk Eddelbuettel <edd <at> debian.org> writes:
> Ole Tange <ole <at> tange.dk> writes:
> > If 2 awk scripts both open A, B and C then the last one wins and all
> > data written by the first one is lost.
>
> Plonk. I think that may indeed be the case. I had not tought that through.
> I have to find a tool that does this in append mode.
Well a little "apt-get install gawk-doc" and two seconds of searching lead to
the '>>' operator to append to files ... and tada, it now works.
edd@max:/tmp/parallel$ rm dataSerial/* dataParallel/*
edd@max:/tmp/parallel$
edd@max:/tmp/parallel$ cat data.txt | \
awk -v path=dataSerial '{print $0 > (path "/" $1 ".txt")}'
edd@max:/tmp/parallel$ cat data.txt | \
parallel --pipe -- awk -v path=dataParallel -f script.awk
edd@max:/tmp/parallel$ wc -l dataSerial/*
199762 dataSerial/A.txt
200031 dataSerial/B.txt
200283 dataSerial/C.txt
199845 dataSerial/D.txt
200079 dataSerial/E.txt
1000000 total
edd@max:/tmp/parallel$ wc -l dataParallel/*
199762 dataParallel/A.txt
200031 dataParallel/B.txt
200283 dataParallel/C.txt
199845 dataParallel/D.txt
200079 dataParallel/E.txt
1000000 total
edd@max:/tmp/parallel$
with
edd@max:/tmp/parallel$ cat script.awk
{
print $0 >> (path "/" $1 ".txt")
}
edd@max:/tmp/parallel$
For reference and completeness, the data generator was the R script below:
edd@max:/tmp/parallel$ cat createData.r
#!/usr/bin/Rscript
N <- 1e6
set.seed(42)
df <- data.frame(key=sample(LETTERS[1:5], N, replace=TRUE),
value=rnorm(N))
write.table(df, file="/tmp/parallel/data.txt",
row.names=FALSE, col.names=FALSE, quote=FALSE)
Thanks, Dirk
- GNU Parallel seems to drop, Dirk Eddelbuettel, 2012/09/25
- Re: GNU Parallel seems to drop, Dirk Eddelbuettel, 2012/09/25
- Re: GNU Parallel seems to drop, Ole Tange, 2012/09/25
- Re: GNU Parallel seems to drop, Dirk Eddelbuettel, 2012/09/25
- Re: GNU Parallel seems to drop, Ole Tange, 2012/09/25
- Re: GNU Parallel seems to drop,
Dirk Eddelbuettel <=
- Re: GNU Parallel seems to drop, Ole Tange, 2012/09/25
- Re: GNU Parallel seems to drop data, Dirk Eddelbuettel, 2012/09/25
- Re: GNU Parallel seems to drop data, Ole Tange, 2012/09/25
- Re: GNU Parallel seems to drop data, Dirk Eddelbuettel, 2012/09/25
- Re: GNU Parallel seems to drop data, Ole Tange, 2012/09/25
- Re: GNU Parallel seems to drop data, Dirk Eddelbuettel, 2012/09/25
- Re: GNU Parallel seems to drop data, Ole Tange, 2012/09/25
- Re: GNU Parallel seems to drop data, Dirk Eddelbuettel, 2012/09/25
- Re: GNU Parallel seems to drop data, Ole Tange, 2012/09/25
- Re: GNU Parallel seems to drop data, Dirk Eddelbuettel, 2012/09/25