[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GNU Parallel seems to drop
From: |
Ole Tange |
Subject: |
Re: GNU Parallel seems to drop |
Date: |
Tue, 25 Sep 2012 15:11:05 +0200 |
On Tue, Sep 25, 2012 at 1:50 PM, Dirk Eddelbuettel <edd@debian.org> wrote:
> Well a little "apt-get install gawk-doc" and two seconds of searching lead to
> the '>>' operator to append to files ... and tada, it now works.
Depending on how it appends that may not work. Do you know for sure it
flushes for every record? Otherwise you may get half-records.
> edd@max:/tmp/parallel$ rm dataSerial/* dataParallel/*
> edd@max:/tmp/parallel$
> edd@max:/tmp/parallel$ cat data.txt | \
> awk -v path=dataSerial '{print $0 > (path "/" $1 ".txt")}'
> edd@max:/tmp/parallel$ cat data.txt | \
> parallel --pipe -- awk -v path=dataParallel -f script.awk
> edd@max:/tmp/parallel$ wc -l dataSerial/*
> 199762 dataSerial/A.txt
> 200031 dataSerial/B.txt
> 200283 dataSerial/C.txt
> 199845 dataSerial/D.txt
> 200079 dataSerial/E.txt
> 1000000 total
> edd@max:/tmp/parallel$ wc -l dataParallel/*
> 199762 dataParallel/A.txt
> 200031 dataParallel/B.txt
> 200283 dataParallel/C.txt
> 199845 dataParallel/D.txt
> 200079 dataParallel/E.txt
> 1000000 total
If these give the same output, then you are golden. If not, you may
have half-records in the parallel data.
parallel -k --tag 'sort {} | md5sum' ::: dataSerial/*
parallel -k --tag 'sort {} | md5sum' ::: dataParallel/*
/Ole
- GNU Parallel seems to drop, Dirk Eddelbuettel, 2012/09/25
- Re: GNU Parallel seems to drop, Dirk Eddelbuettel, 2012/09/25
- Re: GNU Parallel seems to drop, Ole Tange, 2012/09/25
- Re: GNU Parallel seems to drop, Dirk Eddelbuettel, 2012/09/25
- Re: GNU Parallel seems to drop, Ole Tange, 2012/09/25
- Re: GNU Parallel seems to drop, Dirk Eddelbuettel, 2012/09/25
- Re: GNU Parallel seems to drop,
Ole Tange <=
- Re: GNU Parallel seems to drop data, Dirk Eddelbuettel, 2012/09/25
- Re: GNU Parallel seems to drop data, Ole Tange, 2012/09/25
- Re: GNU Parallel seems to drop data, Dirk Eddelbuettel, 2012/09/25
- Re: GNU Parallel seems to drop data, Ole Tange, 2012/09/25
- Re: GNU Parallel seems to drop data, Dirk Eddelbuettel, 2012/09/25
- Re: GNU Parallel seems to drop data, Ole Tange, 2012/09/25
- Re: GNU Parallel seems to drop data, Dirk Eddelbuettel, 2012/09/25
- Re: GNU Parallel seems to drop data, Ole Tange, 2012/09/25
- Re: GNU Parallel seems to drop data, Dirk Eddelbuettel, 2012/09/25