[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Using -C with --pipe is this possible?
From: |
Matt Oates (Home) |
Subject: |
Re: Using -C with --pipe is this possible? |
Date: |
Tue, 1 May 2012 08:42:23 +0100 |
On 30 April 2012 21:51, Ole Tange <tange@gnu.org> wrote:
> On Thu, Apr 26, 2012 at 12:20 AM, Matt Oates (Home) <mattoates@gmail.com>
> wrote:
> Good to see protein people using GNU Parallel.
I think there are quite a few of us :)
>> I then want to run something of the form:
>>
>> parallel -C '\t' -N 1 --pipe "myprogram /dev/stdin | cat <(echo {1})
>> -" < file.tab | output-processing-program > results.tab
>
> Will this work?
>
> cat file.tab | parallel -C '\t' 'echo {1}; echo {2} | myprogram
> /dev/stdin' | output-processing-program > results.tab
>
> Or maybe --tag is even better for your purpose?
>
> cat file.tab | parallel --tag -C '\t' 'echo {2} | myprogram
> /dev/stdin' | output-processing-program > results.tab
Neither of these will work since they are changing the input that's
going into "myprogram" rather than the output. I need to tag the
output before it goes into the output processing program (which I've
written) but after the input from --pipe has already passed through. I
can't change anything about "myprogram" since its a precompiled
protein disorder predictor. Is there a reason --pipe and -C can't just
be made to work at the same time, if they could my command line would
work and in general more elaborate commands can be made. I'm a heavy
user of --pipe but it's very limited if I can't also use some of the
input to parallel just as parameters to the commands being run. I need
something like a --tag-output or --tag-after flag from GNU parallel so
that the job output associated with a chunk of split input is tagged
with the prefix. At the moment I'm faced with having to do something
other than use GNU parallel, which is a scary concept considering how
much GNU parallel does for me already :)
The best feature I can imagine for what I want to do is a flag of the form:
parallel --pipe --cut=1,3 -C ',' -N 1 'program -x {2} {1} /dev/stdin'
So if the input was:
1,Hello world,11
2,Yay,3
Then {1} and {2} would be:
{1}=1 {2}=11
{1}=2 {2}=3
The input streams reaching 'program' would look like:
Hello world
EOF
Yay
EOF
And the command lines per job would be:
program -x 11 1 /dev/stdin
program -x 3 2 /dev/stdin
--cut=1,3 tells parallel to cut these fields out of the input stream
before they get sent anywhere or chunked up, and that the cut parts
populate the {1}{2} parameter place holders. If --cut isn't specified
but -C is then assume that we shouldn't cut anything out, but all
fields should populate the parameter place holders.
The problem comes when a single line is not used, and a larger chunk
of data is being piped, in this instance you could just ignore the
populating of the parameters when -N is greater than 1 or a
--rec-start/end is specified. Still being able to cut the piped input
without using the cut program isn't an awful feature for other people
as a default behaviour to minimise surprise on what it does.
If you think this is a worthwhile addition to parallel I can work on a
patch for you, as I need this functionality myself ASAP.
Best,
Matt.
---
http://blog.mattoates.co.uk
http://bccs.bris.ac.uk
- Re: Using -C with --pipe is this possible?,
Matt Oates (Home) <=