[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Combining --pipe and --shebang options
From: |
Michel Samia |
Subject: |
Re: Combining --pipe and --shebang options |
Date: |
Tue, 20 Nov 2012 11:21:22 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux i686; rv:16.0) Gecko/20121028 Thunderbird/16.0.2 |
On 19.11.2012 20:26, Ole Tange wrote:
It actually is possible to combine --shebang and --pipe - but not the
way you want.
--shebang considers the rest of the file as data as if it was received
on -a or through stdin.
#!/usr/local/bin/parallel --shebang --pipe -k -j24 cat
a
b
c
d
I tried it also without --shebang and the problem is, that the exec
syscall on linux doesn't tokenize the rest of the line and passes the
rest of the shebang line as one long argument containing spaces. But is
fixed in the read_options subrutine in case of --shabang at the
beginning of it (I don't know perl much but I understood this :) ).
822 # This must be done first as this may exec myself
823 if(defined $ARGV[0] and ($ARGV[0]=~/^--shebang / or
824 $ARGV[0]=~/^--hashbang /)) {
825 # Program is called from #! line in script
826 $ARGV[0]=~s/^--shebang *//; # remove --shebang if it is set
827 $ARGV[0]=~s/^--hashbang *//; # remove --hashbang if it is set
828 my $argfile = pop @ARGV;
829 # exec myself to split $ARGV[0] into separate fields
830 exec "$0 --skip-first-line -a $argfile @ARGV";
831 }
What you want is to pass the rest of the file to python and let
parallel chunk up stdin to the python script.
I really like your idea, but it is clearly not a bug that it does not
work currently.
Your idea would be useful for any script (Shell, Perl, Python) which
can either process only one file or process stuff from stdin. Using
GNU Parallel your script can suddenly process many files/blocks of
data and in parallel.
I do not see we can change the behaviour of --shebang, but we can
invent a new option.
I agree, the semantics of --shebang is a little bit different than what
I need to do, so we should better add new option.
So it could be something like:
#!/usr/bin/parallel --shebang-program --pipe -k -j24 /usr/bin/python
(Please come up with a better name than --shebang-program)
--shebang-program is maybe too large, but sounds good, shorter variant,
and also quite descriptive, can be for example --script. Both are good,
just choose what do you prefer :)
This should accept data on stdin which should be chunked and passed to
the python program. So the program will be called like:
cat foo bar | my_program
or:
my_program foo bar
Without the --pipe:
#!/usr/bin/parallel --shebang-program -k -j24 /usr/bin/python
should work like:
parallel -k -j24 /usr/bin/python my_program {}
So:
my_program foo bar
(echo foo; echo bar) | my_program
should do the same as:
parallel -k -j24 /usr/bin/python my_program {} ::: foo bar
We should allow for putting options on the command interpreter. E.g:
#!/usr/bin/parallel --shebang-program -k -j24 /usr/bin/perl -p
Also without the --pipe the {} should work as expected:
#!/usr/bin/parallel --shebang-program -k -j24 /usr/bin/perl -p {}
{.}.out > {.}.log
Are there things I am not covering? Are there use cases that this will
not cover?
Thank you for all the use cases. I don't see any missing use cases :) If
someone finds any later, (s)he can open a bug report or send an e-mail
to the mailing list ;)
If I knew perl more, I could try to write a patch fixing this bug, but I
think it will be better when it will be written by some perlist.
/Ole
Michel