On Wed, Apr 29, 2015 at 2:07 PM, Rasmus Villemoes <rv@rasmusvillemoes.dk> wrote:
On Wed, Apr 29 2015, Ole Tange <ole@tange.dk> wrote:
This still has the risk of killing an innocent PID and its children.
Killing (in the sense of sending any signal whatsoever) an
innocent/unrelated PID is completely unacceptable, IMO. On a reasonably
busy system, PID reuse within 10 seconds is far from unlikely.
On my system this gives PID reuse after 3.1 secs, but that is a very
extreme case, and I will accept if GNU Parallel deals wrongly with that case:
perl -e 'while(1) { $a=(fork()|| exit); if(not $a %1000) {print "$a\n";} } '
Mapping
the tree even before signalling the immediate children is not enough;
some of the grand^nchildren may vanish in the meantime and their PIDs
reused before one can use the gathered information.
I doubt that is true in practice. Mapping takes less than 100 ms, so I
would find it very unlikely that the PID will be reused that fast. I
understand that this could in theory happen, but I would like to see
this demonstrated before I consider this a real problem.
Since GNU Parallel will be sleeping (and not doing anything else) we
could simply kill 0 all the (grand*)children every second and compute
the family tree of the current children. If the child dies, remove the
child from the list to be killed later.
@children=familiy_tree(@job_pids);
for $signal (@the_signals) {
kill $signal, @job_pids;
$sleep_time = shift @sleep_times;
$time_slept = 0;
while($time_slept < $sleep_time and @children) {
@children = family_tree(grep { kill( 0, $_) } @children);
sleep $a_while;
$time_slept += $a_while;
}
}
kill KILL, @children;
Rasmus: Can you find a situation in which the above will fail?
I think the only way to do this right is for GNU Parallel to make each
immediate child a process group leader (setpgrp 0,0 immediately after
fork).
GNU Parallel uses open3 to spawn children. According to strace -ff
that does not do a setpgrp.
Do note that one can never clean up all descendants that may have been
spawned: A dance consisting of double fork() and some setpgid/setsid
yoga will create a process which cannot be tied to GNU Parallel or any
of its immediate children. So one has to rely on the children not doing
such things.
Yes. GNU Parallel should do the right thing in most cases and not
cause a problem in the rest.