|
From: | Martin d'Anjou |
Subject: | Re: New behavior proposal --halt -% with job killing |
Date: | Sat, 25 Apr 2015 08:55:24 -0400 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 |
On 15-04-25 03:25 AM, Ole Tange wrote:
On Thu, Apr 23, 2015 at 5:10 PM, Martin d'Anjou <martin.danjou14@gmail.com> wrote: Good summary:You could come up with many options here: When to halt: --halt [condition for halt] --timeout [condition for halt is an amount of time] --memfree [condition for halt is an amount of memory] kill -TERM [condition for halt is the signal]I think our solution should make it possible to extend this list. E.g. maybe it will be possible to detect whether the remote job failed or the network connection to the remote server failed. --memfree is special, however. It retries indefinitely, if the job gets killed due to low memory.How to handle jobs after a halt: --halt-job-handling [killpending[,killrunning]] --timeout-job-handling [killpending[,killrunning]] and so on. Users could use both kills if they wanted both.And --kill-TERM-job-handling killrunning will always imply killpending, but the opposite is not the case, right?
For me yes, killrunning would always imply killpending. So it could be [killpending,killall].
--retries should be thrown into the mix, too.
What does retry mean?
I can easily think of real life situations where the handling of a death due to --memfree is different from a --timeout, and I think the current behaviour (retrying indefinitely) is correct. But do we have a real life situation where we want --halt-job-handling to be different from --timeout-job-handling given that we have --retries? I am reluctant to put in 3 options that are extremely rarely used (there are plenty of options as it is and testing becomes harder the more combinations needs to be tested).
Agreed.For me, one option suffices for all possible ways of halt/timeout/kill -TERM: it is either to kill pending (stop spawning new jobs), or kill all (which implies kill pending). I do not use memfree so throw it in the mix where it makes sense.
Or, you could use an explicit "plus" sign to mean halt and kill all running and pending jobs: --halt +1-99%POLA would say that --halt +1-99% == --halt 1-99%
There is a precedent. The find program uses +n, -n and n (man find). "find /path/to/files* -mtime +5" is not the same as "find /path/to/files* -mtime -5".
Martin
[Prev in Thread] | Current Thread | [Next in Thread] |