[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Would like your thoughts on remote java processes not timing out but bei
From: |
Alex Muir |
Subject: |
Would like your thoughts on remote java processes not timing out but being replaced with new processes |
Date: |
Thu, 20 Sep 2012 10:52:58 -0400 |
Hi,
I'm running some java processes on a remote 16 core server with a
timeout of 1500 seconds and with J+1.
I'm getting 21 java processes 17 which are not past the timeout range
fo 25 minutes
28211 ec2-user 20 0 4355m 170m 9948 S 145.4 0.3 0:07.36 java
28075 ec2-user 20 0 4355m 652m 9.8m S 138.5 1.1 0:27.80 java
27610 ec2-user 20 0 4503m 1.1g 9.9m S 130.3 1.9 1:03.86 java
27547 ec2-user 20 0 4503m 1.1g 9.9m S 127.4 1.8 1:09.28 java
27490 ec2-user 20 0 4503m 1.1g 9.9m S 100.5 1.8 1:09.56 java
27124 ec2-user 20 0 4503m 1.3g 9m S 75.5 2.2 2:14.63 java
27779 ec2-user 20 0 4431m 1.2g 9.9m S 68.6 2.0 0:55.03 java
27051 ec2-user 20 0 4503m 1.3g 9m S 65.0 2.2 2:20.66 java
27767 ec2-user 20 0 4431m 1.2g 9.9m S 64.3 1.9 0:55.12 java
27922 ec2-user 20 0 4431m 1.1g 9.9m S 61.1 1.9 0:42.67 java
27849 ec2-user 20 0 4431m 1.1g 9.9m S 60.7 1.9 0:49.17 java
27958 ec2-user 20 0 4431m 1.1g 9.9m S 56.8 1.8 0:42.79 java
28016 ec2-user 20 0 4431m 1.0g 9.9m S 55.8 1.7 0:42.86 java
27280 ec2-user 20 0 4503m 1.2g 9m S 53.2 2.0 1:57.32 java
27343 ec2-user 20 0 4503m 1.3g 9m S 52.9 2.1 1:59.01 java
27683 ec2-user 20 0 4503m 1.1g 9.9m S 50.6 1.8 0:59.54 java
6841 ec2-user 20 0 4355m 2.3g 9.8m S 58.1 3.8 22:29.07 java
and 4 which are past the timeout range
4106 ec2-user 20 0 4355m 371m 9.8m S 56.1 0.6 60:59.26 java
8143 ec2-user 20 0 4355m 1.5g 9.9m S 59.4 2.5 287:53.77 java
8035 ec2-user 20 0 4355m 2.2g 9.8m S 57.1 3.7 288:21.62 java
21306 ec2-user 20 0 4355m 435m 9.8m S 51.2 0.7 188:07.36 java
So I would assume that parallel tried to kill these processes and was
not able to and also started some more to compensate
I have tested that the timeout works locally but this is my first time
seeing the remote server processes not timing out
I'm launching the process as follows
ls $sourceDir*.zip | parallel -j+$NumberExtraJobsPerServer --eta
--progress --sshlogin $servers --timeout $timeout --transfer --joblog
$jobLog "sh /mnt/xslt_volume/i4EnrichV7/src/enrich/10k/scripts/runCalabash.sh
/mnt/xslt_volume/i4EnrichV7/src/enrich/10k/xpl/i4Enrich.xpl
$documentSpecficLogs{/.}Log.txt {}" $outputDir $svnRepositoryRoot
$svnRevision $logging $debug $saveHTML
with parameters
servers="xx.xx.xxx.xxx"
saveHTML="true"
debug="false"
timeout="1500"
logging="false"
NumberExtraJobsPerServer="1"
sourceDir="/mnt/xslt_volume/i4ContentSource/SEC/10k-GHU/2009/"
outputDir="/mnt/xslt_volume/i4ContentOutput/SEC/10k-GHU/2009/"
logDir="${outputDir}logs/"
documentSpecficLogs="${logDir}documentSpecfic/"
jobLog="${logDir}parallelJobLog.txt"
metricsLog="${logDir}metrics.txt"
svnRepositoryRoot=$(svn info |grep 'Repository Root' | sed
's/Repository Root: //g')
svnRevision=$(svn info |grep Revision | sed 's/Revision: //g')
Given the process it's highly likely that a regular expression given
some permutation of text is hanging in a few of the 10000 input files.
I'll have to debug that.
Is there anything I could do to ensure that parallel is able to kill
processes remotely?
Regards
--
-
Alex G. Muir
Software Engineering Consultant
Linkedin Profile : http://ca.linkedin.com/pub/alex-muir/36/ab7/125
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Would like your thoughts on remote java processes not timing out but being replaced with new processes,
Alex Muir <=