|
From: | Cook, Malcolm |
Subject: | RE: Parallelising grep |
Date: | Fri, 9 Aug 2013 16:26:50 +0000 |
Assuming your shell is bash.... With this exported function function slice { # PURPOSE: After an optional -h lines of header
(which are echoed # unless supressed with <-sh>), echo every <-n>th
line (default: # every 1 line) starting with the <-m>th (counting
from 1, starting # with the first line after the header, default:
starting with the # <n-1>th line.) # AUTHOR: malcolm_cook@stowers.org # EXAMPLE: slice -h=1 -sh -n=5 foo.tab > foo_every_fifth_line_after_the_one_line_header.tab # set -e ; perl -snwe 'BEGIN{our $n||=1; our $m=($n) unless
defined($m); $m-=1; our $h||=0; die "required: m < n" unless $m < $n; our $sh} print $_ if (($. > $h ) ? (($. -1 - $h) % $n == $m) : ! $sh)' -- $@ } export -f slice ...you can create a parallel jobs where each job
greps a slice of in.bam You would pass parallels {#} as the value for
–m and the same value you pass as –j to parallel as the value for –n You’ll probably need to use parallels –q and have
each job call bash. The following is untested. parallel –j 10 –q ‘bash –c “samtools view in.bam | slice –n=10 –m={#} | bash –c fgrep -w -f read.ids”’ > alignments.txt The output will have the slices interwoven. From:
parallel-bounces+mec=stowers.org@gnu.org [mailto:parallel-bounces+mec=stowers.org@gnu.org]
On Behalf Of Nathan S. Watson-Haigh I have a SAM/BAM file and I’d like to grep for alignments of certain reads IDs. I have the read ID strings in another file. I’m currently
doing this with: $ samtools view in.bam | fgrep -w -f read.ids > alignments.txt Is it possible to parallelise the grep by having each grep process a different subset of read iDs from the read.ids file? Or is there an alternative
way to parallelise this which I have overlooked? Cheers, Nathan -- Nathan S. Watson-Haigh, PhD Research Fellow in Bioinformatics Australian Centre for Plant Functional Genomics (ACPFG) School of Agriculture, Food and Wine University of Adelaide Waite Campus Plant Genomics Centre Hartley Grove, Urrbrae SA 5064 Phone: +61
8 8313 2046 Mobile: +61 438 711 615 Skype:
nathanhaigh Email:
nathan.haigh@acpfg.com.au Web:
http://www.acpfg.com.au/bioinformatics LinkedIn
http://www.linkedin.com/profile/view?id=114191748 Github:
https://github.com/nathanhaigh/
https://gist.github.com/nathanhaigh/ Twitter:
@watsonhaigh RID:
B-9833-2008 ResearchGate:
Nathan_Watson-Haigh
|
[Prev in Thread] | Current Thread | [Next in Thread] |