[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Parallelising grep
From: |
Ole Tange |
Subject: |
Re: Parallelising grep |
Date: |
Fri, 9 Aug 2013 23:35:26 +0200 |
On Fri, Aug 9, 2013 at 7:53 AM, Nathan S. Watson-Haigh
<nathan.haigh@acpfg.com.au> wrote:
>
> I have a SAM/BAM file and I’d like to grep for alignments of certain reads
> IDs. I have the read ID strings in another file. I’m currently doing this
> with:
>
> $ samtools view in.bam | fgrep -w -f read.ids > alignments.txt
It will help if we get some idea of the size of the bam and ids, so
give the output for:
$ samtools view in.bam | wc
$ wc read.ids
$ samtools view in.bam | fgrep -w -f read.ids | wc
Based on no information I would do split ids into a chunk per cpu:
$ parallel --round-robin --pipe --block 1k cat ">"id.{#}
And then run one per CPU:
$ parallel "samtools view in.bam | fgrep -w -f {}" ::: id.* > alignments.txt
/Ole