[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: What do you use GNU Parallel for?
From: |
Matt Oates (Home) |
Subject: |
Re: What do you use GNU Parallel for? |
Date: |
Wed, 22 Aug 2012 09:19:42 +0100 |
Hi Ole,
On 22 August 2012 07:08, Ole Tange <tange@gnu.org> wrote:
> So please write a few lines about the tasks you use it for -
> especially if you have reason to believe you are one of the few doing
> that kind of thing. If you want to be anonymous you can write me
> directly, but otherwise use the mailing list.
Good luck with the talk!
I use parallel to parallelise the external loop of most Bioinformatics
software, especially HMMER3. Many pieces of software have no
parallelisation, so if I give a big long list of input they go through
serially. I work with quite large datasets, 1,765 genomes each having
1-10 thousand protein sequences. With 5x 24 core desktops I can really
cutback how long something takes. We even have an internal script that
bridges parallel with the EC2 compute cloud, so if I need to do
something extra big I just go wider and hand the list of EC2 machine
names to parallel.
More day to day, I frequently use parallel to transform large files
(hundreds of gigabytes per file) of data between text based file
formats, so parallel perl/sed. I use the --pipe feature a lot to split
files too, so something like the FASTA format is splitable with
parallel and I can pipe the data straight in to another program.
I think you would do well to perhaps publish a short paper somewhere
in the Bioinformatic field about the speed ups you can get using
parallel with older non-parallel software.
Best,
Matt.
---
http://www.mattoates.co.uk
http://bccs.bris.ac.uk