• software practice
  • parallel
  • tasks
  • clusterlike
  • admin
  • knowledge backup

I’ve used Gnu Parallel in the past, and it’s an exceptional tool for processing a large number of tasks. I highly recommend it.

Today, I started a task where we generate 5,000 things. Each task runs for about 10-15 minutes, generates a bunch of data, modifies it, and writes it elsewhere.

Last week, I ran this running parallel commands seperately on on five different machines. which was kind of a pain to manage. So today, when kicking off another batch of 5,000, I came up with this command:

parallel --eta --progress --joblog jobs.log --tagstring {} --results output -j 10 --delay 10 --retries 4 -S ubuntu@10.0.0.213,ubuntu@10.0.0.66,ubuntu@1 0.0.0.58,ubuntu@10.0.0.71,ubuntu@10.0.0.14 "IDENTIFIER={} REALM=2 /home/ubuntu/bin/run" ::: {1..4999}

Which produces this lovely output.

Computers / CPU cores / Max jobs to run
1:ubuntu@10.0.0.71 / 16 / 10
2:ubuntu@10.0.0.14 / 16 / 10
3:ubuntu@10.0.0.213 / 16 / 10
4:ubuntu@10.0.0.58 / 16 / 10
5:ubuntu@10.0.0.66 / 16 / 10

Computer:jobs running/jobs completed/%of started jobs
ETA: 0s Left: 4999 AVG: 0.00s  1:10/0/20%/0.0s  2:10/0/

Now I’ve got 50 jobs running across five machines, and as they complete, more will start until we hit all 4,999 tasks (I already created job 0 earlier, which brings the total to 5,000).

Quick breakdown.

  • --eta outputs the expected completion time based on how long completed tasks have taken. (It currently shows 0 because no jobs have finished yet.)
  • --progress shows how far along we are
  • --jobslog jobs.log maintains a file of completed jobs. Can be used to restart the entire run if something goes wrong.
  • --tagstring any stdout that goes to the screen get prefixed with x where x is the input 1-4999.
  • --results store the job stdout and stderr to files in /output
  • -j 10 runs up to 10 jobs on each machine
  • --delay 10 wait 10 seconds before starting new jobs.
  • S ubuntu@... is a comma-separated list of hosts to run jobs on. You can include localhost too. It’s also possible to set a specific number of jobs per host.

Finally you have the command, followed by ::: {1,4999} In this case we are expanding the input to be all the numbers between the two digits. Finally, the ::: syntax is followed by {1..4999}. This expands the input to all integers in that range.

This was fun and I hope you find an excuse to use parallel in the future.

How to reply to this post