Tuesday, February 3, 2015

Parallel computing using multi-threads



Abstract: Multi-threading programming can speed up analytic work.


A thread is the smallest sequence of programmed instructions managed by the operating system. Multi-threading is the running of multiple simultaneous running threads at a program without the need to have multiple copies of the program. A thread is usually a component of a process, and multi-threads for multi-threading in Perl can exist within the same process but share resource such as memory or CPU. All threads share the same executable perl code, but the values of input variables might differ. CPU switches among different threads, and threads can be executed concurrently in multi-CPU or multi-core systems. There are some issues related to the implementation of multi-threading in Perl.

The first one is how many threads we can use? Please remember number of CPUs is only one of essential factor in practice. For example, there are 4 CPUs (4 cores per CPU) server computer. The theoretical maximum of multi-threads would be 16 (4x4). Another limit point is memory. Memory usage should not be beyond 90% when multi-threads computation. The I/O ability of hard driver is also a neck-limit. One standard known as linear speedup can be used for balancing the best number of threads. Linear speedup is the execution time under one thread divided by the time under multithreads. Higher speedup would be better. As the graph showed, the relationship between the number of threads and the computation time would be close to saturation status. Obviously, always increasing threads would not always speed up at ratio.


 
The second one is if the work can be parallel-processed known as task parallelism? The supports for multi-threads in Perl is not perfect like other true parallel processing programming. Memory communication and synchronization between threads are not in manual control in Perl programming. The whole task might as well divided into many sub-tasks. Each task would be executed by a thread, and there are no communications between threads.

Here is the example for multi-threading programming in Perl.
#! /usr/bin/perl -w
use strict;
use warnings;
use threads;

sub read_fasta{
my ($in_file)=@_;
my $out_file='short_'.$in_file;
open my($IN), "<", $in_file or die;
open my($OUT), ">", $out_file or die;
while((my $L1=<$IN>) & (my $L2=<$IN>)){
chomp($L1, $L2);
print $OUT "$L1\n", "$L2\n" if length($L2)<50;
}
close($IN);
close($OUT);
}

#main program
my @sample_files=('/home/yuan/data_2/test/test1.fa', '/home/yuan/data_2/test/test1.fa', '/home/yuan/data_2/test/test1.fa');
while(1){

if(threads->list() < $variables{threads_num} and @sample_files>0 ){
my $file=shift @sample_files;
threads->create(\&read_fasta, $file);
}
#recover all threads
foreach my $sub_thread( threads->list() ){#2
$sub_thread->join() if $sub_thread->is_joinable();
}#2
last if threads->list()==0 and @sample_files==0;
sleep 10;
}



Writing data: 2013.09.12


No comments:

Post a Comment