Saturday, October 4, 2014

Parallel::ForkManager - Parallel Processing of a Perl script

 I recently used Parallel::ForkManager  perl module in one of my project.I was quite impressed by this module.By using this module it is very easy to do parallel processing where ever it needs.

My task :  Process thousands of client profiles information and store into a Database.

Approach : Form perl data structure to each profile and pass it to a subroutine that taken care of Database insertion part.

 So Every time when a profile Data structure has been formed , it has to wait till the completion of profile insertion part.

Stub :

my @responses; 
my $total_profiles = scalar @profiles;

for ( 1 .. $total_profiles ) {
    my $profile = shift @profiles;
    # form appropriate structure of $profile
    my $response =  storeProfile($profile);
    push(@responses , $response);
}

print Dumper \@responses;

#storeProfile subroutine has DB operations.

 
How Parallel Fork Manager helps to do it parallely 

use Parallel::ForkManager;
$max_process = 5;  #carefully  choose this number;

Carefully  choose max process because if your system is busy and it has more number of applications running , choosing a big number will affect other process.Parallel fork manager will create that many number of child process so each process will try occupy the cores.There is chance of system hang.

It is the advantage of the module, your script runs parallel y and make use of all cores.So the entire operation will be faster.

 In my task subroutine storeProfile runs parallel y  by 5 child process.So the speed of process will increase by 5 times

Assume that storing one profile took 0.5s 
than 
without Parallel::ForkManager 1000 profiles will take 500s
with Parallel::ForkManager 1000 profiles will take  100s
(if max_process is 10 then 1000 profiles will take  50s



$pm = new Parallel::ForkManager($max_process);
my @responses; 
$pm->run_on_finish(    # called at the end of child_process completion 
       sub {
           my ($pid, $exit_code, $ident, $exit_signal, $core_dump, $response) = @_; 
          # retrieve response  from child
           if (defined($response)) {   
                push(@responses , $response);
           }
       }
);

my $total_profiles = scalar @profiles;

foreach ( 1 .. $total_profiles ) {
     my $profile = shift @profiles;
     # form appropriate structure of $profile
     #FORK HERE
     $pm->start and next;
     #following code will run by child process
     my $response = storeProfile($profile);
     $pm->finish(0, $response); 
}

$pm->wait_all_children; #wait till all the child process completes
print Dumper \@responses;


Note : 
       * Before start using this module, one should be aware of selecting the number of process.

       * It is not good  to use this module when the script runs as a server that handles the request.

For example , if you try to create 5 process from the script , for each request it will create 5 process, if 10 parallel request comes than 5 * 10 process will create, Its a kind of Fork Bomb.Server will hang.

I would suggest to use Parallel::ForkManager in a stand alone script.


Reference : 
http://search.cpan.org/~szabgab/Parallel-ForkManager-1.06/lib/Parallel/ForkManager.pm