In R, often times we get stuck by the limited processing power of our machines.  This can be easily solved by using parallel processing. In R, there are various libraries which enable parallel processing, but here I will use only parallel library.

Example: Here, I will explain a simple scenario of parallel package usage. Consider I have a data frame with thousand rows and two columns. Now, I need to compute the sum of each of 100 subsequent rows, i.e, I want to compute sums of rows c(1:100), c(101:200) ,..., c(901:1000) in parallel. This means I won't compute sums in serial manner.

 

[code language="css"]
library(parallel)
# Create a dummy data frame with 1000 rows and two columns
set.seed(100688)
df = data.frame(x=rnorm(1000),y=rnorm(1000))
no_cores = detectCores()-1# No. of cores in your system
cl = makeCluster(no_cores) # Make cluster
# Generate list of indexes for row summation of data frame
indexs = seq(from=1,to=1000, by =100)
clusterExport(cl,'df') # pass parameters on-fly to the threads
start_time = Sys.time() # start time of parallel computation
parallel_result = parLapply(cl,indexs,sumfunc)
total_time = Sys.time()-start_time # total time taken for computation
cat ('Total_parallel_time_taken',total_time)
stopCluster(cl)

sumfunc = function(ind) {
# Computs row sum of 100 rows starting with the index, ind
rowsum = colSums(df[ind:ind+99,])
return (rowsum)
}

# More than one parameter can be sent in the form of a list as
clusterExport(cl,list('xx','yy','zz') # parameters sent on-fly
[/code]

Other Related Blogs:

  1. How-to go parallel in R – basics + tips
  2. A brief foray into parallel processing with R
  3. Parallel computing in R on Windows and Linux using doSNOW and foreach