Piping Operator



Hi all,

I’m working on providing real time analytics where I’m interested in keeping the execution time as minimum as possible. The data source comes from a hadoop cluster where the execution at the moment takes about 7 minutes. I have used nested functions all the way. Given that I cannot increase the cluster performance further, Would the execution time be brought down if I replace the nested functions using a %>% (Read as Piping Operator offered by dplyr and magrittr package).?


It depends! How efficient the library code written. It should be easy enough to compare on your own, just write the nested function and repeat the same using the pipe operator. Iterate each alternative 1000 times and check the timings.