Converting loop to apply function

r
apply
function
loop
speed

#1

Hi All,

Currently I have written a code using for loop to get the number of values which are less than certain quantiles of that column. How can I replicate the same loop to a apply function so that my processing gets faster. Below is the code.

Code
___________________________________________________
set.seed(1729)
temp <- data.frame(groups=c(1,2),value1=rnorm(12),value2=rnorm(12))

# Number of rows and columns
ngroup<-length(unique(temp[,1]))
iteration=ncol(temp)-1

#Default Table
Table1<- data.frame(matrix(0, nrow=ngroup, ncol=(3*iteration)+1))
p<-colnames(temp)[2:ncol(temp)]
q<-c("0.25","0.5","0.75")
colnames(Table1)=c("Groups",as.vector(t(outer(p, q, paste, sep="-"))))

# Editing Table with counts
for(i in seq(from=1, to=ngroup, by=1)){
Table1[i,"Groups"]<-i
}

for(j in seq(from=1, to=ngroup, by=1)){
  
  for(i in seq(from=2, to=(3*iteration)+1, by=1)){
    
    namecol<-colnames(temp)[ceiling((i-1)/3)+1]
        if ((i%%3)==2){
        quant<-quantile(temp[,ceiling((i-1)/3)+1],probs = as.numeric(q[1]))
        }
        else if ((i%%3)==0){
            quant<-quantile(temp[,ceiling((i-1)/3)+1],probs = as.numeric(q[2]))
        }
        else{
            quant<-quantile(temp[,ceiling((i-1)/3)+1],probs = as.numeric(q[3]))
        }

query<-sprintf("select count(%s) from temp where groups=%s and %s< %s",namecol,j,namecol,quant)
Table1[j,i]<-sqldf(query)

  }
  
}


print(Table1)
________________________________________________________________________

Regards,
Surya


#2

Hi @Surya1987

As far as I could understand, for doing computations on columns you can use sapply function. Nested for loops take lot of time in execution. This method is much faster.
Let’s say I want to select observations per column with quantiles < 50%, it can be done as:

#set.seed(1729)
temp <- data.frame(groups=c(1,2),value1=rnorm(12),value2=rnorm(12))

#logical output
cols <- sapply(temp, function(x) quantile(x) < 0.50)

#subset 
newdata <- temp[cols,]

In case of high dimensional data sets, you can use parallel functions like parSapply, foreach etc.


#3

Hi Manish,

Yes you are right saying that Execution time will be more. So that is why I am trying to run it using sapply. However I need to understand how can I convert the above code.

I need the output in a certain way which you can get by running the code.

In mean time I am also trying.

Thanks for your quick reply.