When should use the ":=" operator in R?




Data.table objects can also use operator “:=”. Please help me to understand, when should we use this operator and what are the benefits of this operator over assignment operator?

Is it faster and when we should not use this?



from http://stackoverflow.com/questions/7029944/when-should-i-use-the-operator-in-data-table


an example showing 10 minutes reduced to 1 second (from NEWS on homepage). It’s like subassigning to a data.frame but doesn’t copy the entire table each time.

m = matrix(1,nrow=100000,ncol=100)
DF = as.data.frame(m)
DT = as.data.table(m)

system.time(for (i in 1:1000) DF[i,1] <- i)
user system elapsed
287.062 302.627 591.984

system.time(for (i in 1:1000) DT[i,V1:=i])
user system elapsed
1.148 0.000 1.158 ( 511 times faster )
Putting the := in j like that allows more idioms :

DT[“a”,done:=TRUE] # binary search for group ‘a’ and set a flag
DT[,newcol:=42] # add a new column by reference (no copy of existing data)
DT[,col:=NULL] # remove a column by reference
and :

DT[,newcol:=sum(v),by=group] # like a fast transform() by group


:= Assignment by reference
Fast add, remove and modify subsets of columns, by reference.

DT[i, LHS:=RHS, by=…]

DT[i, c(“LHS1”,“LHS2”) := list(RHS1, RHS2), by=…]

DT[i, :=(LHS1=RHS1,


…), by=…]

set(x, i=NULL, j, value)