How to impute or do r bind lists in for loop?

r

#1

Hi Friends,

I am encountered with one problem I guess it is easy for an experienced person but I am unable to solve it.
When I run the code outside for loop, I get desired output.

s <- stringdist::stringdistmatrix(stri_extract_first_words(talent[,2][1]),company[,1], method = "jaccard")
d <- company[apply(s,1, which.min),]

s <- stringdist::stringdistmatrix(stri_extract_first_words(talent[,2][2]),company[,1], method = "jaccard")
e <- company[apply(s,1, which.min),]

s <- stringdist::stringdistmatrix(stri_extract_first_words(talent[,2][3]),company[,1], method = "jaccard")
f<- company[apply(s,1, which.min),]

s <- stringdist::stringdistmatrix(stri_extract_first_words(talent[,2][4]),company[,1], method = "jaccard")
g <- company[apply(s,1, which.min),]

s <- stringdist::stringdistmatrix(stri_extract_first_words(talent[,2][8]),company[,1], method = "jaccard")
h <- company[apply(s,1, which.min),]

full <- rbind(d,e,f,g,h)
full

Output:
title location_name_list
42 Job one Louisville, KY, United States
14 job San Francisco, CA, United States
421 Job one Louisville, KY, United States
141 job San Francisco, CA, United States
422 Job one Louisville, KY, United States

But Now I use for loop for it I am not getting how to get this output
I tried rbind lists, impute data row wise in matrix etc.
But Didn’t get desire output as above

For loop code:

d <- NULL
for (i in 1:length(talent[,1])) {
  s <- stringdist::stringdistmatrix(stri_extract_first_words(talent[,2][i]),company[,1], method = "jaccard")
  d[i] <- company[apply(s,1, which.min),]
  full <- rbind(d[i-1], d[i])
} 

can anyone please help me how can I impute data in data frame or in matrix?

Thanks in Advance


#2

Hi
I Found the reason why I am getting error. Because when loops get NULL value its through error.

Now can Someone please tell me how can I skip NULL value when I use stringdistmatrix function?

It’s main Issue


#3

Hi @premsheth

When you use for loop, you have used full <- rbind(d[i-1], d[i]) inside the loop, which binds only the current and previous values.


#4

Thanks for reply,

But I full already inside loop
I modified code and it works until string comes:

`for (i in 1:length(talent[,1])) {`
  `s <- stringdist::stringdistmatrix(stri_extract_first_words(talent[,2][i]),company[,1], method = "jaccard")`
  `d[i,] <- (company[apply(s,1, which.min),])`

But when NULL or missing value comes it will give me following error:
x[[jj]][iseq] <- vjj : replacement has length zero

can you please tell me how can I skip NULL value?


#5

Hi @premsheth

You can use if-else statement before your for loop. where you can specify that if the value is null, increase i (do not enter for loop).


#6

@AishwaryaSingh
Thank you for good suggestion

Still same error actually there is no data means its white space if I import .csv with na.string = “NA” still its not working.

I think I need to remove all missing values then I can apply stringdistmatrix() function. otherwise function count distance NA and when I extract minimum distance its give me error.


#7

Hi,

I solved half problem, I create distance matrix using stringdistmatrix.

Now I want to subset minimum distance from distance matrix and fetch original data ( wherever distance is less need original data from data frame).

I am trying to do as follows:
library(stringdist)
library(stringi)

talent <- read.csv("/Users/imac086/Desktop/Premal_scripts/Test_talent.csv", header=TRUE,na.strings = "NA")
talent <- talent[,c(5,8,28)]

company <- read.csv("/Users/imac086/Desktop/Premal_scripts/Test_company1.csv",header=TRUE)
company <- company[,c(2,9)]

s <- matrix(0,nrow = length(talent[,1]), ncol = length(company[,1]))

for (i in 1:length(talent[,1])) {
  s[i,] <- stringdist::stringdistmatrix(stri_extract_first_words(talent[,2][i]),company[,1], method = "jaccard")
}

d <- data.frame(title = as.character(), location = as.character() ) 
for (i in 1:length(company[,1])) {
  d[i,] <- company[lapply(s[,i],which.min),]
}

s matrix dimension is 14388X511
talent dataframe dimension is 14388X3
Company dataframe dimension is 511x3

I want to fetch original data from company dataframe.

Error:
if I use lapply it give me following error:
Error in[.default(xj, i) : invalid subscript type 'list'
OR
if I use apply function it gives me following error:
Error in apply(s[i, ], 1, which.min) : dim(X) must have a positive length


#8

Hi Friends,

Problem is resolved and solution as below:

# Inializate empty data frame
df <- NULL

# Extract minimum distance and store row wise in new dataframe
for (i in 1:length(talent[,3])) {
  df <- rbind(df,company[which.min(s[i,]),])
}

Thank you