R text navie byes same values for class posterior probability


#1

Hi,

We are attempting text classification using R navie byes. For a data set it returns same values for class posterior probability. Actually it is only calculating class prior probability.

Below is code and training data

#Code

library(‘log4r’)
#logReset()
#basicConfig()
#addHandler(writeToFile, logger=“RML”, file=“D:/Rnlp.log”, level=‘DEBUG’)
#with(getLogger(), names(handlers))
#loginfo(‘test %d’, 1)
#RML <- create.logger(logfile = ‘C:/software/absa/textmining/nlp/logs/RML.log’, level = “DEBUG”)
RML <- create.logger(logfile = ‘D:/absa/textmining/nlp/logs/RML.log’, level = “DEBUG”)

computeNavieByes=function(trainingDataPath,testData,isTrainingMode) {
debug(RML,‘start compute naviebyes’)
out <- tryCatch(
{
library™
library(e1071)

testDataTokens <-unlist(strsplit(testData, “[,]”))
dataText<-read.csv(trainingDataPath,header= TRUE)
trainvector <- as.vector(dataText$Text)
trainsource <- VectorSource(trainvector)
traincorpus <- Corpus(trainsource)

#REMOVE STOPWORDS
traincorpus <- tm_map(traincorpus,stripWhitespace)
traincorpus <- tm_map(traincorpus,tolower)
traincorpus <- tm_map(traincorpus, removeWords,stopwords(“english”))
traincorpus<- tm_map(traincorpus,removePunctuation)
traincorpus <- tm_map(traincorpus, PlainTextDocument)

CREATE TERM DOCUMENT MATRIX

trainmatrix <- t(TermDocumentMatrix(traincorpus))
model <- naiveBayes(as.matrix(trainmatrix),as.factor(dataText$Category))
col1 <- c()
index <- 1
resultsColl <- vector()
for (valueToken in testDataTokens)
{
col1[1] <- valueToken
dataTest <- data.frame(“col1”=col1)
testvector <- as.vector(dataTest)
testsource <- VectorSource(testvector)
testcorpus <- Corpus(testsource)
testcorpus <- tm_map(testcorpus,stripWhitespace)
testcorpus <- tm_map(testcorpus,tolower)
testcorpus <- tm_map(testcorpus, removeWords,stopwords(“english”))
testcorpus<- tm_map(testcorpus,removePunctuation)
testcorpus <- tm_map(testcorpus, PlainTextDocument)

testmatrix <- t(TermDocumentMatrix(testcorpus))
print(valueToken)
results<-predict(model, as.matrix(testmatrix),type="raw")
print(class(results))
print(typeof(results))
print(results)

#resultsColl[index] <- "hello world"
resultsColl[index] <- toString(results)
index <- index +1

debug(RML,'valueToken')
debug(RML,valueToken)

#print(valueToken)
#debug(as.character(results))

}
return (resultsColl)

},
error=function(cond)
{
error(RML,cond)
},
warning=function(cond)
{
warn(RML,cond)
return(cond)
},
finally={
}
)
debug(RML,‘end compute naviebyes’)
return(out)
}

testing

result<- computeNavieByes(“D:/axa/TrainNavieByes.csv”,“suspend suspend,smuggler smuggler”,“N”)
print(result)

Training data
Text Category
laundering laundering laundering Money laundering
tax evasion tax evasion Money laundering
bank fraud Money laundering
terrorist terrorist terrorist terrorist Terrorist Financing
arms arms Terrorist Financing
weapon weapon Terrorist Financing
bribe bribe bribe bribe bribe Bribery and Corruption
corrupt corrupt corrupt Bribery and Corruption
kickback kickback Bribery and Corruption
fraud fraud fraud fraud Fraud and Regulatory Breaches
convict convict Fraud and Regulatory Breaches
breach breach Fraud and Regulatory Breaches

Thanks


Naive Byes text classification gives different result from hand computed
#2

Hi

did you try with type=“raw” in the call the NaiveBayes()? This should solve your issues.
Alain


#3

Hi - type = raw give same value -> prior class probability -> 1/4 -> 0.25. Do we need multinominal navie byes?


#4

For unseen data - it always returns prior class probability. For seen data (data present in training) will it take into consideration word frequency? Can I have an example using R multinomial belief network?
thanks


R text classificaton bayesian network
#5

Naive byes text classification seems to be incorrect. Classification is different from hand computed. Please use above code.blogpostnb1.zip (1.2 KB)

[Training Data]
Text,Category
laundering laundering laundering,Money laundering
bankfraud bankfraud,Money laundering
terrorist terrorist terrorist terrorist,Terrorist Financing
weapon weapon,Terrorist Financing
bribe bribe bribe bribe bribe,Bribery and Corruption
corrupt corrupt corrupt,Bribery and Corruption

[Test data]
laundering terrorist terrorist

Attached are results from R and naive byes