GLM predict probabilities in R

Hi,
I trained a glm model for advertising bids and predict click/imp or conversion, after I fit the model, the predict get the following pobabilities/frequencies on 6 levels 1 2 3 4 5 6

1.910270e-05 2.063908e-05 3.281303e-08 1.648556e-05 3.979453e-07 4.979762e-06

My glm family = binomial(link=‘logit’)
There’s many features/variables, but the result seems too small, how to improve it?
What’s the reason for it?

Thanks!
R.

Hi

Attach the complete code to guide you.

Regards,
Tony

Here it is:

library(readr)
conv.20131022 <- read_csv(“conv.20131022.csv”)
conv.20131023 <- read_csv(“conv.20131023.csv”)
clk.20131022 <- read_csv(“clk.20131022.csv”)
clk.20131023 <- read_csv(“clk.20131023.csv”)
imp.20131022 <- read_csv(“imp.20131022.csv”)
imp.20131023 <- read_csv(“imp.20131023.csv”)
bid.20131022 <- read_csv(“bid.20131022.csv”)
bid.20131023 <- read_csv(“bid.20131023.csv”)
strName <- “Bid_ID,Time,Log_Type,unique_ID,User-Agent,IP,Region_ID,City_ID,Ad_Exchange,URL,Landing_Page_URL,Ad_Slot_Format,Ad_Slot_Width,Ad_Slot_Height,Ad_Slot_Visibility,Ad_Slot_Floor_Price,Ad_Slot_ID,Creative_ID,Bidding_Price,Advertiser_ID”
Names <- strsplit(strName, “,”)[[1]]
names(conv.20131022) <- Names
names(conv.20131023) <- Names
names(clk.20131022) <- Names
names(clk.20131023) <- Names
names(imp.20131022) <- Names
names(imp.20131023) <- Names
names(bid.20131022) <- Names
names(bid.20131023) <- Names
conv.20131022$date <- as.Date(‘10/22/2013’,’%m/%d/%Y’ )
conv.20131023$date <- as.Date(‘10/23/2013’,’%m/%d/%Y’ )
clk.20131022$date <- as.Date(‘10/22/2013’,’%m/%d/%Y’ )
clk.20131023$date <- as.Date(‘10/23/2013’,’%m/%d/%Y’ )
imp.20131022$date <- as.Date(‘10/22/2013’,’%m/%d/%Y’ )
imp.20131023$date <- as.Date(‘10/23/2013’,’%m/%d/%Y’ )
bid.20131022$date <- as.Date(‘10/22/2013’,’%m/%d/%Y’ )
bid.20131023$date <- as.Date(‘10/23/2013’,’%m/%d/%Y’ )

bid<-rbind(bid.20131022,bid.20131023)
imp<-rbind(imp.20131022,imp.20131023)
clk<-rbind(clk.20131022,clk.20131023)
conv<-rbind(conv.20131022,conv.20131023)

bid[bid$Bid_ID %in% conv$Bid_ID , ]$Log_Type <- 3
bid[bid$Bid_ID %in% clk$Bid_ID , ]$Log_Type <- 2
bid[bid$Bid_ID %in% imp$Bid_ID , ]$Log_Type <- 1
bid[bid$Ad_Slot_Visibility %in% ‘Na’,]$Ad_Slot_Visibility<-‘OtherView’

set.seed(123)
w<- nrow(bid)
length(w)
q<- sample(w,1500000)
bidd<-bid[q,]
n<-nrow(bidd)
t<-sample(n,n/2)
train<-bidd[t,]
test<-bidd[-t,]
table(train$Region_ID)
lrow<-unique(train[train$Log_Type==2,]$City_ID)
train$hotzone<-0
train[train$City_ID %in% lrow,]$hotzone<-1
test$hotzone<-0
test[test$City_ID %in% lrow,]$hotzone<-1
nraw<-unique(train[train$Log_Type==2,]$Region_ID)
train$hotregion<-0
train[train$Region_ID %in% nraw,]$hotregion<-1
test$hotregion<-0
test[test$Region_ID %in% nraw,]$hotregion<-1
trainx<- train[train$hotregion ==1,]
testx<- test[train$hotregion ==1,]
trainx$Log_Type<- as.factor(trainx$Log_Type)
trainx[trainx$Ad_Slot_Floor_Price %in% ‘Na’,]$Ad_Slot_Floor_Price<-“Pop”
testx[testx$Ad_Slot_Floor_Price %in% ‘Na’,]$Ad_Slot_Floor_Price<-“Pop”
gglm<- glm(Log_Type ~ Time+Region_ID+City_ID+hotregion+Creative_ID+Bidding_Price+Ad_Slot_Floor_Price+Ad_Slot_Width+Ad_Slot_Height+Advertiser_ID, data=trainx, family = binomial(link=‘logit’), maxit=100)
summary(gglm)
gglit<-predict(gglm,data.frame(testx),type = ‘response’)

When I run gglit:
gglit
1 2 3 4 5 6
5.826215e-11 5.826215e-11 5.826215e-11 5.826215e-11 5.826215e-11 5.826215e-11

I tried predict clicks among bids, and conversions among so many bids, the results are similar. But when I use logistic regression, it gives me better (bigger than glm on r), I also tried just one predictor, still same.

Thanks!

Anyone around this site?

© Copyright 2013-2019 Analytics Vidhya