Bangalore Hackathon - 14th June 2015

bengaluru
hackathon

#1

Welcome Bengaluru to our second hackathon!

Like previous hackathon, we have a very interesting problem for you to work upon. I am sure most of you would have used Uber or Ola in recent times. Let us take them to Portugal today.

Here is the complete problem statement:

You need to predict the destination for each trip which starts. The beauty of the problem is that there are a lot of avenues to create awesome mind blowing visualizations to understand the problem.

So, get going hackers! Explore…


#5

Hi,

I have started participating this problem from my current location because I am not in Bengaluru today. While started solving this problem, my initial hypothesis was that it will vary over time and after that I have visualize it. You can look at the below result.

Early Morning

Morning

Noon

Late Afternoon

Now, I will perform detailed analysis for only those timings those are available in Test dataset.

Test Dataset timing

Please share if you have any finding.

Regards,
Imran


#6

Below are the code of above finding…

install.packages('rjson')
install.packages('data.table')
install.packages('Rcpp')


library(rjson)
library(data.table)

### Control the number of trips read for training (all=-1)
### Control the number of closest trips used to calculate trip duration
# N_read <- 100000
# N_trips <- 1000
N_read <- -1
N_trips <- 10000

### Get starting & ending longitude and latitude
get_coordinate <- function(row){
  lonlat <- fromJSON(row)
  snapshots <- length(lonlat)  
  start <- lonlat[[1]]
  end <- lonlat[[snapshots]]
  return(list(start[1], start[2], end[1], end[2], snapshots))
} 
a
### Get Haversine distance
get_dist <- function(lon1, lat1, lon2, lat2) {  
  lon_diff <- abs(lon1-lon2)*pi/360
  lat_diff <- abs(lat1-lat2)*pi/360
  a <- sin(lat_diff)^2 + cos(lat1) * cos(lat2) * sin(lon_diff)^2  
  d <- 2*6371*atan2(sqrt(a), sqrt(1-a))
  return(d)
}


setwd('/Users/Yashwanth/Desktop/Imran/Taxi')
train <- fread('/Users/Yashwanth/Desktop/Imran/Taxi/train.csv', stringsAsFactors=F, nrows=N_read)
test <- fread('/Users/Yashwanth/Desktop/Imran/Taxi/test.csv',  stringsAsFactors=F)


train <- train[POLYLINE!='[]']
train <- train[DAY_TYPE =='A']


#Getting the start and end points of each trip
train[, r:=-seq(.N, 1, -1)]
train[, c('lon', 'lat', 'lon_end', 'lat_end', 'snapshots'):=get_coordinate(POLYLINE), by=r]


trainxy <-  as.data.frame(train)
trainxy$time  <- 0
trainxy$time <- as.POSIXct(as.numeric(trainxy$TIMESTAMP), origin="1970-01-01")

trainxy$date <- strptime(trainxy$time, "%Y-%m-%d")
trainxy$day <- weekdays(trainxy$date)
trainxy$time <- strptime(trainxy$time, "%Y-%m-%d %H:%M:%S")
trainxy$time <- (as.numeric(format(trainxy$time, "%H"))*60) +as.numeric(format(trainxy$time, "%M"))


install.packages("MASS")
library(MASS)
install.packages("ggplot2")


### plotting a few cases

trainemorn <- trainxy[trainxy$time > 530 & trainxy$time < 590,]
trainemorn <- trainemorn[sample(1:nrow(trainemorn), 10000,replace=FALSE),]
m <- ggplot(trainemorn, aes(x = lat_end, y = lon_end))  + geom_point() + xlim(41.10821, 41.25) + ylim(-8.692272, -8.534187)
m + stat_density2d(aes(fill = ..level..), geom="polygon")

trainmorn <- trainxy[trainxy$time > 800 & trainxy$time < 860,]
trainmorn <- trainmorn[sample(1:nrow(trainmorn), 10000,replace=FALSE),]
m <- ggplot(trainmorn, aes(x = lat_end, y = lon_end))  + geom_point() + xlim(41.10821, 41.25) + ylim(-8.692272, -8.534187)
m + stat_density2d(aes(fill = ..level..), geom="polygon")

trainaft <- trainxy[trainxy$time > 1170 & trainxy$time < 1230,]
trainaft <- trainaft[sample(1:nrow(trainaft), 10000,replace=FALSE),]
m <- ggplot(trainaft, aes(x = lat_end, y = lon_end))  + geom_point() + xlim(41.10821, 41.25) + ylim(-8.692272, -8.534187)
m + stat_density2d(aes(fill = ..level..), geom="polygon")

traineve <- trainxy[trainxy$time > 1270 ,]
traineve <- traineve[sample(1:nrow(traineve), 10000,replace=FALSE),]
m <- ggplot(traineve, aes(x = lat_end, y = lon_end))  + geom_point() + xlim(41.10821, 41.25) + ylim(-8.692272, -8.534187)
m + stat_density2d(aes(fill = ..level..), geom="polygon")

m <- ggplot(trainxy[trainxy$time > 1270 & trainxy$time < 1330,], aes(x = lat_end, y = lon_end)) + geom_point() + xlim(41.10821, 41.25) + ylim(-8.692272, -8.534187)

Hope this help


#7

Excellent actually it would have been better to post it to github repository and post the link in this discussion with a brief description.
Better still would be that the Organisers would have created a github repository and add folders for each teams who participated. The Organisers could then rank the teams based on some criteria.
I do hope this good initiative evolves into something big.


#8

My suggestions is that the problem and team building should happen online 10 days before the event. A Github repository could be made for the event.
On the day of the Hackathon it should be just code walk through and presentations by individual teams.
Also it would be great if the Organizers tap actual end users who will have a stake in this.
There is a frustration that we are not able to present deliverables during the event.


#9

Thank Ramesh. Some good thoughts there.

We will bring them in the format in coming months.

Regards,
Kunal