K-Nearest Neighbors (KNN) Classification Problem

knn
r
machine_learning
predictive_model

#1

Dear Experts,
I am using KNN algorithm to predict Machine Failor. I have several sensor data & I have to predict machine fault. Here I uploaded the data-set to Dropbox & Code to Gist.

My problem is accuracy is very low. It is around 55-60% I need to improve this accuracy far more.

I’ll be highly grateful if someone could give me some hints

Thanks, Advance :slight_smile:

Dropebox link of Dataset : https://www.dropbox.com/s/tgaqfgm2gkl7i3r/maintenance_data_updated.csv

Source Code:


#2

Hi
Can you please elaborate what does the runtime signifies here. Just a bit confused if you have time stamp associated with the readings.
After looking at the data i think you need to first segregate the data as according to machine or plant.
So for this you have to do lot of data transformation, without proper data format no algorithm can give you desired results.
I hope this helps, feel free to ask more?
Thanks


#3

Hey, you cannot use knn for that specific problem because the positive cases are way too less in number compared to negative cases and that results in high bias.

The solution to your problem is to use Anomaly detection algorithms rather than classification algorithms.

Please “like” this answer if you think it solves your problem.


#4

Hi @SilverStone,
Thanks for your response.Here runtime means how long machine is running. Its unit is in week. But there is no timestamp.
I used correlation there found only 3 variable related to Defect. So I used this 3 only in the algorithm.
What suggestions for optimization? I’m open to using any other algorithm too.
Thanks Advance.


#5

Hi @iammangod96
Thanks for the response. Here is my total cases.
Defected NonDefected
397 603
There are several Anomaly detection algorithms. which one you would like to suggest ?
Thanks Advance.


#6

Hi @kafikhan , what i was trying say is, you need to subset your data-set because the data points are mixed with other machines and other plant, just try to capture the anomaly in particular machine or particular plant.
Then algorithm doesnt matter, it all depends on your data preparation.

image

image

What i mean is this segregate your data.
I hope this works, feel free to ask any question.
Thanks !