I created web scraper that would collect car data from a particular site. I collected little more than 1000 cars (different Audi models) and some information about them.
I wanted to use the dataset to predict the prices of cars. I know that there are different models that can be used, but for the beginning I wanted to do simple linear regression as that is a good way to actually learn and understand what one is doing - plugging everything in some NN or random forest might yield good results but I would get no understanding out of it as it would just be a black box which spits out the solution.
So, suggestions about different methods are welcome, but I would really like to solve this the simplest way possible or understand why it is not possible to do it that way.
I am providing dataset and complete code with all the explanations and procedure of what is being done and why.
This might also be a great material for other people just beginning as everything is explained in detail.
The problem is that predictions I get are really off and I am having a hard time understanding what I am doing wrong or if I am missing something. I am really looking forward to everyone’s feedback.
One of the following links should work. There you can see code with explanations and also dataset used.