Analysis of Wine Quality dataset



I have a Dataset which explains the quality of wines based on the factors like acid contents, density, pH, etc. I am attaching the link which will show you the Wine Quality datset. According to the dataset we need to use the Multi Class Classification Algorithm to Analyze this dataset using Training and test data. Please correct me if I am wrong?

Wine_Quality.csv Dataset

Also I have used Principal Component Analysis Algorithm to work with this dataset. Below is the code I have used :-

# -*- coding: utf-8 -*-
Created on Sun Aug 26 14:14:44 2018

@author: 1022316

# Wine Quality testing
#Multiclass classification - PCA

#importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

#importing the Dataset
dataset = pd.read_csv('C:\Machine learning\winequality-red_1.csv')
X = dataset.iloc[:, 0:11].values
y = dataset.iloc[:, 11].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

#Applying the PCA
from sklearn.decomposition import PCA
pca = PCA(n_components = 2 )
X_train = pca.fit_transform(X_train)
X_test = pca.fit_transform(X_test)
explained_variance = pca.explained_variance_ratio_

# Fitting Logistic Regression to the Training set
#from sklearn.tree import DecisionTreeClassifier
#classifier = DecisionTreeClassifier(max_depth = 2).fit(X_train, y_train)
#y_pred = classifier.predict(X_test)

#classifier = LogisticRegression(random_state = 0), y_train)

#Fiiting the Logistic Regression model to the training set
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0), y_train)

#Predicting thr Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

Please let me know if I am using the correct algorithm of this dataset. Also, as I can see we have 9 classes in which this dataset will be divided. Please also let me know how will I visualize and plot the data accordingly in different classes.

I will be attentive to the reply.

Vishabh Goel

winequality-red.csv (82.2 KB)