I have a Dataset which explains the quality of wines based on the factors like acid contents, density, pH, etc. I am attaching the link which will show you the Wine Quality datset. According to the dataset we need to use the Multi Class Classification Algorithm to Analyze this dataset using Training and test data. Please correct me if I am wrong?
Also I have used Principal Component Analysis Algorithm to work with this dataset. Below is the code I have used :-
``` # -*- coding: utf-8 -*- """ Created on Sun Aug 26 14:14:44 2018 @author: 1022316 """ # Wine Quality testing #Multiclass classification - PCA #importing the libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd #importing the Dataset dataset = pd.read_csv('C:\Machine learning\winequality-red_1.csv') X = dataset.iloc[:, 0:11].values y = dataset.iloc[:, 11].values # Splitting the dataset into the Training set and Test set from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0) # Feature Scaling from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test) #Applying the PCA from sklearn.decomposition import PCA pca = PCA(n_components = 2 ) X_train = pca.fit_transform(X_train) X_test = pca.fit_transform(X_test) explained_variance = pca.explained_variance_ratio_ # Fitting Logistic Regression to the Training set #from sklearn.tree import DecisionTreeClassifier #classifier = DecisionTreeClassifier(max_depth = 2).fit(X_train, y_train) #y_pred = classifier.predict(X_test) #classifier = LogisticRegression(random_state = 0) #classifier.fit(X_train, y_train) #Fiiting the Logistic Regression model to the training set from sklearn.linear_model import LogisticRegression classifier = LogisticRegression(random_state = 0) classifier.fit(X_train, y_train) #Predicting thr Test set results y_pred = classifier.predict(X_test) # Making the Confusion Matrix from sklearn.metrics import confusion_matrix cm = confusion_matrix(y_test, y_pred) ```
Please let me know if I am using the correct algorithm of this dataset. Also, as I can see we have 9 classes in which this dataset will be divided. Please also let me know how will I visualize and plot the data accordingly in different classes.
I will be attentive to the reply.
winequality-red.csv (82.2 KB)