Steps to be Followed when Analyzing the datasets



Hi, I am new to the field of Data science and I am learning Data Analytics stuffs of my own, I gained some good foundational knowledge on R Programming and some data analytics concepts like Descriptive Statistics, Inferential Statistics etc, now I thought of Analyzing one Health care related Dataset on my own and here is the Processes that I followed for the Purpose of Analysis that I have listed below

  1. Collected One Raw Dataset from website called Haberman Survival Data
  2. Then I Cleaned the dataset since many columns were merged one
    another and then I cleaned those data by separating and doing some more data cleaning work like converting the data types of each column with respect to those variables like numeric, character, factor since in the description of the dataset I found what the variables represent based on that I have converted those data types.
  3. Then I did some Exploratory data analysis Stuffs by plotting different plots like Histogram to know the kind of distribution with each variables and then Boxplot to find the Outliers and then scatterplot to determine the relationship between variables.
  4. Then I found the central Tendency on each variables(mean, median, mode) on each variables.
    Now I am Stuck here, how should I proceed here?
    Is the way that I approach a dataset Is correct?
    what are all the Process that would be followed by a data scientist when analyzing the data?
    Though I know the tool and statistics concepts I don’t know the correct procedure in Analyzing the data set so please kindly help me
    Correct me If I wrong in any of these Process


Hi @shivanesh_kumar

first what do you try to solve? it is not clear from your question. If you have not defined your objective them anything will be good. So point 1 is define and frame your objective high level is it a regression or one classification. Then what could be the outcome distribution or minority majority in case of classification, Then you can start to test do I have continuous covariates or categorical will they be similar in test set. After those few steps you can go with the preliminary EDA and then pick up the best model considering the objective and variables.
Hope this hepl.
Best regards