Guys Can anybody please tell me what are basic questions related to multiple regression, logistic regression,factor analysis and cluster analysis can be asked in any analytics related interview. Please help me out. I just need list of questions. Please waiting reply. Post is related to analyst position.
This question is very open ended. Can you define the scope of this question? Here are a few things you should go prepared :
- Basic Assumptions of a Linear Regression Model.
- How do we test these assumptions?
- What is multi-collinearity? How does it affect a regression model? How to spot multicollinearity in a regression model?
- How do we solve for multi-collienarity?
Clustering is rarely asked in analytics interviews but in case you still want to tighten on your belt in this zone, you should know how do we find the appropriate number of clusers in a cluster analysis. How do you judge whether you have good clusters or not.
Hope this helps. Try to be more specific in your questions to get more accurate answers.
Thanks tavish sir for reply so fast. The scope of the question is any basic questions related to multiple regresssion, logistic regression, factor analysis. You have vast years of experience so just tell me if you are in place of a interviewer what kind of questions will you ask to a person who just want to get into analytics industry.
Waiting for your reply
If santu_rcc014 doesn’t mind my asking a somewhat related question: why are questions on clustering rarely asked? I thought it was fairly important topic.
It really depends on the job requirement. Clustering is a very important concept but is only useful if you have through business domain knowledge. A statistically built cluster with no business inference is of no use. Given that clustering is coupled strongly with domain knowledge (which is not common in less than 3-4 years of experience), they are generally not the deal maker/breaker questions in interviews (at least in context of Indian analytics jobs). However, this in no way indicates that clustering is not used in industry but in my opinion Regression models / time series models are more common subject in interviews.
Questions I mentioned in my last comment should be a good starting point for preparing some basic/frequently asked questions in interview.
Very helpful, thanks.
just now I came across a new thing named heteroscadasticity related to multiple regression model. Can Tavish sir or any give me a suitable link to understand it in layman point of view. Can the interviewer ask questions related to factor analysis?
Waiting for answer
Anybody has any answer related to that? Tavish or Kunal sir waiting for you people’s valuable input
This is a very important term & assumption of OLS(ordinary least squares) used in Linear Regression.
The assumption is that the error for each data point is independent of each other(this is called homoskedasticity) and unbiased.
So if you plot a scatter plot of the residuals vs predicted values,there should not be any pattern that you see,like a funnel shaped plot would mean increasing values of error terms.
Imagine we have data on family income and spending on luxury items. Using bivariate regression, we use family income to predict luxury spending (as expected, there is a strong, positive association between income and spending). Upon examining the residuals we detect a problem – the residuals are very small for low values of family income (families with low incomes don’t spend much on luxury items) while there is great variation in the size of the residuals for wealthier families (some families spend a great deal on luxury items while some are more moderate in their luxury spending). This situation represents heteroscedasticity because the size of the error varies across values of the independent variable.
This can sometimes be overcome by transforming the dependent variable in some way(ex. log transformation) but this condition of homoskedasticity must not be violated while performing linear regression.
I hope I could be of some help in understanding this complex topic.!!
What are the problems of Auto-correlation & Heteroskedasticity in regression analysis?
Thanks shuvayan for your awesome explanation really it helps me to understand that. Now I have another question which comes to my mind is residual and error and noise all three terms are same in regression analysis point of view or they are different. Please explain me in details from layman point of view.
Thanks once again for helping me out.
Glad I could be of help.I am not quite sure about the residual & error thing but according to my understanding:
When we use linear regression we try to fix an estimated line(E)
through the data which is as close as possible to the true line(T)
or the best fit line for the data.
We do not know anything about the true line(that is why we try to estimate it),
and the variations of the data from it.These unobservable variations
are called errors.
But we do get the fitted line after running the model,and the variation of each data point from E is called residual.
Hey Shuvayan After you said the thing I refer some other material I came to know that people are defining
Error= variation of data from its true line
Residual= variation of data from its estimated line
I have dilemma if we know the true line(ideally which we dont know) then all the data points should exactly lie on the true line that means sum of variations from its data should be zero right? but in practical it is not Why?
@kunal sir @tavish sir or any other member in this group please explain
Please see this image
As you can see,though linear regression assumes a linear relationship,in reality,the data does not fall in a straight line but is scattered around it.So the variations are never zero,but the algorithm finds a line such that the variations are minimized.
there only my doubt how do u construct the True straight line ? If you know prior that true line should not be inear then why do u construct it as linear let the true line be quadratic or cubic etc and estimated line be linear. Hope you understand my doubt
Shuvayan Things are not clear because as we all know the definition of error is variation of data from its true line right? so how do we say that error has mean 0 and variance sigma square. If we dont know its value how do we estimate it.
Can any body there in this community who can clear my doubt. Please waiting for your help
Can you please briefly answer on all above specific that you formed or is there any link in AV where same are already explained.
One question that I have been asked multiple times:
When you will use Normal form (equation method) vs Gradient Descent method and why?
hope this helps !