Validating custom variables to get better estimate of a parameter I am trying to compare




I am currently facing a challenge of comparing a metric (y) across different times (x). But there is a parameter ‘z’ which affects the calculation of ‘y’. How to bring ‘z’ across different ‘x’ to a common reference level so that ‘y’ can be compared across ‘x’.

I have two ideas:

  1. Either create a custom metric out of 'y and ‘z’ and compare. If yes, how do I validate the invention of this new metric?
  2. Find out a threshold for ‘x’ so that only the ‘y’ values corresponding to above that ‘x’ value would be considered and compared ignoring the outliers.

Which one to go with?



The first option , like creating one more variable which is a mix of, say x and z.
When there is a strong co relation between x and z, z can not be ignored.


Thanks Malathi!

Would like to read some theory on feature creation criteria before creating a hyperparameter out of the two existing parameters. Any resources you would suggest?


This article may help you understand about feature engineering.


Hi @akshay.kotha
You face a problem of conditional probability it seems. You have to build Y for X given Z, therefore multiple models based on the conditional probability. You could check the interaction for example if linear type of model you have X*Z if it not the same as conditional but you will see if the interaction is significant if significant what will be the coefficient. If z is categorical then every thing is simpler and you use ANOVA type,

Hope this help a bit.