Assign weights and parameterize input variables using R

data_science

#1

Hi Friends,

I have few queries which is listed below.

  1. How to assign weights to variables While building a model. Say I have four input variables(X1,X2,X3,X4) and one output variable (Y). I want to assign weights( X1=50%,X2=20%,X3=10%,X4=20%). Can someone explain how to do this using R?
  2. Is it possible to parameterize the weights of input variable on a deployed R model. Say the user feeds in the weights of variables through a console(Code should not be modified each and every time(no manual intervention)when the user alters the weight). One more example would be the user might specify the number of clusters (Say K=5) and dynamically the values has to be passed to R model and new clusters should be displayed.
  3. Say my bank has 1000 customers, and I have formed 4 clusters of 250 customers in each luster. When a new customer is added, should I re-run the entire cluster on all(1000+1 =1001 customers)? Or is they other ways of doing it.

Thanks,Rajaram


#2

Hi @Rajaram1986,

  1. There isn’t really a straightforward way of weighting features, at least that I know of. It’s something that depends on the kind of learning algorithm you use. One way to add weighting is to choose an algorithm that depends on distance measures, like KNN or SVM. Normalizing all features and then scaling them with the appropriate weights will achieve the desired result. There are also implementations that have built-in weighting, like XGBoost.

  2. As mentioned above, the weighting will just be a matter of feeding a vector of weights, so there is no need for it to be hardcoded. Same goes for number of clusters, see the following example: http://syskall.com/kmeans.js/

  3. It depends on what you want to do, but usually it doesn’t make sense to recompute the clusters for a single new entry. I’d just compare the new point to the previously computed centroids and assign it to a cluster. After some criteria, say for every 10% increase in customers, or every week, etc. you can re-run the clustering to update it.