How to predict Multiple Target using Linear Regression?

Hi, Can anyone please suggest/help me in the best easiest model to implement Multivariate Regression? I’m not looking for Multiple Regression which takes multiple independent variables/features and predicts 1 output but what I’m looking for is MULTIPLE OUTPUT/TARGET variables with Multiple Input/independent variables.

All I can see online is about Multiple Regression but couldn’t find any good examples of implementation for Multi Target Regression. I heard Tensorflow/Pytorch are used in this kind of problems. But, I don’t have practical knowledge in any of these advanced Deep Learning techniques yet. So, I would appreciate if any one can help me in this with any examples/pointers/videos etc
Thanks in adv!

Hi @kshethrajna,

You’d have to build multiple linear regression models in this case.


Multiple target regression is the term used when there are multiple dependent variables. If the target variables are categorical, then it is called multi-label or multi-target classification, and if the target variables are numeric, then multi-target (or multi-output) regression is the name commonly used.

Multi target regression(MTR) using Clustering and Decision trees.

Using trees for clustering, aka Predictive Clustering Trees (PCT)
We need to first take a look at Predictive Clustering Trees (PCT), which is the foundation on which decision trees for MTR are built on.

Please look at the following example.

Hope this is clear

Hi, Thanks for your reply. But, multiple linear regression models is the case when you have multiple independent variables but only 1 dependent variable. But, what am I looking for is the case with multiple independent and multiple dependent(numerical) variables.
in other words For ex: I have x1,x2,x3,x4,x5,x6 etc independent variables and my target variables are y1,y2,y3 and not just y. I hope am clear…I heard that Tensorflow/Pytorch have this algorithm but couldn’t find any details…if you have any insights in this kind of model, please share…

Hi, Thanks a lot for your update with info. I haven’t done any MTR so far and hence have no idea regarding PCT. But, I’ll try to dig down and see how far I can get this. On the other hand,I have had a quick glance at the github link you gave below and it looks little complicated by the first look. Do you have any code/link using any algorithm which is little easy to understand and implement?
Thanks again!!

Hi Kshethrajna,

Please specify your real life problem what you are predicting or classifying ?


Hi Tony,
First of all thanks a lot for your time and interest in trying to help me out. The real life problem is in related to a Pharma product by using ML for optimal product formulations. Basically, by using the “ingredients”, “batch” and “experiment data” we need to predict “quantitative” and “qualitative” product attributes like
Nutrient concentrations,
Shelf life(4 quarterly time based predictions, Nutrients, Sensory),
Physical stability and
Sensory attributes(color,flavor,separation)

In a one liner “Predict the product composition given the ingredient list of ~200 attributes”.
For ex: If I have the below ingredients, predict the product composition of each target attribute accordingly as what % or number each has in the final product.
Data inputs(x1-x200): Particle size, Volume based,Surface area based, carbohydrates, protein, oils,
mineral salts, emulsifier etc ~200 attributes like these)
Predicted O/P(y1-y13): Total solids %, protein %, fat%, Na mg/100g, Ca mg/100g, Cl mg/100g, Zn
mg/100g etc…like these about 13 attributes

Given the number of inputs and the number of targets that need to be predicted, I’m thinking of these 2 algorithms MLPRegressor or MultiOutputRegressor that might work for this problem. However, I have never done this multi target prediction earlier and not sure how to proceed further and also I have no neural_network knowledge. I read that MultiOutputRegressor works well but it does build models individually for each of the targets and not sure if it combines all of them and brings out as 1 model or not. If it does, then not sure how it does internally and what could be the trade offs or issues.
Ultimately, I want one final model that handles this case well and easily.

I would really appreciate if you can help me with your thoughts on this problem in regards to the best and easiest solution and possibly with some code.

Thanks a lot again!


Chk this

Jump in


setwd(“C:/R Packages/Kaggle”)


#Get the data from
normalize = function(x){return((x-min(x))/(max(x)-min(x)))
}, normalize))


concrete_model = neuralnet (strength +Cement ~ Slag+Ash+Water+Superplasticizer+ Coarse+Fineagg+age, data=concrete_train,hidden = c(5,3) )








please add your valuable inputs.

Hi Tony,
Thanks a lot for your reply. The code you’ve provided looks good but its in R although I was looking for Python with scikit-learn or MLRegressor/MultioutputRegresor with RandomForest or any other such algorithm which is easy to implement.

I would appreciate if you can share any sample code that is in python with any algorithms that can be used for my case. Thanks again!!


Chk the below

import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.multioutput import MultiOutputRegressor

Create a random dataset

rng = np.random.RandomState(1)
X = np.sort(200 * rng.rand(600, 1) - 100, axis=0)
y = np.array([np.pi * np.sin(X).ravel(), np.pi * np.cos(X).ravel()]).T
y += (0.5 - rng.rand(*y.shape))

X_train, X_test, y_train, y_test = train_test_split(
X, y, train_size=400, test_size=200, random_state=4)

max_depth = 30
regr_multirf = MultiOutputRegressor(RandomForestRegressor(n_estimators=100,
random_state=0)), y_train)

regr_rf = RandomForestRegressor(n_estimators=100, max_depth=max_depth,
random_state=2), y_train)

Predict on new data

y_multirf = regr_multirf.predict(X_test)
y_rf = regr_rf.predict(X_test)

Plot the results

s = 50
a = 0.4
plt.scatter(y_test[:, 0], y_test[:, 1], edgecolor=‘k’,
c=“navy”, s=s, marker=“s”, alpha=a, label=“Data”)
plt.scatter(y_multirf[:, 0], y_multirf[:, 1], edgecolor=‘k’,
c=“cornflowerblue”, s=s, alpha=a,
label=“Multi RF score=%.2f” % regr_multirf.score(X_test, y_test))
plt.scatter(y_rf[:, 0], y_rf[:, 1], edgecolor=‘k’,
c=“c”, s=s, marker="^", alpha=a,
label=“RF score=%.2f” % regr_rf.score(X_test, y_test))
plt.xlim([-6, 6])
plt.ylim([-6, 6])
plt.xlabel(“target 1”)
plt.ylabel(“target 2”)
plt.title(“Comparing random forests and the multi-output meta estimator”)

1 Like

Thanks a lot Tony for the code. When I tried to execute the code, I ran into some errors in the plotting piece of code and couldn’t compare both the algorithms. I’ll try to debug and see sometime later how it works. However, I got an idea now. By the way, Sorry I was away and couldn’t check your post earlier. Thanks again!