Machine Unlearning



Everyone has tried their best at making models learn classes or numbers. But how to make a machine learning model “unlearn” some instances? Let me put forth my ideas on the same and I would welcome some more :slightly_smiling:

  • The simplest way would be to re-train the model minus the “to be forgot” instances. But it would be costly w.r.t time for larger training sets.

  • An extension of the above approach (re-train model) would be to aggregate the data points in some way so that individual samples directly impact the bucket they are going to rather than the model.

  • Add instance importance as a feature perhaps? i.e. use weights to make some instances more important than others. So if we need to forget instances, we can mark them as low importance.

  • Re-insert the “to be forgot” samples with corrected labels

  • Make the algorithm resistant to noise and incorrect classifications OR try to identify incorrectly classified instances


So there can be two things, when you want the machine to unlearn some instances

  1. Remove the entry altogether
    This means that you do not want that entry to be considered in the model at all.

  2. Reclassify the output for that entry
    This means that you want the model to re-route the output for this particular sequence of inputs.

Part 2 would be easier, but part 1 would be difficult i guess.


@anantguptadbl - thanks for giving some thought to this!

Regarding Part 1, we need to be able to remove the instance from the model, but how can that be achieved? It’s not like the model is like a database table and I can delete rows :frowning:

Regarding Part 2, you mean that re-train that entry, right? If it is an incremental model, then may be we can re-insert the corrected entry but the model has already learnt incorrect class for that entry, so not sure about this.

Also, if I need to unlearn a batch of instances, then I can build separate models for each batch and ensemble them by weights, perhaps? By batch I mean group of instances with common factor. For eg:

  • For the titanic problem, I want to unlearn the passengers who got a free passage i.e. Fare = 0, without retraining the model though. If we had 2 models, one for paid passengers and another for free passengers, and if we are doing an ensemble, I can nullify the weights for the 2nd model.

Will this approach work? Any other ideas, please anyone?


Hi @Bolaka,

I absolutely like your idea. That’s innovative!

I’d like to add that in neural networks you could do machine unlearning. There are two concepts in it that you should know, i.e. fine tuning and transfer learning

I’ll be somewhat abstract.
A neural network can be considered like a block of legos, with the objective of finding the correct “structure” (architecture). So if you want to build a good structure, what you can do is take a pre-built structure (pretrained model) and change it according to your need. You could make only tiny changes in the structure (fine tuning) or you could take the previous structure, remove some of the blocks and try a new structure (transfer learning).

The most obvious problem with this approach is that unlike that of decision trees (where you know what each node does), you have to do much trial and error to get a good model.

PS: Here’s the link to the reference article.



In a typical modeling project, there are several steps prior to the model creation, such as Data Exploration, Handling missing values, handling outliers etc. During this pre-process, you exclude the data that does not fit in, so that when you create the model, you will have only the relevant data.

In such a modeling project, why would anyone make the model unlearn some instances?


Hi @r_achar ,

I would like to comment on your point.

I agree that you give only the relevant data to your model. But even after that, you can see that the model does not perform as good as you want it to be (99% is not enough, you want 100% :wink: ) So if you want the model to be as good as a human (or even better), what you can do is that you can “correct” the model whenever it is going wrong. If you could just make the model see the mistakes it is making and make it resistant to it, that would be excellent.

Did I make sense?


Hi @jalFaizy,

I see your point.

You can get 100% accuracy with artificially created data which contains no random noise, whereas with real life data, you are highly unlikely to get 100% accuracy.

After the model goes into production, you validate it periodically. If the performance starts degrading, you’d better update the model using current, latest data. This way, the model learns new information and its accuracy will likely improve. I would not call this unlearning, would rather refer to it as updating the model.


Hey @jalFaizy!
Thanks for the link! I had not known about transfer learning but it sure looks interesting! You have explained it beautifully too :slight_smile: I will look into it.

I am glad you like my idea. So by nullifying weights on the ensembled model, we can unlearn whole clusters of (wrong) data points. But let’s say we need to go deeper and pick out instances from a cluster to unlearn. In that case we may need to re-train the model only on that cluster minus the wrong data points.



Yes you are right. But consider the scenario when some classes are relabeled because of change in business rules. That is a valid case where you might need to unlearn data points.



I think at this point it is just semantics. We both agree that when the model is no longer valid, due to the change in business rules, we need to replace it with a new model, OR update it using current data.


I found these resources which might be helpful

  • Technical Paper : Cao, Yinzhi, and Junfeng Yang. “Towards Making Systems Forget with Machine Unlearning.” 2015 IEEE Symposium on Security and Privacy. IEEE, 2015.
  • Corresponding Video