Reinforcement multi Armed Bandit


I just came across this blog post which explain how MAB can be applied to online advertising to Optimise the CTR.

My question is regarding the UCB1 example at the end.
The advert tab, what does a ‘row’ represent?
Are N rows for different users clicks at different time?

For my use case I have a very similar setup. I have 5 ads that I would like to first randomly show to the users (in order to collect initial data to learn from)
When an ad is shown to the user it generate an “Impression” event type and when a user click on the ad it generates a “click” event.
I am monitoring these events live and I am able to, for example, calculate the CTR hour after hour.
The goal would be to, using the historical data (click, impression etc…), try to find which ad. is yielding the best CTR over time and little a little showing that ad more and more.

Is MAB useful in this situation? Because reading the article it looks like what MAB does is finding what is the best CTR (I have that information already) ?

© Copyright 2013-2019 Analytics Vidhya