In this article, in iteration 4, for S1, UCB1 is calculated as follows:

10+2*sqrt(ln(3)/2)

Should it be following?:

20+2*sqrt(ln(3)/2)

UCB1 formula is given as:

where Vi is the average reward/value of all nodes beneath this node. Does that reduce Vi at S1 from iteration 3 to iteration 4 from 20 to 10, because in interaction 4, S1 has 2 more children? If yes, I am unable to get why exactly. Can someone please explain?