Question on General Probability

#1

Please solve this basic data science related question asked for a screening interview of company Integrate.AI, at-least 1 part-

Suppose we have a data set with two variables: type of injury (categorical) and description (string). We want to predict the type of injury for new data given only the description. We get a new description that contains the word “swelling”. Our model, built from a very large training sample, tells us that the only two types of injuries that can produce the word “swelling” are “Burn”, which occurs in 1 out of 10 observations, and “Bruise”, which occurs in 1 out of 100 observations. A “Bruise” observation has a 30% chance of generating the word “swelling”, while “Burn” has only a 5% chance of generating the word “swelling".

Q1. Without any other information, is the new observation with the word “swelling” more likely to be a burn or a bruise? What is the probability of either? *

Q2. What is the probability of at-least 2 bruises given that the 6 observations have descriptions that contain the word “swelling”.

I solved as -

GIVEN-
Burn - 1 out of 10 observations
Bruise - 1 out of 100 observations

Bruise - 30% chance of generating word swelling
Burn - 5% change of generating word swelling

CALCULATION 1 -

So,
0.01 * 0.3 = 0.003 = P(bruise)
0.1 * 0.05 = 0.005 = P(burn)

Now, 6 observations have the word “swelling”.

CALCULATION 2 -

P(at least 2 bruises) = P(2 bruises, 4 burns) + P(3 bruises, 3 burns) + P(4 bruises, 2 burns) + P(5 bruises, 1 burns) + P (6 bruises, 0 burns) = 9 * 625 + 27 * 125 + 81 * 25 + 243 * 5 + 729 / (1000 * 1000)

Is it correct? I believe nCr has to be used.

Thanks & Best

#2

Hi @mrinmayk ,

Bernoulli trials take into account the position of events as well. For example, for the case of 2 bruise and 4 burn - it can be :

bruise bruise burn burn burn
burn bruise bruise bruise burn
burn burn bruise bruise bruise
bruise burn burn bruise bruise … .and so on

The second question should be solved with bernoulli trail