Data Projection to National Level



I have a project where i am collecting data from a sample of pharmacies from a universe. I use the data to build an application in QlikView. The application gives information on the pharmaceutical market in the country. How can one best project the sampled data to national level. Say one has a sample of ‘n’ pharmacies from a universe of say ‘N’ pharmacies. What is the best way to project the collected sample data to the universe or national level>

Thanking you.


Hello @chitemerere_stratdig,

I believe by data you mean ‘sales’ data. I think would depend on your sample. How large is n as compared to N? What is the geographic distribution? Do you think it’s a representative sample, i.e. is it big enough to estimate the population?

In case you’re confident on your sample, you can directly extrapolate based on ratio. But this generally doesn’t work out that well. Especially if you’re in a consulting role, it will almost never work.

Another idea could be to think about parameters. You can divide the nation into different geographies and try to find out parameters which you think will affect the ‘data’ (not sure what it is). Eg, in case of sales you might want to look at incidence, healthcare expenditure, #pharmacies, #physicians of a particular specialty which is expected to prescribe the product more, #hospitals if it is a specialty-care drug, etc. There can be multiple factors.

Determine these factors at a granular level (state or district). I know data availability might be a constraint but you have to do some hard secondary research or delegate this part. You can define a metric using all the factors and then extrapolate taking this metric as the weight.

Further details about the problem will help in defining the specifics. Right now, that’s all I can say.

Hope this gives you some clue.



Dear Aarshay

Thanks you very much for your prompt response, most appreciated. Your contribution is in the right direction with respect to what i am looking at.

The data relates to the following measures:

  1. Dollar value of prescriptions sold
  2. Number of prescriptions sold
  3. Quantity of medicines sold

The data is collected on a monthly basis at the end of each month.

Right now my universe consists of 407 pharmacies nationally and my current sample has 41 pharmacies nationally located. With the universe of 407 pharmacies, 49% are located in the major city Harare, 10% in the second largest city Bulawayo and 8% in the next largest town Chitungwiza. The rest are scattered all over the country with outlets ranging from 1 to 13 per town.

i should be able to get the following measures for each location:

  1. Number of pharmacies
  2. Number of doctors
  3. Overall disease incidence
  4. Health insurance coverage
  5. Population

Any further guidance on coming up with the right metric(s) would be most appreciated.




Hi Chris,

First of all, let me clarify that I’m not an expert in this domain. Whatever I said was based on my intuition and some experience in pharma Sales Force Effectiveness. So I might not be the best person to answer, but still I’ll share my opinion.

I’ll clear my understanding of problem first - You have sales of 41 pharmacies and you want to use it to estimate sales of each of the remaining 366 pharmacies. I guess this data will be used for pharmacy targeting. My answer is based on this understanding. If its anything else, this might be much more than required. But it might be good to know.

First you should note that this is less of a statistical but more of a business problem. Statistics won’t work here because you have a very small sample (10%) and it’ll be difficult to get a good estimate just mathematically. I suppose most of those 41 pharmacies in your panel will be in Harare (atleast that’s what I would do. Go for the low hanging fruits first).

I think 1 single metric might not suffice here. Some tips which you can use:

  1. I’m sure you must be having Tier information. So Harare would be Tier1, Bulawayo & Chitungwiza might be Tier 2 and others might belong to Tier 2/3. (Just speculating)

  2. The key should be to get Harare right. If you look at a Zimbabwe map, you’ll see Harare is a very small area in terms of geographic spread but there high market concentration because its the capital and most populous city. Most probably it’ll be the business hub. This is similar to something like Manila for Philippines or Jakarta for Indonesia. Let’s focus on Harare first.

  3. Because of so population density, the market will be very dynamics. So you can consider something as sophisticated as a “geo-spatial analysis”. You can plot all the pharmacies and Hospitals/Clinics (in Harare) on a map and then see which ones you cover. What is driving the business there? It might be a big hospital or a group of clinics (not sure how healthcare system works in Zimbabwe). The #doctors (in each hospital/clinic) and more importantly the #doctors of the specialty which are important for the particular product are important. Try to see which hospitals are driving sales and assign high weightage to pharmacies around those.

  4. Geographic distance can be another metric. Once you know the key hospitals, you can see which pharmacies are geographically closer and assign an index accordingly.

Let me take an example. Suppose its a diabetes drug (say Lantus) and key specialties are DIA and END (give them a weightage 10) and lesser important are GP and INT (give them a weightage 1).

Hospital 1 (H1) has - 1 DIA, 1 END, 20 GP and 20 INT. So the value in terms of potential of docs is 210 + 401 = 60
Hospital 2 (H2) has - 5 DIA, 5 END, 5 GP and 5 INT. So value is 1010 + 101 = 110

Note that even though H1 has 42 docs which is twice than H2 which has 20 docs, the value is almost half. So taking specialties into account is really important. Now you can assign a high weight to pharmacies near H2 as compared to H1. I know this is a simple example and actually many hospitals and pharmacies will be close by. But I’m just giving you a direction to think. You have to figure the actual thing out.

For cities apart from Harare, I don’t think you’ll have much coverage and you can define a similar metric at city level and divide sales. I believe this exercise is being used to target pharmacies and you’ll anyways end up focussing on Harare and 1-2 others because the potential will be way higher than others.

I know its a lot to digest. But this is how I would do it. Please read it carefully. Feel free to discuss further.



Dear Aarshay

Once again thank you very much for your valuable input.

I have done some searches on the subject. Is it possible for me to have your e-mail address so that i can share some confidential information on the subject matter. Indeed geo-spatial projection algorithm is commercially used by one of the big pharmaceutical information providers.




Dear Chris,

Please drop me a personal note and we can discuss further on the matter.