Dear Statalisters,
For better understanding and learning STATA, I would like to ask you the following question. I have a municipality-level panel dataset containing some information about the kindergarten services offered by each municipalities, like the following:
Code:
In particular, I'm interested in the ratio between nr_seats and pop0_2, that is the ratio between available kindergarten seats and population aged 0-2 (the kindergarten coverage rate), by municipalities-year.
However, because many municipalities are too small to offer any kindergarten service and their demand for childcare service is basically served by the closest large municipality, I'm interested in aggregating those data at a higher geographical level, that in my case is the province level (code_province).
For this purpose, I've tried 2 different codes in STATA, each one giving me a different result:
First code:
Second code:
as I understand it, however, the two codes should not give such a different results; indeed the first one is giving the average coverage rate by province, weighting all the municipalities belonging to the province the same, regardless of the population in each. The second one is calculating the average nr. of seats and the average nr. of kids aged 0-2 by province, again without any weighting, and then is calculating the ratio between the two.
Could some one please give me some insights on why the two codes give such a different results?
Thank you so much,
Chiara
For better understanding and learning STATA, I would like to ask you the following question. I have a municipality-level panel dataset containing some information about the kindergarten services offered by each municipalities, like the following:
Code:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int code_province double code_municipality float year double(children nr_seats) float(pop0_2 pop_tot) 1 1008 2004 71 75 437 16752 1 1008 2005 73 75 416 16769 1 1008 2006 73 75 392 16726 1 1008 2007 73 75 391 16760 1 1008 2008 74 90 411 17029 1 1008 2009 87 90 415 16998 end
However, because many municipalities are too small to offer any kindergarten service and their demand for childcare service is basically served by the closest large municipality, I'm interested in aggregating those data at a higher geographical level, that in my case is the province level (code_province).
For this purpose, I've tried 2 different codes in STATA, each one giving me a different result:
First code:
Code:
. preserve . gen coverage=nr_seats/pop0_2 (480 missing values generated) . recode coverage(.=0) if nr_seats==0|pop0_2==0 (coverage: 480 changes made) . collapse coverage, by(code_province year) . sum coverage Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- coverage | 1,320 .0502159 .0455661 0 .2749822 . restore
Code:
. preserve . collapse nr_seats pop0_2, by(code_province year) . gen coverage=nr_seats/pop0_2 . sum coverage Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- coverage | 1,320 .0844444 .0592051 0 .2932529 . restore
as I understand it, however, the two codes should not give such a different results; indeed the first one is giving the average coverage rate by province, weighting all the municipalities belonging to the province the same, regardless of the population in each. The second one is calculating the average nr. of seats and the average nr. of kids aged 0-2 by province, again without any weighting, and then is calculating the ratio between the two.
Could some one please give me some insights on why the two codes give such a different results?
Thank you so much,
Chiara
Comment