collapse municipality-level data and calculating a ratio

Chiara Puccioni

Join Date: Jun 2019

Posts: 2
#1

collapse municipality-level data and calculating a ratio

15 Nov 2019, 01:46

Dear Statalisters,

For better understanding and learning STATA, I would like to ask you the following question. I have a municipality-level panel dataset containing some information about the kindergarten services offered by each municipalities, like the following:

Code:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int code_province double code_municipality float year double(children nr_seats) float(pop0_2 pop_tot) 1 1008 2004 71 75 437 16752 1 1008 2005 73 75 416 16769 1 1008 2006 73 75 392 16726 1 1008 2007 73 75 391 16760 1 1008 2008 74 90 411 17029 1 1008 2009 87 90 415 16998 end

In particular, I'm interested in the ratio between nr_seats and pop0_2, that is the ratio between available kindergarten seats and population aged 0-2 (the kindergarten coverage rate), by municipalities-year.
However, because many municipalities are too small to offer any kindergarten service and their demand for childcare service is basically served by the closest large municipality, I'm interested in aggregating those data at a higher geographical level, that in my case is the province level (code_province).

For this purpose, I've tried 2 different codes in STATA, each one giving me a different result:

First code:

Code:

. preserve . gen coverage=nr_seats/pop0_2 (480 missing values generated) . recode coverage(.=0) if nr_seats==0|pop0_2==0 (coverage: 480 changes made) . collapse coverage, by(code_province year) . sum coverage Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- coverage | 1,320 .0502159 .0455661 0 .2749822 . restore

Second code:

Code:

. preserve . collapse nr_seats pop0_2, by(code_province year) . gen coverage=nr_seats/pop0_2 . sum coverage Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- coverage | 1,320 .0844444 .0592051 0 .2932529 . restore

as I understand it, however, the two codes should not give such a different results; indeed the first one is giving the average coverage rate by province, weighting all the municipalities belonging to the province the same, regardless of the population in each. The second one is calculating the average nr. of seats and the average nr. of kids aged 0-2 by province, again without any weighting, and then is calculating the ratio between the two.

Could some one please give me some insights on why the two codes give such a different results?

Thank you so much,

Chiara

Last edited by Chiara Puccioni; 15 Nov 2019, 01:48.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

15 Nov 2019, 02:27

Chiara:
welcome to this forum.
I would propose a different take, since you have a panel dataset:

Code:

xtset code_province year xtpoisson nr_seats children i.year, exp( pop0_2 ) irr

Kind regards,
Carlo
(Stata 19.0)
Comment
Chiara Puccioni

Join Date: Jun 2019

Posts: 2
#3

15 Nov 2019, 10:32

Thank you Carlo,

but I am not sure this will solve the problem. I apologise in advance if I didn't understand correctly the idea of xtpoisson but I cannot set
as panel var the provinces unless I first collapse the dataset; indeed if I try I obtain the following error:

Code:

repeated time values within panel

Secondly, maybe I didn't express well the problem: my final goal is to evaluate if a certain policy did have an impact on the kindergarten coverage rate by province, and in particular in my analysis I would use a Difference in Difference model. As such, I need to first construct my outcome variable at province level, that is the kindergarten coverage rate by province-year.

Best,

Chiara
Comment

Announcement

collapse municipality-level data and calculating a ratio

Comment

Comment