How can I proceed with this estimate?

Juan Gutierrez

Join Date: Oct 2024

Posts: 2
#1

How can I proceed with this estimate?

23 Oct 2024, 05:57

How to solve the cost of this estimate?

I am conducting an analysis on infant mortality due to pollution. For this purpose I have considered microdata of births whose outcome equal to 1 corresponds to mortality (before 24 hours after birth or late fetal).

My instrumental variable is the accumulation or average of a particular type of pollutant (PM2.5 or PM10). I have also considered for my regression both meteorological controls and characteristics of the mother.

I am conducting this analysis at the municipal district level (it should be noted that within my sample there are about 8,000 municipalities) each with its measurement of accumulated pollution also at the weeks of gestation of the fetus or newborn child, until the date of death.

My empirical strategy is defined by this equation:
set maxvar 30000
set min_memory 8g
set niceness 0
set matsize 10000
use "C:\ ...", clear
set dp comma
describe

xi: reg birthclassification accpm10 sexofbirth multiplebirth studylevelsmother yearsmother yearsmother2 foreignnationality monthlyrainfall monthlyt2 monthlyraint2 monthlyrainfall2 monthlyt22 i.monthbirth*i.year i.monthbirth*i.capm

birthclassification corresponds to mortality classification
accpm10 corresponds to accumulated PM10 pollutant
sexofbirth : control for the sex of the newborn, dummy equal to 1 if the child is female
multiplebirth : dummy 1 if the birth is multiple
studylevelsmother : dummy equal to 1 if the mother has higher education
yearsmother : age of the mother
yearsmother2 : age of the mother at birth square
foreignnationality : dummy equal to 1 if mother is native to the corresponding country
monthlyrainfall: monthly average of accumulated rainfall
monthlyt2: monthly average of accumulated temperatures
monthlyraint2 : interaction between average rainfall and temperature
monthlyrainfall2 : square of average rainfall
monthlyt22 : square of average temperatures

Finally, there are two interactions that I have included in my regression. One is monthbirth x year, expressed by "i.monthbirth*i.year" and from which I hope to measure seasonal patterns and unobserved heterogeneities. And also another interaction that corresponds to minutiae with months "i.monthbirth*i.capm" to evaluate seasonal patterns that differ in each municipality on a monthly basis.

My problem arises in the calculation of this estimate when doing the regression. In total there are 2,623,692 observations (corresponds a study period between 2009-2016), and it implies an enormous cost in the estimate, even in days. I would like to ask if the procedure I am carrying out could have some obvious error; if the equation I have created could be defective? On the other hand, I assume that the interaction of the municipalities with the months is what is generating an enormous computational cost for me. If so, is there any way to remedy the cost of the operation? Please, I look forward to your comments in order to be able to solve this obstacle.

Thank you

JC

Last edited by Juan Gutierrez; 23 Oct 2024, 06:05.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17601
#2

24 Oct 2024, 03:09

Juan:
welcome to this forum.
Some comments on your query:
1) since you have a N>T panel dataset, why not using panel data command, such as -xtreg-, instead of -regress- to analyze your dataset?
2) the -xi.- prefix is redundant if you use -fvvarlist- notation;
3) you mention instrumental variable in your post: do you mean that you detected endogeneity in your regression? Or do you mean instrumental variable=independent variable?

Kind regards,
Carlo
(StataNow 18.5)
Comment
Juan Gutierrez

Join Date: Oct 2024

Posts: 2
#3

25 Oct 2024, 16:38

Good evening Carlo, thank you very much for the welcome to the forum and for your feedback.

Yes, I will do the evaluation with xtreg. I think that the fact of using xi is what is causing the extension of time in the estimation.

You are right, in the rush to write the message I wrote instrumental, but in reality it is an independent variable.

Kind regards

Juan Carlos
Comment

Announcement

How can I proceed with this estimate?

Comment

Comment