Hi everyone (and good holidays for those of you who will reads this soon after I post this),
I've got into a problem which I cannot understand where its origins lay on. Put it simply, I've run the same model with two different syntaxes in STATA. Here the two codes:
* Syntax A
quietly forvalues v=1/40 {
reg ser i.pser##i.cohort3 ib5.edulvl i.sex i.year [pweight = cs_weight] if sampleA==1 & country==`v', r
estimates store reg_results
margins cohort3, dydx(i.pser)
estimates store margins_results
outreg2 margins_results using DESOcohSL, excel dec(2) ci keep(i.pser#i.cohort3) ct(`v')
}
* Syntax B
bys cohort: reg ser i.pser ib5.edulvl i.sex i.year [pweight = cs_weight] if sampleA==1 & country==2, r
bys cohort: reg ser i.pser ib5.edulvl i.sex i.year [pweight = cs_weight] if sampleA==1 & country==3, r
bys cohort: reg ser i.pser ib5.edulvl i.sex i.year [pweight = cs_weight] if sampleA==1 & country==4, r
bys cohort: reg ser i.pser ib5.edulvl i.sex i.year [pweight = cs_weight] if sampleA==1 & country==5, r
etc...
ser & pser are dummies. Edulvl is categorical. All the other variables are controls. The variables are correctly specified (i.e. I've encoded what to encode and cleaned what to be cleaned).
I cannot understand why the two syntaxes leads to different estimates for i.pser#i.cohort3. The results from syntax A are analytically impossible (out of expected 0-1 range for linear probability models), while the results from syntax B are in line with previous works and literature. Trying to troubleshoot this, I've got to the conclusion that the problem (probably) lays in the margins command. I've also noticed that syntax A is way slower than B.
Again, syntax B fixed the problem and now I have the estimates I need, but I'm curious to know why I've got those estimates, does anyone have an idea?
Thanks to you all in advance and have nice holidays!
I've got into a problem which I cannot understand where its origins lay on. Put it simply, I've run the same model with two different syntaxes in STATA. Here the two codes:
* Syntax A
quietly forvalues v=1/40 {
reg ser i.pser##i.cohort3 ib5.edulvl i.sex i.year [pweight = cs_weight] if sampleA==1 & country==`v', r
estimates store reg_results
margins cohort3, dydx(i.pser)
estimates store margins_results
outreg2 margins_results using DESOcohSL, excel dec(2) ci keep(i.pser#i.cohort3) ct(`v')
}
* Syntax B
bys cohort: reg ser i.pser ib5.edulvl i.sex i.year [pweight = cs_weight] if sampleA==1 & country==2, r
bys cohort: reg ser i.pser ib5.edulvl i.sex i.year [pweight = cs_weight] if sampleA==1 & country==3, r
bys cohort: reg ser i.pser ib5.edulvl i.sex i.year [pweight = cs_weight] if sampleA==1 & country==4, r
bys cohort: reg ser i.pser ib5.edulvl i.sex i.year [pweight = cs_weight] if sampleA==1 & country==5, r
etc...
ser & pser are dummies. Edulvl is categorical. All the other variables are controls. The variables are correctly specified (i.e. I've encoded what to encode and cleaned what to be cleaned).
I cannot understand why the two syntaxes leads to different estimates for i.pser#i.cohort3. The results from syntax A are analytically impossible (out of expected 0-1 range for linear probability models), while the results from syntax B are in line with previous works and literature. Trying to troubleshoot this, I've got to the conclusion that the problem (probably) lays in the margins command. I've also noticed that syntax A is way slower than B.
Again, syntax B fixed the problem and now I have the estimates I need, but I'm curious to know why I've got those estimates, does anyone have an idea?
Thanks to you all in advance and have nice holidays!
Comment