Dear all,
I'm using the Oaxaca decomposition command, in Stata 18, with svy subpopulation option.
I noticed that some observations are being excluded from the models within both Group 1 and Group 2. Additionally, I observed that in the decomposition model, the number of observations is the total sample size rather than the intended subpopulation size.
My subpopulation's sample size is 14,367, with Group 1 (lower education) comprising 2,192 observations and Group 2 constituting 12,175 observations.
When I run the decomposition without the svy option I get the correct subpopulation sample size in all the models.
I would greatly appreciate your guidance and advice regarding this issue.
Here are the outputs for the number of obsevations in the regression analyses for each group and the outputs for the number of observations in the oaxaca decomposition.
Thanks in advance for your help.
Total sample
Group 1
Group 2
Here is the code I used for the decomposition
Here is the output for the number of observations generated by the decomposition
I'm using the Oaxaca decomposition command, in Stata 18, with svy subpopulation option.
I noticed that some observations are being excluded from the models within both Group 1 and Group 2. Additionally, I observed that in the decomposition model, the number of observations is the total sample size rather than the intended subpopulation size.
My subpopulation's sample size is 14,367, with Group 1 (lower education) comprising 2,192 observations and Group 2 constituting 12,175 observations.
When I run the decomposition without the svy option I get the correct subpopulation sample size in all the models.
I would greatly appreciate your guidance and advice regarding this issue.
Here are the outputs for the number of obsevations in the regression analyses for each group and the outputs for the number of observations in the oaxaca decomposition.
Thanks in advance for your help.
Total sample
Code:
svy, subpop(if subpop==1): logistic self age sex income badl visit eat prot_d2 dent
Code:
Survey: Logistic regression Number of strata = 574 Number of obs = 90,846 Number of PSUs = 8,027 Population size = 168,426,190 Subpop. no. obs = 14,367 Subpop. size = 21,722,187.6 Design df = 7,453 F(8, 7446) = 75.68 Prob > F = 0.0000
Code:
svy, subpop(if subpop==1 & ses==0): logistic self age sex income badl visit eat prot_d2 dent
Code:
Number of strata = 457 Number of obs = 80,899 Number of PSUs = 7,085 Population size = 135,901,805 Subpop. no. obs = 2,192 Subpop. size = 2,505,225.62 Design df = 6,628 F(8, 6621) = 22.36 Prob > F = 0.0000
Code:
svy, subpop(if subpop==1 & ses==1): logistic self age sex income badl visit eat prot_d2 dent
Code:
Number of strata = 573 Number of obs = 90,789 Number of PSUs = 8,022 Population size = 168,374,254 Subpop. no. obs = 12,175 Subpop. size = 19,216,962 Design df = 7,449 F(8, 7442) = 55.80 Prob > F = 0.0000
Code:
oaxaca self age sex income badl visit eat prot_d2 dent, /// by(ses) logit weight(0) svy(,subpop(subpop)) noisily cformat(%4.3f)
Here is the output for the number of observations generated by the decomposition
Code:
Model for group 1 (running logit on estimation sample) Survey: Logistic regression Number of strata = 456 Number of obs = 80,842 Number of PSUs = 7,080 Population size = 135,849,869 Subpop. no. obs = 2,183 Subpop. size = 2,497,605.89 Design df = 6,624 F(8, 6617) = 22.28 Prob > F = 0.0000 Note: 117 strata omitted because they contain no subpopulation members. Model for group 2 (running logit on estimation sample) Survey: Logistic regression Number of strata = 456 Number of obs = 80,842 Number of PSUs = 7,080 Population size = 135,849,869 Subpop. no. obs = 10,365 Subpop. size = 14,703,327 Design df = 6,624 F(8, 6617) = 49.14 Prob > F = 0.0000 Blinder-Oaxaca decomposition Number of strata = 456 Number of obs = 80,842 Number of PSUs = 7,080 Population size = 135,849,869 Design df = 6,624 Model = logit Group 1: ses = 0 N of obs 1 = 7,353 Group 2: ses = 1 N of obs 2 = 73,489 explained: (X1 - X2) * b2 unexplained: X1 * (b1 - b2)
Comment