Generate a dataset with the marginal effect of gender per country in the dataset

Luis Ortiz

Join Date: Dec 2014

Posts: 95
#1

Generate a dataset with the marginal effect of gender per country in the dataset

26 Jul 2024, 03:01

Dear members of the list,

I have a dataset with individual-level data nested in years and countries. I am interested in plotting the marginal effect of gender on overeducation (my dependent variable) against the country-level scores of a key independent variable (gender egalitarianism). I'm having problems running the loop that should generate and store those marginal effects (see below). That's the reason I'm asking for help here.

First, I run a simple logistic regression with years as a control and an interaction of gender and country. S020 is the variable that captures years. As I said before, my intention is to get the marginal of gender for each country. See next:

HTML Code:

logit overed4 i.female##i.country S020

Before the loop, I generate unique country codes with 'levelsof'

HTML Code:

levelsof cntry, local(countries)

It works. So far, so good.

Then, I create variables to store the results:

HTML Code:

gen country_effect = . gen country_id = .

Now comes the loop, where the variables above are included. This is where Stata reads but does nothing. It does not give any error message either.

HTML Code:

foreach c of local countries { margins, dydx(female) at(cntry=`c') matrix b = r(b) replace country_effect = b[1,1] if country == `c' replace country_id = `c' if country == `c' }

I wonder why the loop does not work.

Once the marginal effect of gender for the different countries is generated, I would plot these marginal effects against gndr_egalitarianism per country. For this, I would merge the small dataset with the marginal effect of gender by country with another dataset with the country values of gender egalitarianism.

But that's another story. First I have to solve the problem with the loop mentioned before

As indicated in the norms of the list, I'm pasting a sample of my data with dataex.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float overed4 byte female int(cntry S020) float mean_gndr_eg 0 1 276 1997 .638171 0 1 276 1997 .638171 1 2 276 1997 .638171 0 2 276 1997 .638171 0 2 276 1997 .638171 0 2 276 1997 .638171 0 2 276 1997 .638171 0 2 276 1997 .638171 0 1 276 1997 .638171 0 1 276 1997 .638171 0 1 724 1997 .624891 0 1 7241997 .624891 0 1 7241997 .624891 0 2 7241997 .624891 0 2 724 1997 .624891 0 1 724 1997 .624891 0 2 724 1997 .624891 0 1 724 1997 .624891 0 1 724 1997 .624891 0 2 724 1997 .624891 end label values female X001 label def X001 1 "Male", modify label def X001 2 "Female", modify label values cntry S003 label def S003 276 "276. Germany", modify label values S020 S020 label def S020 1997 " 1997", modify label var female "Sex" label var cntry "Country (ISO 3166-1 Numeric code)" label var S020 "Year survey"

Thanks for your attention

And kind regards

Luis Ortiz
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29147
#2

26 Jul 2024, 09:25

There are a couple of problems I see.

First, you are using S020 as a continuous variable to specify the year effect. Unless you expect a linear relationship between year and log odds overeducation, this is wrong. It is more usual, in the absence of a linear trend, to represent years as discrete variables, i.e. entering it into the regression as i.S020.

But that's a side issue. More to the point, your code refers to a variable, country, that does not exist in your data. You mean cntry, not country. If you make that change, the code will at least run without error messages.

Then you have another problem. If you -matrix list b- you will see that the coefficient you are looking for is found in b[1,2], not b[1,1], so you need to change your -replace- statement accordingly.
Comment
Luis Ortiz

Join Date: Dec 2014

Posts: 95
#3

26 Jul 2024, 10:16

Many thanks for your attention, Clyde

Very silly mistake, the one of naming the variable 'country' instead of 'cntry' in the logit regression. And you're totally right about the treatment of years (S020)

I made both changes, as well as the change inside the loop (b[1,2] instead of b[1,1)) Still, Stata runs the loop but produces nothing. I cannot see where these marginal effects of gender are stored... if they are generated at all. When I say...

HTML Code:

matrix list b

...Stata tells me that "matrix b was not found"

There's not any error message, but I do not see any production either. Am I missing something?

Thanks again

Luis

Last edited by Luis Ortiz; 26 Jul 2024, 10:20.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 29147

26 Jul 2024, 10:26

I cannot reproduce your difficulty, so I can't say what is wrong. Here's what I get with your example data:

Code:

. * Example generated by -dataex-. For more info, type help dataex
. clear

. input float overed4 byte female int(cntry S020) float mean_gndr_eg

       overed4    female     cntry      S020  mean_gn~g
  1. 0 1 276 1997 .638171
  2. 0 1 276 1997 .638171
  3. 1 2 276 1997 .638171
  4. 0 2 276 1997 .638171
  5. 0 2 276 1997 .638171
  6. 0 2 276 1997 .638171
  7. 0 2 276 1997 .638171
  8. 0 2 276 1997 .638171
  9. 0 1 276 1997 .638171
 10. 0 1 276 1997 .638171
 11. 0 1 724 1997 .624891
 12. 0 1 724 1997 .624891
 13. 0 1 724 1997 .624891
 14. 0 2 724 1997 .624891
 15. 0 2 724 1997 .624891
 16. 0 1 724 1997 .624891
 17. 0 2 724 1997 .624891
 18. 0 1 724 1997 .624891
 19. 0 1 724 1997 .624891
 20. 0 2 724 1997 .624891
 21. end

. label values female X001

. label def X001 1 "Male", modify

. label def X001 2 "Female", modify

. label values cntry S003

. label def S003 276 "276. Germany", modify

. label values S020 S020

. label def S020 1997 "       1997", modify

. label var female "Sex"

. label var cntry "Country (ISO 3166-1 Numeric code)"

. label var S020 "Year survey"

.
. logit overed4 i.female##i.cntry i.S020

note: 1.female != 0 predicts failure perfectly;
      1.female omitted and 10 obs not used.

note: 276.cntry != 1 predicts failure perfectly;
      276.cntry omitted and 4 obs not used.

note: 2.female omitted because of collinearity.
note: 724.cntry omitted because of collinearity.
note: 2.female#724.cntry omitted because of collinearity.
note: 1997.S020 omitted because of collinearity.
Iteration 0:  Log likelihood = -2.7033673  
Iteration 1:  Log likelihood = -2.7033673  

Logistic regression                                     Number of obs =      6
                                                        LR chi2(0)    =   0.00
                                                        Prob > chi2   =      .
Log likelihood = -2.7033673                             Pseudo R2     = 0.0000

------------------------------------------------------------------------------------
           overed4 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------------+----------------------------------------------------------------
            female |
             Male  |          0  (empty)
           Female  |          0  (omitted)
                   |
             cntry |
              724  |          0  (empty)
                   |
      female#cntry |
Male#276. Germany  |          0  (empty)
         Male#724  |          0  (empty)
       Female#724  |          0  (empty)
                   |
              S020 |
             1997  |          0  (omitted)
                   |
             _cons |  -1.609438   1.095445    -1.47   0.142    -3.756471    .5375951
------------------------------------------------------------------------------------

.
. levelsof cntry, local(countries)
276 724

.
. gen cntry_effect = .
(20 missing values generated)

. gen cntry_id = .
(20 missing values generated)

.
. foreach c of local countries {
  2.         margins, dydx(female) at(cntry=`c')
  3.         matrix b = r(b)
  4.         replace cntry_effect = b[1,2] if cntry == `c'
  5.         replace cntry_id = `c' if cntry == `c'
  6. }
warning: prediction constant over observations.

Average marginal effects                                     Number of obs = 6
Model VCE: OIM

Expression: Pr(overed4), predict()
dy/dx wrt:  2.female
At: cntry = 276

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
      female |
       Male  |          0  (empty)
     Female  |          .  (not estimable)
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
(10 real changes made)
(10 real changes made)
warning: prediction constant over observations.

Average marginal effects                                     Number of obs = 6
Model VCE: OIM

Expression: Pr(overed4), predict()
dy/dx wrt:  2.female
At: cntry = 724

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
      female |
       Male  |          0  (empty)
     Female  |          .  (not estimable)
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
(10 real changes made)
(10 real changes made)

I did not list the data set after the code runs because it is fairly long. But the correct coefficient values (0 with this data), are entered in the cntry_effect variable. Compare the code you are running with mine and see what you are doing differently. (In the future, when you need help troubleshooting code, you need to show the exact code you are running. You did that in #1, but you didn't in #3, where it is equally important to do so.)

Now, the results I got from the above are not very nice. But that's due to the ultra-lopsided distribution of the outcome variable in the example data. Presumably in your real data set you will not encounter that difficulty. (In fact, when I change the values of the outcome variable to something more reasonable, the code runs and produces sensible results.)

I do have one theory about why you are perhaps not getting any results. This code uses a local macro, countries. Therefore, it must be run without interruption in one fell swoop. If you are running the code one line, or a few lines, at a time, local macro countries will go out of scope and when the -foreach- loop is reached, Stata will interpret that as an empty list, and the loop will be skipped.

Last edited by Clyde Schechter; 26 Jul 2024, 10:30.

Comment

Luis Ortiz

Join Date: Dec 2014

Posts: 95
#5

26 Jul 2024, 10:41

Many thanks, Clyde:

Sharing your code has been very useful. I believe it has got me closer to getting what I want. At least, now there's an error message out of running my code.

That's the loop as it was written in my do-file

HTML Code:

foreach c in `countries' { margins, dydx(female) at(cntry = `c') matrix b = r(b) replace country_effect = b[1,2] if cntry == `c' replace country_id = `c' if cntry == `c' }

And that's the loop as I copied from your message to my do-file. Noticeably, the lines in your code appeared numbered. That gave me hope.

HTML Code:

foreach c of local countries { 2. margins, dydx(female) at(cntry=`c') 3. matrix b = r(b) 4. replace cntry_effect = b[1,2] if cntry == `c' 5. replace cntry_id = `c' if cntry == `c' 6. }

Now, I run your code (your loop) pasted on my do-file and I got the following error message

HTML Code:

program error: matching close brace not found

The thing is that I see the close brace there. I have marked it in red. I do not understand why Stata is saying that I'm omitting it.

Thanks again

Luis

Last edited by Luis Ortiz; 26 Jul 2024, 10:43.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29147
#6

26 Jul 2024, 11:13

That is strange. The most likely situation I can think of is that somewhere further up in the code, not shown in the thread, there is some other open brace { that has no matching close brace } and Stata is still expecting a match for that. Then, when it finally reaches the end without it, you get that error message. But if that is true, the loop should have been executed and you should see results in the variable cntry_effect, because Stata won't hit that obstacle until after it finishes the loop we are working on in this thread.

Here's another possibility: sometimes when code is copy/pasted from other sources (especially Word, but also, not infrequently, this Forum) there are non-printing characters that copied along with the code. Those characters cannot be seen on the screen (they are, after all, "non-printing"). But Stata sees them when it tries to parse code and can sometimes get confused as a result. So I would also try deleting the entire loop and then re-typing its code by hand. Just be careful not to make any typos.

If it's not one of those, then I would make sure that your Stata is fully updated (run -update all-, force-) and try again. If that doesn't help, try restarting your computer and running it again. If that still doesn't work, try a clean re-install of your Stata.

If none of these things works, I think you will have to contact Stata tech support about the issue of the missing closed brace.
Comment
Luis Ortiz

Join Date: Dec 2014

Posts: 95
#7

26 Jul 2024, 11:23

Got it....¡¡

"This code uses a local macro, countries. Therefore, it must be run without interruption in one fell swoop"

Yes, that was the secret to why it didn't work. I launched the task "in one fell swoop", from the logistic regression onwards, and Stata is now producing the results that I expected.

Many thanks for your assistance and your patience, Clyde...¡'

And best wishes

Luis Ortiz
Comment

Announcement