Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Marginal effect for specific range

    I am running a logit model with a cubic term for age (age x age x age). Age ranges from 0 to 17. The cubic term is also interacted with a binary group variable (0, 1). The model looks like:

    Code:
     logit y c.age##c.age##c.age##i.group
    Is there a way to look at the marginal effect of each group, but only within a specific age-range? For example, can I use margins to look at the marginal effect of age separately for those 0-6, 7-12, and 13-17. I tried a blunt approach of:

    Code:
     margins if age<=6, dydx(age) at=(group=(0 1)) post
    That worked, but I don't know if is an appropriate approach. Further, if I wanted to compare the marginal effects between different age ranges (e.g., 0-6 vs. 7-12), I am not sure if this would work...

    I would appreciate any advice. Thanks!

  • #2
    It depends! Your command tells -margins- to ignore all observations with age > 6, and then compute the average marginal effect of age in groups 0 and 1.

    Now, I think particularly in the age ranges you are dealing with, it is reasonable to assume that any other variables that are relevant to your outcome variable, y, are very likely to have different distributions in 0-6, 7-12, and 13-17 year olds. Just because almost everything differs across those age groups. By excluding the observations in the 7-17 year olds from your analysis, you are estimating an average marginal effect conditional on all those other variables being distributed as they are in 0-6 year olds. This may very well be exactly what you want. It also may be the only thing you can get: your -logit- command doesn't include any other variables in the first place. And if this is not a simplification of your real -logit- command, then proceeding with the -margins- command you showed is your way forward.

    But it is also possible, as you don't say either way, that what you really want is an estimate of the pure, isolated effect of age, excluding the impact of the concomitant differences in the distributions of other variables. If that's what you want, you can't get it with that approach. Instead, you would first need to create a new age group variable and add it to the interaction in your regression--so you would actually have a five-way interaction term.
    Code:
    gen byte age_group = 1 if age <= 6
    replace age_group = 2 if inrange(age, 7, 12)
    replace age_group = 3 if inrange(age, 13, 17)
    logit y c.age##c.age##c.age##i.group##i.age_group other_covariates
    margins age_group#group, dydx(age)
    margins age_group#group, dydx(age) pwcompare
    This will give you true marginal average effect of age fully accounting for age-related differences in the distributions of the other covariates, as well as comparisons between them.

    Turning now to the question of comparing the age-group specific average marginal effects if you prefer to exclude the other covariates from consideration, as your current approach does, it will be difficult to do that with separate -margins- commands. So here, too, I would add an age-group variable into the interaction. But the -margins- commands would have to be different:
    Code:
    margins group, dydx(age) over(age_group)
    margins group, dydx(age) over(age_group) pwcompare

    Comment


    • #3
      Thanks as always, Clyde. Your intuition is correct, I left out other covariates for simplicity! This might be too complicated to discuss here, but by including the age_group variable into the interaction, doesn't that change the cubic-age term to something kind of like age to the fourth power (age X age X age X age group)?

      Taking a step back - here is what I am dealing with. I modeled linear, quadratic and cubic age terms. I compared them for model fit and the cubic fit the data best. When I plotted the age effects, I was interested in the fact that the curve was much steeper in certain places than others, and I thought it would be good to characterize the marginal effects in the different parts of the age range. Now, if I were to go back to the model and add "age_group" to the interaction, would the resulting estimates be different from the model that does not include "age_group"?

      Comment


      • #4
        You raise a very good question.

        No, adding in age group does not convert the cubic into a quartic. However, it does create a different model as you now have three cubics, one for each age group, and the cubics in those age groups can be different from each other. Now, if the cubic polynomial model of age is really a good fit to the data over the full range 0-17, then the linear, and probably the quadratic terms of those three age-group specific cubic terms will be approximately the same as before, but the cubic term (and possibly the quadratic one) is likely to be attenuated towards 0, perhaps extremely so. That's because when you narrow the range of the independent variable, a cubic curve starts to look more quadratic, and ultimately more linear as the range gets small enough. (Think about a Taylor series expansion: f(x+h) = f(0) + c1h + c2h2 + c3h3 + ... If h is small enough, just the f(0) + c1h is already a good approximation to f over the range from x to x+h. If h is a bit larger, you may need the quadratic term to keep reasonable approximation to f but you can still ignore the cubic. Only when h gets larger still does the cubic term become non-negligible.) Nevertheless, even if you end up with negligible cubic terms in all three age groups, so that you end up dropping the cubic terms out of the model and run with three age-group specific quadratics (or possibly even just three linear functions), that should still give you a very nice fit to the data--probably as good as or better than the original overall cubic. (In fact, the fit should be better because three quadratics give you 6 df, whereas a single cubic gives you only 3. You know what they say about fitting an elephant?)

        This reasoning, however, elides the question of how this might affect the estimation of the average marginal effects of age in each of the three regions. The marginal effect is, after all, just the first derivative. And it is well understood that two functions f and g can be very close in their values (i.e. |f(x)-g(x)| arbitrarily small for all x) and yet be wildly different in their first derivatives at any point. But the average marginal effect is not just a first derivative evaluated at a point. It is an average over all points, and it seems to me that the mean value theorem implies that although f'(x) and g'(x) can be very different at many points, it must still be the case that the average value of f'(x) and the average value of g'(x) are close. I don't know how to prove that quickly off the top of my head, and I don't have the time to work that out today, but my intuition that it's true is very strong. Of course, my intuition might be wrong. Take it for whatever you think it's worth. My conclusion is that for purposes of estimating average marginal effects in the three groups, it will be OK. Imperfect, but OK.

        Added: Here's something else that occurred to me just now. The problem we are facing is how to keep the differing covariate distributions from affecting the age-group specific average marginal effects, while preserving the original overall 0-17 cubic function. So one possibility would be to stipulate specific values for all of the covariates to be anchored to for the calculations. So something like this:

        Code:
        gen byte age_group = 1 if age <= 6
        replace age_group = 2 if inrange(age, 7, 12)
        replace age_group = 3 if inrange(age, 13, 17)
        logit y c.age##c.age##c.age##i.group other_covariates
        margins group, dydx(age) at((omean) other_covariates) over(age_group)
        You might prefer omedian to omean in the -at()- specification, or some other summary statistic, especially for the discrete covariates. N.B., the o's in front of omean and omedian are not typos. They specifically tell Stata to use the overall mean (resp. median) values of the covariates in the calculation, not the age-group specific ones.
        Last edited by Clyde Schechter; 23 Sep 2024, 12:12.

        Comment


        • #5
          Another strategy to consider relies on the undocumented -generate- option in -margins- which yields the observation-specific margin(s) of interest.
          Code:
          help margins_generate
          Once you've generated the observation-specific estimates of the margins of interest you can then summarize them for subsamples of interest, e.g.:
          Code:
          sysuse auto
          
          qui logit foreign c.mpg##c.mpg
          margins, dydx(*) gen(marg)
          
          sum marg
          sum marg if mpg>20
          which yields
          Code:
          . margins, dydx(*) gen(marg)
          
          Average marginal effects                                    Number of obs = 74
          Model VCE: OIM
          
          Expression: Pr(foreign), predict()
          dy/dx wrt:  mpg
          
          ------------------------------------------------------------------------------
                       |            Delta-method
                       |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                   mpg |   .0285022   .0082392     3.46   0.001     .0123537    .0446507
          ------------------------------------------------------------------------------
          
          .
          . sum marg
          
              Variable |        Obs        Mean    Std. dev.       Min        Max
          -------------+---------------------------------------------------------
                 marg1 |         74    .0285022    .0079669   .0119041   .0389524
          
          . sum marg if mpg>20
          
              Variable |        Obs        Mean    Std. dev.       Min        Max
          -------------+---------------------------------------------------------
                 marg1 |         36    .0348588    .0048751   .0144163   .0389524

          Comment


          • #6
            Clyde - thanks for the very detailed response, that is interesting. I need to sit with it for a bit.

            John - thanks, I never knew about generate in margins. I am not sure I follow, though. When you obtain the the mean after 'sum marg' - is that just the average marginal effect for all variables in the model? Then, I see you restrict it to mpg>20. But in my situtation, it would be interesting to compare, for instance, the AMEs in different ranges of MPGs (i.e., age ranges).

            Comment


            • #7
              Also, Clyde: "quartic"! Never knew that was the word for the fourth power.

              Comment


              • #8
                re: #6:
                When you obtain the the mean after 'sum marg' - is that just the average marginal effect for all variables in the model?
                Indeed, this is exactly how the standard margins result is computed in a simple unweighted example like this.

                it would be interesting to compare, for instance, the AMEs in different ranges of MPGs (i.e., age ranges).
                This is straightforward. I could have summarized the estimated margin over three sub-ranges of mpg, e.g.
                Code:
                sysuse auto
                
                qui logit foreign c.mpg##c.mpg
                margins, dydx(*) gen(marg)
                
                sum marg
                qui sum mpg
                gen tmpg=autocode(mpg,3,`r(min)',`r(max)')
                bysort tmpg: sum marg
                P.S. Unsolicited comment: In numerous posts over the years I've urged Stata to include -generate- in the standard -margins- postestimation documentation. In my experience you are far from alone in being unaware of it and my conjecture is that many people who use margins would find -generate- useful if they knew about it. Stata evidently feels differently.

                Comment


                • #9
                  Thanks John. This is interesting, I never thought of doing it this way. If you were to generate those three AMEs, how would you compare them? I know the margins and lincom approach, but here you have these estimates as means, so ... a t-test...?

                  Comment


                  • #10
                    I, too, was unaware of the -generate()- option in -margins-. I think John Mullahy's approach does exactly what O.P. needs. Thanks for chiming in, John.

                    Comment


                    • #11
                      John - what would be your approach to testing differences between the marginal effects that you generate?

                      Comment


                      • #12
                        Robbie: I'd be speculating as to what's the best approach so will leave it to others to provide answers.

                        Comment

                        Working...
                        X