Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Margins Command After Logit with Aggregated Data

    Edit: Using Stata 14.2.

    Long story short, I'm running a logit on aggregated data and when I attempt to run margins in order to set up predicted effects plots the predicted value are ludicrously large and beyond [0,1]. This only happens when doing grouped logit, so not sure if this is unique to group logit.

    Longer story is I have a dataset that consists of candidates, the total number of ads they've run, and the total number of ads with a certain characteristic. I'm investigating the proportion of a candidate's ads include a characteristic (ex: mentioning ideology) based on district competitiveness. Here is an example of the dataset, restricted to the variables most relevant to the question I'm asking:

    Code:
    +-------------------------+-------------+-------------+
    | cand_id                 | total_ideol | total_aired |
    +-------------------------+-------------+-------------+
    | "BELK_JUDY_MCCAIN_2002" |      0      |     1010    |
    +-------------------------+-------------+-------------+
    | "BONNER_2006"           |      0      |     477     |
    +-------------------------+-------------+-------------+
    | "BELK_JUDY_MCCAIN_2004" |      0      |     245     |
    +-------------------------+-------------+-------------+
    | "BECKERLE_2006"         |      0      |     169     |
    +-------------------------+-------------+-------------+
    | "BONNER_JO_2004"        |      0      |     1126    |
    +-------------------------+-------------+-------------+
    | "BONNER_JO_2002"        |      0      |     414     |
    +-------------------------+-------------+-------------+
    | "BYRNE_BRADLEY_2014"    |     223     |     223     |
    +-------------------------+-------------+-------------+
    | "ROGERS_MIKE_2002"      |     1757    |     3508    |
    +-------------------------+-------------+-------------+
    | "SEGALL_JOSH_2008"      |      .      |     1379    |
    +-------------------------+-------------+-------------+
    | "ROGERS_MIKE_2008"      |      .      |     1638    |
    +-------------------------+-------------+-------------+
    | "ROGERS_MIKE_2004"      |      0      |     389     |
    +-------------------------+-------------+-------------+
    | "TURNHAM_JOE_2002"      |      0      |     2207    |
    +-------------------------+-------------+-------------+
    | "ADERHOLT_2000"         |     320     |     1449    |
    +-------------------------+-------------+-------------+
    | "FOLSOM_2000"           |      0      |     1053    |
    +-------------------------+-------------+-------------+
    | "ADERHOLT_ROBERT_2012"  |      0      |      1      |
    +-------------------------+-------------+-------------+
    | "BACHUS_2000"           |      0      |     136     |
    +-------------------------+-------------+-------------+
    | "LESTER_MARK_2014"      |      0      |     183     |
    +-------------------------+-------------+-------------+
    | "PALMER_GARY_2014"      |      0      |     242     |
    +-------------------------+-------------+-------------+
    | "BAILEY_PENNY_2012"     |      0      |      35     |
    +-------------------------+-------------+-------------+
    | "RENZI_RICK_2002"       |      0      |     2034    |
    +-------------------------+-------------+-------------+
    While in broader paper I'm just using OLS because of other things, I wanted to include an appendix showing the results are robust to using logit since this is a [0,1] type of DV. Based on what I've understood, this is basically logit with aggregated data, where candidate is the "group", number of trials is the total number of ads aired, and the number of success is the number of ads with candidates. So, I run my group logit in Stata using the advice provided here: https://www.stata.com/support/faqs/s...-grouped-data/ . This results in code that looks like this:

    Code:
    glm total_ideol ip_margin mrp_mean party incumbent open female ///
    dime_score log_spend vote_share opponent_neg, vce(cluster statdist_cen) ///
    link(logit) family(binomial total_air)
    And produces the following output (just showing coefficients, though can add the rest late if needed):

    Code:
    +----------------+-----------+--------------------+-------+----------+----------------------+
    | total_ideol |  | Coef.     | Robust Std. Errors | z     | P>|z| | [95% Conf. Interval] |
    +----------------+-----------+--------------------+-------+----------+-----------+----------+
    | ip_margin |    | 1.214657  | .6524683           | 1.86  | 0.063    | -.0641568 | 2.493472 |
    +----------------+-----------+--------------------+-------+----------+-----------+----------+
    | mrp_mean |     | 1.147326  | .4847957           | 2.37  | 0.018    | .1971439  | 2.097508 |
    +----------------+-----------+--------------------+-------+----------+-----------+----------+
    | party |        | .4515572  | .3877767           | 1.16  | 0.244    | -.3084712 | 1.211586 |
    +----------------+-----------+--------------------+-------+----------+-----------+----------+
    | incumbent |    | -.0343667 | .1807825           | -0.19 | 0.849    | -.388694  | .3199605 |
    +----------------+-----------+--------------------+-------+----------+-----------+----------+
    | open |         | .4163194  | .2065463           | 2.02  | 0.044    | .0114962  | .8211426 |
    +----------------+-----------+--------------------+-------+----------+-----------+----------+
    | female |       | -.4038649 | .2619799           | -1.54 | 0.123    | -.9173362 | .1096063 |
    +----------------+-----------+--------------------+-------+----------+-----------+----------+
    | dime_score |   | .123921   | .1910771           | 0.65  | 0.517    | -.2505833 | .4984253 |
    +----------------+-----------+--------------------+-------+----------+-----------+----------+
    | log_spend |    | -.3341911 | .1872252           | -1.78 | 0.074    | -.7011457 | .0327635 |
    +----------------+-----------+--------------------+-------+----------+-----------+----------+
    | vote_share |   | -.7543874 | 1.313943           | -0.57 | 0.566    | -3.329668 | 1.820893 |
    +----------------+-----------+--------------------+-------+----------+-----------+----------+
    | opponent_neg | | .0146609  | .2361849           | 0.06  | 0.951    | -.448253  | .4775748 |
    +----------------+-----------+--------------------+-------+----------+-----------+----------+
    | _cons |        | -.8380754 | 1.192996           | -0.70 | 0.482    | -3.176305 | 1.500154 |
    +----------------+-----------+--------------------+-------+----------+-----------+----------+
    Logit coefficients are hard to understand, so I decide I want to do a predicted effects plot using margins as I usually do. So I set up the margins command as followed:

    Code:
    margins, at(ip_margin=(-.3(.1).3) ///
    (mean) log_spend vote_share opponent_neg dime_score mrp_mean ///
    female=0 incumbent=0 open=0 party=1)
    Which is where the problems occur:

    Code:
    +--------+-----------------+----------+--------+------------+-----------+----------+
    | Margin |                 |     z    | P>z | [95% Conf. | Interval] |          |
    |        |   Delta-Method  |          |        |            |           |          |
    |        | Standard Errors |          |        |            |           |          |
    +--------+-----------------+----------+--------+------------+-----------+----------+
    | _at    |                 |          |        |            |           |          |
    +--------+-----------------+----------+--------+------------+-----------+----------+
    | 1      |     42.84476    | 12.86015 | 3.33   | 0.001      | 17.63933  | 68.05018 |
    +--------+-----------------+----------+--------+------------+-----------+----------+
    | 2      |     48.11459    | 12.41373 | 3.88   | 0.000      | 23.78413  | 72.44505 |
    +--------+-----------------+----------+--------+------------+-----------+----------+
    | 3      |     53.99641    | 12.20703 | 4.42   | 0.000      | 30.07108  | 77.92174 |
    +--------+-----------------+----------+--------+------------+-----------+----------+
    | 4      |     60.55199    | 12.62734 | 4.80   | 0.000      | 35.80286  | 85.30112 |
    +--------+-----------------+----------+--------+------------+-----------+----------+
    | 5      |     67.84699    | 14.13677 | 4.80   | 0.000      | 40.13943  | 95.55455 |
    +--------+-----------------+----------+--------+------------+-----------+----------+
    | 6      |     75.95055    | 17.07385 | 4.45   | 0.000      | 42.48643  | 109.4147 |
    +--------+-----------------+----------+--------+------------+-----------+----------+
    | 7      |     84.93473    | 21.55822 | 3.94   | 0.000      | 42.68139  | 127.1881 |
    +--------+-----------------+----------+--------+------------+-----------+----------+
    As you can see, those estimates are clearly out of the [0,1] range. I've run the model using OLS and just a normal logit (I have the proportions calculated as well but that's obviously not appropriate), and margins run perfectly fine and gives results between [0,1]. Granted, OLS does produce some results below 0 at the extreme values of in-party vote, but that's OLS for you. It's only with grouped logit that the effects are extreme. I even calculated by hand by multiplying the coefficients with the assigned values of my IVs: they did not equal the predicted margins.

    My only thought is, since this is only with this model, is it for some reason converting the predicted margins back into "total number of successes" somehow? I really just can't think of any reason I should be getting such high numbers for marginal effects.
    Last edited by Brandon Marshall; 20 Sep 2019, 11:36.

  • #2
    -margins- is doing exactly what you have asked it to do; your expectation that the results will fall in the [0,1] interval is incorrect.

    Following -glm-, the default statistic for -margins- to calculate is mu, the expected value of the -glm- outcome variable. In your grouped logistic regression model, the outcome variable is the number of observations with total_ideol != 0, it is not the probability. Your -glm- command specified a binomial distribution, so the outcomes range not from 0 to 1 but from 0 to total_air.

    If what you want from -margins- is the probability of total_ideol != 0, then you need to specify that in the -expression()- option as -expression(predict(mu)/total_air)-.

    Comment


    • #3
      Alright, so it was, basically, converting it into predicted number of successes. I was thinking that might have been the case but wasn't sure. First time using logit with aggregated data. Ran the code again with your suggestion and got the following predicted margins:

      Code:
      +-------+---+----------+--------------+------+-------+----------------------+
      |       |   |          |              |      |       |                      |
      |       |   |          | Delta-Method |      |       |                      |
      |       |   |  Margin  |   Std. Err.  |   Z  | P>|z| | [95% Conf. Interval] |
      +-------+---+----------+--------------+------+-------+-----------+----------+
      | _at | |   |          |              |      |       |           |          |
      +-------+---+----------+--------------+------+-------+-----------+----------+
      | 1     | | | .0440943 | .0131722     | 3.35 | 0.001 | .0182773  | .0699114 |
      +-------+---+----------+--------------+------+-------+-----------+----------+
      | 2     | | | .0488858 | .0125276     | 3.90 | 0.000 | .0243322  | .0734395 |
      +-------+---+----------+--------------+------+-------+-----------+----------+
      | 3     | | | .0541677 | .0121316     | 4.47 | 0.000 | .0303903  | .0779452 |
      +-------+---+----------+--------------+------+-------+-----------+----------+
      | 4     | | | .0599834 | .0123637     | 4.85 | 0.000 | .0357511  | .0842158 |
      +-------+---+----------+--------------+------+-------+-----------+----------+
      | 5     | | | .0663786 | .0136649     | 4.86 | 0.000 | .039596   | .0931613 |
      +-------+---+----------+--------------+------+-------+-----------+----------+
      | 6     | | | .0734011 | .0163349     | 4.49 | 0.000 | .0413854  | .1054168 |
      +-------+---+----------+--------------+------+-------+-----------+----------+
      | 7     | | | .0811004 | .0204442     | 3.97 | 0.000 | .0410306  | .1211703 |
      +-------+---+----------+--------------+------+-------+-----------+----------+
      Now, that looks a lot better. What I do want to know, though, is if this can be interpreted as an estimated proportion of candidate ads? Because that is essentially what I want to estimate.

      Comment


      • #4
        Yes, this would be the estimated proportion of candidate ads that are total_ideol != 0, assuming that total_air represents the total number of candidate ads, adjusted to the particular values of the predictors that you specified in the -at()- option.

        Comment


        • #5
          Yes, total_air is the total number of advertisements that the candidate ran in that election. Thank you for helping with this: it's been bugging me all week.

          Comment

          Working...
          X