Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issues with Logistic Regression with Binary Independent Variables

    I'm fitting a logistic regression of a sports outcome as a function of what players were playing that point. The dependent variable is whether or not the team scored a point, and the independent variables (mostly) are binary variables indicating if a player was playing at that time.

    I have a couple of issues/questions.
    1. The first is about the intercept. When all independent variables are zero, this indicates that no players at all are playing. That will obviously not lead to a 50/50 chance of scoring a point. For this, I used:
    logit Score ... , nocons off(SmallNumber)

    where SmallNumber is a negative number such that the model still converges (I've been using -10). I do this to indicate to the model that the likelihood of scoring a point should approach zero if no players are playing.

    Is this ok? What consequences will this have?

    2. I'm also unsure about the interpretation. If I get an odds ratio of 3.0, does that indicate that my odds of scoring a point are 3x higher when that player plays "compared to when she does not? aka when we play one player short?"

    Also, with odds ratios for all players, is there a way I can calculate the odds of scoring a point when player X plays "relative to if player Y had played instead?"

    Thanks a lot!

    Eliot Alexander

  • #2
    After some more dabbling:

    Related to question 2, the answer is that yes, the odds ratio estimates the odds of scoring a point relative to when we play with one fewer player than the opponent. I've set the likelihood of scoring with zero players close to zero. The likelihood of scoring with an example lineup decreases from 72% to 62% to 35% to 23% to 1% to essentially 0% as the team plays with one fewer player each time. This is somewhat realistic.

    Comment


    • #3
      You'll increase your chances of a useful answer by following the FAQ on asking questions. I don't know what sport you are looking at, but the first lines of your post seem to suggest you're looking at specific individuals, not the number of players on the field. Is there really a sport that doesn't require a minimum number of players on the field? If you're looking at specific individuals and the number of individuals on the field varies, then you probably need a control for number of individuals on the field. If you have a model based on individuals, and don't include all the individuals, then you will have omitted variables bias. For example, if Bill tends to play when John plays, and you omit John, Bill may appear productive when it is John who is productive.

      While strictly speaking, the intercept is the value if all the x's are zero, it is often not interpreted. For example, if all the x's are never zero, it may not make a lot of sense to talk about what if they're all zero. If the number of players varies, it is also likely that the effect of a player varies with number of players. In hockey, an individual may have a different effect when two players are off on penalties than when none are.

      You need to look at the margins command. It will let you generate odds ratios or predicted values at specific values of the x's so you can compare the predicted probabilities or odds ratios for any permutation of x values. I personally find predicted values easier to interpret, but this is a matter of taste.

      Comment


      • #4
        I did read the FAQs; what should I have changed in my post?

        The sport is ultimate frisbee - where there number of players is always seven.

        Comment

        Working...
        X