Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Controls and non-variation in logits

    Hi Statalist readers,

    I am trying to design controls for a logistic regression to extend Qian and Fullers's analysis of data from Canada's Labour Force Survey during COVID-19 (2020; doi: 10.3138/cpp.2020-077). My plain logistic regression is to measure gender employment gaps ("lfs", or labour force status) according to survey month for single parents of younger and older school-age children. It is as follows:

    Code:
    logit lfs sex##survmnth##loneyg[pweight=finalwt], or
    So far, so good. Now trying to implement some controls:

    Code:
    logit lfs sex##survmnth##loneyg age_12 edu labour naics_21 noc_40 cowmain age25 age30 age35 age40 age45 age50 widowed separated divorced snmarried imm10 imm11 nonimm prov tenure [pweight=finalwt], or
    I (somewhat predictably) get the error: "outcome does not vary; remember: 0 = negative outcome, all other nonmissing values = positive outcome". The one control variable which causes this error to occur is "tenure" (a continuous variable which measures months of employment with present employer). Of course, among those where "tenure">0, their value on "lfs"=1, invariably. So the error makes sense, per my understanding of Stata.

    Only, when Qian and Fuller measure the same dependent variable with a logistic regression, they state: "We also include a continuous variable to measure job tenure ["tenure"] with employer (in months). Note that for employed respondents we measure the attributes of their current job, whereas for respondents who were not currently employed, we measure the attributes of their last job" (2020: 590). The latter variable refers to "prevten" or tenure with previous job (in months), which also causes the same error when included.

    I'm trying to reconcile Stata with the controls, as described by Qian and Fuller. I think the main issue may be that I don't know how to implement Qian and Fuller's bolded "note" (above), on measuring tenure for the currently employed and measuring "prevten" for those not currently employed. They seem to imply this is done respectively. What kind of code can I use to also treat "tenure" as a control variable in a logistic regression for employment (as Qian and Fuller do), without running into the issue of "outcome does not vary"?


  • #2
    Well, if you keep tenure and prevten as separate variables, then, no, you will not be able to do this. You've already explained why, so I won't repeat what you said, just to reinforce that you have it right.

    But I interpret the passage you quote from their article differently. I think they have a single tenure variable whose value is tenure in the current job if the person is currently employed and is the tenure in the previous job if they are not employed. If you combine your two variables, I think you will overcome your problem. Whether it makes sense to define a single tenure this way, I don't know. It seems a bit odd to me, but I have no expertise in this area. If you are unsure, I suggest you discuss it with a labor economist. I realize it passed peer review in the journal they published in, but lots of crap finds its way into even the best journals, so I wouldn't way that heavily.

    If you think I am misinterpreting what they said, I suggest contacting them directly and asking them.

    Completely as an aside, from your use of the ## operators, I infer that you are familiar with factor-variable notation. So I'm surprised that you are throwing around a large number of indicator variables for age groups, marital status, and immigration status. You can make the code simpler to write and easier to read if you instead create polytomous age, marital status and immigration variables and enter them into the regression with the i.prefix, letting Stata create "virtual" variables for you. A side benefit is that the output will be better organized and easier to read.

    Comment


    • #3
      Thank you, Clyde Schechter! Your interpretation of combining the variables seems to be correct. I did so with the code:

      Code:
      gen tenlfs= tenure if tenure!=.
      replace tenlfs= prevten if tenure==. & prevten!=.
      This now works among my other controls, which are also much more neatly organized thanks to your most helpful aside which reminded me of the use of the i. and c. prefixes.

      Comment

      Working...
      X