Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Splitting sample by gender.

    Hi,

    I am currently running a logit regression estimating the effect of trade unions on the probability of being classed as 'low-paid' (binary dependent variable). I have run my initial regression with a bunch of controls. I am trying to split my sample so I can see how the probability of being low-paid differs for male union workers compared to female union workers. I have implemented the following code. Does the code achieve what I am trying to get to?

    * Model for male union workers.
    logit poverty union i.SEX i.GORWKR i.HIQUL22D i.AGEEUL i.ETHUKEUL i. MARSTA i.MPNR02 i.INDC07M i.SC20MMJ i.FTPT i.JOBTYP, r, if MALE==1

    * Model for female union workers.
    ​​​​​​​logit poverty union i.SEX i.GORWKR i.HIQUL22D i.AGEEUL i.ETHUKEUL i. MARSTA i.MPNR02 i.INDC07M i.SC20MMJ i.FTPT i.JOBTYP, r, if MALE==0

  • #2
    Yes it does. Note that including the i.SEX term in the model is pointless when you are then restricting to a single value of MALE in each -logit- estimation. So leave out i.SEX. (Why do you even have both a SEX and MALE variable? They are clearly redundant and you should use just one or the other.)

    but it will not enable you to do a formal statistical comparison between the male and female union effects, assuming you want to do that. YOu can amplify the code to accomplish that as follows:
    Code:
    * Model for male union workers.
    logit poverty union i.GORWKR i.HIQUL22D i.AGEEUL i.ETHUKEUL i. MARSTA i.MPNR02 i.INDC07M i.SC20MMJ i.FTPT i.JOBTYP, r, if MALE==1
    estimates store male
    
    * Model for female union workers.
    ​​​​​​​logit poverty union i.GORWKR i.HIQUL22D i.AGEEUL i.ETHUKEUL i. MARSTA i.MPNR02 i.INDC07M i.SC20MMJ i.FTPT i.JOBTYP, r, if MALE==0
    estimates store female
    
    
    suest male female
    lincom _b[male_poverty:1.union] - _b[female_poverty:1.union]

    Comment


    • #3
      Thanks for the help. Sorry, I forgot to remove the i.SEX term as this equation was just the baseline.

      After running your code, STATA seems to be giving me some trouble with the 'suest' command. Firstly it says 'male was estimated with a nonstandard vce (robust). Thus I took out the robust standard error term. After I had done that, it said 'INDC07M: factor variable base category conflict'.

      Is there an alternative way to achieve the results I want?

      Thank you.

      Comment


      • #4
        Also, I forgot to include:

        margins, dydx(*) atmeans

        As I want to calculate the marginal effects so I can interpret the coefficients. In a LPM model an interaction term would be applicable, but they are difficult to interpret with marginal effects from a logit.

        Comment


        • #5
          OK. I forgot about the robust variance estimates in your models. You can't use those with -suest. What you have to do, instead, is run the logistic regressions with ordinary variance estimates, and then you can specify robust variance estimates in the suest command itself.

          The base conflict error means that the male and female distributions of variable INDC07M is such that the lowest value (which is the default base value) of this variable in one of the sexes simply doesn't occur in the other sex. So Stata has to use a different base value for INDC07M in the two models, and this makes the results incompatible for use with -suest-. The way to get around this is to identify some value of INDC07M that definitely occurs in both sexes, and force Stata to use that as the base category in both sexes. So, suppose the value 4 is definitely known to occur in both sexes (in observations that have all non-missing values on all regression variables so that they are part of the estimation samples), then use ib4.INDC07M instead of i.INDC07M in the logistic regressions.

          Comment

          Working...
          X