Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • The Effect of Parenthood on Earnings and Hours worked

    Hi All,

    I'm working on trying to find the effect of parenthood on wages earned and hours worked by the parents.
    I'm using the BHPS and I've created two subsets looking at married males and married females who are aged between 21 and 45 and who have a degree.
    I've created a variable newchild which shows when the respondent has a new child this period when compared with last period.

    I've created a dummy variable to =1 when the person lives in London, but this omits itself through collinearity. Should I just use the original variable, region but then this wouldn't be indicating to me if they live in London, right?

    For my gross pay variable, I'm using gross pay last period and I've linked it back to the variable that tells me how long the period lasted in weeks - should I get the log of this before running my regression?

    The equation I've used, to work out the wages earned is,
    Code:
    regress weeklypay region age newchild highesteducat london
    Is this code correct?

    Highereducat omits itself because I've limited my sample to look at people who have a degree, so I don't need to include it in my regression?

    Thanks you for any help you can offer?

    Rebecca
    Stata IC 15


  • #2
    So is this longitudinal data, with multiple observations of the same adults over time? Or do you just have a single cross-section, with each adult represented once in the data set?

    It is by no means apparent from your description why your london indicator should end up omitted due to colinearity. Colinearity with what? Are you running a panel-regression with fixed-effects? If so it might be colinear with the fixed effects, assuming that people don't move in and out of London during the period of the study observation. If that is the case, any effects London residence might exert, are already adjusted for by the fixed effects, and you should just forget about it. But if this is a single cross-section, you need to look at why London is colinear with some other variable. Is London coextensive with one of the regions, perhaps? If so, again, any effect it might exert is already adjusted for by the region variable, and you should just forget about London. But, before you draw any conclusions, you should really verify that one of these things is actually going on.

    The code you show is one possible model. If you have panel data, it is probably an inadequate model because it will not adequately account for the non-independence of repeated observations within participant, and you would be better off using one of the -xt- commands for this. Again, if london and region are colinear, you can't have them both in the model, and it is best to just omit london.

    If you have looked only at people who have a degree, then, yes highesteducat will be a constant, not a variable, and cannot be in the model. If you don't take it out yourself, Stata will do it for you.

    As for whether to log transform your data, that really depends on what you expect the relationship between weeklypay and these variables to be like. If you expect weeklypay to vary exponentially with (some combination of) these variables, then, yes you should log-transform weekly pay to capture that relationship in a linear regression. In this kind of model you would be saying that having a new child is associated with a certain constant multiplicative factor effect on weeklypay, rather than adding or subtracting certain amount to weekly pay.

    But if there is no reason to think that it works that way, then you shouldn't. Sometimes this is also done to deal with variables that have highly skew distributions, and I can easily imagine that pay could be one of those. On the other hand, there are likely to be observations where weeklypay is zero, and that would rule out using the log transformation altogether. If you have a variable with zeroes and a highly skew distribution, then you need to look at other transformations that reduce the skew, such as square root or cube root, In any case, you should try the straight linear regression first and see how well it fits the data: even with skew variables the linear model is often correct.

    Finally, I'm surprised that you do not include gender, and a gender#newchild interaction in your model. I'm no labor economist by any stretch of the imagination, but just having been around the block a few times in life, my expectation would be that the effects on earnings of having a child could differ greatly for men and for women. Indeed, it would surprise me greatly if they do not.

    Comment


    • #3
      This is a longitudinal data set,

      Yes, I'm running a panel data with fixed effects as well as OLS to check for endogeneity. I'll have another look at the variable and see which is happening before making a decision on what to do with the variable.

      I'm using the xt command as well for one of my regressions but with the same variables.

      Thank you for the advice on the logging of the variable, I'll have a look at the skew of the variable.

      I've separated them out to look at them separately, so one subset focuses only on males and the another subset focuses only on females, so as I've only looked at the gender of one person in the data set I thought I wouldn't need to include gender in my code as it would also be a constant?

      Thanks,

      Rebecca


      Comment


      • #4
        It is true that if you do separate analyses for males and females you do not need to include a gender variable in those analyses.

        But bear in mind that if you do separate analyses for males and females, you may be unable, in the end, to do any comparisons/contrasts between males and females. Of course, if that is not part of your research goal, then this is not a problem. But if you ultimately want to do that, you may find yourself stymied if you start with separate analyses. If all of your analyses are done with simple -regress-, then -suest- will enable you to combine those results and do comparisons. But given that this is longitudinal data, -regress- will probably be incorrect for much if not all of what you want to do. The -xtreg- commands, which are usually the appropriate way to analyze longitudinal data cannot be used with -suest-. Separate -xt- analyses for males and females will not support cross-gender comparisons.

        Comment


        • #5
          Thank you, I think I want to include that in my analysis, I want to look at males and compare them with males and then females and compare them with females but I also think a comparison between males and females would be good to include.

          So if I want to do a comparison I should include males and females into my dataset and then in to my xtreg command?
          Do I need to create a dummy variable or would the inclusion of gender in my code be enough?

          Thank you for all your advice, I really appreciate it

          Rebecca

          Comment


          • #6
            So your model would need to include both gender and its interaction with all of the other variables. So, looking at your proposed regression in #1, the modification of that would be:

            Code:
            regress weeklypay i.gender##(i.region c.age i.newchild i.highesteducat i.london)
            Now, we have had some discussion above about that model and perhaps it has changed as a result. But the approach is the same: i.gender##(list of regressors). Each regressor in the parentheses should be prefixed with i. if it is categorical and c. if it is continuous. The same approach works with -xtreg-, and, for that matter, with all official Stata estimation commands (and many user-written ones ass well). This is called factor-variable notation and you can read more about it at -help fvvarlist-. Assuming you already have a gender variable in your data set, you do not need to create any new variables at all.

            After that you could get the expected values of weekly pay for men and women, adjusted for the other variables, with
            Code:
            margins gender
            If you are interested in contrasting the effects of a new child on men and on women (again, adjusted for everything else), that would come from:
            Code:
            margins gender, dydx(newchild) pwcompare(effects)
            If you are not familiar with the -margins- command, I think the best introduction to it is Richard Williams' Stata Journal article at http://www.stata-journal.com/sjpdf.h...iclenum=st0260. It covers the basics of the command, including everything you need to understand what the code suggested above is doing. After that, the manual chapter covers the more advanced features well. (The manual chapter covers the basics, too, and has plenty of nice worked examples, but I think Richard's article is easier to understand.)



            Comment

            Working...
            X