Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Event study/ DID/ Individual FE/ Time&County FE

    Hi,

    I am new to Stata and in need of an advice on how to translate my model specification into code. I have an individual level panel data set, including 11 million observations from 21 counties covering a period of 19 years. For the purpose of my project, I have 2 time dimensions- calendar (t) and event year (e). I want to use a difference in differences model with individual fixed effects, as well as to add county time fixed effects. The goal is to estimate the effect of being able to take sick leave before the first childbirth (event year -1) on the long term propensity to be on sick leave afterwards. There is an exogenous regional variation that affects the likelihood of the mother to get sick leave. The individual propensity to be on sick leave is measured by their history of sick leave withdrawals in the years before pregnancy (event years from -14 to -2).
    This is my specification:

    SLict = αi + Σ17e=-1 αφ 1[ e=φ ] +αtc+ Σ17e=-1 αφ 1[ e=φ ]* HL + βiXit + βcXct + εict

    Where:

    - SL is number of sick leave days per year for individual i, living in county c, in calendar year t

    - αi is individual effect parameter which captures individual's propensity to be on sick leave during event years -14 to -2, which is to be used as a reference point

    - αφ captures how much has the individual's propensity to be on sick leave has changed from its reference point during event years -1 to 17. The term in brackets is equal to 1 when φ=e (event year e=-1 to 17)

    - αφ * HL is an interaction between the parameter capturing the individual propensity to be on sick leave in event years -1 to 17 and the county’s leniency in that year. HL is dummy variable indicating weather the county is lenient or strict on giving sick leave.

    - αct county time fixed effects for given calendar year, because time trends are very regional

    - Xit is an individual time varying covariant, individual characteristic that change over time, which includes indicator variables for additional children, length of education, work sector, mother’s income, father’s income, household’s disposable income.

    - Xct county level characteristics that change over time, such as county level unemployment rates in calendar year.


    My supervisor advised me to use xi:areg with absorb(id) and cluster(id). I don't know how to code it in a way that I can get all of the parameters for each event year. I'm confused about the whole code to be honest. It is way more complex than anything I have ever done. Therefore any help is more than welcome!



  • #2
    Welcome to Stata list. You didn't get a quick response. You'll increase your chances of a helpful answer by following the FAQ on asking questions – provide Stata code in code delimiters, readable Stata output, and sample data using dataex. Being able to replicate your problem is often helpful in helping you. A shorter, more focused question is also more likely to get an answer. By trying to write in Stata, we know exactly what you're trying to say whereas interpreting different individuals efforts to write out equations is often quite challenging.

    It certainly sounds like you have a panel data analysis with more than one panel variable. reghdfe is generally recommended for that. You need to look at factor variable notation since you have a great many interaction terms.. I would also recommend trying to program parts of your model and build up – it may be less daunting that way. Given the extremely large number of variables you are trying to incorporate in the model, you may find it useful to set up each kind of variable in a local macro rather than having an immense estimation statement although smart use of factor variable notation may solve this to a large extent.

    Comment


    • #3
      Hi Phil,

      I appreciate that you took the time to write to me. It's good to have some feedback and put things in perspective
      My analysis involves multiple fixed effects. I was recommended yesterday to include industry fixed effects together with the county time and individual fixed effects. From your personal experience, does that work better with xi: areg command or with reghdfe? The purpose of including the fixed effects in my analysis in to see how robust my results are once I account for the fixed effects, not particularly interested the fixed effects estimates per se.

      Comment


      • #4
        Marija:
        if you actually have panel data, as Phil surmises, the community-contributed command -reghdfe- is the way to go.
        I'm not sure I follow you about the robustness check you have in mind. in my opinion, the issue is to consider whether or not all those fixed effects make sense when contrasted against the data generating process you're investigating.
        As an aside, whenever we consider robustness, I think we should answer ourselves: with respect to what?
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Hi Carlo,

          Yes, it is a panel data. I am trying to estimate the effects of sick leave benefits that are available to mothers during the pregnancy on long term sick leave outcomes later on.
          My hypothesis is that mothers who have taken sick leave during their pregnancy have lower propensity to be on sick leave after the child is born. However, there has been a variation on how strict the insurance agency is on implementing sick leave policies across time and regions. Therefore I am including the time county fixed effects. On the other hand, different regions have different industry structure, which means that I would have to include industry fixed effects as well. The differences in unobservables between individuals are expected to be captured by the individual fixed effects.
          I am planning to estimate a baseline model with only the individual fixed effects and then add all other fixed effects.

          Comment


          • #6
            Marija:
            thanks for clarifying.
            However, from your clear description, it may be that interaction between regions and industries can replace the related fixed effects.
            I would recommend to discuss with your supervisor whether interactions are worth investigating in your regression.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Hi Carlo,

              Thank you for your input.
              You have right. The way the interaction term is expressed in my specification,could substitute the fixed effects.
              I think I should change the leniency indicator to whether the mother was living in a lenient region in event year -1 (instead for from -1 to 17) and let it interact with the yearly estimates, to see how much the propensity to be on sick leave has changed for those mothers. That should make more sense if I'm not wrong again

              Comment


              • #8
                Marija:
                the partial revision of your research strategy looks interesting.
                As an aside, using -fvvarlist- notation for interactions links your regression to the wonderful -margins- and -marginsplot- commands.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment

                Working...
                X