Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Predict variable from fixed effects regression

    Dear community,

    in my dataset, the last year (2020) is missing. Therefore, I want to forecast the value of my variable "human capital" for this year. I want to to this with a fixed effects regression with a country-specific time trend and use the estimation results to predict the value of the dependent variable (human capital) for each country for the year 2020. The data used for the regression is until 2019.

    My panel id is country_enc and the time variable is year.

    I wanted to do:

    Code:
    xtreg hc year##c.country_enc, fe
    however this produces the warning "the panel variable country_enc may not be included as an independent variable". I am however unsure how to change the estimation command appropriately and I'm also not sure about the correct command to predict the values for 2020.

    Does someone have an advice?

    Best wishes
    Noemi

  • #2
    You don't give example data to work with, but here's how I would do this using the online Grunfeld data set. It should be straightforward to adapt the code to your data set.

    Code:
    webuse grunfeld, clear
    
    xtset company year
    
    xtreg mvalue c.year#i.company if year < 1954, fe
    predict predicted_mvalue if year == 1954
    list if year == 1954

    Comment


    • #3
      Dear Clyde Schechter

      thank you for your response! My problem is, I don't have the year I want to predict in the data. So the 1954 in your example is missing in my data set. Also, my id variable, which is company here, is in my case the countryname, which I cannot include as a factor variable for the interaction.

      Best
      Noemi

      Comment


      • #4
        I don't have the year I want to predict in the data. So the 1954 in your example is missing in my data set.
        Well, then you need to create observations for that year. Like this:
        Code:
        ebuse grunfeld, clear
        
        xtset company year
        summ year
        
        xtreg mvalue c.year#i.company, fe
        
        // CREATE OBSERVATIONS WITH YEAR == 1955
        expand 2 if year == 1954
        
        // NOW PREDICT VALUES
        by company year, sort: replace year = 1955 if year == 1954 & _n == _N
        predict predicted_mvalue if year == 1955
        list if year == 1955
        Also, my id variable, which is company here, is in my case the countryname, which I cannot include as a factor variable for the interaction.
        I think you are misreading that error message, which, by the way, is really not very clear. You tried to include the country name as a continuous variable, and that is not legal. You put a c. in front of country and nothing in front of year. That's exactly backwards because it designates year as discrete and country as continuous. Treating country as a continuous variable makes no sense in any context. And to get a time trend, you want year to be continuous, not discrete. Change it to c.year##i.country_enc and it will run.
        Last edited by Clyde Schechter; 17 Sep 2024, 15:42.

        Comment


        • #5
          Dear Clyde Schechter

          thank you and thank you very much for the clarification regarding the error message!

          As I'm really unsure about my results and I don't want to work with wrong data (forecasts), I need more advice especially on how to account for the fixed effects in my forecast.

          For better replicability, I used the Penn World Table 10 data set, available under https://www.rug.nl/ggdc/productivity/pwt/?lang=en (I'm not sure how to make data available here, in the correct way).

          What I did then is the following: (! the carryforward command is installed from SSC)

          Code:
          use pwt100.dta, replace
          encode country, gen(country_enc)
          xtset country_enc year
          keep iso3_d country country_enc year hc
          
          xtreg hc c.year#i.country_enc, fe
          expand 2 if year == 2019
          by country_enc year, sort: replace year = 2020 if year == 2019 & _n == _N
          predict predicted_hc if year == 2020, xb
          predict fe , u
          carryforward fe, replace
          gen yhat2 = predicted_hc + fe
          This yields me values for the forecast for 2020, yhat_2, that seem meaningful. However, some of them are lower than the values in 2017, 2018, 2019, which seems not reasonable to me as a forecast for 2020.

          Is there anything I did wrong?

          I think it is important to account for the country fixed effects in the prediction. However, option xbu (postestimation xtreg) does not work with out-of-sample predictions and yields many missing values. That's why I predicted without the FE (predicted_hc) and than I predicted the FE separately (fe) and added both up (yhat_2).

          Best wishes
          Noemi
          Last edited by Noemi Seng; 18 Sep 2024, 03:13.

          Comment


          • #6
            I think it is important to account for the country fixed effects in the prediction. However, option xbu (postestimation xtreg) does not work with out-of-sample predictions and yields many missing values. That's why I predicted without the FE (predicted_hc) and than I predicted the FE separately (fe) and added both up (yhat_2).
            The problem is that -predict xbu- has no fixed-effect estimate for year 2020. So it won't do the out of sample predictions.

            Now, your situation is a bit special in that you are doing a linear regression and there are no covariates in the model. So I think you can, instead, do this:

            Code:
            webuse grunfeld, clear
            
            regress mvalue c.year#i.company i.company
            
            // CREATE OBSERVATIONS WITH YEAR == 1955
            expand 2 if year == 1954
            by company year, sort: replace year = 1955 if year == 1954 & _n == _N
            
            // NOW PREDICT VALUES
            predict mvalue_hat
            
            list if year == 1955
            which emulates a fixed-effects model using regress and company (country in your data) indicators. However, bear in mind that this approach implicitly assumes that the "fixed effect" for company (country) estimated from the data through 1954 (2019 in your data) would still be the same if the 1955 (2020) data had been available. That assumption is unlikely to be exactly right, although if you have enough years in your data, it may be close to true. I don't see any way around this limitation. This is just an instance of the general maxim in statistics that out-of-sample predictions are hazardous.

            Comment

            Working...
            X