Predict variable from fixed effects regression

Noemi Seng

Join Date: Jan 2024

Posts: 85
#1

Predict variable from fixed effects regression

17 Sep 2024, 08:29

Dear community,

in my dataset, the last year (2020) is missing. Therefore, I want to forecast the value of my variable "human capital" for this year. I want to to this with a fixed effects regression with a country-specific time trend and use the estimation results to predict the value of the dependent variable (human capital) for each country for the year 2020. The data used for the regression is until 2019.

My panel id is country_enc and the time variable is year.

I wanted to do:

Code:

xtreg hc year##c.country_enc, fe

however this produces the warning "the panel variable country_enc may not be included as an independent variable". I am however unsure how to change the estimation command appropriately and I'm also not sure about the correct command to predict the values for 2020.

Does someone have an advice?

Best wishes
Noemi
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29453
#2

17 Sep 2024, 10:29

You don't give example data to work with, but here's how I would do this using the online Grunfeld data set. It should be straightforward to adapt the code to your data set.

Code:

webuse grunfeld, clear xtset company year xtreg mvalue c.year#i.company if year < 1954, fe predict predicted_mvalue if year == 1954 list if year == 1954
Comment
Noemi Seng

Join Date: Jan 2024

Posts: 85
#3

17 Sep 2024, 10:41

Dear Clyde Schechter

thank you for your response! My problem is, I don't have the year I want to predict in the data. So the 1954 in your example is missing in my data set. Also, my id variable, which is company here, is in my case the countryname, which I cannot include as a factor variable for the interaction.

Best
Noemi
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29453
#4

17 Sep 2024, 15:38

I don't have the year I want to predict in the data. So the 1954 in your example is missing in my data set.

Well, then you need to create observations for that year. Like this:

Code:

ebuse grunfeld, clear xtset company year summ year xtreg mvalue c.year#i.company, fe // CREATE OBSERVATIONS WITH YEAR == 1955 expand 2 if year == 1954 // NOW PREDICT VALUES by company year, sort: replace year = 1955 if year == 1954 & _n == _N predict predicted_mvalue if year == 1955 list if year == 1955

Also, my id variable, which is company here, is in my case the countryname, which I cannot include as a factor variable for the interaction.

I think you are misreading that error message, which, by the way, is really not very clear. You tried to include the country name as a continuous variable, and that is not legal. You put a c. in front of country and nothing in front of year. That's exactly backwards because it designates year as discrete and country as continuous. Treating country as a continuous variable makes no sense in any context. And to get a time trend, you want year to be continuous, not discrete. Change it to c.year##i.country_enc and it will run.

Last edited by Clyde Schechter; 17 Sep 2024, 15:42.
Comment
Noemi Seng

Join Date: Jan 2024

Posts: 85
#5

18 Sep 2024, 02:31

Dear Clyde Schechter

thank you and thank you very much for the clarification regarding the error message!

As I'm really unsure about my results and I don't want to work with wrong data (forecasts), I need more advice especially on how to account for the fixed effects in my forecast.

For better replicability, I used the Penn World Table 10 data set, available under https://www.rug.nl/ggdc/productivity/pwt/?lang=en (I'm not sure how to make data available here, in the correct way).

What I did then is the following: (! the carryforward command is installed from SSC)

Code:

use pwt100.dta, replace encode country, gen(country_enc) xtset country_enc year keep iso3_d country country_enc year hc xtreg hc c.year#i.country_enc, fe expand 2 if year == 2019 by country_enc year, sort: replace year = 2020 if year == 2019 & _n == _N predict predicted_hc if year == 2020, xb predict fe , u carryforward fe, replace gen yhat2 = predicted_hc + fe

This yields me values for the forecast for 2020, yhat_2, that seem meaningful. However, some of them are lower than the values in 2017, 2018, 2019, which seems not reasonable to me as a forecast for 2020.

Is there anything I did wrong?

I think it is important to account for the country fixed effects in the prediction. However, option xbu (postestimation xtreg) does not work with out-of-sample predictions and yields many missing values. That's why I predicted without the FE (predicted_hc) and than I predicted the FE separately (fe) and added both up (yhat_2).

Best wishes
Noemi

Last edited by Noemi Seng; 18 Sep 2024, 03:13.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29453
#6

18 Sep 2024, 12:53

I think it is important to account for the country fixed effects in the prediction. However, option xbu (postestimation xtreg) does not work with out-of-sample predictions and yields many missing values. That's why I predicted without the FE (predicted_hc) and than I predicted the FE separately (fe) and added both up (yhat_2).

The problem is that -predict xbu- has no fixed-effect estimate for year 2020. So it won't do the out of sample predictions.

Now, your situation is a bit special in that you are doing a linear regression and there are no covariates in the model. So I think you can, instead, do this:

Code:

webuse grunfeld, clear regress mvalue c.year#i.company i.company // CREATE OBSERVATIONS WITH YEAR == 1955 expand 2 if year == 1954 by company year, sort: replace year = 1955 if year == 1954 & _n == _N // NOW PREDICT VALUES predict mvalue_hat list if year == 1955

which emulates a fixed-effects model using regress and company (country in your data) indicators. However, bear in mind that this approach implicitly assumes that the "fixed effect" for company (country) estimated from the data through 1954 (2019 in your data) would still be the same if the 1955 (2020) data had been available. That assumption is unlikely to be exactly right, although if you have enough years in your data, it may be close to true. I don't see any way around this limitation. This is just an instance of the general maxim in statistics that out-of-sample predictions are hazardous.
Comment

Announcement

Predict variable from fixed effects regression

Comment

Comment

Comment

Comment

Comment