getting estimated y vale for regressions on subsamples of a panel dataset

krishantha Ainsworth

Join Date: Dec 2020

Posts: 35
#1

getting estimated y vale for regressions on subsamples of a panel dataset

01 Jan 2021, 02:17

Dear Statalist,

I need to have estimated value of an independent variable for a panel data set
for particular coutry, year and industry though regressing on each cross-section sub sample.
My dateset looks like follows

year country industry y_value x values

2000 country1
2000 country2
...
2001
...
...
2010

I want to know whether

Code:

bysort country year industry: regress y xvariables predict yhat,xb

works for this?
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

01 Jan 2021, 06:34

No, it will not do what you want. The predict command will be run only once, using the results from the final regression.

The community contributed runby command written by Robert Picard and Clyde Schechter available from SSC will make it easy to do what you need. See the output of

Code:

ssc describe runby

and after installing it, read the comprehensive documentation in

Code:

help runby

for guidance on how to make use of it.
1 like
Comment
krishantha Ainsworth

Join Date: Dec 2020

Posts: 35
#3

01 Jan 2021, 18:54

Thank you. It seems to be the program looks like

Code:

program define my_regress regress y xvariables predict yhat,xb end runby my_regress, by(year) verbose

Do you know how can I put by industry and country also into this code ?
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

01 Jan 2021, 19:28

The output of

Code:

help runby

tells us that the by() option accepts a varlist, not just a single variable name. (See the output of

Code:

help varlist

for more details on variable lists.)

So use

Code:

runby my_regress, by(country year industry) verbose
Comment
krishantha Ainsworth

Join Date: Dec 2020

Posts: 35
#5

01 Jan 2021, 22:22

I notice there is a problem in this command as if it fails to run the regression for a particular year/country/industry combination,
it completely deletes those observations. So I have no way of retrieving them. For example I had 200+ groups and 30 something had errors.
When I look back all the observations had errors gone. Any, solutions ?
I guess this is due to missing values in some combinations

Last edited by krishantha Ainsworth; 01 Jan 2021, 23:17.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

02 Jan 2021, 09:21

The output of help runby tells us that when a program terminates with an error, runby discards the data in memory and stores nothing for that by group. You should understand that runby creates a new dataset from the dataset you apply it to: it allows you to delete or create observations and variables.

The first thing you should do is be certain you understand exactly why 30 groups are failing the regression. Perhaps the implication is that they should be dropped from your dataset. If so, the problem goes away.

If not, here is one way of avoding the problem, by using capture to prevent the failure of regress from causing your my_regress program to terminate with an error.

Code:

program define my_regress capture noisily regress y xvariables if _rc==0 { predict yhat,xb } end runby my_regress, by(country year industry) verbose

This will leave your observations unaffected, and yhat will have missing values for the 30+ groups with errors.

But I would be inclined to take a different approach. In what follows, I assume that the variable id, together with country, year, and industry, is sufficient to identify each distinct observation in your data. This version of my_regress only returns to runby the necessary identification variables and the newly-created yhat.

Code:

program define my_regress regress y xvariables predict yhat,xb keep country year industry id yhat end use mydata, clear runby my_regress, by(country year industry) verbose tempfile yhats save `yhats' use mydata, clear merge 1:1 country year industry id using yhats, keep(master match) save mydatafit
Comment

Announcement

getting estimated y vale for regressions on subsamples of a panel dataset

Comment

Comment

Comment

Comment

Comment