Sum of squared residuals

Rick Mould

Join Date: Jan 2016

Posts: 7
#1

Sum of squared residuals

14 Jan 2016, 15:21

Hi stata experts!

I'm new to econometrics and stata so I would need some support. I want to to generate a variable, that is the sum of the squared resiudals for every firm's standardized net income trend for a considered period. My dataset is compustat North America. I don't even know where to start, may someone help me?

Thank you!
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

14 Jan 2016, 23:48

Rick:
welcome to the list.
A good place to start is, as usual, the Stata .pdf manual.
You may want to take a look at -regression postestimation- entry and then -residuals-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Rick Mould

Join Date: Jan 2016

Posts: 7
#3

15 Jan 2016, 04:23

Thank you for the hint Carlo. I now read a lot in the manuals. Following these instructions, am I right that I have to do a regression of net income on time and afterwards to predict my new variable with the following code?

Code:

predict new_variable_name, residuals

But what I don't understand: How do I regress net income on time, what are my independent time variables?

Sorry, I am quite a newbie in these issues so all comments and suggestions are invited! Thanks!
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#4

15 Jan 2016, 04:31

Rick:
your code looks fine.
However (and that is not clear in your first post), if you're dealing with panel data, you should better switch to -xtreg- and -xtreg postestimation-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#5

15 Jan 2016, 04:33

It sounds as if you are following some instructions specifying "standardized net income trend for a considered period". We can't see those instructions. Conversely, if you are expected to choose your own method of defining a trend, there really isn't, so far as I know, a single way of doing it, quite apart from "standardized" being a word with several meanings.
Comment
Rick Mould

Join Date: Jan 2016

Posts: 7
#6

15 Jan 2016, 08:16

Carlo: Thanks, I have usual compustat data with several firms and firm year observations.

Nick: Right, I try to recreate a specific variable from prior literature. The exact definition is the reciprocal of the sum of the squared residuals for the firm's standardized net income trend for the last 5 years. Owing to the help of Carlo it's clear to me now that I first need some kind of regression for the squared residuals but I don't understand how to do it.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#7

15 Jan 2016, 08:23

Rick:
I would first select the regression model I would like to run with my data, also relying on the literature on my research field.
As far as your last statement is concerned, residuals interpretation is quite unanmbiguous for linear regression, but trickier for panel data analysis.

Kind regards,
Carlo
(Stata 19.0)
Comment

Robson Glasscock

Join Date: Apr 2014
Posts: 25

15 Jan 2016, 12:26

Rick,
This example starts from pulling data from COMPUSTAT and ends up calculating what you need. The only exception is that net income isn't standardized because I'm not clear about exactly what is needed here, but this should get you very close to what you are looking for.

I logged on to WRDS this morning and used the online web query form from COMPUSTAT Monthly Updates- Fundamentals Annual to pull data based on fiscal year (Date Variable field "Fiscal Year" chosen) and gvkey for "What format are your company codes?" I selected "Search the entire database", the start date is 2010-01 and the end date is 2016-01. All screening variables remained as defaults. The variable selected is "NI- Net Income" and I downloaded a Stata .dta file. I saved this file as "list.dta" on my computer.

The big picture of what the attached code does is:
1) Check the data for duplicates that prevent using xtset
2) Remove duplicates. Duplicates in the data come from a) gvkey's with missing fyears and b) the web query default pulling two different industry formats and c) a little bit of weirdness
3) Data is marked as panel data via xtset and a trend term for the regression of net income is generated within each gvkey.
4) The rolling command is used to save the sum of squared residuals from 5-year rolling regressions of net income on the trend term into a separate file called rol.dta for merging back into the data downloaded from COMPUSTAT.
5) A 1:1 merge based on gvkey and fyear, where fyear in the data saved from rolling is the last fyear of the estimation window. This brings in the residual sum of squares for each firm and five-year window back into the COMPUSTAT data.

Code:

clear

use list.dta, clear

desc

/* check for missing fyears */
capture noisily assert fyear !=.

/* drop missing fyears */
drop if fyear==.

/*Destring gvkey to enable -xtset- */
destring gvkey, replace

/* Identify data as panel */
capture noisily xtset gvkey fyear
/*There are duplicates based on gvkey and fyear in the raw data we pulled */

sort gvkey fyear indfmt curcd

/* Mark duplicates based on gvkey and fyear */
duplicates tag gvkey fyear, gen(dup)
tab dup
 
/* Eyeball data to see if there are any patterns for duplicates */
preserve
keep if dup==1
list in 1/10
restore

/* Based on the above, it looks like we pulled two different industry formats
for each firm. If you go back to the default settings in the online web query,
both INDL and FS are checked for "Industry Format." It also looks like ni
is missing for all FS indsutry formats. Check this below.  */

count if ni !=. & indfmt=="FS"

/* In the data we pulled, net income is missing for all FS industry formats.
Next, we get rid of observations where indfmt== "FS" */

drop dup
drop if indfmt=="FS"

capture noisily xtset gvkey fyear

/* Check data again for duplicates based on gvkey and fyear */
duplicates tag gvkey fyear, gen(dup)
tab dup
list if dup==1

/* Keep the only observation with non-missing ni */
drop if dup==1 & ni ==.
drop dup

/*Identify data as panel */
xtset gvkey fyear

/* generate trend variable within each firm */
bys gvkey: gen trend= [_n]

list in 1/20

xtset gvkey fyear

/* Use the rolling command to estimate the residual sum of squares
from a regression of net income on a trend term for rolling 5 year periods */
rolling rss= e(rss), window(5) saving(rol, replace): reg ni trend

/* Save most current version of data for merging in the residual sum of
squares from the rol.dta file. */

save list_merge.dta, replace

/* Accuracy check */
reg ni trend if gvkey==1094 & fyear < 2015 & fyear > 2009
reg ni trend if gvkey==1094 & fyear < 2016 & fyear > 2010

/* The residual sum of squares from these two regressions is manually compared
to the results in the rol.dta file that was saved from the rolling command.
No exceptions noted. */

/* Format the rol.dta file for merging back into the list_merge.dta file. */
clear
use rol.dta

keep gvkey end rss
/* Rename ending period to facilitate merging */
rename end fyear

/* Next, a 1:1 merge based on gvkey and ending fyear of the rolling
regression */

merge 1:1 gvkey fyear using list_merge.dta

/* Replace the sum of squares residual to the reciprocal of the sum of squares
residual */
gen rss_recip= 1/rss

list rss rss_recip in 1/20

Announcement

Sum of squared residuals

Comment

Comment

Comment

Comment

Comment

Comment

Comment