Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sum of squared residuals

    Hi stata experts!

    I'm new to econometrics and stata so I would need some support. I want to to generate a variable, that is the sum of the squared resiudals for every firm's standardized net income trend for a considered period. My dataset is compustat North America. I don't even know where to start, may someone help me?

    Thank you!

  • #2
    Rick:
    welcome to the list.
    A good place to start is, as usual, the Stata .pdf manual.
    You may want to take a look at -regression postestimation- entry and then -residuals-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you for the hint Carlo. I now read a lot in the manuals. Following these instructions, am I right that I have to do a regression of net income on time and afterwards to predict my new variable with the following code?

      Code:
      predict new_variable_name, residuals
      But what I don't understand: How do I regress net income on time, what are my independent time variables?

      Sorry, I am quite a newbie in these issues so all comments and suggestions are invited! Thanks!

      Comment


      • #4
        Rick:
        your code looks fine.
        However (and that is not clear in your first post), if you're dealing with panel data, you should better switch to -xtreg- and -xtreg postestimation-.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          It sounds as if you are following some instructions specifying "standardized net income trend for a considered period". We can't see those instructions. Conversely, if you are expected to choose your own method of defining a trend, there really isn't, so far as I know, a single way of doing it, quite apart from "standardized" being a word with several meanings.

          Comment


          • #6
            Carlo: Thanks, I have usual compustat data with several firms and firm year observations.

            Nick: Right, I try to recreate a specific variable from prior literature. The exact definition is the reciprocal of the sum of the squared residuals for the firm's standardized net income trend for the last 5 years. Owing to the help of Carlo it's clear to me now that I first need some kind of regression for the squared residuals but I don't understand how to do it.

            Comment


            • #7
              Rick:
              I would first select the regression model I would like to run with my data, also relying on the literature on my research field.
              As far as your last statement is concerned, residuals interpretation is quite unanmbiguous for linear regression, but trickier for panel data analysis.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Rick,
                This example starts from pulling data from COMPUSTAT and ends up calculating what you need. The only exception is that net income isn't standardized because I'm not clear about exactly what is needed here, but this should get you very close to what you are looking for.

                I logged on to WRDS this morning and used the online web query form from COMPUSTAT Monthly Updates- Fundamentals Annual to pull data based on fiscal year (Date Variable field "Fiscal Year" chosen) and gvkey for "What format are your company codes?" I selected "Search the entire database", the start date is 2010-01 and the end date is 2016-01. All screening variables remained as defaults. The variable selected is "NI- Net Income" and I downloaded a Stata .dta file. I saved this file as "list.dta" on my computer.

                The big picture of what the attached code does is:
                1) Check the data for duplicates that prevent using xtset
                2) Remove duplicates. Duplicates in the data come from a) gvkey's with missing fyears and b) the web query default pulling two different industry formats and c) a little bit of weirdness
                3) Data is marked as panel data via xtset and a trend term for the regression of net income is generated within each gvkey.
                4) The rolling command is used to save the sum of squared residuals from 5-year rolling regressions of net income on the trend term into a separate file called rol.dta for merging back into the data downloaded from COMPUSTAT.
                5) A 1:1 merge based on gvkey and fyear, where fyear in the data saved from rolling is the last fyear of the estimation window. This brings in the residual sum of squares for each firm and five-year window back into the COMPUSTAT data.

                Code:
                clear
                
                use list.dta, clear
                
                desc
                
                /* check for missing fyears */
                capture noisily assert fyear !=.
                
                /* drop missing fyears */
                drop if fyear==.
                
                /*Destring gvkey to enable -xtset- */
                destring gvkey, replace
                
                /* Identify data as panel */
                capture noisily xtset gvkey fyear
                /*There are duplicates based on gvkey and fyear in the raw data we pulled */
                
                sort gvkey fyear indfmt curcd
                
                /* Mark duplicates based on gvkey and fyear */
                duplicates tag gvkey fyear, gen(dup)
                tab dup
                 
                /* Eyeball data to see if there are any patterns for duplicates */
                preserve
                keep if dup==1
                list in 1/10
                restore
                
                /* Based on the above, it looks like we pulled two different industry formats
                for each firm. If you go back to the default settings in the online web query,
                both INDL and FS are checked for "Industry Format." It also looks like ni
                is missing for all FS indsutry formats. Check this below.  */
                
                count if ni !=. & indfmt=="FS"
                
                /* In the data we pulled, net income is missing for all FS industry formats.
                Next, we get rid of observations where indfmt== "FS" */
                
                drop dup
                drop if indfmt=="FS"
                
                capture noisily xtset gvkey fyear
                
                /* Check data again for duplicates based on gvkey and fyear */
                duplicates tag gvkey fyear, gen(dup)
                tab dup
                list if dup==1
                
                /* Keep the only observation with non-missing ni */
                drop if dup==1 & ni ==.
                drop dup
                
                /*Identify data as panel */
                xtset gvkey fyear
                
                /* generate trend variable within each firm */
                bys gvkey: gen trend= [_n]
                
                list in 1/20
                
                xtset gvkey fyear
                
                /* Use the rolling command to estimate the residual sum of squares
                from a regression of net income on a trend term for rolling 5 year periods */
                rolling rss= e(rss), window(5) saving(rol, replace): reg ni trend
                
                /* Save most current version of data for merging in the residual sum of
                squares from the rol.dta file. */
                
                save list_merge.dta, replace
                
                /* Accuracy check */
                reg ni trend if gvkey==1094 & fyear < 2015 & fyear > 2009
                reg ni trend if gvkey==1094 & fyear < 2016 & fyear > 2010
                
                /* The residual sum of squares from these two regressions is manually compared
                to the results in the rol.dta file that was saved from the rolling command.
                No exceptions noted. */
                
                /* Format the rol.dta file for merging back into the list_merge.dta file. */
                clear
                use rol.dta
                
                keep gvkey end rss
                /* Rename ending period to facilitate merging */
                rename end fyear
                
                /* Next, a 1:1 merge based on gvkey and ending fyear of the rolling
                regression */
                
                merge 1:1 gvkey fyear using list_merge.dta
                
                /* Replace the sum of squares residual to the reciprocal of the sum of squares
                residual */
                gen rss_recip= 1/rss
                
                list rss rss_recip in 1/20

                Comment

                Working...
                X