Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • getting estimated y vale for regressions on subsamples of a panel dataset

    Dear Statalist,

    I need to have estimated value of an independent variable for a panel data set
    for particular coutry, year and industry though regressing on each cross-section sub sample.
    My dateset looks like follows

    year country industry y_value x values

    2000 country1
    2000 country2
    ...
    2001
    ...
    ...
    2010

    I want to know whether
    Code:
    bysort country year industry: regress y xvariables
    predict yhat,xb
    works for this?


  • #2
    No, it will not do what you want. The predict command will be run only once, using the results from the final regression.

    The community contributed runby command written by Robert Picard and Clyde Schechter available from SSC will make it easy to do what you need. See the output of
    Code:
    ssc describe runby
    and after installing it, read the comprehensive documentation in
    Code:
    help runby
    for guidance on how to make use of it.

    Comment


    • #3
      Thank you. It seems to be the program looks like

      Code:
      program define my_regress
               
      
      regress y xvariables
      
      
           predict yhat,xb  
              end
              
              runby my_regress, by(year) verbose
      Do you know how can I put by industry and country also into this code ?

      Comment


      • #4
        The output of
        Code:
        help runby
        tells us that the by() option accepts a varlist, not just a single variable name. (See the output of
        Code:
        help varlist
        for more details on variable lists.)

        So use
        Code:
        runby my_regress, by(country year industry) verbose

        Comment


        • #5
          I notice there is a problem in this command as if it fails to run the regression for a particular year/country/industry combination,
          it completely deletes those observations. So I have no way of retrieving them. For example I had 200+ groups and 30 something had errors.
          When I look back all the observations had errors gone. Any, solutions ?
          I guess this is due to missing values in some combinations
          Last edited by krishantha Ainsworth; 01 Jan 2021, 23:17.

          Comment


          • #6
            The output of help runby tells us that when a program terminates with an error, runby discards the data in memory and stores nothing for that by group. You should understand that runby creates a new dataset from the dataset you apply it to: it allows you to delete or create observations and variables.

            The first thing you should do is be certain you understand exactly why 30 groups are failing the regression. Perhaps the implication is that they should be dropped from your dataset. If so, the problem goes away.

            If not, here is one way of avoding the problem, by using capture to prevent the failure of regress from causing your my_regress program to terminate with an error.
            Code:
            program define my_regress
            capture noisily regress y xvariables
            if _rc==0 {
                predict yhat,xb  
            }
            end
            
            runby my_regress, by(country year industry) verbose
            This will leave your observations unaffected, and yhat will have missing values for the 30+ groups with errors.

            But I would be inclined to take a different approach. In what follows, I assume that the variable id, together with country, year, and industry, is sufficient to identify each distinct observation in your data. This version of my_regress only returns to runby the necessary identification variables and the newly-created yhat.
            Code:
            program define my_regress
            regress y xvariables
            predict yhat,xb  
            keep country year industry id yhat
            end
            
            use mydata, clear
            runby my_regress, by(country year industry) verbose
            tempfile yhats
            save `yhats'
            use mydata, clear
            merge 1:1 country year industry id using yhats, keep(master match)
            save mydatafit

            Comment

            Working...
            X