Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regression command that uses a parameter to only include those independent variables that have the minimum number of non-missing observations?

    I am doing several rolling window-based regressions to collect the time-series of the regression outputs. I am using a combination of program define and rangerun: program only contains a constrained regression and rangerun runs the rolling window regressions (rangestat was not employed since the regressions have constraints). I have over 10 independent variables but they are not all filled in for the full time-period being considered. I want to run a simple regression where, in any window, only those independent variables that meet the minimum number of observations are included. There could be windows where over 10 independent variables are included and other windows where fewer than 10 independent variables are included. Since I am collecting only the output and I think I have an interpretation for the output, the changing specification is not a concern. (The dependent variable is fully filled in and is not a concern). How do I achieve this without writing a more complicated program?

  • #2
    Well, if you literally don't want to add any complication to your program, then it cannot be done. I'll assume you are actually willing to tolerate a little bit of extra work to get this.

    Inside the program you have written for -rangerun- you will need to make some changes. Let me assume for purposes of illustration that your program is called my_regress and that the variables that are candidates for inclusion are called var1-var25. (Modify the code accordingly to reflect the actual names.) You also don't say what the minimum acceptable number of non-missing values is, so, for illustrative purposes, I'll assume it's 30.

    The modification is to loop over the candidate variables and build up a list of included regressors by inspecting each candidate to determine if it has sufficient non-missing values. That built-up list of regressors is then used to run the regression and pick the outputs.

    Code:
    capture program drop my_regress
    program define my_regress
        unab candidates: var1-var25 // MODIFY TO USE ACTUAL VARIABLE NAMES
        local regressors
        local must_have 30    // MODIFY TO ACTUAL MINIMUM ACCEPTABLE NON-MISSING VALUES
        foreach c of local candidates {
            count if !missing(`c')
            if `r(N)' >= `must_have' {
                local regressors `regressors' `c'
            }
        }
        regress dep_var `regressors'    // MODIFY USING ACTUAL NAME OF DEP. VAR.
        //  ALSO, PUT IN ANY OPTIONS, CONSTRAINTS, OR OTHER THINGS NEEDED FROM ORIGINAL VERSION
        
        foreach c of local regressors {
            gen b_`c' = _b[`c']
            gen se_`c' = _se[`c']
        }
        exit
    end

    Comment


    • #3
      Thank you Clyde Schechter !

      Comment

      Working...
      X