Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Read The Fine Manual, Stephen Ch. Scroll down to Estimation commands in the documentation to find a list of supported commands.
    --
    Bruce Weaver
    Email: [email protected]
    Version: Stata/MP 18.5 (Windows)

    Comment


    • #17
      One comment to the solutions presented in #4 and #6:

      You are assuming that there are no missing values of x1, x2, x3, x4, and x5 (the same to y1 and y2 if you want to compare the models for y1 and y2 based on the same sample data). However, if there are missing values, the five (or ten) regression models may use different subsets of your data. To avoid this issue I would suggest to first generate a marker variable indicating cases with no missing values for all x variables (and if you want for all y variables):
      Code:
      mark valid
      markout valid x1 x2 x3 x4 x5
      foreach y in y1 y2 {
         local indvars
         foreach var in x1 x2 x3 x4 x5 {
            local indvars `indvars' `var'
            display "Model: `y' `indvars'
            reg `y' `indvars' if valid
         }
      }
      or
      Code:
      mark valid
      markout valid y1 y2 x1 x2 x3 x4 x5
      foreach y in y1 y2 {
         local indvars
         foreach var in x1 x2 x3 x4 x5 {
            local indvars `indvars' `var'
            display "Model: `y' `indvars'
            reg `y' `indvars' if valid
         }
      }
      nestreg does not have this problem (although you should take care for missing values in y1 and y2).
      Last edited by Dirk Enzmann; 25 Feb 2024, 09:03. Reason: corrected to "nestreg" (instead of "nextreg")

      Comment


      • #18
        Hi Dirk,

        This is very helpful.

        In fact, I run across the following issue terminating the loop:

        -in sufficient observations-

        Is there a way that I can get around if this happens just to skip that particular specification of regression in the loop, so I can follow through the rest of my loop specifications?

        Originally posted by Dirk Enzmann View Post
        One comment to the solutions presented in #4 and #6:

        You are assuming that there are no missing values of x1, x2, x3, x4, and x5 (the same to y1 and y2 if you want to compare the models for y1 and y2 based on the same sample data). However, if there are missing values, the five (or ten) regression models may use different subsets of your data. To avoid this issue I would suggest to first generate a marker variable indicating cases with no missing values for all x variables (and if you want for all y variables):
        Code:
        mark valid
        markout valid x1 x2 x3 x4 x5
        foreach y in y1 y2 {
        local indvars
        foreach var in x1 x2 x3 x4 x5 {
        local indvars `indvars' `var'
        display "Model: `y' `indvars'
        reg `y' `indvars' if valid
        }
        }
        or
        Code:
        mark valid
        markout valid y1 y2 x1 x2 x3 x4 x5
        foreach y in y1 y2 {
        local indvars
        foreach var in x1 x2 x3 x4 x5 {
        local indvars `indvars' `var'
        display "Model: `y' `indvars'
        reg `y' `indvars' if valid
        }
        }
        nestreg does not have this problem (although you should take care for missing values in y1 and y2).

        Comment


        • #19
          There should be a substantial (or theoretical) reason why you want to include or exclude a predictor in or from your regression model. Your question seems to indicate that your are on an exploratory fishing tour. There may be serious issues involved, see (among others): https://www.stata.com/support/faqs/s...sion-problems/https://www.stata.com/support/faqs/s...sion-problems/. I am sure that you can solve the technical issue of excluding variables from your variable list.

          Comment


          • #20
            It's prudent to

            Code:
            count if valid
            before you try any regressions.

            The implication of the error message seems to be that you have too few observations to do all of these regressions -- in which case you have perhaps too few observations for any of them to be worthwhile.

            Comment


            • #21
              Nick Cox and Dirk Enzmann : all valid points

              I am running regressions in foreach loops with a conditional that imposed a restriction of hospital type: private practice, university level, medium size clinic, etc.

              What is happening: the sample consists of 75% medium size consortium clinics, 15% private practice small, 10% university practice.

              For smaller size groups, it might be the case that some regressord and regressands have too few observations. This is granted and agree with you all.

              However, I still want to follow through all loop iterations, because subsequent specifications have sufficient observations and have strong theoretical foundation for running the regressions.

              In this case, how can I prevent the loop to terminate due to insufficient observation errors?

              Thanks.

              Comment


              • #22
                a question about -mark valid-

                is this essentially same as for example

                -reg y x if x!=• & y!=•-

                thanks

                Originally posted by Dirk Enzmann View Post
                One comment to the solutions presented in #4 and #6:

                You are assuming that there are no missing values of x1, x2, x3, x4, and x5 (the same to y1 and y2 if you want to compare the models for y1 and y2 based on the same sample data). However, if there are missing values, the five (or ten) regression models may use different subsets of your data. To avoid this issue I would suggest to first generate a marker variable indicating cases with no missing values for all x variables (and if you want for all y variables):
                Code:
                mark valid
                markout valid x1 x2 x3 x4 x5
                foreach y in y1 y2 {
                local indvars
                foreach var in x1 x2 x3 x4 x5 {
                local indvars `indvars' `var'
                display "Model: `y' `indvars'
                reg `y' `indvars' if valid
                }
                }
                or
                Code:
                mark valid
                markout valid y1 y2 x1 x2 x3 x4 x5
                foreach y in y1 y2 {
                local indvars
                foreach var in x1 x2 x3 x4 x5 {
                local indvars `indvars' `var'
                display "Model: `y' `indvars'
                reg `y' `indvars' if valid
                }
                }
                nestreg does not have this problem (although you should take care for missing values in y1 and y2).

                Comment


                • #23
                  a question about -mark valid-

                  is this essentially same as for example

                  -reg y x if x!=• & y!=•-

                  No; they are not at all the same, even "essentially" or beyond the fact that one is a regression command and the other isn't.

                  The spirit is quite different as well as the letter.

                  Excluding missing values on the variables mentioned in a regress command does no harm but is redundant as any observations with missing values will be ignored anyway. Also, what is going on in other variables is irrelevant unless they are mentioned in an if qualifier.

                  The point of the machinery of mark and related commands is to mark in only those observations with no missing values on all the variables mentioned and thus to mark out observations with any missing values on the variables mentioned. This has point for any exercise comparing models where usually you should want to ensure that models are fitted to the same subset of observations (the entire set if there are no problems with missing values). Dirk Enzmann explained this clearly, concisely and correctly.
                  Last edited by Nick Cox; 25 Feb 2024, 12:22.

                  Comment


                  • #24
                    As Nick Cox already said: If you have too few observations to include all possible predictors in your model, you have perhaps too few obervations for any model to be worthwhile.

                    Because it is likely that the number of missing values is different for each of your predictor variables, I don't see a useful way to automate the decision which predictors to include in your regression model. Therefore, I would precede the foreach loop for your regression models with
                    Code:
                    local predvars x1 x2 x3 x4 x5 x6 x7 x8 x9  // experiment with the list of predictor variables here
                    local n_preds : word count `predvars'
                    
                    mark valid
                    markout valid `predvars'
                    count if valid
                    if r(N) < `n_preds' + 1 {
                       di as err "N = `r(N)' are too few cases for `n_preds' predictors: `predvars'"
                       drop valid
                       exit
                    }
                    and experiment with the list of predictor variables by excluding or including specific variables to see the number of valid cases (listwise). Note that the absolute minimum number of cases necessary for a regression model with all predictors is the number of predictors + 1.

                    I want to point out that I would not want to interpret the results with such few cases (especially if you don't have a very good theoretical reason for the list of predictors to be included or excluded).
                    Last edited by Dirk Enzmann; 25 Feb 2024, 12:23.

                    Comment


                    • #25
                      Thanks to both Nick Cox and Dirk Enzmann !

                      Dirk, it turns out that when I run your diagnostic code, my observation halves.

                      To have an apple-to-apple comparison, this is a basic and good practice to have in my pocket.

                      Thank you once again!


                      Originally posted by Dirk Enzmann View Post
                      As Nick Cox already said: If you have too few observations to include all possible predictors in your model, you have perhaps too few obervations for any model to be worthwhile.

                      Because it is likely that the number of missing values is different for each of your predictor variables, I don't see a useful way to automate the decision which predictors to include in your regression model. Therefore, I would precede the foreach loop for your regression models with
                      Code:
                      local predvars x1 x2 x3 x4 x5 x6 x7 x8 x9 // experiment with the list of predictor variables here
                      local n_preds : word count `predvars'
                      
                      mark valid
                      markout valid `predvars'
                      count if valid
                      if r(N) < `n_preds' + 1 {
                      di as err "N = `r(N)' are too few cases for `n_preds' predictors: `predvars'"
                      drop valid
                      exit
                      }
                      and experiment with the list of predictor variables by excluding or including specific variables to see the number of valid cases (listwise). Note that the absolute minimum number of cases necessary for a regression model with all predictors is the number of predictors + 1.

                      I want to point out that I would not want to interpret the results with such few cases (especially if you don't have a very good theoretical reason for the list of predictors to be included or excluded).

                      Comment


                      • #26
                        Code:
                        help capture
                        Still a bad idea!

                        Comment


                        • #27
                          how do you mean, Nick?

                          Originally posted by Nick Cox View Post
                          Code:
                          help capture
                          Still a bad idea!

                          Comment


                          • #28
                            I sign up to two simple ideas:

                            (1) comparing regressions with different variables only makes full sense if you are using the same observations.

                            (2) regressions should have many more observations than predictors than observations.

                            It seems that you want to acknowledge those principles and then ignore them.

                            Comment


                            • #29
                              I am following Dirk's code to run the loop! This resolves the two simple ideas you mentioned!

                              Thanks.

                              Originally posted by Nick Cox View Post
                              I sign up to two simple ideas:

                              (1) comparing regressions with different variables only makes full sense if you are using the same observations.

                              (2) regressions should have many more observations than predictors than observations.

                              It seems that you want to acknowledge those principles and then ignore them.

                              Comment


                              • #30
                                #28 was an answer to #21

                                how can I prevent the loop to terminate due to insufficient observation errors?

                                Comment

                                Working...
                                X