looping regressions

Bruce Weaver

Join Date: May 2014

Posts: 1109
#16

23 Feb 2024, 14:14

Read The Fine Manual, Stephen Ch. Scroll down to Estimation commands in the documentation to find a list of supported commands.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
Comment
Dirk Enzmann

Join Date: Apr 2014

Posts: 516
#17

25 Feb 2024, 08:57

One comment to the solutions presented in #4 and #6:

You are assuming that there are no missing values of x1, x2, x3, x4, and x5 (the same to y1 and y2 if you want to compare the models for y1 and y2 based on the same sample data). However, if there are missing values, the five (or ten) regression models may use different subsets of your data. To avoid this issue I would suggest to first generate a marker variable indicating cases with no missing values for all x variables (and if you want for all y variables):

Code:

mark valid markout valid x1 x2 x3 x4 x5 foreach y in y1 y2 { local indvars foreach var in x1 x2 x3 x4 x5 { local indvars `indvars' `var' display "Model: `y' `indvars' reg `y' `indvars' if valid } }

or

Code:

mark valid markout valid y1 y2 x1 x2 x3 x4 x5 foreach y in y1 y2 { local indvars foreach var in x1 x2 x3 x4 x5 { local indvars `indvars' `var' display "Model: `y' `indvars' reg `y' `indvars' if valid } }

nestreg does not have this problem (although you should take care for missing values in y1 and y2).

Last edited by Dirk Enzmann; 25 Feb 2024, 09:03. Reason: corrected to "nestreg" (instead of "nextreg")
Comment
Stephen Ch

Join Date: Apr 2022

Posts: 67
#18

25 Feb 2024, 09:09

Hi Dirk,

This is very helpful.

In fact, I run across the following issue terminating the loop:

-in sufficient observations-

Is there a way that I can get around if this happens just to skip that particular specification of regression in the loop, so I can follow through the rest of my loop specifications?

Originally posted by Dirk Enzmann View Post

One comment to the solutions presented in #4 and #6:

You are assuming that there are no missing values of x1, x2, x3, x4, and x5 (the same to y1 and y2 if you want to compare the models for y1 and y2 based on the same sample data). However, if there are missing values, the five (or ten) regression models may use different subsets of your data. To avoid this issue I would suggest to first generate a marker variable indicating cases with no missing values for all x variables (and if you want for all y variables):

Code:

mark valid markout valid x1 x2 x3 x4 x5 foreach y in y1 y2 { local indvars foreach var in x1 x2 x3 x4 x5 { local indvars `indvars' `var' display "Model: `y' `indvars' reg `y' `indvars' if valid } }

or

Code:

mark valid markout valid y1 y2 x1 x2 x3 x4 x5 foreach y in y1 y2 { local indvars foreach var in x1 x2 x3 x4 x5 { local indvars `indvars' `var' display "Model: `y' `indvars' reg `y' `indvars' if valid } }

nestreg does not have this problem (although you should take care for missing values in y1 and y2).
Comment
Dirk Enzmann

Join Date: Apr 2014

Posts: 516
#19

25 Feb 2024, 09:19

There should be a substantial (or theoretical) reason why you want to include or exclude a predictor in or from your regression model. Your question seems to indicate that your are on an exploratory fishing tour. There may be serious issues involved, see (among others): https://www.stata.com/support/faqs/s...sion-problems/https://www.stata.com/support/faqs/s...sion-problems/. I am sure that you can solve the technical issue of excluding variables from your variable list.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35212
#20

25 Feb 2024, 09:22

It's prudent to

Code:

count if valid

before you try any regressions.

The implication of the error message seems to be that you have too few observations to do all of these regressions -- in which case you have perhaps too few observations for any of them to be worthwhile.
Comment
Stephen Ch

Join Date: Apr 2022

Posts: 67
#21

25 Feb 2024, 10:59

Nick Cox and Dirk Enzmann : all valid points

I am running regressions in foreach loops with a conditional that imposed a restriction of hospital type: private practice, university level, medium size clinic, etc.

What is happening: the sample consists of 75% medium size consortium clinics, 15% private practice small, 10% university practice.

For smaller size groups, it might be the case that some regressord and regressands have too few observations. This is granted and agree with you all.

However, I still want to follow through all loop iterations, because subsequent specifications have sufficient observations and have strong theoretical foundation for running the regressions.

In this case, how can I prevent the loop to terminate due to insufficient observation errors?

Thanks.
Comment
Stephen Ch

Join Date: Apr 2022

Posts: 67
#22

25 Feb 2024, 11:07

a question about -mark valid-

is this essentially same as for example

-reg y x if x!=• & y!=•-

thanks

Originally posted by Dirk Enzmann View Post

One comment to the solutions presented in #4 and #6:

You are assuming that there are no missing values of x1, x2, x3, x4, and x5 (the same to y1 and y2 if you want to compare the models for y1 and y2 based on the same sample data). However, if there are missing values, the five (or ten) regression models may use different subsets of your data. To avoid this issue I would suggest to first generate a marker variable indicating cases with no missing values for all x variables (and if you want for all y variables):

Code:

mark valid markout valid x1 x2 x3 x4 x5 foreach y in y1 y2 { local indvars foreach var in x1 x2 x3 x4 x5 { local indvars `indvars' `var' display "Model: `y' `indvars' reg `y' `indvars' if valid } }

or

Code:

mark valid markout valid y1 y2 x1 x2 x3 x4 x5 foreach y in y1 y2 { local indvars foreach var in x1 x2 x3 x4 x5 { local indvars `indvars' `var' display "Model: `y' `indvars' reg `y' `indvars' if valid } }

nestreg does not have this problem (although you should take care for missing values in y1 and y2).
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35212
#23

25 Feb 2024, 11:49

a question about -mark valid-

is this essentially same as for example

-reg y x if x!=• & y!=•-

No; they are not at all the same, even "essentially" or beyond the fact that one is a regression command and the other isn't.

The spirit is quite different as well as the letter.

Excluding missing values on the variables mentioned in a regress command does no harm but is redundant as any observations with missing values will be ignored anyway. Also, what is going on in other variables is irrelevant unless they are mentioned in an if qualifier.

The point of the machinery of mark and related commands is to mark in only those observations with no missing values on all the variables mentioned and thus to mark out observations with any missing values on the variables mentioned. This has point for any exercise comparing models where usually you should want to ensure that models are fitted to the same subset of observations (the entire set if there are no problems with missing values). Dirk Enzmann explained this clearly, concisely and correctly.

Last edited by Nick Cox; 25 Feb 2024, 12:22.
Comment
Dirk Enzmann

Join Date: Apr 2014

Posts: 516
#24

25 Feb 2024, 12:17

As Nick Cox already said: If you have too few observations to include all possible predictors in your model, you have perhaps too few obervations for any model to be worthwhile.

Because it is likely that the number of missing values is different for each of your predictor variables, I don't see a useful way to automate the decision which predictors to include in your regression model. Therefore, I would precede the foreach loop for your regression models with

Code:

local predvars x1 x2 x3 x4 x5 x6 x7 x8 x9 // experiment with the list of predictor variables here local n_preds : word count `predvars' mark valid markout valid `predvars' count if valid if r(N) < `n_preds' + 1 { di as err "N = `r(N)' are too few cases for `n_preds' predictors: `predvars'" drop valid exit }

and experiment with the list of predictor variables by excluding or including specific variables to see the number of valid cases (listwise). Note that the absolute minimum number of cases necessary for a regression model with all predictors is the number of predictors + 1.

I want to point out that I would not want to interpret the results with such few cases (especially if you don't have a very good theoretical reason for the list of predictors to be included or excluded).

Last edited by Dirk Enzmann; 25 Feb 2024, 12:23.
2 likes
Comment
Stephen Ch

Join Date: Apr 2022

Posts: 67
#25

25 Feb 2024, 13:15

Thanks to both Nick Cox and Dirk Enzmann !

Dirk, it turns out that when I run your diagnostic code, my observation halves.

To have an apple-to-apple comparison, this is a basic and good practice to have in my pocket.

Thank you once again!

Originally posted by Dirk Enzmann View Post

As Nick Cox already said: If you have too few observations to include all possible predictors in your model, you have perhaps too few obervations for any model to be worthwhile.

Because it is likely that the number of missing values is different for each of your predictor variables, I don't see a useful way to automate the decision which predictors to include in your regression model. Therefore, I would precede the foreach loop for your regression models with

Code:

local predvars x1 x2 x3 x4 x5 x6 x7 x8 x9 // experiment with the list of predictor variables here local n_preds : word count `predvars' mark valid markout valid `predvars' count if valid if r(N) < `n_preds' + 1 { di as err "N = `r(N)' are too few cases for `n_preds' predictors: `predvars'" drop valid exit }

and experiment with the list of predictor variables by excluding or including specific variables to see the number of valid cases (listwise). Note that the absolute minimum number of cases necessary for a regression model with all predictors is the number of predictors + 1.

I want to point out that I would not want to interpret the results with such few cases (especially if you don't have a very good theoretical reason for the list of predictors to be included or excluded).
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35212
#26

25 Feb 2024, 13:40

Code:

help capture

Still a bad idea!
Comment
Stephen Ch

Join Date: Apr 2022

Posts: 67
#27

25 Feb 2024, 15:08

how do you mean, Nick?

Originally posted by Nick Cox View Post

Code:

help capture

Still a bad idea!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35212
#28

25 Feb 2024, 15:12

I sign up to two simple ideas:

(1) comparing regressions with different variables only makes full sense if you are using the same observations.

(2) regressions should have many more observations than predictors than observations.

It seems that you want to acknowledge those principles and then ignore them.
Comment
Stephen Ch

Join Date: Apr 2022

Posts: 67
#29

25 Feb 2024, 15:27

I am following Dirk's code to run the loop! This resolves the two simple ideas you mentioned!

Thanks.

Originally posted by Nick Cox View Post

I sign up to two simple ideas:

(1) comparing regressions with different variables only makes full sense if you are using the same observations.

(2) regressions should have many more observations than predictors than observations.

It seems that you want to acknowledge those principles and then ignore them.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35212
#30

25 Feb 2024, 15:43

#28 was an answer to #21

how can I prevent the loop to terminate due to insufficient observation errors?
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment