Broken loops

Jakub Cirkowski

Join Date: Jul 2018

Posts: 16
#1

Broken loops

19 Jul 2018, 15:46

Hi,

I am new STATA user and I have a question about my loops that unfortunately don't work. I use STATA 15.

I work on a panel data and want to run regression for small and big in each year.

In the first step I generate labels for small medium and large companies and it works

gen small =.
gen large =.
gen size =.

forvalues i = 1962(1)2017 {

summ mvaluemill if pyear==`i' , detail
di `i'
replace small = r(p25) if pyear==`i'
replace large = r(p75) if pyear==`i'
replace size = 0 if mvaluemill < small & pyear==`i'
replace size = 1 if mvaluemill > small & mvaluemill < large & pyear==`i'
replace size = 2 if mvaluemill > large & pyear==`i'

However, in the second step when I want to run regressions for small companies in each year using the following loop:

forvalues i = 1962(1)2017 {

reg prc atpr capxpr ceqpr chpr cogspr dvcpr intanpr ibpr oancfpr revtpr spipr xadpr xrdpr xsgapr revgrowpr ocipr if pyear==`i' & size==0
}

STATA responds:

no observations
r(2000);

Any help or comments much appreciated. Sorry If I missed any important information.

Thanks!
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

19 Jul 2018, 16:04

Welcome to Statalist.

It is possible that in some year, every observation with size==0 has a missing value for one or more of the other variables - not necessarily the same variable(s) in each observation. Perhaps something like

Code:

forvalues i = 1962(1)2017 { display "`i'" count if if pyear==`i' & size==0 & ! missing(prc,atpr,capxpr,ceqpr,chpr,cogspr,dvcpr,intanpr,ibpr,oancfpr,revtpr,spipr,xadpr,xrdpr,xsgapr,revgrowpr,ocipr) }

will help you locate the problem.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35405
#3

19 Jul 2018, 16:25

William is right. Note in passing that your first block of code could be rewritten

Code:

egen p25 = pctile(mvaluemill), by(pyear) p(25) egen p75 = pctile(mvaluemill), by(pyear) p(75) gen size = cond(mvaluemill < p25, 0, cond(mvaluemill < p75, 1, 2))

I'd use rangestat (SSC) for the regression part.
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29948
#4

19 Jul 2018, 17:08

Another possible source of the error message is if one or more of the variables listed in the -reg- command is stored as a string. -des prc atpr capxpr ceqpr chpr cogspr dvcpr intanpr ibpr oancfpr revtpr spipr xadpr xrdpr xsgapr revgrowpr ocipr- will reveal whether you have this problem or not.

Added: If this is the problem you have, then it will show up no the very first iteration of the -reg- loop. So if you have gotten regression output for some rounds of the loop, but then at some point you get the "no observations" message, than the problem must be the one pointed out by William.

Last edited by Clyde Schechter; 19 Jul 2018, 17:10.
1 like
Comment
Jakub Cirkowski

Join Date: Jul 2018

Posts: 16
#5

20 Jul 2018, 03:12

Thank you guys for help!!!

It looks like William was right and missing values were the problem.

However, I still need to run this regression

Is there a way to run this regression despite missing values as I can't replace them in some cases?

Thanks!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35405
#6

20 Jul 2018, 04:02

regress will ignore observations with missing values. No way round that except some quite different procedure.

I count 16 predictors here. I wouldn't want to fit such a model with fewer than say 10 times that many observations. (I wouldn't want to fit a model with 16 predictors, but that's more personal.) How many observations do you have for each year at best?

Your quartile binning makes the problem worse, naturally.

https://bmcmedresmethodol.biomedcent...471-2288-12-21 is possible reading here.
Comment
Jakub Cirkowski

Join Date: Jul 2018

Posts: 16
#7

20 Jul 2018, 04:19

Number of my observations varies significantly over the years.

113 companies in 1962 (lowest)

6500 companies1997 (highest)

In most years I have around 3500 companies

Would it be wise to drop years that have less than 160 observations for quartile given that I have 16 predictors?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35405
#8

20 Jul 2018, 05:31

I can't easily advise on what's best for your project. If it were mine I would incline to omit predictors with a high number of missings unless they were substantively crucial for explanations.

10x is just a rule of thumb. You may find different rules of thumb.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29948
#9

20 Jul 2018, 10:19

Would it be wise to drop years that have less than 160 observations for quartile given that I have 16 predictors?

That depends on why some years have so much more missing data than others. If the missingness is due to a purely random process, then it would be fine to omit years with fewer than 160 (or some other threshold; I find the 10X rule insufficiently conservative as a matter of personal preference) complete observations. But in the real world, missingness is often the result of processes that are connected to the variables of interest in the study. In that case, dropping those years with scantier data may result in a biased data sample.

My instincts are quite similar to Nick's. I would prefer to drop the predictors where the data is the spottiest, unless they are central to your research questions, and preserve as much of the data sample as possible.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment