How to use collapse in my panel data set?

Yao Zhao

Join Date: Feb 2017

Posts: 226
#1

How to use collapse in my panel data set?

14 Mar 2020, 21:55

I have a panel data set. Every id has tons of observations. I also create a dummy, called non, which shares the same value within id. In other words, for a specific id, its tons of observations has only one value (o or 1) of non.

Now I want to know how many ids are 0 of non and 1 of non. I want to do this, but it failed:

Code:

collapse (min) non, by(id) distinct(id) if non == 1

Collapse command failed. After doing this collapse command, all id have non zero.
Why?
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29948
#2

14 Mar 2020, 22:12

Well, first, are you sure that the variable non is correctly constructed so that it is truly always 0 or always 1 within an id? You can check that with

Code:

by id (non), sort: assert non[1] == non[_N]

If that runs without error, then non is truly invariant within id. Otherwise, the problem lies with non, and you will need to revisit how you created it.

If non is correctly created, then the results you are getting imply that non is in fact zero in every id. Why are you certain that isn't correct? If it really can't be correct, then, again, the problem is with how non was created--there is nothing wrong with the code you show in #1.

As an aside, you don't need to invoke -distinct- in that second command. After -collapse- you will have exactly one observation per id, so -count if non == 1- will be sufficient for the purpose at hand.
Comment
Yao Zhao

Join Date: Feb 2017

Posts: 226
#3

14 Mar 2020, 22:26

You're right. non is created wrong. I'm interested in you asser code.

assert non[1] == non[_N] I think it asserts the first obs is the same as the last obs within each group? Not assert that every obs is the same within each group.
I also changed it to
assert non[1] == non[_n] it's the same outcome?.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29948
#4

14 Mar 2020, 23:59

Yes, assert non[1] == non[_N] asserts that the first obs is the same as the last within each group. BUT, the command also sorts the data by id, and then by non within id. So since the data are sorted by non (within id) the first and last being the same implies that they are all the same.

Yes assert non[1] == non[_n], or even more simply, assert non == non[1] will produce the same result. But it will take much longer if your data set is large because it requires many more comparisons.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35405
#5

15 Mar 2020, 04:12

See also https://www.stata.com/support/faqs/d...ions-in-group/ for discussion.
1 like
Comment

Announcement

How to use collapse in my panel data set?

Comment

Comment

Comment

Comment