Xtset and string variables

Yaseminn Akcann

Join Date: Mar 2021

Posts: 17
#1

Xtset and string variables

03 Jun 2022, 18:45

Hi Everyone,

I am trying to set my data as a panel by using xtset command. I have uasid as identifier and year variables. When I use uasid, since it is a string variable my command doesn't work.
I did encode uasid to id to make sure I destring uasid. when I did that, I had the following problem on the attachement. After encoding, uasid and id do not match each other for each observations. How can I solve this problem?
Attached Files
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

03 Jun 2022, 19:36

The encode command is designed for assigning numerical codes to non-numeric strings like "France", "Germany", "United States". The output of help encode instructs us

Do not use encode if varname contains numbers that merely happen to be stored as strings; instead, use generate newvar = real(varname) or destring; see real() or [D] destring.

You should use something like

Code:

destring uasid, generate(id)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#3

03 Jun 2022, 21:07

William Lisowski's recommendation to use -destring- is correct.

Unfortunately, the advice on the quote from -help encode-, to use generate newvar = real(varname), would lead to incorrect results in this case. That is because -generate-, by default, creates variables as a float storage type. But a float storage type is not large enough to hold 9 decimal digits of precision, and the results would be wrong (including mapping some distinct uasid values to the same id value. If one were to use -generate ... real()- for this, you would have to explicitly make the new variable a long or double to get correct results.
1 like
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#4

03 Jun 2022, 21:58

Originally posted by Yaseminn Akcann View Post

I did encode uasid to id to make sure I destring uasid. . . . After encoding, uasid and id do not match each other for each observations.

How did you manage to do that?

I agree with William that destring is your best option in your case, and with Clyde that there's a bug in the documentation that StataCorp needs to fix.

But I can't see how you got those mismatches with encode.

.ÿ
.ÿversionÿ17.0

.ÿ
.ÿclearÿ*

.ÿ
.ÿinputÿstr9ÿuasid

ÿÿÿÿÿÿÿÿÿuasid
ÿÿ1.ÿ140100007
ÿÿ2.ÿ140100007
ÿÿ3.ÿ140100007
ÿÿ4.ÿ140100010
ÿÿ5.ÿ140100041
ÿÿ6.ÿ140100011
ÿÿ7.ÿ140100011
ÿÿ8.ÿ140100047
ÿÿ9.ÿ140100035
ÿ10.ÿ140100035
ÿ11.ÿ140100072
ÿ12.ÿ140100041
ÿ13.ÿ140100038
ÿ14.ÿ140100079
ÿ15.ÿ140100047
ÿ16.ÿ140100041
ÿ17.ÿ140100081
ÿ18.ÿ140100048
ÿ19.ÿ140100047
ÿ20.ÿ140100090
ÿ21.ÿ140100072
ÿ22.ÿ140100048
ÿ23.ÿ140100108
ÿ24.ÿ140100079
ÿ25.ÿend

.ÿ
.ÿencodeÿuasid,ÿgenerate(id)ÿlabel(UASIDs)

.ÿ
.ÿpreserve

.ÿsortÿuasid

.ÿlistÿuasidÿid,ÿnoobsÿsepby(uasid)

ÿÿ+-----------------------+
ÿÿ|ÿÿÿÿÿuasidÿÿÿÿÿÿÿÿÿÿidÿ|
ÿÿ|-----------------------|
ÿÿ|ÿ140100007ÿÿÿ140100007ÿ|
ÿÿ|ÿ140100007ÿÿÿ140100007ÿ|
ÿÿ|ÿ140100007ÿÿÿ140100007ÿ|
ÿÿ|-----------------------|
ÿÿ|ÿ140100010ÿÿÿ140100010ÿ|
ÿÿ|-----------------------|
ÿÿ|ÿ140100011ÿÿÿ140100011ÿ|
ÿÿ|ÿ140100011ÿÿÿ140100011ÿ|
ÿÿ|-----------------------|
ÿÿ|ÿ140100035ÿÿÿ140100035ÿ|
ÿÿ|ÿ140100035ÿÿÿ140100035ÿ|
ÿÿ|-----------------------|
ÿÿ|ÿ140100038ÿÿÿ140100038ÿ|
ÿÿ|-----------------------|
ÿÿ|ÿ140100041ÿÿÿ140100041ÿ|
ÿÿ|ÿ140100041ÿÿÿ140100041ÿ|
ÿÿ|ÿ140100041ÿÿÿ140100041ÿ|
ÿÿ|-----------------------|
ÿÿ|ÿ140100047ÿÿÿ140100047ÿ|
ÿÿ|ÿ140100047ÿÿÿ140100047ÿ|
ÿÿ|ÿ140100047ÿÿÿ140100047ÿ|
ÿÿ|-----------------------|
ÿÿ|ÿ140100048ÿÿÿ140100048ÿ|
ÿÿ|ÿ140100048ÿÿÿ140100048ÿ|
ÿÿ|-----------------------|
ÿÿ|ÿ140100072ÿÿÿ140100072ÿ|
ÿÿ|ÿ140100072ÿÿÿ140100072ÿ|
ÿÿ|-----------------------|
ÿÿ|ÿ140100079ÿÿÿ140100079ÿ|
ÿÿ|ÿ140100079ÿÿÿ140100079ÿ|
ÿÿ|-----------------------|
ÿÿ|ÿ140100081ÿÿÿ140100081ÿ|
ÿÿ|-----------------------|
ÿÿ|ÿ140100090ÿÿÿ140100090ÿ|
ÿÿ|-----------------------|
ÿÿ|ÿ140100108ÿÿÿ140100108ÿ|
ÿÿ+-----------------------+

.ÿ
.ÿrestore

.ÿdecodeÿid,ÿgenerate(backid)

.ÿassertÿuasidÿ==ÿbackid

.ÿ
.ÿexit

endÿofÿdo-file

.

For my own edification, could you show what you typed that got you there?
1 like
Comment
Yaseminn Akcann

Join Date: Mar 2021

Posts: 17
#5

03 Jun 2022, 22:12

Thank you all for the answers.

When use destring command as William recommended, I cannot set my data as panel data by using xtreg. When I try that after destring, Stata tells me that "repeated time values within panel" error. When I check for duplicates and drop them after that message, I get half of my abs deleted.

Joseph, I used the exam same code of encode uasid, generate(id) to destring the values but it ended up creating these results. I don't understand why both way do not work.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#6

03 Jun 2022, 23:07

Stata tells me that "repeated time values within panel" error. When I check for duplicates and drop them after that message, I get half of my abs deleted.

Well, is that a problem or isn't it? If your data set contained duplicates of every observation (so half the data disappeared when you dropped duplicates), you have not actually lost any information. The real problem is why those duplicates were there in the first place. That usually reflects an error in the data management that created the data set. Seldom does anybody intentionally create a data set with even any duplicate observations, let alone large numbers of them. Removing those duplicates probably leaves you with the data set you initially intended to create.

Unless you did something like -duplicates drop id timevar, force-, in which case you may have had observations that were not pure duplicates but merely agreed on id and the time variable, while disagreeing on other variables. In that case, you need to carefully review the original data set and get a clear understanding of what is going on. There are two possibilities:

1. The observations that agree on id and timevar but otherwise disagree on some variables are all supposed to be there. They are all correct data. In this case, it is inappropriate to -xtset id timevar- because you do not have panel data. It may be that id in combination with some other variable(s) will uniquely identify observations in conjunction with the time variable--in which case creating a variable that combines those variables will serve instead of id in -xtset-. (Look at -egen, group- to create such a variable.) If that is not the case, you can still -xtset id- without mentioning a time variable. You will still be able to use the -xt- commands for analyses. All you will lose is the ability to use time series operators (leads, lags, etc.) or do analyses with autoregressive structure.

2. The observations that agree on id and timevar but otherwise disagree on some variables are not all supposed to be there. So you have a bad data set and you need to eliminate the surplus observations that are incorrect. Or perhaps the "correct" observations are combinations of the surplus ones. But even if you can simply handle that, the fact that you ended up with a bad data set suggests that there was something wrong with the data management that created it. So you should go back and review that from beginning to end and find out where things went wrong. In the course of doing that, there is a reasonable chance you will also uncover other errors. Best to get it all fixed now before it bites you later.
1 like
Comment
Yaseminn Akcann

Join Date: Mar 2021

Posts: 17
#7

04 Jun 2022, 18:15

Originally posted by Clyde Schechter View Post

Well, is that a problem or isn't it?

It was a problem since I was sure that there were no dups in my data overall and I had this problem after encoding. After the comments and noticing encode did not work properly, I looked back every data set that I appended or merged and figured out the problems. It seems everything is fine now. Thanks for all the help!
Comment

Announcement

Xtset and string variables

Comment

Comment

Comment

Comment

Comment

Comment