Split string variable

Tharshini Thangavelu

Join Date: Oct 2015

Posts: 78
#1

Split string variable

07 Apr 2022, 07:39

Hi stata users,

I have the following dataset and would like to split the variable Col1 into 8 variables. I have tried without any success. I am not sure how to manipulate when the variable name is on the second row. Anyone who could assist me in doing this ?

input strL Col1 str12 Col2
`"LOPNR" "PERIOD" "SKAT" "AVORS" "INSKRIVNINGSPERIOD" "FUNKHINDER" "FUNKHINDER2" "FUNKHINDER3"' ""
"168388 2015-06 39 8 42 61 " ""
"88124 2015-06 70 7 42 " ""
"195489 2015-06 11 6 " ""
"159940 2015-06 71 10 " ""
end
[/CODE]

Thanks !
Best,
Tharshini
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35224
#2

07 Apr 2022, 08:14

You have metadata in observation 1, which isn't illegal but not usually what you really want. But in your data example that is the only observation with a non-missing value for word 8 of Col1, as shown by looking at the results of

Code:

split Col1
Comment
Daniel Feenberg

Join Date: Oct 2014

Posts: 321
#3

07 Apr 2022, 08:15

This is confusing, but it appears you have a spreadsheet with 4 to 7 values in each row, but 8 column (variable) labels. Also, the column labels row has wrapped. Also, the 4-7 valid values are wrapped in a single pair of quotes, making them a single long string. So this isn't something Stata will read without some prior editing. You could do a global delete on the quote marks, put dots on the missing value cells, and undo the wrap of the column labels. Then the Stata -insheet- command could read it.
Comment
Tharshini Thangavelu

Join Date: Oct 2015

Posts: 78
#4

07 Apr 2022, 09:34

Originally posted by [email protected] View Post

This is confusing, but it appears you have a spreadsheet with 4 to 7 values in each row, but 8 column (variable) labels. Also, the column labels row has wrapped. Also, the 4-7 valid values are wrapped in a single pair of quotes, making them a single long string. So this isn't something Stata will read without some prior editing. You could do a global delete on the quote marks, put dots on the missing value cells, and undo the wrap of the column labels. Then the Stata -insheet- command could read it.

Does it mean that I should convert the STATA file into excel and do the suggested changes and then use Stata insheet command?
Comment
Tharshini Thangavelu

Join Date: Oct 2015

Posts: 78
#5

07 Apr 2022, 09:46

Originally posted by Nick Cox View Post

You have metadata in observation 1, which isn't illegal but not usually what you really want. But in your data example that is the only observation with a non-missing value for word 8 of Col1, as shown by looking at the results of

Code:

split Col1

The suggested code just created a copy of Col1, all other problems constant. So, the code didn't solve the problem.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35224
#6

07 Apr 2022, 10:27

Replying to #5.

I copied the data example presented as code in #1 before testing split.

If the information in #1 doesn't represent your real data then we need to know what does represent your real data.

One arcane possibility is that the spaces are really some high ASCII character such as uchar(160).

You can test your data for exotic characters by using chartab from SSC. Or try split with that character as separator.
Comment

Announcement

Split string variable

Comment

Comment

Comment

Comment

Comment