Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Split string variable

    Hi stata users,

    I have the following dataset and would like to split the variable Col1 into 8 variables. I have tried without any success. I am not sure how to manipulate when the variable name is on the second row. Anyone who could assist me in doing this ?




    input strL Col1 str12 Col2
    `"LOPNR" "PERIOD" "SKAT" "AVORS" "INSKRIVNINGSPERIOD" "FUNKHINDER" "FUNKHINDER2" "FUNKHINDER3"' ""
    "168388 2015-06 39 8 42 61 " ""
    "88124 2015-06 70 7 42 " ""
    "195489 2015-06 11 6 " ""
    "159940 2015-06 71 10 " ""
    end
    [/CODE]

    Thanks !
    Best,
    Tharshini

  • #2
    You have metadata in observation 1, which isn't illegal but not usually what you really want. But in your data example that is the only observation with a non-missing value for word 8 of Col1, as shown by looking at the results of

    Code:
    split Col1 

    Comment


    • #3
      This is confusing, but it appears you have a spreadsheet with 4 to 7 values in each row, but 8 column (variable) labels. Also, the column labels row has wrapped. Also, the 4-7 valid values are wrapped in a single pair of quotes, making them a single long string. So this isn't something Stata will read without some prior editing. You could do a global delete on the quote marks, put dots on the missing value cells, and undo the wrap of the column labels. Then the Stata -insheet- command could read it.

      Comment


      • #4
        Originally posted by [email protected] View Post
        This is confusing, but it appears you have a spreadsheet with 4 to 7 values in each row, but 8 column (variable) labels. Also, the column labels row has wrapped. Also, the 4-7 valid values are wrapped in a single pair of quotes, making them a single long string. So this isn't something Stata will read without some prior editing. You could do a global delete on the quote marks, put dots on the missing value cells, and undo the wrap of the column labels. Then the Stata -insheet- command could read it.
        Does it mean that I should convert the STATA file into excel and do the suggested changes and then use Stata insheet command?

        Comment


        • #5
          Originally posted by Nick Cox View Post
          You have metadata in observation 1, which isn't illegal but not usually what you really want. But in your data example that is the only observation with a non-missing value for word 8 of Col1, as shown by looking at the results of

          Code:
          split Col1 
          The suggested code just created a copy of Col1, all other problems constant. So, the code didn't solve the problem.

          Comment


          • #6
            Replying to #5.

            I copied the data example presented as code in #1 before testing split.

            If the information in #1 doesn't represent your real data then we need to know what does represent your real data.

            One arcane possibility is that the spaces are really some high ASCII character such as uchar(160).

            You can test your data for exotic characters by using chartab from SSC. Or try split with that character as separator.

            Comment

            Working...
            X