Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Concatenating multiple variables into one variable

    Dear Statalist,

    I am struggling with the "egen - concat" command to join the list of 4 variables into one new variable:

    Here is the example:
    Code:
    clear
    input id var1 var2 var3 var4
    1  . 0.15 . .
    2 1.25 . . .
    3 . . 2.15 .
    4 . . . 0.25
    end
    And the code i used:
    Code:
    egen var5 = concat(var1 var2 var3 var4)
    The result was:
    id var1 var2 var3 var4 var5
    1 . .15 . . ..15..
    2 1.25 . . . 1.25...
    3 . . 2.15 . ..2.15.
    4 . . . .025 ....025
    Is there anyway i can deal with the missing values, so that it won't appear in the"var5"?

    Thank you.

  • #2
    The result variable from the -concat- function of -egen- is a string, so you can do this after creating var5:
    Code:
    replace var5 = subinstr(var5, ".", "", .)

    Comment


    • #3
      Thank you, Mike for your suggestion. But will it change the decimal numbers into integers as well?

      Comment


      • #4
        I think this will work:
        Code:
        clear
        input id var1 var2 var3 var4
        1  . 0.15 . .
        2 1.25 . . .
        3 . . 2.15 .
        4 . . . 0.25
        end
        
        forvalues i = 1/4 {
         gen temp`i' = string(var`i')
            replace temp`i' = "" if temp`i' == "."
        }
        egen wanted = concat(temp*), punct(" ")
        drop temp*
        But I also have a question for you: in your example data, in each observation, only one of the four variables has a non-missing value, so you aren't really concatenating anything. If the pattern of only one variable non-missing prevails throughout your data, then what you really want is a new variable that just picks out the non-missing value. If that's the case:
        Code:
        egen nmcount = rownonmiss(var1-var4)
        assert nmcount <= 1
        egen wanted2 = rowmax(var1-var4)
        has the advantage that you end up with a numeric variable, rather than a string that looks like numbers. It's also shorter and simpler. (And the first two lines are just to check that you really only have at most one non-missing value in each observation.)

        Comment


        • #5
          Thank you, Clyde. That's exactly what i'm looking for with only one variable with non-missing value throughout my data. However, what should i do if my variables were in string format?

          Comment


          • #6
            Well, if they are actually numbers that are being stored as strings, I would -destring- them first. There is rarely any good reason to store numbers as strings in Stata.

            If they are actually string variables that contain non-numeric information, then a different approach is needed. But in that case, where are the "."s coming from? Your example data had numeric variables. If the data are actually strings, I think you should post a new example that matches your actual situation. Posting data examples that do not actually represent your data just wastes your time and that of those who respond to your question.

            Comment


            • #7
              Sorry, it is just a question mainly for my curiosity, as the example above actually represented my real data. Thank you for your suggestion. It's really helpful

              Comment

              Working...
              X