Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining two variables into one?

    Hello all :-) Now I have a question about combining the data from two variables (in the same data set) into one variable. I want to combine the variables in such a way that the data from one variable can replace the missing values from the other. I have created an example of what I am looking for below (see attached file). I have tried the command “generate = pre_Q2_2 + pre_Q33_2 but that doesn’t seem to work at all (just creates missing values).

    Best Regards
    Søren
    Now I have a question about combining the data from two variables (in the same data set) into one variable. I want to combine the variables in such a way that the data from one variable can replace the missing values from the other. I have created an example of what I am looking for below (see attached file). I have tried the command “generate = pre_Q2_2 + pre_Q33_2 but that doesn’t seem to work at all (just creates missing values).

    Best Regards
    Søren
    Attached Files

  • #2
    it should be pre_Q1_2 by the way :-)

    Comment


    • #3
      Soren:
      please note that the best way to share codes/results/examples is via CODE delimiters and -dataex-; attachments are usually left unopened (as I did with the one you posted).
      With a bit of guess-work, you might be interested in something along the following lines:
      Code:
      set obs 10
      g A=runiform()
      g B=runiform() in 1/8
      egen C=rowtotal(A B)
      list
      
           +--------------------------------+
           |        A          B          C |
           |--------------------------------|
        1. | .3488717   .2047095   .5535812 |
        2. | .2668857   .8927587   1.159644 |
        3. | .1366463   .5844658   .7211121 |
        4. | .0285569   .3697791    .398336 |
        5. | .8689333   .8506309   1.719564 |
           |--------------------------------|
        6. | .3508549   .3913819   .7422367 |
        7. | .0711051   .1196613   .1907664 |
        8. |  .323368   .7542434   1.077611 |
        9. | .5551032          .   .5551032 |
       10. |  .875991          .    .875991 |
           +--------------------------------+
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        I won't (actually can't too at this point) open .docx attachments. FAQ Advice #12 spells out our policy on Word documents among others and explains. But in the spirit of Carlo's example, consider

        Code:
        gen C = cond(missing(A), B, cond(missing(B), A, .))
        which produces missing values if both are present or

        Code:
        gen D = max(A, 0) + max(B, 0)
        which produces a sum ignoring missings (but will give wrong answers whenever A or B is negative).

        If these answers don't help, please give an explicit data example.

        Comment


        • #5
          Ok sorry about that but i was thinking about something like the table below :-)
          Var 1 Var 2 New Var
          1 . 1
          . . .
          2 . .
          . 1 1

          Comment


          • #6
            Soren:
            a trivial approach might be:
            Code:
            g New_Var=Var_1 if Var_1==1
            replace New_Var=Var_2 if Var_2==1
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Welcome to Statalist, Søren.

              In your example, you cover three of four possibilities.
              1) var1 present, var2 missing => result is var1
              2) var1 missing, var2 present => result is var2
              3) var1 missing, var2 missing => result is missing

              What you don't show us is
              4) var1 present, var2 present

              So if we assume the fourth option was omitted because you believe it cannot happen in your data, the following code might start you on a useful direction. The first line confirms your assumption that at least one of the two variables is missing.
              Code:
              assert var1==. | var2==.
              generate NewVar = var1
              replace NewVar = var2 if var1==.
              Another approach would be
              Code:
              assert var1==. | var2==.
              generate NewVar = cond(var1!=.,var1,var2)
              Now some advice about Statalist. Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using CODE delimiters, and to use the dataex command to provide sample data, as described in section 12 of the FAQ.

              The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

              Comment


              • #8
                error

                Comment


                • #9
                  Hi everyone, I'm so excited about joining the forum and have this great opportunity to find solutions. I have a follow up question on this topic and I appreciate your insights.

                  So, I generated new variables using any of the three provided codes.
                  Code:
                  egen scope = rowtotal(scope1 scope2),missing
                  gen scope = cond(missing(scope1), scope2, cond(missing(scope2), scope1, .))
                  gen scope = cond(scope1!=.,scope1,scope2)

                  Here is the result for two pairs of variables as an example.
                  Code:
                  . list record_id2 scope1 scope2 scope pain_score1 pain_score2 pain_score in 1/10
                  
                       +---------------------------------------------------------------------------------------------------+
                       | record~2                       scope1             scope2   scope   pain_s~1   pain_s~2   pain_s~e |
                       |---------------------------------------------------------------------------------------------------|
                    1. |        1   Emergency Medicine-General                  .      14          .          .          . |
                    2. |        2              Family Medicine                  .      16          .          .          . |
                    3. |        3              Family Medicine                  .      16          .          .          . |
                    4. |        4              General Surgery                  .      19          .          .          . |
                    5. |        5                            .       Neurosurgery      38          .          5          5 |
                       |---------------------------------------------------------------------------------------------------|
                    6. |        6            Internal Medicine                  .      28          .          .          . |
                    7. |        7                            .   Physical Therapy      57          .        4.5        4.5 |
                    8. |        8              Family Medicine                  .      16          1          .          1 |
                    9. |        9            Internal Medicine                  .      28          .          .          . |
                   10. |       10                            .       Neurosurgery      38          .          .          . |
                       +---------------------------------------------------------------------------------------------------+
                  Note: "scope1" and "scope2" are shown in the value label. The generated "scope" variable is shown in the actual value—it is not labeled!
                  Note: At least one of the two variables is missing in each pair of variables.

                  However,
                  1. I have 112 pairs of variables that I wish to combine as such. Is there a loop command I can use? Or any other way around it.
                  2. How can I assign the label of the "scope1 or scope2" variables to the new variable "scope"?

                  Thank you so much for your time.
                  Last edited by tina antelope; 12 Jul 2024, 10:23.

                  Comment


                  • #10
                    Originally posted by tina antelope View Post
                    However, I have 112 pairs of variables that I wish to combine as such. Is there a loop command I can use?
                    It depends whether these variables share the same name and are suffixed by the numbers 1 and 2. If that is the case, collect the prefixes in a local

                    Code:
                    local list scope somevar someothervar ...
                    foreach var of local list{
                        egen `var' = rowtotal(`var'1 `var'2),missing
                        label values `var' `:val lab `var'1'
                    }
                    The above assumes that each set of prefixed variables share the same value labels. So we just assign the value labels of the variable suffixed with 1. Do you need to automatically extract the prefixes? Post back if the assumptions that I have made do not hold or if you need to extract the prefixes.

                    Comment


                    • #11
                      Thank you, Andrew, for your prompt help.
                      Yes, these variables share the same name and are suffixed by the numbers 1 and 2.

                      The label command works perfectly in general. I'm yet to run the loop command. I'm getting a "type mismatch" error. My variables are float, byte, int. Would you mind helping me with that, too?
                      Last edited by tina antelope; 12 Jul 2024, 11:43.

                      Comment


                      • #12
                        Originally posted by tina antelope View Post
                        Thank you, Andrew, for your prompt help.
                        Yes, these variables share the same name and are suffixed by the numbers 1 and 2.

                        The label command works perfectly in general. I'm yet to run the loop command. I'm getting a "type mismatch" error. My variables are float, byte, int. Would you mind helping me with that, too?

                        My bad! I had a string variable among my 100-something variables! I removed it, and the whole loop command worked perfectly! Thank you!
                        Last edited by tina antelope; 12 Jul 2024, 13:08.

                        Comment


                        • #13
                          A variant of your second code in #9 does not depend on the variable type. Then we can add a capture command in front of the label command so that it absorbs any error messages that result from labeling strings. There are other approaches, however.

                          Code:
                          help capture
                          Code:
                          local list scope somevar someothervar ...
                          foreach var of local list{
                              gen `var' = cond(missing(`var'1), `var'2,`var'1)
                              cap label values `var' `:val lab `var'1'
                           }

                          Comment

                          Working...
                          X