Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding dummy variables together

    I have a data set which as variables based on yes/no questions. I have recoded all the variables (9 total) so that the yes answer is 1 and the no is 0. Now I want to make a new variable=1 in which 1 is every individual whose sum of the 9 dummy variables is >=5.

    gen F_ACTIVIST_W127=1 if total (NEW_SM10_a_W127, NEW_SM10_b_W127, NEW_SM10_c_W127, NEW_SM10_d_W127, NEW_SM10_e_W127, NEW_SMBLM_a_W127, NEW_SMBLM_b_W127, NEW_SMBLM_c_W127, NEW_SMBLM_d_W127)>=5

    this is the code that I tried that didn't work.

  • #2
    * Step 1: Calculate the total sum of the 9 dummy variables
    egen total_sum = rowtotal(NEW_SM10_a_W127 NEW_SM10_b_W127 NEW_SM10_c_W127 ///
    NEW_SM10_d_W127 NEW_SM10_e_W127 NEW_SMBLM_a_W127 NEW_SMBLM_b_W127 ///
    NEW_SMBLM_c_W127 NEW_SMBLM_d_W127)

    * Step 2: Create the new variable based on the condition
    gen F_ACTIVIST_W127 = 1 if total_sum >= 5
    replace F_ACTIVIST_W127 = 0 if total_sum < 5

    i think it works . writtern by chatGPT

    Comment


    • #3
      It's likely that there's a pattern that catches the indicator (a.k.a. dummy variables) in question, and only those, allowing a crisper statement such as

      Code:
      egen total_sum = rowtotal(NEW_SM10_*_W127 NEW_SMBLM_*_W127)
      Even if that isn't so, only one more line needed

      Code:
       
      gen F_ACTIVIST_W127 = total_sum >= 5
      The problem with #1 is that total() is an egen function (and the wrong one to boot).

      If it were a general function, commas would be needed.

      You could naturally write out

      [CODE]gen F_ACTIVIST_W127 = (NEW_SM10_a_W127 + NEW_SM10_b_W127 + NEW_SM10_c_W127 +
      NEW_SM10_d_W127 + NEW_SM10_e_W127 + NEW_SMBLM_a_W127 + NEW_SMBLM_b_W127 +
      NEW_SMBLM_c_W127 + NEW_SMBLM_d_W127) >= 5

      /CODE]

      Comment


      • #4

        You could naturally write out a single statement.

        Code:
        gen F_ACTIVIST_W127 = (NEW_SM10_a_W127 + NEW_SM10_b_W127 + NEW_SM10_c_W127 +
        NEW_SM10_d_W127 + NEW_SM10_e_W127 + NEW_SMBLM_a_W127 + NEW_SMBLM_b_W127 +
        NEW_SMBLM_c_W127 + NEW_SMBLM_d_W127) >= 5
        Not wanting to do that would be a style preference. The syntax is legal.

        Note however that missings on any of those variables would result in missing for the sum, which is deemed greater than zero. That is one possible reason to prefer rowtotal().

        Comment

        Working...
        X