Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Coding a new variable: where a category selects 'high' responses for one variable only if there are low responses for another variable

    Hi Statalist.

    I want to code 'impat' with three categories (1 = both low, 2 = one high, 3 = both high). These three levels reflect the levels/categories of each of the categorical variables in my dataset. Note, each variable has a value for the respondent (e.g. relimp, relat) and for their partner (e.g. p_relimp, p_relat). In words, I want "impat == 2" only if relimp or relat == 3, but not both as this is captured in "impat == 3".
    Code:
    gen impat = 1 if (relimp2 == p_relimp2 & relimp2 == 1) & (relat2 == p_relat2 & relat2 == 1)     // both low
    replace impat = 2                                                                     // one high (import/attend)
    replace impat = 3 if (relimp2 == p_relimp2 & relimp2 == 3) & (relat2 == p_relat2 & relat2 == 3) // both high
    I believe I have coded 'impat = 1' and 'impat = 3' correctly, so I appreciate help to code 'impat == 2'.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long(id p_id) byte wave float(relimp2 p_relimp2 relat2 p_relat2)
    110 163 1 1 1 1 1
    110 163 2 2 1 1 1
    114  115 1 3 1 1 1
    114  115 2 3 1 1 1
    114  115 3 3 1 1 1
    116  279 1 1 1 1 1
    116  279 2 1 1 1 1
    116  279 3 1 1 1 1
    118  119 1 2 1 1 1
    118  119 2 3 1 1 1
    118  119 3 2 1 1 1
    118  119 4 3 1 1 1
    123  124 1 3 3 3 3
    123  124 2 3 3 3 3
    123  124 3 3 3 2 2
    123  124 4 3 3 3 3
    123  124 5 3 3 2 2
    125 132 1 3 1 3 3
    126 135 2 3 3 3 3
    126 135 3 3 3 3 3
    126 135 4 3 3 3 3
    129  130 1 1 2 1 1
    129  130 2 1 2 1 1
    129  130 3 2 1 2 1
    138  139 1 3 2 1 1
    138  139 2 3 2 1 1
    138  139 3 3 3 2 2
    138  139 4 2 1 1 1
    end
    Note: I have included multiple responses over different waves for a number of couples as I want to ask how I deal with changes in responses over time in my code? Should I take the average or the last level recorded?
    Last edited by Chris Boulis; 29 Aug 2020, 21:49.

  • #2
    I do not think you have explained the rule, at least I do not understand the rule you are trying to implement.

    You discuss what you want in terms of two variables, relimp, relat, but there are 4 variables in your data, the two mentioned, and the two for the partner. Then the code you show is in terms of 4 variables.

    Comment


    • #3
      Hi Joro Kolev. Sorry for not being clearer. I am trying to pair up the two variables for a couple - that is, the respondent and their partner. That means I need to find those couples that share the same level score in their responses. The new variable "impas" combines both the respondent's responses (for relimp & relat) and those for their partner (for p_relimp & p_relat).

      So the code for "impat = 2" should ensure that both in a couple are EITHER: "high (3) in relimp and low (1) or med (2) in relat" OR "high (3) in relat and low (1) or med (2) in relimp". I hope this clarifies the rule better. Kind regards, Chris

      Comment


      • #4
        Check whether this is not doing what you want to do:

        Code:
        . gen impat = cond( relimp2==relat2==1, 1, cond(relimp2==relat2==3,3,2) )
        
        . replace impat = 0 if relimp2!=p_relimp2 | relat2!=p_relat2
        (15 real changes made)

        Comment


        • #5
          Thank you Joro Kolev. No it didn't. Do you mind explaining whether the first line of code will provide me with these three categories?

          1) both husband and wife have a low level for both variables - where relimp (and p_relimp) and relat (and p_relat) are low (==1).

          2) husband and wife have a different level for both variables - where both have a high score (==3) for one (relimp and p_relimp) or (relat and p_relat) and a low (==1) or med score (==2)). e.g. relimp & p_relimp == 3 (high) and relat & p_relat == 1 (low) or ==2 (med) OR relat & p_relat == 3 and relimp & p_relimp ==1 or == 2.

          3) both husband and wife have a high level for both variables - where relimp (and p_relimp) and relat (and p_relat) is high (=3).
          Last edited by Chris Boulis; 30 Aug 2020, 00:04.

          Comment


          • #6
            You are right, Chris, it does not give the desired result. The problem comes from these triple conditions relimp2==relat2==1 and relimp2==relat2==3, they do not evaluate to what I think they should evaluate.

            The last line is clear, I just set to 0 the observations where the scores for the two people are in disagreement. Let us instead drop them to have fewer observations to look at:

            Code:
            . drop if relimp2!=p_relimp2 | relat2!=p_relat2
            (15 observations deleted)
            Then after I have changed the conditions, the modified first line is

            Code:
            . gen impat = cond( relimp2==relat2 & relimp2==1, 1, cond(relimp2==relat2 & relimp2==3,3,2) )
            
            . list, sep(0)
            
                 +--------------------------------------------------------------------+
                 |  id   p_id   wave   relimp2   p_reli~2   relat2   p_relat2   impat |
                 |--------------------------------------------------------------------|
              1. | 110    163      1         1          1        1          1       1 |
              2. | 116    279      1         1          1        1          1       1 |
              3. | 116    279      2         1          1        1          1       1 |
              4. | 116    279      3         1          1        1          1       1 |
              5. | 123    124      1         3          3        3          3       3 |
              6. | 123    124      2         3          3        3          3       3 |
              7. | 123    124      3         3          3        2          2       2 |
              8. | 123    124      4         3          3        3          3       3 |
              9. | 123    124      5         3          3        2          2       2 |
             10. | 126    135      2         3          3        3          3       3 |
             11. | 126    135      3         3          3        3          3       3 |
             12. | 126    135      4         3          3        3          3       3 |
             13. | 138    139      3         3          3        2          2       2 |
                 +--------------------------------------------------------------------+
            I think this now does it, no?

            Originally posted by Chris Boulis View Post
            Thank you Joro Kolev. No it didn't. Do you mind explaining whether the first line of code will provide me with these three categories?

            1) both husband and wife have a low level for both variables - where relimp (and p_relimp) and relat (and p_relat) are low (==1).

            2) husband and wife have a different level for both variables - where both have a high score (==3) for one (relimp and p_relimp) or (relat and p_relat) and a low (==1) or med score (==2)). e.g. relimp & p_relimp == 3 (high) and relat & p_relat == 1 (low) or ==2 (med) OR relat & p_relat == 3 and relimp & p_relimp ==1 or == 2.

            3) both husband and wife have a high level for both variables - where relimp (and p_relimp) and relat (and p_relat) is high (=3).

            Comment


            • #7
              Thank you Joro Kolev. I appreciate your efforts. Yes this is much closer to what I want. There is an issue arising from the changing responses by the pair "123 & 124" for the variable relat/p_relat, which we observe changes between 2 and 3, would therefore prohibit it from being captured by "impat == 1 or impat == 3 due to no longer 'fitting' the rule that both in a couple being == 1 or both == 3 over time. I therefore think such cases should be captured by "impat == 2".

              To further define the rule for "impact == 2", I stress the key factor that both in the couple must score a "3" for one of the two variables and something less than "3" for the other variable. In the example for couple "123/124" continuously scored "3" for relimp/p_relimp. Though, as the score for relat/p_relat changed (over time) it therefore 'fits' the rule for "impat == 2" that both in a couple score "3" for one of the two variables and "1 or 2" or not consistently "3" (if that is clearer) for the other variable. And the second category (impact == 2) would also include the case where both partners scored "3" for one variable and one partner scored "3" and the other partner scored "2 for the other variable - as it is about equal scores for each variable for both in the couple.

              Kind regards,
              Chris
              Last edited by Chris Boulis; 31 Aug 2020, 21:10.

              Comment


              • #8
                Hi Joro Kolev. After re-reading #6, to remove any confusion, I have created 2 variables "import" (that combines equal levels of relimp2 & p_relimp2 (e.g. relimp2 = 1 & p_relimp = 1 == import == 1), and "attend" (that combines equal levels of relat2 & p_relat2).

                I want a three category variable "impat" to be made up of different levels of import and attend. My draft code:
                Code:
                gen impat == 1 if import == 1 & attend == 1  // both low
                replace impat = 2 if max(import attend), min(import attend) // only one == 3 (high), the other must not == 3 (high) (can be 1 or 2 (low or med), but not 3 (high)
                replace impat == 3 if import == 3 & attend == 3  // both high
                The rules for each category are:
                (1) impat = 1 if both import & attend == 1 // both low
                (2) impat = 2 if import == 3 & attend != 3) OR (attend == 3 & import != 3) // only one of the two equal 3 (high) e.g. if 'import' == 3, 'attend' must equal either 1 or 2 OR 'attend' == 3, 'import' must equal 1 or 2.
                (3) impat = 3 if both import & attend == 3 // both high
                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input long(id p_id) byte wave float(import attend)
                1 2 4 1 1
                1 2 7 1 1
                1 2 10 1 1
                1 2 14 1 1
                1 2 18 1 1
                3 4 4 3 3
                3 4 7 3 3
                3 4 14 3 3
                3 4 18 3 3
                5 6 4 3 2
                5 6 7 3 2
                5 6 10 3 2
                5 6 18 3 3
                7 8 4 1 2
                7 8 7 1 1
                7 8 10 1 2
                7 8 14 1 1
                7 8 18 1 1
                10 11 4 1 1
                10 11 7 1 1
                10 11 10 1 1
                10 11 14 1 1
                10 11 18 1 1
                12 14 4 3 1
                12 14 7 3 2
                12 14 10 3 2
                12 14 14 3 1
                12 14 18 3 2
                15 16 4 1 1
                15 16 7 1 1
                15 16 10 1 2
                15 16 14 1 2
                15 16 18 1 1
                21 22 4 1 3
                21 22 7 1 3
                21 22 10 1 3
                21 22 14 1 3
                21 22 18 1 3
                23 24 4 3 1
                23 24 7 3 1
                23 24 10 3 2
                23 24 14 3 2
                23 24 18 3 1
                end
                While I'm sure the code in lines (1) & (3) are correct, I'm not sure of code in (2) but thought -max- -min- was key. Also, I haven't factored in how values change over time. What I would like is that if they change over time then they will not meet rules for "impat == 1" or "impat == 3". I appreciate your help.
                Last edited by Chris Boulis; 03 Sep 2020, 00:48.

                Comment

                Working...
                X