Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Thank you Nick Cox for your explanation. I used your code and included 'wanted' in my Cox regression using factor variable notation (i.wanted), but Stata returned:
    wanted: factor variables may not contain negative values r(452)
    I then included it as a continuous variable (c.wanted) and received the following output. I'd appreciate help from anyone on how to interpret the hazard ratio for this variable 'wanted' given his takes both positive values (the number of years that the male in the couple is older than the female) and negative values (the number of years that the female in the couple is older than the male):
    Code:
              _t     Haz. Ratio    Std. Err.    z     P>|z|   [95% Conf. Interval]
            wanted     1.016876    .0155319   1.10    0.273   .9868852   1.047778

    Comment


    • #17
      Please repeat the exact definition of wanted you used for anyone else to comment.

      The Statalist habit of using the name wanted for the variable that someone wants to generate is a meme I think I started. I suspect it started by accident when someone said they wanted to calculate something but didn't give a name they wished to use. Or perhaps someone mentioned a name that made sense to them but looked too long or too cryptic for me to want to type it.

      That's a very small deal, although I note that some other people answering questions have picked up on using wanted as generic too, which also is a very small deal, but still interesting to me.

      It does serve as a way to flag what is, well, wanted, as a result of calculations with what is available. But good practice for the person who wants the code is to use some other name that describes the content or meaning of the variable.
      Last edited by Nick Cox; 12 Aug 2020, 03:09.

      Comment


      • #18
        Hi Nick Cox. Yes of course. The "wanted" variable in this case refers to 'Age Difference by Gender' between male and female couples to test if gender has an effect with respect to age differences in relationships. As noted in #16, this variable takes on both negative and positive values and this is explained in the code in #8:
        Code:
        gen wanted = cond(hgsex == 1 , hgage - p_hgage, cond(hgsex == 2, p_hgage - hgage, .))
        which calculates age differences in couples. When the male's age is greater than the female's age, this is shown by a positive value, when the female's age is greater than the male's age, this is shown as a negative value.

        Help in understanding the hazard ratio in the output for this variable above in #16 is kindly appreciated.

        I also wanted to thank you for the link to the book (Gelman, A., Hill, J., Vehtari, A. 2021. Regression and Other Stories. Cambridge: Cambridge University Press). (I am still working through the cond() tutorial - thank you). Kind regards, Chris.
        Last edited by Chris Boulis; 12 Aug 2020, 06:12.

        Comment


        • #19
          That should be helpful to anyone who knows about Cox survival stuff (as flagged already at #15 I don't).

          Comment


          • #20
            Hi Nick Cox. I wanted to ask a question about the two age difference variables I've generated based on your code in #3 (age difference) & #12 (age difference by gender). I first coded the first of the two as per your code:
            Code:
            gen agediff = max(hgage, p_hgage) - min(hgage, p_hgage)
            But given your code in #12, I was wondering if it is more accurate to code as
            Code:
            gen agediff = max(hgage, p_hgage, .) - min(hgage, p_hgage, .)
            Finally, the code for agediffsex is as per yours in #12
            Code:
            gen agediffsex = cond(hgsex == 1 , hgage - p_hgage, cond(hgsex == 2, p_hgage - hgage, .))
            That said, I'm not sure why the total # of observations differ between these variables. When I "tab 'agediff'", it shows there are 276,145 observations, but "tab 'agediffsex'" only has 88,282, though when I "tab 'agediffsex', miss", it has the same number as the 'agediff' variable, indicating 187,863 are missing. The difference in observations occurs in the zero "0" value. That is, "0" in -tab- 'agediff' is 198,402, but only 10,539 in 'agediffsex'. Is there an explanation as to why these zero values in 'agediff' show up as 'missings' in 'agediffsex'? Kind regards, Chris

            Comment


            • #21
              I was wondering if it is more accurate to code as

              gen agediff = max(hgage, p_hgage, .) - min(hgage, p_hgage, .)
              I don't understand what you are thinking here. Nothing is made more accurate by adding a comparison with missing. Although system missing is treated as arbitrarily large positive, max() ignores it to the extent possible, so max(hgage, p_hgage, .) will yield missing if and only if max(hgage, p_hgage) does. It's also true that the minimum function will yield missing if and only if both of the age variables are missing. Adding missing as a third argument is harmless but pointless.

              I can't easily follow your other questions. But the first age difference will be zero if the ages are the same, positive if the ages are known and different and missing if either age is missing. That is quite different from the second variable which will be zero if the ages are the same but will have a sign otherwise if both ages are known.

              Comment


              • #22
                Thank you Nick Cox for clarifying. So if the age of the male and female in a couple is the same, then the number of observations for zero should be the same for both variables. However, agediff 0 = 198,402; agediffsex 0 = 10,922. [FYI, there are no 'missing' for agediff, but 'missing' for agediffsex is 187,863. Tabulating "agediff" and "agediffsex, miss", total observations are equal at 276,145]. I do not understand why the difference in age of a couple would equal '0' in agediff, but be 'missing' in agediffsex?

                Comment


                • #23
                  Sorry, but I don't think I can help further. Necessarily you have your dataset and should understand it. I've looked at the code I've suggested and think it's what you asking for, but I can't go beyond that.

                  Comment

                  Working...
                  X