Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Understanding cond command

    Hi there, I would like to understand more on the condition command

    I have read Prof Cox article on Stata Journal regarding the cond and used stata help, youtube.
    However I still have some questions

    Code - sugessted by one of the users (with thanks for a separate problem)
    bys IDno: egen last_surgical=max(cond( surgicalindex==1,admidate,.))
    Question A :

    As far as I know after reading the above sources, statahelp presenst the command as:

    cond(x,a,b[,c]) -

    Where one defines the x, A = IF x is true and non missing, B = X is false and nonmissing, C = if x is missing - as does Nick Cox Paper.

    The above I understand, however when one reads the command highlighted in red

    1. I understand that a newvariable is created (egen last_surgical)

    2. One tells stata to take the Max admidate if surgicalindex ==1

    However, (which is where I need help ) ---> it seems that if this is the case the explanation of cond by Stata is no longer valid (see Question A) .... is this the case?
    Any help explaining this ?

    Note: Surgicalindex is a binary variable which is 1 or 0
    Admidate: Date of admission into hospital

  • #2
    The implied reference is https://www.stata-journal.com/articl...article=pr0016 and (credit where credit is due) David Kantor is first author.

    cond() is a function, not a command. Feel free to regard the distinction as pedantic, but it's Stata's distinction, not mine. In Stata command and function are not synonymous.

    I don't understand the question, however. Perhaps the puzzlement is that Stata ignores missings when running egen, max() to the extent possible.

    The code says calculate the maximum for each patient over a series of dates, but ignore observations where the index is not 1.

    stata-journal.com/article.html?article=dm0055 Section 9 goes over the same idea from a different but consistent perspective.

    Code:
    * Example generated by -dataex-.
    clear
    input float(id index date)
    1 1 21944
    1 1 22054
    1 1 22188
    1 1 22272
    1 1 22804
    1 0 21987
    1 0 22246
    1 0 22483
    1 0 22812
    end
    format %td date
    
    bysort id : egen wanted = max(cond(index == 1, date, .))  
    
    format %td wanted
    
    list
    
         +------------------------------------+
         | id   index        date      wanted |
         |------------------------------------|
      1. |  1       1   30jan2020   08jun2022 |
      2. |  1       1   19may2020   08jun2022 |
      3. |  1       1   30sep2020   08jun2022 |
      4. |  1       1   23dec2020   08jun2022 |
      5. |  1       1   08jun2022   08jun2022 |
         |------------------------------------|
      6. |  1       0   13mar2020   08jun2022 |
      7. |  1       0   27nov2020   08jun2022 |
      8. |  1       0   22jul2021   08jun2022 |
      9. |  1       0   16jun2022   08jun2022 |
         +------------------------------------+
    Last edited by Nick Cox; 05 Oct 2022, 07:31.

    Comment


    • #3
      The code says calculate the maximum for each patient over a series of dates, but ignore observations where the index is not 1

      Precisely, but shouldn’t the cond function only:

      Where one defines the x, A = IF x is true and non missing, B = X is false and nonmissing, C = if x is missing

      or can cond function be used with several other commands like max min? And if so does the syntax change ?
      Last edited by Denise Vella; 05 Oct 2022, 12:53.

      Comment


      • #4
        Just true and false. Whether any argument is non-missing or missing is irrelevant.

        Also, the extended syntax is not being used in the example, so it is just a distraction here.

        Comment


        • #5
          Apologies if I am failing to make myself clear

          If the cond function only does True or False

          then why in the code below:

          bys IDno: egen last_surgical=max(cond( surgicalindex==1,admidate,.))

          We know the code above tells stata says if

          surgicalindex==1

          Select the highest admidate

          but if cond function Just does true and false as per your article and stata help why does the ‘max’ over-ride this in the code (red)

          (in no way does admidate represent anything ‘false’ or anything other than expected - as Max tells stata to select the highest admindate rather than represent anything false


          I’m really trying to make myself clear. Just want to fully understand cond function so I can use it in the future
          Last edited by Denise Vella; 05 Oct 2022, 15:35.

          Comment


          • #6
            I think you are misunderstanding the order of application of max() and cond(). Because the entire cond() expression falls inside max(), cond() is evaluated first. So, first Stata goes one by one through the observations of an ID no, and, behind the scenes, calculates a new variable, let's call it X, which is set to the value of admidate in those observations wher surgicalindex == 1, and to missing value otherwise. That is cond()'s "job" and once it has done that, it is out of the picture. Next, max() comes in and picks the largest value of X for the given ID, and stores that in the new variable last_surgical.

            So cond() works exactly as described. In particular, cond() is not picking the largest value. max() is doing that after cond() has finished its work.

            Comment


            • #7
              Clyde Schechter Thanks

              Comment


              • #8
                Great conversation that points me to something that I've been trying to get a one-liner for.
                Say you have a panel dataset and you wish to create a reference group of one or more cross sections.
                Here, I want all the mean of countries that are not the USA.

                t = time variable.
                dus = dummy for USA.
                X = var of interest.

                Code:
                bys t: egen X_peers = mean(cond(~dus,X,.))
                Command populates all countries/dates with the non-US mean.

                Comment

                Working...
                X