Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining egen with if command

    Hi All,

    I have data which resembles the following:


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(num_obs y)
    1 10
    0 11
    end

    Here, I have data which contains a variable num_obs, and another variable y. I wish to calculate conditional minimums of y by num_obs, making use of the egen command. There seem to be two different interpretations, the distinction between which is not obvious from the sytanx.

    Consider

    Code:
    egen min1=min(y) if num_obs==1
    egen min2=min(y) if num_obs==0
    egen min3=min(y)
    From the results, it seems the way egen works is that it evaluates the if condition first (line by line), and then computes the minimum from the surviving lines. For the lines that do not survive the -if- condition, a reasonable value of missing is assinged.
    Another interpretation of the egen command could be that it would always calculate the global minimum (say the value which coincides with min3), and then only assign values where the -if- condition is met. In other words, the -if- condition is not used in computation, but only in assignment. At present, it seems that it is used in both computation as well as assignment. Have I completely lost the plot here? Is there a way to reconcile this seeming distinction? Or is this somehow apparent from the syntax which I might have missed.

    Many thanks,
    CS








  • #2
    I guess what you're looking for is this:

    Code:
    egen min1 = min(cond(num==1,y,.))
    egen min0 = min(cond(num==0,y,.))
    egen min = min(y)
    list
         +----------------------------------+
         | num_obs    y   min1   min0   min |
         |----------------------------------|
      1. |       1   10     10     11    10 |
      2. |       0   11     10     11    10 |
         +----------------------------------+

    Comment


    • #3
      Ali Atia Thanks, but I was not looking for a solution- just a clarification regarding the fact that the syntax associated with egen does not clearly distinguish between evaluating conditions, assignments, or both.

      Comment


      • #4
        I can't see any ambiguity in the code in #1 compared with standard Stata logic. Note that we are talking about the if qualifier and certainly not about the if command which is different.

        egen given an if qualifier does calculations on the subset identified by the if condition, produces a result for those, and (generally) returns missing for any observations in the complementary subset.

        It's not standard for calculations to be performed on the whole dataset and then assigned only to the subset identified by if.

        Here's scope for a twist. however: the if qualifier is passed to the function code, say that for min() in this case, and it's up to that function code to handle it.

        The twist known to me is tag() which only ever produces 1 or 0 as result and never missing, even if the if condition is never satisfied. In a strong sense producing a (0, 1) indicator is its entire purpose. There was some user flak about that an age ago, but the function was written quite deliberately that way.

        Comment


        • #5
          Note that -egen- is byable and

          Code:
          egen min = min(y), by(num_obs)
          will give you the minimum by group without using the IF qualifier.

          Otherwise -egen- does exactly what you explain "it evaluates the if condition first (line by line), and then computes the minimum from the surviving lines. For the lines that do not survive the -if- condition, a reasonable value of missing is assinged," in what you call "first interpretation".

          I have never heard of the "other interpretation", which with particular real life example would read, e.g., you have a sample of women and men, and you wonder what is the minimal salary if the person is a woman. So in the "other interpretation" we find the global minimal salary in the sample, and assign it to the women, which is a nonsense. There is no sense in which this operation corresponds to finding the minimal salary if a person is a woman.

          Comment


          • #6
            Admittedly there is no reason why a missing value should be assigned to the observations that do not pass the condition.

            I do not see anything logically wrong with the operation

            "Generate the minimum salary if the person is a woman"

            to result in calculating the minimum salary for the subset of women, and assigning this number to everybody in the sample, including the men.

            This is not what Stata does by default, and if this behaviour is desired, it can be done in at least 3 ways. Ali showed one of these three ways in his #2.

            Comment


            • #7
              Indeed; there is often a need to assign values based on values in other observations. in which case you need something other than #1. You might want to compare female salaries with the minimum, median or mean male salary. You might want the ages or education levels of children compared with those of their parents.

              https://www.stata-journal.com/articl...article=dm0055 surveyed some basic devices in that territory.

              Comment


              • #8
                Thanks Joro Kolev and Nick Cox . Very illuminating.

                Comment

                Working...
                X