Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating new variable by counting observations

    Hello Members,

    I have a data set (shown below). I want to generate a new variable that counts the number of times v1, v2, and v3 are greater than or equal to time. This, however, does not include cases where v1, v2, or v3 are the last observation in the row. In this case, the last observation cannot equal time.


    Code:
    clear
    input float(obs time v1 v2 v3)
    1 1 1 2 3 
    2 2 1 . . 
    3 3 3 4 . 
    4 4 4 . . 
    5 . 1 2 .
    6 . 1 . .
    7 . 3 4 .
    8 . 2 4 .
    end
    tempfile dataset1
    save `dataset1'
    Here is an example of what I would like the end result to be:

    Code:
    clear
    input float(obs time v1 v2 v3 Count)
    1 1 1 2 3 6
    2 2 1 . . 5
    3 3 3 4 . 4
    4 4 4 . . 0
    5 . 1 2 . . 
    6 . 1 . . .
    7 . 3 4 . .
    8 . 2 4 . .
    end
    tempfile dataset1
    save `dataset1'
    I would appreciate any assistance with this!

    Thanks,
    A


  • #2
    I don't understand what you want.

    In obs 1, we have v1, v2, and v3 are all >= time, so that counts up to 3. You say you want 6. So maybe you want the sum of those values of v1, v2, and v3, which are >= time. But then in row 2, time == 2, v1 = 1, and v2 and v3 are missing. So none of them are >= time which suggests you should want count = 0. Now, even if you follow the Stata convention that missing values is > any real number, then the answer should be 2 (v2 and v3). But you say you want 5.

    In short, I have no idea how you are getting your proposed values for count. Please explain in greater detail.

    Comment


    • #3
      Clyde Schechter Sorry about that, I will try to explain more clearly.

      I want to count the number of observations that remain after each time period. For instance, there are six observations that remain in row 1 (obs 1, 3, 4, 5, 7, and 8). Observations 2 and 6 are excluded because their row value is 1, which equals the time value for that row. In the second row where time=2, there are only 5 remaining observations and count=5. In this case, row 5 is dropped because v2=2 (in addition to the two observations previously dropped for time=1). In row 3, there are 4 observations remaining (obs 3, 4, 7, and 8). Lastly, in row 4, there are zero observations remaining because the last value in rows 3, 4, 7, and 8 equal 4. The count column should be the same length as the time column.

      I hope this clarifies a bit more.

      Thanks,
      Anoush

      Comment


      • #4
        I'm sorry, but I still don't understand. What role do v1, v2, and v3 play in this? You mention them, but it isn't clear why you chose the particular one you mention when you do. What does "its row value is 1" mean? What is a "row value?"

        What do you mean by "The count column should be the same length as the time column." In a Stata data set all of the variables (columns) are necessarily of the same length.

        N.B. If somebody else following this thread understands what is wanted, do jump in with a solution.

        Comment


        • #5
          Clyde Schechter I apologize, I will try to explain.

          v1, v2, and v3 are used to determine whether or not a row is included in the "count" variable. Rows are not counted when the last value in rows containing v1, v2, or v3 exceed a time value. In the first row where time=1, count=6 because six rows remain after time period 1 (based on their v1, v2, and v3 values). In the first row, obs=2 and obs=6 are dropped from the count because v1=1, which would indicate that they did not pass the time=1 threshold. In the second row where time=2, obs=5 does not pass the time=2 threshold because the last value in that row is 2. Therefore, it is not included in the "count" variable. This is repeated until the last time period (time=4). Since observations 5-8 do not have a time value, they will not have a "count" observation.

          A

          Comment


          • #6
            I think I get it now.
            Code:
            gen last_value = .
            forvalues i = 1/3 {
                replace last_value = v`i' if !missing(v`i')
            }
            
            gen timep1 = time + 1
            rangestat (count) wanted = obs, interval(last_value, timep1, .)
            replace wanted = 0 if missing(wanted)
            replace wanted = . if missing(time)
            -rangestat- is by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC.

            Comment


            • #7
              Clyde Schechter Thank you so much!

              Comment

              Working...
              X