Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating max by group

    Hi, I'm playing around with data
    Dataconsists of intervention, vehicle no, visitno - no of times the
    car went to the MOT.
    I would like to detect the max for brake and oil, for each time an intervention took
    place. This would mean taking the max of the preceeding values of the same vehicleno

    DATA:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(intervention vehicleno episodeno brake oil)
      . 1 1 1 0
    110 1 2 1 0
      . 2 1 0 1
      . 2 2 0 1
    112 2 3 1 1
      . 3 1 0 1
    113 3 2 0 1
    114 4 1 1 0
      . 4 2 1 0
      . 4 3 1 1
    115 4 4 1 1
    end
    I tried:

    collapse (max) brake oil, by (vehicleno intrevention)
    But then this wouldn't give me vehicleno: 4 episode 2-4

    So ideally data would look like this:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(intervention vehicleno episodeno brake oil) str1(maxBRAKE maxOIL)
      . 1 1 1 0 ""  "" 
    110 1 2 1 0 "1" "0"
      . 2 1 0 1 ""  "" 
      . 2 2 0 1 ""  "" 
    112 2 3 1 1 "1" "1"
      . 3 1 0 1 ""  "" 
    113 3 2 0 1 "0" "1"
    114 4 1 1 0 "1" "0"
      . 4 2 1 0 ""  "" 
      . 4 3 1 1 ""  "" 
    115 4 4 1 1 "1" "" 
    end

  • #2
    This would mean taking the max of the preceeding values of the same vehicleno
    What is the definition of "preceding values"?

    For example, for vehicle no 4, episode 1, there were not any "preceding value", why it took 1 and 0 instead of missing and missing?

    Same goes for the last case, what are they 1 and missing? Instead of 1 and 1?

    Comment


    • #3
      For example, for vehicle no 4, episode 1, there were not any "preceding value", why it took 1 and 0 instead of missing and missing?

      because the mechanic at the Mot/garage detected the brake failure (1) and not oil failure on that visit.

      everytime a brake or oil failure was detected - it’s coded as 1
      if it’s absent then coded as 0

      please note episode no is == visitno (everytime the owner took car to garage/mot) apologies I forgot to rename it

      Comment


      • #4
        For every time there was an intervention to the car , I want to calculate the max no for brake or oil from the preceeding visits
        In summary:

        eg : Max oil & brake of intervention 110; vehicle no 1, consisting of episodes 1-2

        Max oil & brake of intervention 112; vehicle no 2, consisting of episodes 1-3

        Max oil & brake of intervention 113; vehicle no 3, consisting of episodes 1-2

        Max oil & brake of intervention 114; vehicle no 4, consisting of episodes 1


        Max oil & brake of intervention 115; vehicle no 4, consisting of episodes 2-4

        Comment


        • #5
          Code:
          gsort vehicleno -episodeno
          by vehicleno: gen intervention_num = sum(!missing(intervention))
          
          sort vehicleno intervention_num
          by vehicleno intervention_num: egen byte max_brake = max(brake)
          by vehicleno intervention_num: egen byte max_oil = max(oil)
          
          replace max_brake = . if missing(intervention)
          replace max_oil = . if missing(intervention)
          drop intervention_num
          
          sort vehicleno episodeno
          which produces:

          Code:
          . li , noobs sepby(vehicleno)
          
            +-------------------------------------------------------------------+
            | interv~n   vehicl~o   episod~o   brake   oil   max_br~e   max_oil |
            |-------------------------------------------------------------------|
            |        .          1          1       1     0          .         . |
            |      110          1          2       1     0          1         0 |
            |-------------------------------------------------------------------|
            |        .          2          1       0     1          .         . |
            |        .          2          2       0     1          .         . |
            |      112          2          3       1     1          1         1 |
            |-------------------------------------------------------------------|
            |        .          3          1       0     1          .         . |
            |      113          3          2       0     1          0         1 |
            |-------------------------------------------------------------------|
            |      114          4          1       1     0          1         0 |
            |        .          4          2       1     0          .         . |
            |        .          4          3       1     1          .         . |
            |      115          4          4       1     1          1         1 |
            +-------------------------------------------------------------------+
          Last edited by Hemanshu Kumar; 07 Oct 2022, 20:08.

          Comment


          • #6
            This uses some similar ideas to the post of Hemanshu Kumar -- but #1 #5 and this don't all agree.

            For example in #1 for intervention 112 the max of the preceding values are 0 1 not 1 1 For 114, there are no preceding values, so what is the rule then? For 115, why a missing value for one variable? I guess that's a typo.

            On Hemanshu's trick of reversing time temporarily see also e.g.this FAQ from 2012 https://www.stata.com/support/faqs/d...issing-values/

            rangestat is from SSC.

            I agree with Hemanshu in not seeing any virtue in string result variables.

            I am extending this to what are the previous maximum values? Results not wanted can be ignored.


            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input float(intervention vehicleno episodeno brake oil)
              . 1 1 1 0
            110 1 2 1 0
              . 2 1 0 1
              . 2 2 0 1
            112 2 3 1 1
              . 3 1 0 1
            113 3 2 0 1
            114 4 1 1 0
              . 4 2 1 0
              . 4 3 1 1
            115 4 4 1 1
            end
            
            rangestat (max) brake oil, int(episodeno . -1) by(vehicleno)
            
            list, sepby(vehicleno)
            
                 +-------------------------------------------------------------------+
                 | interv~n   vehicl~o   episod~o   brake   oil   brake_~x   oil_max |
                 |-------------------------------------------------------------------|
              1. |        .          1          1       1     0          .         . |
              2. |      110          1          2       1     0          1         0 |
                 |-------------------------------------------------------------------|
              3. |        .          2          1       0     1          .         . |
              4. |        .          2          2       0     1          0         1 |
              5. |      112          2          3       1     1          0         1 |
                 |-------------------------------------------------------------------|
              6. |        .          3          1       0     1          .         . |
              7. |      113          3          2       0     1          0         1 |
                 |-------------------------------------------------------------------|
              8. |      114          4          1       1     0          .         . |
              9. |        .          4          2       1     0          1         0 |
             10. |        .          4          3       1     1          1         0 |
             11. |      115          4          4       1     1          1         1 |
                 +-------------------------------------------------------------------+
            
            .

            Comment


            • #7
              Originally posted by Hemanshu Kumar View Post
              Code:
              gsort vehicleno -episodeno
              by vehicleno: gen intervention_num = sum(!missing(intervention))
              
              sort vehicleno intervention_num
              by vehicleno intervention_num: egen byte max_brake = max(brake)
              by vehicleno intervention_num: egen byte max_oil = max(oil)
              
              replace max_brake = . if missing(intervention)
              replace max_oil = . if missing(intervention)
              drop intervention_num
              
              sort vehicleno episodeno
              which produces:

              Code:
              . li , noobs sepby(vehicleno)
              
              +-------------------------------------------------------------------+
              | interv~n vehicl~o episod~o brake oil max_br~e max_oil |
              |-------------------------------------------------------------------|
              | . 1 1 1 0 . . |
              | 110 1 2 1 0 1 0 |
              |-------------------------------------------------------------------|
              | . 2 1 0 1 . . |
              | . 2 2 0 1 . . |
              | 112 2 3 1 1 1 1 |
              |-------------------------------------------------------------------|
              | . 3 1 0 1 . . |
              | 113 3 2 0 1 0 1 |
              |-------------------------------------------------------------------|
              | 114 4 1 1 0 1 0 |
              | . 4 2 1 0 . . |
              | . 4 3 1 1 . . |
              | 115 4 4 1 1 1 1 |
              +-------------------------------------------------------------------+
              Whilst I thank you for your code, the code will not work in the following circumstance (see vehicle 4, intervention 115) - I changed the last row value (see below) , just testing to see if it's full proof.

              As you can see in vehicle 4 (which is where my problem comes into play - see green ) - at visit 3 (episode3) - the vehicle develops a oil failure and so this should be counted as a 1 in the next subsequent visits, thus a MAX of 1 , but the person inputting the data forgets to accumulate this failure and in the last row - the person codes intervention 115 - as oil failure 0 when this should be 1 (seen preceeding visit)

              Thus I would like to calculate the MAX at each intervention accounting for the failures happing in the preceeding episodes belonging to the same vehicle where the values are binary 0 (absent) 1 (present)


              I am not sure if this is possible, perhaps one has to duplicate ? (I don't know...) but It seems like it's impossible to do .... As I understand one can calculate the max by each vehicleno but difficult to calculate the max at the intervention taking into consideration what is happening in the preceeding visits (epsiodenos)



              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input float(intervention vehicleno episodeno brake oil)
                . 1 1 1 0
              110 1 2 1 0
                . 2 1 0 1
                . 2 2 0 1
              112 2 3 1 1
                . 3 1 0 1
              113 3 2 0 1
              114 4 1 1 0
                . 4 2 1 0
                . 4 3 1 1
              115 4 4 1 0
              end

              Perhaps I'm not making myself clear

              I would like to consider different vehicle numbers as groups (see image in different coloured lines)
              I would then like to calculate the max failures at each intervention made officially at the car repair - which would take into consideration the previous visits (oil, brake) to the car repair where no intervention was made - coded as an episodeno

              The max values at each intervention are highlighted in yellow in my image
              Click image for larger version

Name:	Capture.PNG
Views:	1
Size:	29.7 KB
ID:	1684801







              Comment


              • #8
                The max for intervention 112 would be 1 1 (as this would be the max of vehicle no 2) from episode 1-3 —> see blue circle
                For 114 —> There are no preceeding values, as this is the first time the VEHICLE 4 when to the car mechanic (Episode no 1), so thus automatically the max would be 1 0

                For 115 —> The Max would be 1 1 —> as seen in the green circle, the max from visits 1-4 for vehicle 4.


                As mentioned in my post #7, I have tried to explain what I would like to obtain .


                However, although I myself have managed to calculate the max at the intervention, I have not managed to calculate the MAX at the intervention which takes the max from the preceeding values for each vehicleno. THIS I’m afraid has been challenging, and perhaps I may need to duplicate to analyse them separaetly? Unless someone can present another way of doing this ?






                Comment


                • #9
                  There are crossing posts here. Also, one of my points in #6 was made previously by Ken Chui in #2. Sorry that I missed this somehow, perhaps a matter of how much one can read at any one time on a laptop.

                  But if the interest is in preceding (*) values -- meaning previous values -- there is nothing "automatic" about what you are calculating for intervention 114.

                  #6 presents systematic code for the maximum values over preceding observations. It seems to correspond to what you ask for in

                  the MAX at the intervention which takes the max from the preceeding values for each vehicleno
                  (*) The correct spellings are precede, preceding but succeed, succeeding, and that makes no sense that I can imagine even to people whose first language in English!

                  Comment


                  • #10
                    Thanks for your reply.

                    with regards to syntax

                    rangestat (max) brake oil, int(episodeno . -1) by(vehicleno) with regards to: int(episodeno . -1)

                    integer - means to run to the nearest figure
                    but I don’t understand why you include missing . And -1 ?

                    Comment


                    • #11
                      Denise Vella first, I think a basic clarification: my interpretation has been that when you say "preceding values", you really mean the values before but also including the current episode. This would answer some of the questions and comments in #2, #6 and #9.

                      Second, you say that the code would not work when we change the oil observation for intervention #115 to 0. But in fact, if you make that change and run the code I suggested in #5, it still produces a max_oil value of 1 for intervention #115, since it is taking the maximum over preceding (and current) values, including for episode no. 3 for that vehicle.

                      Third, however your language in #8 suggests that you want the max to be taken over ALL preceding (and current) values, not just those after the previous intervention, since you say that for vehicle 4, you want the maximum to be taken over episodes 1-4. This contradicts what you said in #1, where you said for vehicle 4, you want the maximum over episodes 2-4. I'm not sure what it is that you actually want.

                      If you want the maximum over ALL preceding values and the current value, the code can be simplified to the following:

                      Code:
                      sort vehicleno episodeno
                      
                      by vehicleno: gen byte tot_brake = sum(brake)
                      by vehicleno: gen byte tot_oil = sum(oil)
                      
                      gen byte max_brake = (tot_brake > 0) if !missing(intervention)
                      gen byte max_oil = (tot_oil > 0) if !missing(intervention)
                      drop tot_brake tot_oil
                      I changed your example a little bit to show how this would work:

                      Code:
                      . list , noobs sepby(vehicleno)
                        +-------------------------------------------------------------------+
                        | interv~n   vehicl~o   episod~o   brake   oil   max_br~e   max_oil |
                        |-------------------------------------------------------------------|
                        |        .          1          1       1     0          .         . |
                        |      110          1          2       1     0          1         0 |
                        |-------------------------------------------------------------------|
                        |        .          2          1       0     1          .         . |
                        |        .          2          2       0     1          .         . |
                        |      112          2          3       1     1          1         1 |
                        |-------------------------------------------------------------------|
                        |        .          3          1       0     1          .         . |
                        |      113          3          2       0     1          0         1 |
                        |-------------------------------------------------------------------|
                        |      114          4          1       1     1          1         1 |
                        |        .          4          2       1     0          .         . |
                        |        .          4          3       1     0          .         . |
                        |      115          4          4       1     0          1         1 |
                        +-------------------------------------------------------------------+
                      Last edited by Hemanshu Kumar; 08 Oct 2022, 07:49.

                      Comment


                      • #12
                        The answer to #10 lies in the help of rangestat which explains rules for its interval() option, such that . -1 means here extending back from the immediate previous value as far back as possible. If preceding for you also includes the present value, then -1 needs to be changed to 0.

                        Comment


                        • #13
                          Originally posted by Hemanshu Kumar View Post
                          Denise Vella first, I think a basic clarification: my interpretation has been that when you say "preceding values", you really mean the values before but also including the current episode. This would answer some of the questions and comments in #2, #6 and #9.

                          Second, you say that the code would not work when we change the oil observation for intervention #115 to 0. But in fact, if you make that change and run the code I suggested in #5, it still produces a max_oil value of 1 for intervention #115, since it is taking the maximum over preceding (and current) values, including for episode no. 3 for that vehicle.

                          Third, however your language in #8 suggests that you want the max to be taken over ALL preceding (and current) values, not just those after the previous intervention, since you say that for vehicle 4, you want the maximum to be taken over episodes 1-4. This contradicts what you said in #1, where you said for vehicle 4, you want the maximum over episodes 2-4. I'm not sure what it is that you actually want.

                          If you want the maximum over ALL preceding values and the current value, the code can be simplified to the following:

                          Code:
                          sort vehicleno episodeno
                          
                          by vehicleno: gen byte tot_brake = sum(brake)
                          by vehicleno: gen byte tot_oil = sum(oil)
                          
                          gen byte max_brake = (tot_brake > 0) if !missing(intervention)
                          gen byte max_oil = (tot_oil > 0) if !missing(intervention)
                          drop tot_brake tot_oil
                          I changed your example a little bit to show how this would work:

                          Code:
                          . list , noobs sepby(vehicleno)
                          +-------------------------------------------------------------------+
                          | interv~n vehicl~o episod~o brake oil max_br~e max_oil |
                          |-------------------------------------------------------------------|
                          | . 1 1 1 0 . . |
                          | 110 1 2 1 0 1 0 |
                          |-------------------------------------------------------------------|
                          | . 2 1 0 1 . . |
                          | . 2 2 0 1 . . |
                          | 112 2 3 1 1 1 1 |
                          |-------------------------------------------------------------------|
                          | . 3 1 0 1 . . |
                          | 113 3 2 0 1 0 1 |
                          |-------------------------------------------------------------------|
                          | 114 4 1 1 1 1 1 |
                          | . 4 2 1 0 . . |
                          | . 4 3 1  0 . . |
                          | 115 4 4 1 0 1 1 |
                          +-------------------------------------------------------------------+
                          Apologies for being dumb, but I like to understand the syntax rather than just getting a solution

                          some questions

                          1. Why do you ask stata to generate a sum of all the brake values ?
                          by vehicleno: gen byte tot_brake = sum(brake) 2. How does the code generate the max of the current value at the intervention and including PREVIOUS values of the same vehicle number, when this is telling stata to generate a variable which is the maximum of the new ‘summed variable you generated in 1.

                          And thus a summed column can contain the summation of many 1s therefore (2 or 3 or 4) for eg vehicle 4 - episode 4- brake - at intervention 115 - the sum(brake) would be 4. How did this become 1 in your results ?

                          gen byte max_brake = (tot_brake > 0) if !missing(intervention)

                          Comment


                          • #14
                            Originally posted by Denise Vella View Post
                            1. Why do you ask stata to generate a sum of all the brake values ?

                            by vehicleno: gen byte tot_brake = sum(brake)
                            The sum() function gives the cumulative sum, in this case of all brake values for that vehicle number, i.e. the preceding values and the current value. Notice that tot_brake will thus be zero only if there have been no brake failures, otherwise it will take on a positive value. I use this in the next part.


                            2. How does the code generate the max of the current value at the intervention and including PREVIOUS values of the same vehicle number, when this is telling stata to generate a variable which is the maximum of the new ‘summed variable you generated in 1.
                            The line
                            Code:
                            gen byte max_brake = (tot_brake > 0) if !missing(intervention)
                            now looks at whether tot_brake is positive or not, and returns a value of 1 if it is, and returns 0 if it is zero, i.e. it is evaluating the logical operation "is tot_brake positive?", and returning 1 or 0 depending on true/false. In other words, it takes on a value of 1 if there have been any brake failures in the current or past episodes for that vehicle number, else it takes a value 0. Also, it makes sure to take any value only for an intervention observation, which is what you seemed to want.

                            Comment


                            • #15
                              Originally posted by Hemanshu Kumar View Post

                              The line
                              Code:
                              gen byte max_brake = (tot_brake > 0) if !missing(intervention)
                              gen --> generates a variable (1)
                              byte --> type of variable where the integer may have a value of -127 to 100
                              max_brake --> name of the new variable in (1) which would consist of :

                              (tot_brake >0) --> the value calculated from the summation of all brake values by vehicle no ; for which the values of intervention are not missing represented in this part of the code: if !missing(intervention)


                              So the question still remains - if we have asked STATA to calculate the sum of all brake values eg. for vehicle 4, episodes 1-4, as it had a brake failure in each recorded episoded - wouldn't the sum be 4 in tot_brake

                              Part A
                              Therefore how did the code:

                              Code:
                              gen byte max_brake = (tot_brake > 0) if !missing(intervention)
                              Get a max of 1? Which is what I don't understand.

                              Part B
                              What Dr Hemanshu Kumar says here about the above code:
                              now looks at whether tot_brake is positive or not, and returns a value of 1 if it is, and returns 0 if it is zero, i.e. it is evaluating the logical operation "is tot_brake positive?", and returning 1 or 0 depending on true/false. In other words, it takes on a value of 1 if there have been any brake failures in the current or past episodes for that vehicle number, else it takes a value 0.

                              How did this logical explanation (Part B) come about when what I see here written in the syntax, I see it telling stata this (Part A) rather than the above? For Part B I would alternatively write it like this (see below)

                              gen max_brake = 0
                              replace max_brake = 1 if tot_brake > 0 if !missing(intervention)

                              Comment

                              Working...
                              X