Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a new variable in descending order of a date variable to replace an existing variable that isnot ordered based on the date variable

    In a given year, an organization has several trip information. Each trip for an organization in a year can be repeated. Below, I have provided the trips for Organization_Year == 1 and Organization_Year == 2. When Organization_Year == 2, the trips do not start from 1 but continue from the last trip of Organization_Year 1.
    Within the dataset, there are two date variables, Date1 and Date2. In most cases, these two date variables are the same, although occasionally they differ. The same trip does not have a different Date 1variable. The same trip does not also have a different date2 variable.
    My objective is to create two additional trip variables for each organization_year variable based on the Trip Variable. The first variable is a new_Trip1 ordered in descending order of Date1, while the second variable will be another new_Trip2 variable ordered in descending order of Date2.
    I have already presented the date in dataex after making it in descending order of date1 variable using (gsort Organization_Year -Date1).



    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(Organization_Year Trips) double(Date1 Date2)
    1 15 15270 15270
    1  9 15267 15267
    1  9 15267 15267
    1  8 15259 15259
    1  8 15259 15259
    1  8 15259 15259
    1 16 15256 15256
    1 16 15256 15256
    1 14 15244 15243
    1 14 15244 15243
    1 13 15135 15135
    1 13 15135 15135
    1  1 15133 15133
    1  1 15133 15133
    1  1 15133 15133
    1 17 15133 15133
    1  1 15133 15133
    1 17 15133 15133
    1 17 15133 15133
    1 17 15133 15133
    1 12 15131 15131
    1 12 15131 15131
    1 12 15131 15131
    1 12 15131 15131
    1  2 15050 15050
    1  5 15034 15034
    1  4 15029 15029
    1  3 15025 15025
    1 11 15004 15004
    1 11 15004 15004
    1 11 15004 15004
    1 11 15004 15004
    1 11 15004 15004
    1  6 14997 14997
    1  6 14997 14997
    1  6 14997 14997
    1  7 14990 14990
    1  7 14990 14990
    1  7 14990 14990
    1 10 14983 14983
    1 10 14983 14983
    1 10 14983 14983
    1 10 14983 14983
    2 24 15289 15289
    2 26 15283 15283
    2 26 15283 15283
    2 26 15283 15283
    2 25 15276 15276
    2 32 15272 15272
    2 32 15272 15272
    2 31 15254 15253
    2 31 15254 15253
    2 33 15250 15250
    2 27 15140 15140
    2 28 15140 15140
    2 29 15140 15140
    2 30 15140 15140
    2 19 15137 15137
    2 18 15137 15137
    2 18 15137 15137
    2 19 15137 15137
    2 19 15137 15137
    2 18 15137 15137
    2 22 15054 15054
    2 21 15045 15045
    2 20 15043 15043
    2 23 15019 15019
    end
    format %td Date1
    format %td Date2
    ------------------ copy up to and including the previous line ------------------

    Listed 67 out of 67 observations

    .
    end of do-file



    Thank you and looking forward to your suggestions
    Last edited by tig som; 16 May 2023, 08:20.

  • #2
    In my attempt to find a solution, I explored the Stata help documentation and came across information regarding the "egenmore" command, which pertains to the concept of "group2.

    ““Groups of varlist will have values 1 upwards according to their values on the results of a specified egen_call. For example, group2(rep78) sort(mean(mpg))
    will produce a variable such that the group of rep78 with the lowest mean of mpg will have value 1, that with the second lowest mean will have value 2, and so forth.”

    In my initial attempt, I tried with the following command:

    egen Newtrip1 = group2(Trips) sort(Date1)

    However, this command did not work. I don’t think I correctly specified the egen command for the "group2" function. Unfortunately, the Stata help documentation did not provide any specific examples or guidance on how to utilize the "egenmore" command for group2 with the additional option of "sort()".
    If you have an alternative solution to address my first question, I would greatly appreciate your suggestions.

    Comment


    • #3
      While attempting to find a solution, I executed the following command:

      egen Reverse_New_trip1 = group(Organization_Year Date1 Trips)

      gsort Organization_Year -newTrips, generate(New_trip1)


      In my dataset, I encountered no issues since each trip is associated with only one Date1 value. Therefore, creating the New_trip1 groups did not result in more groups than the original trips group. However, if each trip is linked to multiple date variables, the groupings of New_trip1 and trips will differ. I would appreciate your suggestions on how to address this scenario if it arises. Also, it would be nice to hear if you have any comments on the solution I took. Thank you
      Last edited by tig som; 17 May 2023, 01:22.

      Comment


      • #4
        #2 contradicts itself as you cite from the help an example of how to use that egen code and then state that there is no such example. As stated in the text you cite, the sort() option must be a separate call to egen naming an egen function, and your application doesn't do that.

        I wrote that extra egen function sometime in or before 2000 and I'd forgotten about it, but what it does is now better done by myaxis from the Stata Journal. However, I don't think either the egen function or that command has any bearing on what you want to do.

        I can't follow what you're trying to do with #3.

        Backing up, and thanks for the data example, I don't really understand what you're asking as your data already appear sorted in precisely the form you want. But this code may indicate some technique.

        Code:
        ​​​​​​​* Example generated by -dataex-. For more info, type help dataex
        clear
        input float(Organization_Year Trips) double(Date1 Date2)
        1 15 15270 15270
        1  9 15267 15267
        1  9 15267 15267
        1  8 15259 15259
        1  8 15259 15259
        1  8 15259 15259
        1 16 15256 15256
        1 16 15256 15256
        1 14 15244 15243
        1 14 15244 15243
        1 13 15135 15135
        1 13 15135 15135
        1  1 15133 15133
        1  1 15133 15133
        1  1 15133 15133
        1 17 15133 15133
        1  1 15133 15133
        1 17 15133 15133
        1 17 15133 15133
        1 17 15133 15133
        1 12 15131 15131
        1 12 15131 15131
        1 12 15131 15131
        1 12 15131 15131
        1  2 15050 15050
        1  5 15034 15034
        1  4 15029 15029
        1  3 15025 15025
        1 11 15004 15004
        1 11 15004 15004
        1 11 15004 15004
        1 11 15004 15004
        1 11 15004 15004
        1  6 14997 14997
        1  6 14997 14997
        1  6 14997 14997
        1  7 14990 14990
        1  7 14990 14990
        1  7 14990 14990
        1 10 14983 14983
        1 10 14983 14983
        1 10 14983 14983
        1 10 14983 14983
        2 24 15289 15289
        2 26 15283 15283
        2 26 15283 15283
        2 26 15283 15283
        2 25 15276 15276
        2 32 15272 15272
        2 32 15272 15272
        2 31 15254 15253
        2 31 15254 15253
        2 33 15250 15250
        2 27 15140 15140
        2 28 15140 15140
        2 29 15140 15140
        2 30 15140 15140
        2 19 15137 15137
        2 18 15137 15137
        2 18 15137 15137
        2 19 15137 15137
        2 19 15137 15137
        2 18 15137 15137
        2 22 15054 15054
        2 21 15045 15045
        2 20 15043 15043
        2 23 15019 15019
        end
        format %td Date1
        format %td Date2
        
        list if Date1 != Date2 
        
        levelsof Organization_Year, local(groups)
        
        gsort Organization_Year -Date1 
        
        gen NewDate2 = . 
        gen wanted = 0 
        foreach g of local groups { 
            replace wanted = Organization_Year == `g'
            mata : work = st_data(., "Date2", "wanted")
            mata : _sort(work, -1)
            mata : st_store(., "NewDate2", "wanted", work)
        } 
        format NewDate2 %td 
        drop wanted 
        list, sepby(Organization_Year)
        
        
            +------------------------------------------------------+
             | Organi~r   Trips       Date1       Date2    NewDate2 |
             |------------------------------------------------------|
          1. |        1      15   22oct2001   22oct2001   22oct2001 |
          2. |        1       9   19oct2001   19oct2001   19oct2001 |
          3. |        1       9   19oct2001   19oct2001   19oct2001 |
          4. |        1       8   11oct2001   11oct2001   11oct2001 |
          5. |        1       8   11oct2001   11oct2001   11oct2001 |
          6. |        1       8   11oct2001   11oct2001   11oct2001 |
          7. |        1      16   08oct2001   08oct2001   08oct2001 |
          8. |        1      16   08oct2001   08oct2001   08oct2001 |
          9. |        1      14   26sep2001   25sep2001   25sep2001 |
         10. |        1      14   26sep2001   25sep2001   25sep2001 |
         11. |        1      13   09jun2001   09jun2001   09jun2001 |
         12. |        1      13   09jun2001   09jun2001   09jun2001 |
         13. |        1       1   07jun2001   07jun2001   07jun2001 |
         14. |        1       1   07jun2001   07jun2001   07jun2001 |
         15. |        1      17   07jun2001   07jun2001   07jun2001 |
         16. |        1      17   07jun2001   07jun2001   07jun2001 |
         17. |        1      17   07jun2001   07jun2001   07jun2001 |
         18. |        1       1   07jun2001   07jun2001   07jun2001 |
         19. |        1      17   07jun2001   07jun2001   07jun2001 |
         20. |        1       1   07jun2001   07jun2001   07jun2001 |
         21. |        1      12   05jun2001   05jun2001   05jun2001 |
         22. |        1      12   05jun2001   05jun2001   05jun2001 |
         23. |        1      12   05jun2001   05jun2001   05jun2001 |
         24. |        1      12   05jun2001   05jun2001   05jun2001 |
         25. |        1       2   16mar2001   16mar2001   16mar2001 |
         26. |        1       5   28feb2001   28feb2001   28feb2001 |
         27. |        1       4   23feb2001   23feb2001   23feb2001 |
         28. |        1       3   19feb2001   19feb2001   19feb2001 |
         29. |        1      11   29jan2001   29jan2001   29jan2001 |
         30. |        1      11   29jan2001   29jan2001   29jan2001 |
         31. |        1      11   29jan2001   29jan2001   29jan2001 |
         32. |        1      11   29jan2001   29jan2001   29jan2001 |
         33. |        1      11   29jan2001   29jan2001   29jan2001 |
         34. |        1       6   22jan2001   22jan2001   22jan2001 |
         35. |        1       6   22jan2001   22jan2001   22jan2001 |
         36. |        1       6   22jan2001   22jan2001   22jan2001 |
         37. |        1       7   15jan2001   15jan2001   15jan2001 |
         38. |        1       7   15jan2001   15jan2001   15jan2001 |
         39. |        1       7   15jan2001   15jan2001   15jan2001 |
         40. |        1      10   08jan2001   08jan2001   08jan2001 |
         41. |        1      10   08jan2001   08jan2001   08jan2001 |
         42. |        1      10   08jan2001   08jan2001   08jan2001 |
         43. |        1      10   08jan2001   08jan2001   08jan2001 |
             |------------------------------------------------------|
         44. |        2      24   10nov2001   10nov2001   10nov2001 |
         45. |        2      26   04nov2001   04nov2001   04nov2001 |
         46. |        2      26   04nov2001   04nov2001   04nov2001 |
         47. |        2      26   04nov2001   04nov2001   04nov2001 |
         48. |        2      25   28oct2001   28oct2001   28oct2001 |
         49. |        2      32   24oct2001   24oct2001   24oct2001 |
         50. |        2      32   24oct2001   24oct2001   24oct2001 |
         51. |        2      31   06oct2001   05oct2001   05oct2001 |
         52. |        2      31   06oct2001   05oct2001   05oct2001 |
         53. |        2      33   02oct2001   02oct2001   02oct2001 |
         54. |        2      27   14jun2001   14jun2001   14jun2001 |
         55. |        2      29   14jun2001   14jun2001   14jun2001 |
         56. |        2      28   14jun2001   14jun2001   14jun2001 |
         57. |        2      30   14jun2001   14jun2001   14jun2001 |
         58. |        2      19   11jun2001   11jun2001   11jun2001 |
         59. |        2      19   11jun2001   11jun2001   11jun2001 |
         60. |        2      19   11jun2001   11jun2001   11jun2001 |
         61. |        2      18   11jun2001   11jun2001   11jun2001 |
         62. |        2      18   11jun2001   11jun2001   11jun2001 |
         63. |        2      18   11jun2001   11jun2001   11jun2001 |
         64. |        2      22   20mar2001   20mar2001   20mar2001 |
         65. |        2      21   11mar2001   11mar2001   11mar2001 |
         66. |        2      20   09mar2001   09mar2001   09mar2001 |
         67. |        2      23   13feb2001   13feb2001   13feb2001 |
             +------------------------------------------------------+

        Comment


        • #5
          I am very sorry for not being clear. Thank you very much for your answer, Nick cox.

          However, my main question is :
          I want to create a variable called "New_trip1" that is derived from my existing "Trip" variable. The key difference is that the values of "New_trip1" should be ordered in descending order of the "Date1" variable.
          Thus, "New_trip1" is group of a “Trip” variable with the oldest "Date1" will have a value of 1, then the second oldest "Date1" will have a value of 2, and so forth.

          I made a mistake in Comment #3. I edited it now.
          However, in the following -date example I used the following command.

          gsort Organization_Year -Date1 Trips, gen (New_trip1)

          "New_trip1" and "Trip" have the same number of categories (number of distinct observations) equal to 33. That is what I want, and it is pretty much good. Since each value of "Trip" is related to only one "Date1" value, the command I used (i.e. gsort Organization_Year -Date1 Trips, gen(New_trip1)) does not create more groups in the "New_trip1" variable than the existing groups in the "Trips" variable. The key difference between the two variables is the order of the categories, with "New_trip1" being ordered in the desired descending order based on the "Date1" variable.

          However, if a single value of “Trip” variable were associated with multiple values of "Date1" values, complications would arise as the number of categories in "New_trip1" and "Trip" would differ. I am still searching for a solution if such a situation arises. I hope I made my self-clearer now, and my solution is workable, right? I would appreciate any suggestions.
          Thank you.



          Below, I present what the old variables and my new variable "New_Trip1" looks like



          ----------------------- copy starting from the next line -----------------------
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input float(Organization_Year Trips) double(Date1 Date2) byte New_trip1
          1 15 15270 15270  1
          1  9 15267 15267  2
          1  9 15267 15267  2
          1  8 15259 15259  3
          1  8 15259 15259  3
          1  8 15259 15259  3
          1 16 15256 15256  4
          1 16 15256 15256  4
          1 14 15244 15243  5
          1 14 15244 15243  5
          1 13 15135 15135  6
          1 13 15135 15135  6
          1  1 15133 15133  7
          1  1 15133 15133  7
          1  1 15133 15133  7
          1  1 15133 15133  7
          1 17 15133 15133  8
          1 17 15133 15133  8
          1 17 15133 15133  8
          1 17 15133 15133  8
          1 12 15131 15131  9
          1 12 15131 15131  9
          1 12 15131 15131  9
          1 12 15131 15131  9
          1  2 15050 15050 10
          1  5 15034 15034 11
          1  4 15029 15029 12
          1  3 15025 15025 13
          1 11 15004 15004 14
          1 11 15004 15004 14
          1 11 15004 15004 14
          1 11 15004 15004 14
          1 11 15004 15004 14
          1  6 14997 14997 15
          1  6 14997 14997 15
          1  6 14997 14997 15
          1  7 14990 14990 16
          1  7 14990 14990 16
          1  7 14990 14990 16
          1 10 14983 14983 17
          1 10 14983 14983 17
          1 10 14983 14983 17
          1 10 14983 14983 17
          2 24 15289 15289 18
          2 26 15283 15283 19
          2 26 15283 15283 19
          2 26 15283 15283 19
          2 25 15276 15276 20
          2 32 15272 15272 21
          2 32 15272 15272 21
          2 31 15254 15253 22
          2 31 15254 15253 22
          2 33 15250 15250 23
          2 27 15140 15140 24
          2 28 15140 15140 25
          2 29 15140 15140 26
          2 30 15140 15140 27
          2 18 15137 15137 28
          2 18 15137 15137 28
          2 18 15137 15137 28
          2 19 15137 15137 29
          2 19 15137 15137 29
          2 19 15137 15137 29
          2 22 15054 15054 30
          2 21 15045 15045 31
          2 20 15043 15043 32
          2 23 15019 15019 33
          end
          format %td Date1
          format %td Date2
          ------------------ copy up to and including the previous line ------------------

          Listed 67 out of 67 observations
          Last edited by tig som; 19 May 2023, 09:19.

          Comment


          • #6
            Originally posted by tig som View Post
            While attempting to find a solution, I executed the following command:

            egen Reverse_New_trip1 = group(Organization_Year Date1 Trips)

            gsort Organization_Year -Reverse_New_trip1, generate(New_trip1)


            In my dataset, I encountered no issues since each trip is associated with only one Date1 value. Therefore, creating the New_trip1 groups did not result in more groups than the original trips group. However, if each trip is linked to multiple date variables, the groupings of New_trip1 and trips will differ. I would appreciate your suggestions on addressing this scenario if it arises. Also, it would be nice to hear if you have any comments on the solution I took. Thank you
            correction for comment #3

            Comment


            • #7
              Sorry, but I can't hold all these words in my head at once, even focusing on a single post. So, I will just give one suggestion: a pattern that may help once you have observations in the desired sort order is just

              Code:
              gen wanted = sum(x != x[_n-1])
              which starts at 1 (usually) and jumps up by 1 every time x changes.

              Comment


              • #8
                Thank you for the suggestion, but the two questions are indeed different. The variables "New_trip" in this post and "New_trip" in the other post refer to completely different things. However, both "organizations_Year" and "Trip" variables represent similar information. I apologize for any confusion caused by using the same name, "New_trip”
                When I used your suggestion ==> bysort Organization_Year: gen wanted2 = sum (Trip!= Trip[_n-1]), I got the answer to my question in the other post. Thus, this answered my question for the other post, and thank you very much for that. If possible, I would be happy to get any suggestions for this post too.

                Kind regards,

                Comment


                • #9
                  I don't know.

                  Note I didn't suggest the prefix by Organization_Year: precisely because your count in #5 continues increasing even as Organization_Year changes.

                  Bailing out here. I hope someone else can help more.

                  Comment


                  • #10
                    Thank you for your remark again. I used it because my aim for the question in the other post is to create the “New_Trip” variable, as stated in the post. Generating a group of a variable based on (by) another variable - Statalist comment Number #1. When I combined your suggestion with an additional prefix by Organization_Year, it answered my question posted on Generating a group of a variable based on (by) another variable - Statalist comment Number #1


                    ----------------------- copy starting from the next line -----------------------
                    Code:
                    * Example generated by -dataex-. For more info, type help dataex
                    clear
                    input float(Organization_Year Trips New_Trip wanted)
                    1  1 1 1
                    1  1 1 1
                    1  1 1 1
                    1  1 1 1
                    1  2 2 2
                    1  3 3 3
                    1  4 4 4
                    1  5 5 5
                    1  6 6 6
                    1  6 6 6
                    1  6 6 6
                    2  7 1 1
                    2  7 1 1
                    2  7 1 1
                    2  8 2 2
                    2  8 2 2
                    2  8 2 2
                    2  9 3 3
                    2  9 3 3
                    2 10 4 4
                    2 10 4 4
                    2 10 4 4
                    2 10 4 4
                    end
                    ------------------ copy up to and including the previous line ------------------

                    Listed 23 out of 23 observations

                    kind regards,

                    Comment

                    Working...
                    X