Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • "Identifying and Categorizing Households Based on Cooking Fuel Usage Patterns in an Unbalanced Panel Dataset"

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float year long unique_id double cooking_fuel
    2015 10001001  3
    2016 10001001  3
    2017 10001001  3
    2018 10001001  3
    2019 10001001 10
    2020 10001001 10
    2018 10001002  3
    2019 10001002  2
    2017 10001003  1
    2018 10001003  3
    2019 10001003  2
    2020 10001003  2
    2021 10001003  2
    2015 10001004  3
    2019 10001004  2
    2020 10001004  2
    2021 10001004  2
    2018 10001006  3
    2019 10001006 10
    2020 10001006 10
    2021 10001006  2
    2017 10001007  3
    2018 10001007  3
    2019 10001007  2
    2020 10001007  2
    2021 10001007  2
    2018 10001008  6
    2019 10001008 10
    2020 10001008 10
    2021 10001008  8
    end




    Hello everyone,

    In my unbalanced panel dataset, I have a categorical variable named cooking_fuel representing different types of cooking fuels. Between 2015 and 2018, the values 5 or 6 denote clean fuel. Similarly, for the years 2019-2020, values 8, 9, 10, or 11 indicate clean fuel. Finally, in 2021, clean fuel is represented by values 6, 7, 8, or 9. Additionally, the variable ranges from 1 to 11, covering all possible fuel types.

    Now, I aim to categorize households (identified by unique_id) into two groups while dropping others:
    1. The first group comprises households that never used clean energy throughout any year in the dataset. (Easy to make)
    2. The second group consists of households that started using clean energy from 2016 onwards but never reverted back to dirty energy (excluding clean energy) thereafter.
    It's important to note that some households may change their energy usage patterns over time, while others may enter the dataset after 2016, 2017, or later years. I tried to use a loop but ultimately failed to get my desired result.

    Thank you.

  • #2
    Originally posted by Sk Moniruzzaman View Post
    [*]The second group consists of households that started using clean energy from 2016 onwards but never reverted back to dirty energy (excluding clean energy) thereafter.
    .
    Just create a clean energy indicator and then check whether it is equal to 1 in all the years considered.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float year long unique_id double cooking_fuel
    2015 10001001  3
    2016 10001001  3
    2017 10001001  3
    2018 10001001  3
    2019 10001001 10
    2020 10001001 10
    2018 10001002  3
    2019 10001002  2
    2017 10001003  1
    2018 10001003  3
    2019 10001003  2
    2020 10001003  2
    2021 10001003  2
    2015 10001004  3
    2019 10001004  2
    2020 10001004  2
    2021 10001004  2
    2018 10001006  3
    2019 10001006 10
    2020 10001006 10
    2021 10001006  2
    2017 10001007  3
    2018 10001007  3
    2019 10001007  2
    2020 10001007  2
    2021 10001007  2
    2018 10001008  6
    2019 10001008 10
    2020 10001008 10
    2021 10001008  8
    end
    
    gen clean = inlist(cooking_fuel, 5, 6) & inrange(year, 2015, 2018)|inrange(cooking_fuel, 8, 11) & inrange(year, 2019, 2020)|inrange(cooking_fuel, 6, 9) & year==2021
    bys unique_id (year): egen wanted=min(cond(!inrange(year, 2016, 2021), . , clean))
    Res.:

    Code:
    
    . l, sepby(unique_id)
    
         +---------------------------------------------+
         | year   unique~d   cookin~l   clean   wanted |
         |---------------------------------------------|
      1. | 2015   10001001          3       0        0 |
      2. | 2016   10001001          3       0        0 |
      3. | 2017   10001001          3       0        0 |
      4. | 2018   10001001          3       0        0 |
      5. | 2019   10001001         10       1        0 |
      6. | 2020   10001001         10       1        0 |
         |---------------------------------------------|
      7. | 2018   10001002          3       0        0 |
      8. | 2019   10001002          2       0        0 |
         |---------------------------------------------|
      9. | 2017   10001003          1       0        0 |
     10. | 2018   10001003          3       0        0 |
     11. | 2019   10001003          2       0        0 |
     12. | 2020   10001003          2       0        0 |
     13. | 2021   10001003          2       0        0 |
         |---------------------------------------------|
     14. | 2015   10001004          3       0        0 |
     15. | 2019   10001004          2       0        0 |
     16. | 2020   10001004          2       0        0 |
     17. | 2021   10001004          2       0        0 |
         |---------------------------------------------|
     18. | 2018   10001006          3       0        0 |
     19. | 2019   10001006         10       1        0 |
     20. | 2020   10001006         10       1        0 |
     21. | 2021   10001006          2       0        0 |
         |---------------------------------------------|
     22. | 2017   10001007          3       0        0 |
     23. | 2018   10001007          3       0        0 |
     24. | 2019   10001007          2       0        0 |
     25. | 2020   10001007          2       0        0 |
     26. | 2021   10001007          2       0        0 |
         |---------------------------------------------|
     27. | 2018   10001008          6       1        1 |
     28. | 2019   10001008         10       1        1 |
     29. | 2020   10001008         10       1        1 |
     30. | 2021   10001008          8       1        1 |
         +---------------------------------------------+
    
    
    
    .
    Last edited by Andrew Musau; 22 Feb 2024, 01:52.

    Comment


    • #3
      Thank you, Andrew.

      Your attempt was appreciated, but unfortunately, it does not fully meet my requirements. For instance, for unique_id 10001006, it switched back to dirty fuel in 2021. My criterion dictates that once a unique_id returns to using dirty fuel in any year, I need to remove the entire unique_id from the dataset. My objective is to divide the dataset into two groups: one consisting of households that never used clean energy in any year, and another comprising households that began using clean fuel according to the defined criteria but never reverted to dirty energy.

      Comment


      • #4
        Use the generated indicator to retrieve your sample.

        For instance, for unique_id 10001006, it switched back to dirty fuel in 2021.
        Sure, so the generated indicator ("wanted") is equal to 0 for this household.

        Comment


        • #5
          Code:
          . list unique_id year cooking_fuel wanted clean in 51/75, sepby(unique_id)
          
               +---------------------------------------------+
               | uniq~_id   year   cookin~l   wanted   clean |
               |---------------------------------------------|
           51. | 10001017   2015          1        0       0 |
           52. | 10001017   2018          3        0       0 |
           53. | 10001017   2019         10        0       1 |
           54. | 10001017   2020         10        0       1 |
           55. | 10001017   2021          8        0       1 |
               |---------------------------------------------|
           56. | 10001018   2018          6        1       1 |
           57. | 10001018   2019         10        1       1 |
           58. | 10001018   2020         10        1       1 |
           59. | 10001018   2021          8        1       1 |
               |---------------------------------------------|
           60. | 10001019   2015          1        0       0 |
           61. | 10001019   2018          6        0       1 |
           62. | 10001019   2019         10        0       1 |
           63. | 10001019   2020          2        0       0 |
           64. | 10001019   2021          8        0       1 |
               |---------------------------------------------|
           65. | 10001020   2018          3        0       0 |
           66. | 10001020   2019          2        0       0 |
           67. | 10001020   2020         10        0       1 |
           68. | 10001020   2021          2        0       0 |
               |---------------------------------------------|
           69. | 10001021   2015          2        0       0 |
           70. | 10001021   2016          6        0       1 |
           71. | 10001021   2017          3        0       0 |
           72. | 10001021   2018          3        0       0 |
           73. | 10001021   2019          8        0       1 |
           74. | 10001021   2020          2        0       0 |
           75. | 10001021   2021          2        0       0 |
               +---------------------------------------------+
          If you look at the result for 10001017, this household satisfies my condition because in 2019, it shifted to clean fuel but never returned to dirty fuel. The next household is also fine. However, for 10001019, this household shifted to clean fuel but again returned to dirty fuel. Therefore, I want to drop all households with the same issue. Similarly, for 10001020, this household also violates the condition.

          Comment


          • #6
            Originally posted by Sk Moniruzzaman View Post

            If you look at the result for 10001017, this household satisfies my condition because in 2019, it shifted to clean fuel but never returned to dirty fuel. The next household is also fine. However, for 10001019, this household shifted to clean fuel but again returned to dirty fuel. Therefore, I want to drop all households with the same issue. Similarly, for 10001020, this household also violates the condition.
            Your condition in #1 states

            households that started using clean energy from 2016 onwards
            So my code in #2 requires that the household be using clean energy for all observed years within this range.


            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input float year long unique_id double cooking_fuel
            2015 10001001  3
            2016 10001001  3
            2017 10001001  3
            2018 10001001  3
            2019 10001001 10
            2020 10001001 10
            2018 10001002  3
            2019 10001002  2
            2017 10001003  1
            2018 10001003  3
            2019 10001003  2
            2020 10001003  2
            2021 10001003  2
            2015 10001004  3
            2019 10001004  2
            2020 10001004  2
            2021 10001004  2
            2018 10001006  3
            2019 10001006 10
            2020 10001006 10
            2021 10001006  2
            2017 10001007  3
            2018 10001007  3
            2019 10001007  2
            2020 10001007  2
            2021 10001007  2
            2018 10001008  6
            2019 10001008 10
            2020 10001008 10
            2021 10001008  8
            end
            
            gen clean = inlist(cooking_fuel, 5, 6) & inrange(year, 2015, 2018)|inrange(cooking_fuel, 8, 11) & inrange(year, 2019, 2020)|inrange(cooking_fuel, 6, 9) & year==2021
            bys unique_id (year): gen sample= sum(clean)>=1 if year>=2016
            by unique_id: egen tag1= max(!clean & sample==1)
            by unique_id: egen tag2= max(clean & sample==1)
            gen wanted= tag2 & !tag1
            Res.:

            Code:
            . l, sepby(uniq)
            
                 +--------------------------------------------------------------------+
                 | year   unique~d   cookin~l   clean   sample   tag1   tag2   wanted |
                 |--------------------------------------------------------------------|
              1. | 2015   10001001          3       0        .      0      1        1 |
              2. | 2016   10001001          3       0        0      0      1        1 |
              3. | 2017   10001001          3       0        0      0      1        1 |
              4. | 2018   10001001          3       0        0      0      1        1 |
              5. | 2019   10001001         10       1        1      0      1        1 |
              6. | 2020   10001001         10       1        1      0      1        1 |
                 |--------------------------------------------------------------------|
              7. | 2018   10001002          3       0        0      0      0        0 |
              8. | 2019   10001002          2       0        0      0      0        0 |
                 |--------------------------------------------------------------------|
              9. | 2017   10001003          1       0        0      0      0        0 |
             10. | 2018   10001003          3       0        0      0      0        0 |
             11. | 2019   10001003          2       0        0      0      0        0 |
             12. | 2020   10001003          2       0        0      0      0        0 |
             13. | 2021   10001003          2       0        0      0      0        0 |
                 |--------------------------------------------------------------------|
             14. | 2015   10001004          3       0        .      0      0        0 |
             15. | 2019   10001004          2       0        0      0      0        0 |
             16. | 2020   10001004          2       0        0      0      0        0 |
             17. | 2021   10001004          2       0        0      0      0        0 |
                 |--------------------------------------------------------------------|
             18. | 2018   10001006          3       0        0      1      1        0 |
             19. | 2019   10001006         10       1        1      1      1        0 |
             20. | 2020   10001006         10       1        1      1      1        0 |
             21. | 2021   10001006          2       0        1      1      1        0 |
                 |--------------------------------------------------------------------|
             22. | 2017   10001007          3       0        0      0      0        0 |
             23. | 2018   10001007          3       0        0      0      0        0 |
             24. | 2019   10001007          2       0        0      0      0        0 |
             25. | 2020   10001007          2       0        0      0      0        0 |
             26. | 2021   10001007          2       0        0      0      0        0 |
                 |--------------------------------------------------------------------|
             27. | 2018   10001008          6       1        1      0      1        1 |
             28. | 2019   10001008         10       1        1      0      1        1 |
             29. | 2020   10001008         10       1        1      0      1        1 |
             30. | 2021   10001008          8       1        1      0      1        1 |
                 +--------------------------------------------------------------------+

            Comment


            • #7
              Get my desired result. Thank you Andrew for your help.

              Comment


              • #8
                Just another question. How I can mark those HH who use clean fuel all the observation year?

                Comment


                • #9
                  That is an FAQ: https://www.stata.com/support/faqs/d...t-occurrences/. Notably, when sorting on the clean fuel indicator, the first and last value need to be equal to one for such households.

                  Comment


                  • #10
                    Code:
                    * Example generated by -dataex-. For more info, type help dataex
                    clear
                    input float year long unique_id double cooking_fuel float(clean treated control firsttime lasttime)
                    2018 10001003  3 0 0 1    .    .
                    2019 10001003  2 0 0 1    .    .
                    2020 10001003  2 0 0 1    .    .
                    2021 10001003  2 0 0 1    .    .
                    2015 10001004  3 0 0 1    .    .
                    2019 10001004  2 0 0 1    .    .
                    2020 10001004  2 0 0 1    .    .
                    2021 10001004  2 0 0 1    .    .
                    2017 10001007  3 0 0 1    .    .
                    2018 10001007  3 0 0 1    .    .
                    2019 10001007  2 0 0 1    .    .
                    2020 10001007  2 0 0 1    .    .
                    2021 10001007  2 0 0 1    .    .
                    2018 10001008  6 1 1 0 2018 2021
                    2019 10001008 10 1 1 0 2018 2021
                    2020 10001008 10 1 1 0 2018 2021
                    end

                    Does it make any sense? I can't drop this 10001008 id.

                    Comment


                    • #11
                      Have you looked at the linked page?

                      Code:
                      bys uniq (clean): gen tag= clean[1] & clean[_N]
                      keep if tag
                      ADDED IN EDIT: Apologies, the link in #9 is not the correct one. Instead, I meant https://www.stata.com/support/faqs/d...ions-in-group/.
                      Last edited by Andrew Musau; 07 Mar 2024, 07:37.

                      Comment


                      • #12
                        Thanks a lot Andrew. I already solved the problem. The code is similar to what you provided in #11. Thank you for your cooperation.

                        Comment

                        Working...
                        X