"Identifying and Categorizing Households Based on Cooking Fuel Usage Patterns in an Unbalanced Panel Dataset"

Sk Moniruzzaman

Join Date: Jan 2024

Posts: 18
#1

"Identifying and Categorizing Households Based on Cooking Fuel Usage Patterns in an Unbalanced Panel Dataset"

22 Feb 2024, 00:24

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float year long unique_id double cooking_fuel 2015 10001001 3 2016 10001001 3 2017 10001001 3 2018 10001001 3 2019 10001001 10 2020 10001001 10 2018 10001002 3 2019 10001002 2 2017 10001003 1 2018 10001003 3 2019 10001003 2 2020 10001003 2 2021 10001003 2 2015 10001004 3 2019 10001004 2 2020 10001004 2 2021 10001004 2 2018 10001006 3 2019 10001006 10 2020 10001006 10 2021 10001006 2 2017 10001007 3 2018 10001007 3 2019 10001007 2 2020 10001007 2 2021 10001007 2 2018 10001008 6 2019 10001008 10 2020 10001008 10 2021 10001008 8 end

Hello everyone,

In my unbalanced panel dataset, I have a categorical variable named cooking_fuel representing different types of cooking fuels. Between 2015 and 2018, the values 5 or 6 denote clean fuel. Similarly, for the years 2019-2020, values 8, 9, 10, or 11 indicate clean fuel. Finally, in 2021, clean fuel is represented by values 6, 7, 8, or 9. Additionally, the variable ranges from 1 to 11, covering all possible fuel types.

Now, I aim to categorize households (identified by unique_id) into two groups while dropping others:
The first group comprises households that never used clean energy throughout any year in the dataset. (Easy to make)

The second group consists of households that started using clean energy from 2016 onwards but never reverted back to dirty energy (excluding clean energy) thereafter.

It's important to note that some households may change their energy usage patterns over time, while others may enter the dataset after 2016, 2017, or later years. I tried to use a loop but ultimately failed to get my desired result.

Thank you.
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 9945

22 Feb 2024, 01:47

Originally posted by Sk Moniruzzaman View Post

[*]The second group consists of households that started using clean energy from 2016 onwards but never reverted back to dirty energy (excluding clean energy) thereafter.
.

Just create a clean energy indicator and then check whether it is equal to 1 in all the years considered.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float year long unique_id double cooking_fuel
2015 10001001  3
2016 10001001  3
2017 10001001  3
2018 10001001  3
2019 10001001 10
2020 10001001 10
2018 10001002  3
2019 10001002  2
2017 10001003  1
2018 10001003  3
2019 10001003  2
2020 10001003  2
2021 10001003  2
2015 10001004  3
2019 10001004  2
2020 10001004  2
2021 10001004  2
2018 10001006  3
2019 10001006 10
2020 10001006 10
2021 10001006  2
2017 10001007  3
2018 10001007  3
2019 10001007  2
2020 10001007  2
2021 10001007  2
2018 10001008  6
2019 10001008 10
2020 10001008 10
2021 10001008  8
end

gen clean = inlist(cooking_fuel, 5, 6) & inrange(year, 2015, 2018)|inrange(cooking_fuel, 8, 11) & inrange(year, 2019, 2020)|inrange(cooking_fuel, 6, 9) & year==2021
bys unique_id (year): egen wanted=min(cond(!inrange(year, 2016, 2021), . , clean))

Res.:

Code:


. l, sepby(unique_id)

     +---------------------------------------------+
     | year   unique~d   cookin~l   clean   wanted |
     |---------------------------------------------|
  1. | 2015   10001001          3       0        0 |
  2. | 2016   10001001          3       0        0 |
  3. | 2017   10001001          3       0        0 |
  4. | 2018   10001001          3       0        0 |
  5. | 2019   10001001         10       1        0 |
  6. | 2020   10001001         10       1        0 |
     |---------------------------------------------|
  7. | 2018   10001002          3       0        0 |
  8. | 2019   10001002          2       0        0 |
     |---------------------------------------------|
  9. | 2017   10001003          1       0        0 |
 10. | 2018   10001003          3       0        0 |
 11. | 2019   10001003          2       0        0 |
 12. | 2020   10001003          2       0        0 |
 13. | 2021   10001003          2       0        0 |
     |---------------------------------------------|
 14. | 2015   10001004          3       0        0 |
 15. | 2019   10001004          2       0        0 |
 16. | 2020   10001004          2       0        0 |
 17. | 2021   10001004          2       0        0 |
     |---------------------------------------------|
 18. | 2018   10001006          3       0        0 |
 19. | 2019   10001006         10       1        0 |
 20. | 2020   10001006         10       1        0 |
 21. | 2021   10001006          2       0        0 |
     |---------------------------------------------|
 22. | 2017   10001007          3       0        0 |
 23. | 2018   10001007          3       0        0 |
 24. | 2019   10001007          2       0        0 |
 25. | 2020   10001007          2       0        0 |
 26. | 2021   10001007          2       0        0 |
     |---------------------------------------------|
 27. | 2018   10001008          6       1        1 |
 28. | 2019   10001008         10       1        1 |
 29. | 2020   10001008         10       1        1 |
 30. | 2021   10001008          8       1        1 |
     +---------------------------------------------+



.

Last edited by Andrew Musau; 22 Feb 2024, 01:52.

Comment

Sk Moniruzzaman

Join Date: Jan 2024

Posts: 18
#3

22 Feb 2024, 04:46

Thank you, Andrew.

Your attempt was appreciated, but unfortunately, it does not fully meet my requirements. For instance, for unique_id 10001006, it switched back to dirty fuel in 2021. My criterion dictates that once a unique_id returns to using dirty fuel in any year, I need to remove the entire unique_id from the dataset. My objective is to divide the dataset into two groups: one consisting of households that never used clean energy in any year, and another comprising households that began using clean fuel according to the defined criteria but never reverted to dirty energy.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 9945
#4

22 Feb 2024, 05:15

Use the generated indicator to retrieve your sample.

For instance, for unique_id 10001006, it switched back to dirty fuel in 2021.

Sure, so the generated indicator ("wanted") is equal to 0 for this household.
Comment

Sk Moniruzzaman

Join Date: Jan 2024
Posts: 18

22 Feb 2024, 08:07

Code:

. list unique_id year cooking_fuel wanted clean in 51/75, sepby(unique_id)

     +---------------------------------------------+
     | uniq~_id   year   cookin~l   wanted   clean |
     |---------------------------------------------|
 51. | 10001017   2015          1        0       0 |
 52. | 10001017   2018          3        0       0 |
 53. | 10001017   2019         10        0       1 |
 54. | 10001017   2020         10        0       1 |
 55. | 10001017   2021          8        0       1 |
     |---------------------------------------------|
 56. | 10001018   2018          6        1       1 |
 57. | 10001018   2019         10        1       1 |
 58. | 10001018   2020         10        1       1 |
 59. | 10001018   2021          8        1       1 |
     |---------------------------------------------|
 60. | 10001019   2015          1        0       0 |
 61. | 10001019   2018          6        0       1 |
 62. | 10001019   2019         10        0       1 |
 63. | 10001019   2020          2        0       0 |
 64. | 10001019   2021          8        0       1 |
     |---------------------------------------------|
 65. | 10001020   2018          3        0       0 |
 66. | 10001020   2019          2        0       0 |
 67. | 10001020   2020         10        0       1 |
 68. | 10001020   2021          2        0       0 |
     |---------------------------------------------|
 69. | 10001021   2015          2        0       0 |
 70. | 10001021   2016          6        0       1 |
 71. | 10001021   2017          3        0       0 |
 72. | 10001021   2018          3        0       0 |
 73. | 10001021   2019          8        0       1 |
 74. | 10001021   2020          2        0       0 |
 75. | 10001021   2021          2        0       0 |
     +---------------------------------------------+

If you look at the result for 10001017, this household satisfies my condition because in 2019, it shifted to clean fuel but never returned to dirty fuel. The next household is also fine. However, for 10001019, this household shifted to clean fuel but again returned to dirty fuel. Therefore, I want to drop all households with the same issue. Similarly, for 10001020, this household also violates the condition.

Comment

Andrew Musau

Join Date: Oct 2014
Posts: 9945

22 Feb 2024, 10:48

Originally posted by Sk Moniruzzaman View Post

Your condition in #1 states

households that started using clean energy from 2016 onwards

So my code in #2 requires that the household be using clean energy for all observed years within this range.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float year long unique_id double cooking_fuel
2015 10001001  3
2016 10001001  3
2017 10001001  3
2018 10001001  3
2019 10001001 10
2020 10001001 10
2018 10001002  3
2019 10001002  2
2017 10001003  1
2018 10001003  3
2019 10001003  2
2020 10001003  2
2021 10001003  2
2015 10001004  3
2019 10001004  2
2020 10001004  2
2021 10001004  2
2018 10001006  3
2019 10001006 10
2020 10001006 10
2021 10001006  2
2017 10001007  3
2018 10001007  3
2019 10001007  2
2020 10001007  2
2021 10001007  2
2018 10001008  6
2019 10001008 10
2020 10001008 10
2021 10001008  8
end

gen clean = inlist(cooking_fuel, 5, 6) & inrange(year, 2015, 2018)|inrange(cooking_fuel, 8, 11) & inrange(year, 2019, 2020)|inrange(cooking_fuel, 6, 9) & year==2021
bys unique_id (year): gen sample= sum(clean)>=1 if year>=2016
by unique_id: egen tag1= max(!clean & sample==1)
by unique_id: egen tag2= max(clean & sample==1)
gen wanted= tag2 & !tag1

Res.:

Code:

. l, sepby(uniq)

     +--------------------------------------------------------------------+
     | year   unique~d   cookin~l   clean   sample   tag1   tag2   wanted |
     |--------------------------------------------------------------------|
  1. | 2015   10001001          3       0        .      0      1        1 |
  2. | 2016   10001001          3       0        0      0      1        1 |
  3. | 2017   10001001          3       0        0      0      1        1 |
  4. | 2018   10001001          3       0        0      0      1        1 |
  5. | 2019   10001001         10       1        1      0      1        1 |
  6. | 2020   10001001         10       1        1      0      1        1 |
     |--------------------------------------------------------------------|
  7. | 2018   10001002          3       0        0      0      0        0 |
  8. | 2019   10001002          2       0        0      0      0        0 |
     |--------------------------------------------------------------------|
  9. | 2017   10001003          1       0        0      0      0        0 |
 10. | 2018   10001003          3       0        0      0      0        0 |
 11. | 2019   10001003          2       0        0      0      0        0 |
 12. | 2020   10001003          2       0        0      0      0        0 |
 13. | 2021   10001003          2       0        0      0      0        0 |
     |--------------------------------------------------------------------|
 14. | 2015   10001004          3       0        .      0      0        0 |
 15. | 2019   10001004          2       0        0      0      0        0 |
 16. | 2020   10001004          2       0        0      0      0        0 |
 17. | 2021   10001004          2       0        0      0      0        0 |
     |--------------------------------------------------------------------|
 18. | 2018   10001006          3       0        0      1      1        0 |
 19. | 2019   10001006         10       1        1      1      1        0 |
 20. | 2020   10001006         10       1        1      1      1        0 |
 21. | 2021   10001006          2       0        1      1      1        0 |
     |--------------------------------------------------------------------|
 22. | 2017   10001007          3       0        0      0      0        0 |
 23. | 2018   10001007          3       0        0      0      0        0 |
 24. | 2019   10001007          2       0        0      0      0        0 |
 25. | 2020   10001007          2       0        0      0      0        0 |
 26. | 2021   10001007          2       0        0      0      0        0 |
     |--------------------------------------------------------------------|
 27. | 2018   10001008          6       1        1      0      1        1 |
 28. | 2019   10001008         10       1        1      0      1        1 |
 29. | 2020   10001008         10       1        1      0      1        1 |
 30. | 2021   10001008          8       1        1      0      1        1 |
     +--------------------------------------------------------------------+

Comment

Sk Moniruzzaman

Join Date: Jan 2024

Posts: 18
#7

22 Feb 2024, 19:39

Get my desired result. Thank you Andrew for your help.
Comment
Sk Moniruzzaman

Join Date: Jan 2024

Posts: 18
#8

07 Mar 2024, 04:24

Just another question. How I can mark those HH who use clean fuel all the observation year?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 9945
#9

07 Mar 2024, 05:11

That is an FAQ: https://www.stata.com/support/faqs/d...t-occurrences/. Notably, when sorting on the clean fuel indicator, the first and last value need to be equal to one for such households.
Comment

Sk Moniruzzaman

Join Date: Jan 2024
Posts: 18

#10

07 Mar 2024, 06:45

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float year long unique_id double cooking_fuel float(clean treated control firsttime lasttime)
2018 10001003  3 0 0 1    .    .
2019 10001003  2 0 0 1    .    .
2020 10001003  2 0 0 1    .    .
2021 10001003  2 0 0 1    .    .
2015 10001004  3 0 0 1    .    .
2019 10001004  2 0 0 1    .    .
2020 10001004  2 0 0 1    .    .
2021 10001004  2 0 0 1    .    .
2017 10001007  3 0 0 1    .    .
2018 10001007  3 0 0 1    .    .
2019 10001007  2 0 0 1    .    .
2020 10001007  2 0 0 1    .    .
2021 10001007  2 0 0 1    .    .
2018 10001008  6 1 1 0 2018 2021
2019 10001008 10 1 1 0 2018 2021
2020 10001008 10 1 1 0 2018 2021
end

Does it make any sense? I can't drop this 10001008 id.

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 9945
#11

07 Mar 2024, 07:33

Have you looked at the linked page?

Code:

bys uniq (clean): gen tag= clean[1] & clean[_N] keep if tag

ADDED IN EDIT: Apologies, the link in #9 is not the correct one. Instead, I meant https://www.stata.com/support/faqs/d...ions-in-group/.

Last edited by Andrew Musau; 07 Mar 2024, 07:37.
Comment
Sk Moniruzzaman

Join Date: Jan 2024

Posts: 18
#12

07 Mar 2024, 19:59

Thanks a lot Andrew. I already solved the problem. The code is similar to what you provided in #11. Thank you for your cooperation.
Comment

Announcement