Hi everyone,
I have a master dataset of the following form:
I have created a random sample of 2,000 unique households, following some advices given on Statalist previously:
I have monthly files about household consumption from January 2021 to July 2023. Below is a -dataex- of January 2021:
What I want, please:
Thank you in advance for your help!
Michael
I have a master dataset of the following form:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input long id str19 id_date long idcontrato double(date_contract_start date_contract_end) 1001 "1001_18887_21700" 1001 18887 21700 1001 "1001_21701_22431" 451697 21701 22431 1001 "1001_22432_22645" 1236132 22432 22645 1001 "1001_22646_22676" 1730454 22646 22676 1001 "1001_22677_22735" 2082075 22677 22735 1001 "1001_22736_23010" 2172904 22736 23010 1001 "1001_23011_23069" 2872183 23011 23069 1001 "1001_23070_." 3107888 23070 . 1005 "1005_18800_21639" 1005 18800 21639 1005 "1005_21640_21651" 420392 21640 21651 1005 "1005_21652_22066" 432684 21652 22066 1005 "1005_22067_22431" 720923 22067 22431 1005 "1005_22432_22456" 1124767 22432 22456 1005 "1005_22457_22645" 1288758 22457 22645 1005 "1005_22646_22676" 1742918 22646 22676 1005 "1005_22677_22735" 2036693 22677 22735 1005 "1005_22736_22888" 2322897 22736 22888 1005 "1005_22889_23010" 2598018 22889 23010 1005 "1005_23011_23041" 2728124 23011 23041 1005 "1005_23042_23130" 2991589 23042 23130 end format %td date_contract_start format %td date_contract_end
I have created a random sample of 2,000 unique households, following some advices given on Statalist previously:
Code:
keep if inrange(date_contract_end, td(01jan2021), td(13nov2023)) | missing(date_contract_end) // { // --- This code is replicated from Statalist, with a few local adaptations. All credits go to Nick Cox --- // egen tag = tag(id) * set seed : random sample for better reproducibility set seed 09012024 // today's date gen shuffle = . gen sampled = . qui replace shuffle = runiform() & missing(fecha_operativa_pv) // households without Solar Panels sort tag shuffle * the tagged values are at the end; we select 2000 of them qui replace sampled = inrange(_n, _N-1999, _N) * spread selection qui bysort id (sampled) : replace sampled = sampled[_N] levelsof id if sampled // } keep if sampled sort id date_contract_start
I have monthly files about household consumption from January 2021 to July 2023. Below is a -dataex- of January 2021:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input long id str19 id_date long idcontrato double(date_contract_start date_contract_end) 1001 "1001_21701_22431" 451697 21701 22431 1001 "1001_21701_22431" 451697 21701 22431 1001 "1001_21701_22431" 451697 21701 22431 1001 "1001_21701_22431" 451697 21701 22431 1001 "1001_21701_22431" 451697 21701 22431 1001 "1001_21701_22431" 451697 21701 22431 1001 "1001_21701_22431" 451697 21701 22431 1001 "1001_21701_22431" 451697 21701 22431 1001 "1001_21701_22431" 451697 21701 22431 1001 "1001_21701_22431" 451697 21701 22431 1001 "1001_21701_22431" 451697 21701 22431 1001 "1001_21701_22431" 451697 21701 22431 1001 "1001_21701_22431" 451697 21701 22431 1001 "1001_21701_22431" 451697 21701 22431 1001 "1001_21701_22431" 451697 21701 22431 1001 "1001_21701_22431" 451697 21701 22431 1001 "1001_21701_22431" 451697 21701 22431 1001 "1001_21701_22431" 451697 21701 22431 1001 "1001_21701_22431" 451697 21701 22431 1001 "1001_21701_22431" 451697 21701 22431 end format %td date_contract_start format %td date_contract_end
- Read in one monthly data at a time (as my dataset is huge), and keep only the data for those 2,000 households,
- Append each month to have a dataset with 2,000 households, and their full data.
Thank you in advance for your help!
Michael
Comment