Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to do summary statistics for each household in each monthly separated file, and then append them together?

    Hi everyone,

    I have to do summary statistics for all households for each month, from January 2021 to July 2023. All file names are written as follows: clean_export_telemedida_yyyymm.dta. E.g.: clean_export_telemedida_202101.dta for January 2021, and so on.
    But I am stuck as I never had to do that in the past, so I need some help, please.

    I have a file for household electricity consumption for each month. The idea that I have in mind is to read each month's file and collapse to a single observation per household, as the dataset is huge.
    • For example, this could be the count of observations, or the monthly hourly electricity consumption for h_0 (12 a.m.), h_1 (1 a.m.),
      etc.
    I want to have a monthly summary for each household, so the dataset is manageable. I also want to append the data, to always have one entry per id.
    • For instance: I summarize January 2021, then I summarize February 2021 and March 2021, append it to January, and collapse again so I still just have one file, with the count of observations and the average consumption per hour, each line being one month.
    Below are three -dataex- for my monthly individual files for January, February and March 2021. I only put the first five observations to be succinct, so it contains only an ID per month:

    January 2021:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long id float(date_elec_consumption h_0 h_1 h_2 h_3 h_4 h_5 h_6 h_7 h_8 h_9 h_10 h_11 h_12 h_13 h_14 h_15 h_16 h_17 h_18 h_19 h_20 h_21 h_22 h_23)
    1001 22281 .179 .113 .149 .115 .179 .114 .148 .115 .179 .113 .146 .113 .171 .117 .137 .125 .159 .134 .125 .136 .137 .157 .113 .148
    1001 22282 .115 .181 .114 .149 .115 .168 .131 .127 .134 .125 .166 .111 .142  .11 .175 .114 .148 .113 .143 .138 .146 .112 .146 .113
    1001 22283 .177 .115 .147 .113 .162 .144 .111 .141 .115 .151 .139 .112 .142 .112 .157 .149 .115 .147 .113 .183 .112 .147 .113 .161
    1001 22284 .136 .123 .144 .121 .175 .113 .147 .113 .172 .119 .135 .119 .135 .152 .112 .147 .112 .178 .111 .145 .113 .151  .14 .115
    1001 22285 .146 .119 .177 .115 .149 .115 .173 .121 .134 .122 .137 .153 .111 .145 .111  .18 .113 .144  .11 .154 .134  .12  .14 .122
    end
    format %td date_elec_consumption

    February 2021:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long id float(date_elec_consumption h_0 h_1 h_2 h_3 h_4 h_5 h_6 h_7 h_8 h_9 h_10 h_11 h_12 h_13 h_14 h_15 h_16 h_17 h_18 h_19 h_20 h_21 h_22 h_23)
    1001 22312  .126  .07 .075 .081 .173 .096 .314 .577 .487 1.316 .875  .18 .116  .078  .43  .687   .23  .15 .174 .231 .475 1.415 1.253 1.754
    1001 22313  .106  .03 .174 .046 .073 .109 .075 .727  .31  .661 .221 .175 .461  .101 .444  .719  .754  .25 .263 .079 .434 2.168  .304  .736
    1001 22314  .159 .093 .222 .276 .147 .113 .194 .707 .358  .462 .305 .259 .148  .075 .277 1.487  .518 .384 .364 .352 .858 1.714  .435  .586
    1001 22315   .12 .074 .409 .217 .077 .062  .08 .609 .747  .553 .504  .71 .365  .806 .395 1.094 1.177 .084 .327 .314 .685 1.697 1.001 1.003
    1001 22316 1.249 .157  .04 .277 .463 .141 .141 .692 .364  .607 .479 .779 .545 1.063 .587  .666  .807 .786 .088 .046 .107  .033   .09  .057
    end
    format %td date_elec_consumption
    March 2021:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long id float(date_elec_consumption h_0 h_1 h_2 h_3 h_4 h_5 h_6 h_7 h_8 h_9 h_10 h_11 h_12 h_13 h_14 h_15 h_16 h_17 h_18 h_19 h_20 h_21 h_22 h_23)
    1001 22340 .162 .078 .094 .043 .102 .055 .358 .701 .773  .37 .329 .147 .282 .049 .453 1.103 .435 .106 .231 .092   .33 1.537 1.043  .713
    1001 22341 .886 .143 .114 .154 .096 .171 .241 .772  .71 1.07 .666 .096 .061 .081 .134  .318 .295 .238 .101 .096  .453 1.246  .693   .19
    1001 22342  .16 .151 .124 .078 .058 .082 .247 .778 .104 .107 .082 .063 .107 .054 .114   .57 .153 .192 .848 1.07 1.699 1.378  .548  .834
    1001 22343 .257 .164 .132 .112 .166 .103 .094 .772 .084 .081 .087 .061 .102 .035 .181  .386  .19 .125 .182 .519  .784 1.867  .737 1.237
    1001 22344 .169 .142 .129 .068 .125 .113 .333 .684 .877 .973 .503 .337 .209 .092 .301  .588 .355 .329 .182 .127  .125  .117  .124  .116
    end
    format %td date_elec_consumption
    Do you know how can I proceed for that, please?

    Thank you for your help.
    All the best,

    Michael
    Last edited by Michael Duarte Goncalves; 10 Jan 2024, 02:25.

  • #2
    I want to collapse the data, by household, for each monthly file. Finally, I want to append them together.
    1. How could I loop everything to have, for each household ID, the number of observations for each month, the average monthly electricity consumption, etc.
    2. Then I want to save as a -tempfile- each individual monthly file computed,
    3. Finally, I want to append them, to have a full summary statistics for all the months from January 2021 to July 2023 for each unique ID from my sample.
    Here is what I tried, but I don't know how to do the proper averages, as I am only taking the average of -h_7-, and the count of non-missing observations for h_0. But I want to compute the non-missing observations from h_0 to h_23 and the monthly averages for each hour taken together.
    Could anyone help me, please?

    Code:
    collapse (mean) h_7 (count) h_0, by(id)
    Thank you in advance for your help!
    Michael

    Comment


    • #3
      I also tried that, but not sure it works:

      Code:
      clear all
      version 17
      set more off
      set varabbrev off
      
      
      global path "C:/Users/miduarte/Desktop/Ongoing_Projects/test_HolaLuz_Data/New_dataset_15112023/stata/2_export_telemedida/data/temp"
      cd "${path}"
      
      
      tempfile allmonths_sumstats
      save `allmonths_sumstats', emptyok
      
      local datafiles : dir . files "clean_export_telemedida_*.dta" // monthly files "clean_export_telemedida_yyyymm.dta"
      foreach file of local datafiles {
          
          clear*
          use "`file'", clear
          
          gen year_month = mofd(date_elec_consumption)
          format %tm year_month
          
          egen daily_elec = rowtotal(h_*)
          collapse (mean) daily_elec (count) no_obs=h_0 if !missing(h_0,h_1,h_2,h_3,h_4,h_5,h_6,h_7,h_8,h_9,h_10,h_11,h_12,h_13,h_14,h_15,h_16,h_17,h_18,h_19,h_20,h_21,h_22, h_23), by(id year_month)
          label var no_obs "No. Observations"
          label var year_month "Year and Month"
          
          append using `allmonths_sumstats'
          save `"`allmonths_sumstats'"', replace
      
      }
      
      exit, clear

      Comment

      Working...
      X