Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to take max number by group

    I've two cross-sectional datasets merging together. The data has variables about working from home pre-covid_19 ( wfh_pre) and during Covid-19 (wfh-post). Where are these variables is a dummy variable showing if an individual is working from home. I would like to calculate the maximum number (peak %) of people working from home within each occupation (jbsoc10_cc_agg), separately for the pre-COVID-19 and post-COVID-19 periods. For example, working from home in a Nurse occupation should have two columns, one for pre-covid-19 and the other for post_covid-19

    I try with this code, but I'm not sure if this is what I'm looking for :
    Code:
    * Collapse the data to get the total number of people working from home by occupation and year
    collapse (sum) wfh_pre wfh_post, by(jbsoc10_cc_agg )
    
    
    ///*Create Temporary Files for Pre and Post COVID-19 Data///
    * Save the collapsed data to a temporary file
    tempfile collapsed_datai
    save "collapsed_data", replace 
    
    * Create and save pre-COVID-19 data
    use "collapsed_data", clear
    collapse (max) wfh_pre, by(jbsoc10_cc_agg)
    tempfile pre_covid_peak
    save "pre_covid_peak", replace 
    
    * Create and save post-COVID-19 data
    use "collapsed_data", clear
    collapse (max) wfh_post, by(jbsoc10_cc_agg)
    tempfile post_covid_peak
    save "post_covid_peak", replace 
    
    ///*Merge Pre and Post COVID-19 Data///
    * Merge pre and post COVID-19 data
    use "pre_covid_peak", clear
    rename wfh_pre peak_wfh_pre  
    merge 1:1 jbsoc10_cc_agg using "post_covid_peak"
    rename wfh_post peak_wfh_post
    drop _merge
    
    ///*Merge with the Original Dataset///
    * Save the merged peak data to a temporary file
    tempfile combined_peak_wfh
    save "combined_peak_wfh", replace 
    
    * Merge with the original dataset
    use "path_to_original_dataset.dta", clear
    merge m:1 jbsoc10_cc_agg using "`combined_peak_wfh'"
    Could you help with that?

    This is my sample data. I provided it separately to just show who is working from home pre/ post :
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double pidp int jbsoc10_cc_agg float(tq wfh_post)
     70414697 721 241 1
     89358129   . 241 1
    137259367 415 241 1
    137330089 231 241 1
    138099169   . 241 1
    138519411 223 241 1
    204395091   . 241 1
    204791531   . 241 1
    205104333 223 241 1
    272014285 321 241 1
    272124451 222 241 1
    272148251 415 241 1
    272269291   . 241 1
    272291727 313 241 1
    272328451 242 241 1
    272401211 231 241 1
    272401891 213 241 1
    272408691 125 241 1
    272465127 612 241 1
    272533807 323 241 1
    272541971 212 241 1
    272588219 413 241 1
    272709313 312 241 1
    272971051 353 241 1
    273113847   . 241 1
    273143771 231 241 1
    273459293 221 241 1
    273729255 341 241 1
    273787045 125 241 1
    273864567 231 241 1
    275361245 222 241 1
    275957686 612 241 1
    277059298 323 241 1
    279072770 222 241 1
    282098764 242 241 1
    285908729 354 241 1
    293213965 244 241 1
    299462489   . 241 1
    300324725 511 241 1
    340182931 213 241 1
    340212233 211 241 1
    340274727 421 241 1
    340303291 211 241 1
    340304647 231 241 1
    340341371 212 241 1
    340540607 231 241 1
    340556931   . 241 1
    340568487 814 241 1
    340631727 113 241 1
    340727615   . 241 1
    340727619   . 241 1
    340899647   . 241 1
    340958127 115 241 1
    341117931 112 241 1
    341126767   . 241 1
    341404889 415 241 1
    341709525 313 241 1
    341709529 241 241 1
    341800653 231 241 1
    342484049   . 241 1
    342554091 231 241 1
    343941969   . 241 1
    354157609 412 241 1
    354273205 223 241 1
    367660369 212 241 1
    408017699 211 241 1
    408031971   . 241 1
    408120367   . 241 1
    408306687   . 241 1
    408348847 612 241 1
    408666407   . 241 1
    408975131 412 241 1
    409115211   . 241 1
    409208371 231 241 1
    409306975   . 241 1
    410520771 415 241 1
    410564971   . 241 1
    410836289 356 241 1
    410910534 411 241 1
    411714849   . 241 1
    419553966 125 241 1
    421648977 213 241 1
    421657129 244 241 1
    421855021   . 241 1
    422141293 221 241 1
    436324117 245 241 1
    436331529 614 241 1
    476322343 612 241 1
    476499811 412 241 1
    476532451 212 241 1
    477104327 231 241 1
    477104331   . 241 1
    477143087   . 242 1
    478384767   . 241 1
    478443925 415 241 1
    478643847   . 241 1
    478775773 231 241 1
    478894089 231 241 1
    479637329 125 242 1
    486968484 531 242 1
    end
    format %tq tq
    label values jbsoc10_cc_agg b_jbsoc10_cc
    label def b_jbsoc10_cc 112 "Production Managers and Directors", modify
    label def b_jbsoc10_cc 113 "Functional Managers and Directors", modify
    label def b_jbsoc10_cc 115 "Financial Institution Managers and Directors", modify
    label def b_jbsoc10_cc 125 "Managers and Proprietors in Other Services", modify
    label def b_jbsoc10_cc 211 "Natural and Social Science Professionals", modify
    label def b_jbsoc10_cc 212 "Engineering Professionals", modify
    label def b_jbsoc10_cc 213 "Information Technology and Telecommunications Professionals", modify
    label def b_jbsoc10_cc 221 "Health Professionals", modify
    label def b_jbsoc10_cc 222 "Therapy Professionals", modify
    label def b_jbsoc10_cc 223 "Nursing and Midwifery Professionals", modify
    label def b_jbsoc10_cc 231 "Teaching and Educational Professionals", modify
    label def b_jbsoc10_cc 241 "Legal Professionals", modify
    label def b_jbsoc10_cc 242 "Business, Research and Administrative Professionals", modify
    label def b_jbsoc10_cc 244 "Welfare Professionals", modify
    label def b_jbsoc10_cc 245 "Librarians and Related Professionals", modify
    label def b_jbsoc10_cc 312 "Draughtspersons and Related Architectural Technicians", modify
    label def b_jbsoc10_cc 313 "Information Technology Technicians", modify
    label def b_jbsoc10_cc 321 "Health Associate Professionals", modify
    label def b_jbsoc10_cc 323 "Welfare and Housing Associate Professionals", modify
    label def b_jbsoc10_cc 341 "Artistic, Literary and Media Occupations", modify
    label def b_jbsoc10_cc 353 "Business, Finance and Related Associate Professionals", modify
    label def b_jbsoc10_cc 354 "Sales, Marketing and Related Associate Professionals", modify
    label def b_jbsoc10_cc 356 "Public Services and Other Associate Professionals", modify
    label def b_jbsoc10_cc 411 "Administrative Occupations: Government and Related Organisations", modify
    label def b_jbsoc10_cc 412 "Administrative Occupations: Finance", modify
    label def b_jbsoc10_cc 413 "Administrative Occupations: Records", modify
    label def b_jbsoc10_cc 415 "Other Administrative Occupations", modify
    label def b_jbsoc10_cc 421 "Secretarial and Related Occupations", modify
    label def b_jbsoc10_cc 511 "Agricultural and Related Trades", modify
    label def b_jbsoc10_cc 531 "Construction and Building Trades", modify
    label def b_jbsoc10_cc 612 "Childcare and Related Personal Services", modify
    label def b_jbsoc10_cc 614 "Caring Personal Services", modify
    label def b_jbsoc10_cc 721 "Customer Service Occupations", modify
    label def b_jbsoc10_cc 814 "Construction Operatives", modify

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double pidp int jbsoc10_cc_agg float(tq wfh_pre)
       22445 242 226 1
       22445 242 233 1
      280165 215 235 1
     4794685 356 209 1
    68011567   . 200 1
    68011567   . 208 1
    68011567   . 216 1
    68011567 125 232 1
    68035367   . 200 1
    68041487 354 216 1
    68044207   . 200 1
    68044207   . 216 1
    68044207   . 224 1
    68060527 356 224 1
    68060527 356 232 1
    68111527 118 224 1
    68112207   . 200 1
    68112207 213 225 1
    68120371 125 232 1
    68122407   . 200 1
    68122407   . 208 1
    68136009 421 224 1
    68150975 213 233 1
    68155731   . 200 1
    68155731 231 225 1
    68163887   . 209 1
    68180887 415 232 1
    68180891   . 200 1
    68180891 214 208 1
    68180891 214 216 1
    68180891 214 232 1
    68195851   . 225 1
    68213531   . 233 1
    68216247   . 200 1
    68216247   . 208 1
    68216247 231 224 1
    68216247 231 232 1
    68216251 313 208 1
    68216251 213 224 1
    68216251 213 232 1
    68288327 356 232 1
    68293091   . 232 1
    68293099 331 232 1
    68293767   . 200 1
    68295131   . 225 1
    68297167   . 200 1
    68329127 354 217 1
    68329131   . 200 1
    68333207   . 200 1
    68333207   . 208 1
    68333207   . 224 1
    68333207   . 232 1
    68338651   . 200 1
    68338651   . 216 1
    68340011   . 200 1
    68340075 246 232 1
    68364491 354 225 1
    68395771 247 232 1
    68423647   . 208 1
    68435887   . 200 1
    68435887   . 208 1
    68438607   . 216 1
    68453571 415 233 1
    68501167   . 208 1
    68501167   . 216 1
    68501167 213 225 1
    68501167 213 232 1
    68501171 356 232 1
    68535847   . 200 1
    68539251 213 224 1
    68545367 113 232 1
    68546047   . 200 1
    68546047 213 224 1
    68550127 353 224 1
    68565087 113 216 1
    68567809   . 208 1
    68569167 125 225 1
    68569171   . 200 1
    68569171   . 216 1
    68580727   . 216 1
    68580727   . 225 1
    68615411 416 232 1
    68643291 113 209 1
    68646096 415 233 1
    68693607   . 200 1
    68708571   . 200 1
    68718771   . 200 1
    68718771   . 208 1
    68718771   . 216 1
    68731007   . 200 1
    68735087   . 200 1
    68735767   . 200 1
    68740599 112 217 1
    68740599 213 225 1
    68755487   . 200 1
    68756171   . 200 1
    68773847   . 200 1
    68774525   . 200 1
    68775213   . 200 1
    68785407 113 232 1
    end
    format %tq tq
    label values jbsoc10_cc_agg b_jbsoc10_cc
    label def b_jbsoc10_cc 112 "Production Managers and Directors", modify
    label def b_jbsoc10_cc 113 "Functional Managers and Directors", modify
    label def b_jbsoc10_cc 118 "Health and Social Services Managers and Directors", modify
    label def b_jbsoc10_cc 125 "Managers and Proprietors in Other Services", modify
    label def b_jbsoc10_cc 213 "Information Technology and Telecommunications Professionals", modify
    label def b_jbsoc10_cc 214 "Conservation and Environment Professionals", modify
    label def b_jbsoc10_cc 215 "Research and Development Managers", modify
    label def b_jbsoc10_cc 231 "Teaching and Educational Professionals", modify
    label def b_jbsoc10_cc 242 "Business, Research and Administrative Professionals", modify
    label def b_jbsoc10_cc 246 "Quality and Regulatory Professionals", modify
    label def b_jbsoc10_cc 247 "Media Professionals", modify
    label def b_jbsoc10_cc 313 "Information Technology Technicians", modify
    label def b_jbsoc10_cc 331 "Protective Service Occupations", modify
    label def b_jbsoc10_cc 353 "Business, Finance and Related Associate Professionals", modify
    label def b_jbsoc10_cc 354 "Sales, Marketing and Related Associate Professionals", modify
    label def b_jbsoc10_cc 356 "Public Services and Other Associate Professionals", modify
    label def b_jbsoc10_cc 415 "Other Administrative Occupations", modify
    label def b_jbsoc10_cc 416 "Administrative Occupations: Office Managers and Supervisors", modify
    label def b_jbsoc10_cc 421 "Secretarial and Related Occupations", modify

  • #2
    Code:
    egen igroup = group(jbsoc10_cc_agg)
    egen indN  = count(pidp), by(igroup)
    egen homeN = count(cond(wfh_post,pidp,.)), by(igroup)
    g homeNshr = homeN/indN

    Comment


    • #3
      I'm not sure I understand what you want to do, and your code is not workable as written and only gives me partial clues to what you are trying to accomplish. I'm guessing that you want to find, for each job category in both the pre- and post-covid periods, the number of people in the category, and then, for both the pre- and post-covid periods, the size of the job category with the most people in it. You can do that with:
      Code:
      byagg wfh_post jbsoc10_cc, sort: egen count_of_people = count(pidp)
       by wfh_post, sort: egen max_number_of_people = max(count_of_people)
      Note: You do say that you were expecting a separate "colulmn" for pre and post covid. Apart from the fact that the example data has only one value of wfh_post, even in your full data this code does not generate two separate variables. It creates a single variable, max_number_of_people, that takes on the appropriate value according to the value of wfh_post in each observation. Creating two "variables" would actually create two constants, since what you are asking for is the same for all observations in the same pre vs post period. There is no point in doing that.

      If this is not what you wanted to do, please post back. In addition to a clearer explanation of what you want, it probably would make sense to a) post a new data example that includes both pre- and post- observations, and then, for just your example data, show what the results should look like.

      Added: Crossed with #2.

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        I'm not sure I understand what you want to do, and your code is not workable as written and only gives me partial clues to what you are trying to accomplish. I'm guessing that you want to find, for each job category in both the pre- and post-covid periods, the number of people in the category, and then, for both the pre- and post-covid periods, the size of the job category with the most people in it. You can do that with:
        Code:
        byagg wfh_post jbsoc10_cc, sort: egen count_of_people = count(pidp)
        by wfh_post, sort: egen max_number_of_people = max(count_of_people)
        Note: You do say that you were expecting a separate "colulmn" for pre and post covid. Apart from the fact that the example data has only one value of wfh_post, even in your full data this code does not generate two separate variables. It creates a single variable, max_number_of_people, that takes on the appropriate value according to the value of wfh_post in each observation. Creating two "variables" would actually create two constants, since what you are asking for is the same for all observations in the same pre vs post period. There is no point in doing that.

        If this is not what you wanted to do, please post back. In addition to a clearer explanation of what you want, it probably would make sense to a) post a new data example that includes both pre- and post- observations, and then, for just your example data, show what the results should look like.

        Added: Crossed with #2.
        Thanks, George and Clyde, for your reply
        I think George's suggestion is near to what I want

        I will try to clarify what I want by showing my estimation equation :
        〖job satisfaction 〗_it= ∝_i+ θ_t+ ∆〖wfh Cocid_19〗_occupation*Year^' s dummy+ ϵ_it
        where: job satisfaction is "jbsat" ,∝ is an individual fixed effect, θ_t is the time fixed effect , ∆〖wfh Cocid_19〗_occupation is the change of work from home by occupation during Covid-19.

        My idea would be something like the working from home(WFH) of an individual as a function of change in WFH in that occupation during COVID-19. I would take the last pre-COVID WFH and want to take the maximum WFH rate in each occupation pre-COVID-19 (calculate the Max (peak) of people pre-WFH within each occupation group). Then, I will do the same with post-COVID WFH and take the difference between the pre-WFH and post-WFH. This generates something like an instrument. Then interacts change of WFH (difference ) with the post-COVID Years dummies (2020,2021 and 2022).

        Could you help me generate this maximum number of pre and post-COVID-19 working from home at the occupation level and take the difference between them to be ready to use in my equation? please advise me if you have a better idea
        my image is to get the same columns in my original data :
        Id Occupation Pre_WFH Post_WFH Difference
        1 Therapy Professionals 0.5 0.8 -0.3
        2 Agricultural Occupations 0.7 0.6 0.2
        Hopefully, this is clear.

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input double pidp int jbsoc10_cc_agg float(tq wfh_pre wfh_post)
         612268607 722 218 1 0
         972032890 722 228 1 0
         409441605 722 209 1 0
         409229447 722 209 1 0
        1429094131 722 238 1 0
        1088197211 722 213 1 0
        1632379447 722 216 1 0
        1088197211 722 229 1 0
         340120379 812 232 1 0
         136127847 812 216 1 0
        1362426259 813 224 1 0
         204368579 813 216 1 0
         952320287 813 228 1 0
         544147567 813 219 1 0
         749550405 813 236 1 0
         300069729 813 234 1 0
        1362421008 813 222 1 0
         300069729 813 226 1 0
        1632210807 821 231 1 0
        1632210807 821 216 1 0
         421769329 821 234 1 0
         750183004 823 236 1 0
         750183004 823 228 1 0
        1224180899 911 230 1 0
        1428300571 912 215 1 0
         410802977 913 226 1 0
        1156499135 921 221 1 0
         544604527 923 226 1 0
        1429128807 923 215 1 0
         612268607 924 211 1 0
        1360318935 924 222 1 0
         477816410 924 233 1 0
        1496765011 926 239 1 0
        1496765011 926 231 1 0
        1230711684 927 213 1 0
        1020163964 927 236 1 0
        1224739847 927 230 1 0
         341117931 112 241 0 1
         340631727 113 241 0 1
         612214887 113 241 0 1
         544735767 113 241 0 1
         884797647 113 244 0 1
         340631727 113 241 0 1
         340958127 115 241 0 1
         884125811 125 244 0 1
         419553966 125 241 0 1
         272408691 125 241 0 1
         479637329 125 242 0 1
         273787045 125 241 0 1
         546315413 211 241 0 1
         340212233 211 241 0 1
         340303291 211 241 0 1
         408017699 211 241 0 1
         476532451 212 241 0 1
         340341371 212 241 0 1
         367660369 212 241 0 1
         272541971 212 241 0 1
         272541971 212 241 0 1
         421648977 213 241 0 1
         620778886 213 241 0 1
         340182931 213 241 0 1
        1020754884 213 244 0 1
         272401891 213 241 0 1
         421648977 213 241 0 1
         544654171 215 241 0 1
         422141293 221 241 0 1
         273459293 221 241 0 1
         273459293 221 241 0 1
         545428685 221 242 0 1
         544441343 221 241 0 1
        1020944531 221 244 0 1
         279072770 222 241 0 1
         275361245 222 241 0 1
         272124451 222 241 0 1
         279072770 222 241 0 1
         275361245 222 241 0 1
         138519411 223 241 0 1
         354273205 223 241 0 1
         205104333 223 241 0 1
         885066247 223 244 0 1
         273864567 231 241 0 1
         272401211 231 241 0 1
         341800653 231 241 0 1
         340304647 231 241 0 1
         884849327 231 244 0 1
         693770689 231 243 0 1
         342554091 231 241 0 1
         273143771 231 241 0 1
         544841171 231 242 0 1
         477104327 231 241 0 1
         478894089 231 241 0 1
         272401211 231 241 0 1
         409208371 231 241 0 1
         478775773 231 241 0 1
         341800653 231 241 0 1
         137330089 231 241 0 1
         340540607 231 241 0 1
         612736447 231 241 0 1
         504276449 231 241 0 1
         342554091 231 241 0 1
        end
        format %tq tq
        label values jbsoc10_cc_agg b_jbsoc10_cc
        label def b_jbsoc10_cc 112 "Production Managers and Directors", modify
        label def b_jbsoc10_cc 113 "Functional Managers and Directors", modify
        label def b_jbsoc10_cc 115 "Financial Institution Managers and Directors", modify
        label def b_jbsoc10_cc 125 "Managers and Proprietors in Other Services", modify
        label def b_jbsoc10_cc 211 "Natural and Social Science Professionals", modify
        label def b_jbsoc10_cc 212 "Engineering Professionals", modify
        label def b_jbsoc10_cc 213 "Information Technology and Telecommunications Professionals", modify
        label def b_jbsoc10_cc 215 "Research and Development Managers", modify
        label def b_jbsoc10_cc 221 "Health Professionals", modify
        label def b_jbsoc10_cc 222 "Therapy Professionals", modify
        label def b_jbsoc10_cc 223 "Nursing and Midwifery Professionals", modify
        label def b_jbsoc10_cc 231 "Teaching and Educational Professionals", modify
        label def b_jbsoc10_cc 722 "Customer Service Managers and Supervisors", modify
        label def b_jbsoc10_cc 812 "Plant and Machine Operatives", modify
        label def b_jbsoc10_cc 813 "Assemblers and Routine Operatives", modify
        label def b_jbsoc10_cc 821 "Road Transport Drivers", modify
        label def b_jbsoc10_cc 823 "Other Drivers and Transport Operatives", modify
        label def b_jbsoc10_cc 911 "Elementary Agricultural Occupations", modify
        label def b_jbsoc10_cc 912 "Elementary Construction Occupations", modify
        label def b_jbsoc10_cc 913 "Elementary Process Plant Occupations", modify
        label def b_jbsoc10_cc 921 "Elementary Administration Occupations", modify
        label def b_jbsoc10_cc 923 "Elementary Cleaning Occupations", modify
        label def b_jbsoc10_cc 924 "Elementary Security Occupations", modify
        label def b_jbsoc10_cc 926 "Elementary Storage Occupations", modify
        label def b_jbsoc10_cc 927 "Other Elementary Services Occupations", modify
        Last edited by Malik Saendman; 19 Jan 2024, 07:12.

        Comment


        • #5
          I'm not sure what you mean by max/peak in a cross section.

          As I understand it, you are not working with a panel. So, you want to use the typical experience by occupation as a proxy for WFH. This proxy is the change in the mean WFH between two periods (before after covid). Is that right?

          Comment


          • #6
            Sorry. While the explanation seems clearer, it seems incompatible with the data you have shown. In the example data, wfh_pre is always 1 when tq is 2019q4 or earlier, and 0 after 2019q4. And wfh-post is always exactly the boolean negation of wfh_pre. So wfh_post is just an indicator for the post-covid era. As a result, using these variables it seems thatthe percentage working from home pre-covid is 0 in all quarters and all occupations and is 100% in all quarters and all occupations after 100%.

            Comment


            • #7
              Originally posted by George Ford View Post
              I'm not sure what you mean by max/peak in a cross section.

              As I understand it, you are not working with a panel. So, you want to use the typical experience by occupation as a proxy for WFH. This proxy is the change in the mean WFH between two periods (before after covid). Is that right?
              Yes, I'm dealing with cross-sectional data; however, I'm not looking to take mean or average. By max/peak, I mean focusing on finding out the highest level of WFH adaptation across occupations. for example this code :
              Code:
              egen everwfh = max(wfh_pre), by(pidp)
              This code represents the maximum value of the wfh_pre variable for each group defined by pidp. This is more aligned with the approach of analyzing the peak or maximum WFH rates. Right? But this did not capture the occupation. I want something similar in each occupation. I'm not sure how to do this. Is it by collapsing the data by occupation or something else?

              Comment


              • #8
                Originally posted by Clyde Schechter View Post
                Sorry. While the explanation seems clearer, it seems incompatible with the data you have shown. In the example data, wfh_pre is always 1 when tq is 2019q4 or earlier, and 0 after 2019q4. And wfh-post is always exactly the boolean negation of wfh_pre. So wfh_post is just an indicator for the post-covid era. As a result, using these variables it seems thatthe percentage working from home pre-covid is 0 in all quarters and all occupations and is 100% in all quarters and all occupations after 100%.
                Yes, you are correct. My data is a bit complicated because it contains two data sets. I merged two data sets by "pidp" and "int_date". The wfh_pre does not reflect wfh_post because each variable came from different data.
                To be clear, I have the main data, which has the variable about working from home "jbfxuse7" from the period 2010 to 2022, so I generated the wfh_pre variable form "jbfxuse7" up to 2019q4 only and used this as pre-covid-19. (here, I could not use this "jbfxuse7" as post_covid working from home because it does not capture the effect of Coivd-19 directly ) Therefore, I merged other data from the same sources to show working from home during the Covid-19 period only, which has the variable " wfh." This variable can help me identify the effect directly.

                This is my original data, I generated variables wfh_pre , wfh_post as shown in #4
                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input double pidp float(int_date tq) int jbsoc10_cc_agg byte jbfxuse7 float wfh
                272401211 22034 241 231  0 1
                272401211 22034 241 231  0 1
                272401215 19821 217 412  0 0
                272401215 21410 234 231  0 0
                272401215 22089 241 231  0 0
                272401219 19865 217 927  0 0
                272401219 22069 241 421  0 0
                272401891 20559 225 213  0 0
                272401891 21280 233 213  1 0
                272401891 19884 217 213  1 0
                272401891 19115 209 713 .c 0
                272401891 22032 241 213  0 1
                272401899 22021 241 925  0 0
                272404607 21269 232 341 .b 0
                272408007 19906 218 323  1 0
                272408007 22074 241 242  1 0
                272408007 20677 226 323  1 0
                272408007 21359 233 111  1 0
                272408011 19906 218 354 .c 0
                272408011 20677 226 354 .c 0
                272408011 21352 233 113  0 0
                272408011 19122 209 354 .c 0
                272408015 21347 233 341  0 0
                272408687 22065 241 213 .b 0
                272408687 20665 226 213 .b 0
                272408687 19901 217 213  0 0
                272408687 21337 233 213 .b 0
                272408687 19138 209 213  0 0
                272408691 20665 226 243 .c 0
                272408691 21336 233 125 .b 0
                272408691 19138 209 353 .c 0
                272408691 19901 217 243 .c 0
                272408691 22066 241 125 .b 1
                272416171 19136 209 511 .c 0
                272416171 22104 242 511 .c 0
                272416171 21402 234 511 .c 0
                272416171 20685 226 511 .c 0
                272420247 19905 218 124  0 0
                272420247 19170 209 614  0 0
                272420247 21329 233 614  0 0
                272420247 20677 226 614  0 0
                272420247 22063 241 614 .b 0
                272435410 19866 217 924 .b 0
                272442007 20550 225 923  0 0
                272442007 19877 217 923  0 0
                272442007 21317 233 923 .b 0
                272452891 19123 209 214  0 0
                272456287 22083 241 711  0 0
                272456287 21364 233 711 .b 0
                272456287 20576 225 711  0 0
                272456291 19904 217 924  0 0
                272456291 20576 225 924  0 0
                272456291 19172 209 924 .b 0
                272456291 21364 233 924 .b 0
                272456291 22075 241 924 .b 0
                272463091 21463 235 926 .b 0
                272463091 22089 241 926 .b 0
                272463765 19827 217 125 .b 0
                272463765 20591 225 125  0 0
                272465127 19239 210 612  0 0
                272465127 21312 233 612  0 0
                272465127 20568 225 612 .b 0
                272465127 19842 217 612 .b 0
                272465127 22063 241 612  0 1
                272465131 19842 217 614 .c 0
                272465135 20588 225 421 .b 0
                272465135 21312 233 614  0 0
                272465135 19842 217 927 .c 0
                272465135 22039 241 614  0 0
                272465139 22089 241 523 .b 0
                272465139 21308 233 523 .b 0
                272465139 20597 225 523 .b 0
                272465139 19842 217 523 .c 0
                272465811 19904 217 213 .b 0
                272465811 20580 225 213  1 0
                272465811 19131 209 213 .b 0
                272466493 19132 209 231 .c 0
                272471927 19131 209 614  0 0
                272473287 22063 241 821 .b 0
                272475327 22045 241 231  0 0
                272475327 20583 225 231  0 0
                272475327 21305 233 231  0 0
                272475331 20585 225 614  0 0
                272476011 21354 233 821 .b 0
                272476011 22065 241 821 .b 0
                272476015 20664 226 711  0 0
                272476015 22033 241 821  0 0
                272476015 21354 233 721 .b 0
                272476015 19887 217 711 .b 0
                272476699 20693 226 342 .b 0
                272476699 21334 233 342 .b 0
                272478047 19134 209 231  0 0
                272478047 19830 217 415  0 0
                272478047 22025 241 415  1 0
                272478047 20607 225 415  0 0
                272478047 21395 234 415  0 0
                272478051 20607 225 231  0 0
                272478051 19830 217 231  0 0
                272478051 21395 234 231  1 0
                272478051 19139 209 231  0 0
                end
                format %td int_date
                format %tq tq
                label values jbsoc10_cc_agg b_jbsoc10_cc
                label def b_jbsoc10_cc 111 "Chief Executives and Senior Officials", modify
                label def b_jbsoc10_cc 113 "Functional Managers and Directors", modify
                label def b_jbsoc10_cc 124 "Managers and Proprietors in Health and Care Services", modify
                label def b_jbsoc10_cc 125 "Managers and Proprietors in Other Services", modify
                label def b_jbsoc10_cc 213 "Information Technology and Telecommunications Professionals", modify
                label def b_jbsoc10_cc 214 "Conservation and Environment Professionals", modify
                label def b_jbsoc10_cc 231 "Teaching and Educational Professionals", modify
                label def b_jbsoc10_cc 242 "Business, Research and Administrative Professionals", modify
                label def b_jbsoc10_cc 243 "Architects, Town Planners and Surveyors", modify
                label def b_jbsoc10_cc 323 "Welfare and Housing Associate Professionals", modify
                label def b_jbsoc10_cc 341 "Artistic, Literary and Media Occupations", modify
                label def b_jbsoc10_cc 342 "Design Occupations", modify
                label def b_jbsoc10_cc 353 "Business, Finance and Related Associate Professionals", modify
                label def b_jbsoc10_cc 354 "Sales, Marketing and Related Associate Professionals", modify
                label def b_jbsoc10_cc 412 "Administrative Occupations: Finance", modify
                label def b_jbsoc10_cc 415 "Other Administrative Occupations", modify
                label def b_jbsoc10_cc 421 "Secretarial and Related Occupations", modify
                label def b_jbsoc10_cc 511 "Agricultural and Related Trades", modify
                label def b_jbsoc10_cc 523 "Vehicle Trades", modify
                label def b_jbsoc10_cc 612 "Childcare and Related Personal Services", modify
                label def b_jbsoc10_cc 614 "Caring Personal Services", modify
                label def b_jbsoc10_cc 711 "Sales Assistants and Retail Cashiers", modify
                label def b_jbsoc10_cc 713 "Sales Supervisors", modify
                label def b_jbsoc10_cc 721 "Customer Service Occupations", modify
                label def b_jbsoc10_cc 821 "Road Transport Drivers", modify
                label def b_jbsoc10_cc 923 "Elementary Cleaning Occupations", modify
                label def b_jbsoc10_cc 924 "Elementary Security Occupations", modify
                label def b_jbsoc10_cc 925 "Elementary Sales Occupations", modify
                label def b_jbsoc10_cc 926 "Elementary Storage Occupations", modify
                label def b_jbsoc10_cc 927 "Other Elementary Services Occupations", modify
                label values jbfxuse7 b_jbfxuse7
                label def b_jbfxuse7 0 "not mentioned", modify
                label def b_jbfxuse7 1 "mentioned", modify
                Last edited by Malik Saendman; 19 Jan 2024, 14:41.

                Comment


                • #9
                  Thank you for the additional information. I still don't think I fully understand what you're doing, and to the extent that I do understand it, it seems wrong to me.

                  First, you are saying that you get wfh from the jbfxuse7 variable in the post-covid period, but in the example data, there is no consistent relationship between jbfxuse7 and wfh in that era. Moreover, the only values of jbfxuse7 are missing values and "not mentioned," whatever that means. So something else is involved in this.

                  I also notice that wfh is set to 0 for everybody in the pre-covid period. From your explanation this is because in these situations working from home would not be attributable to Covid. That's true. BUT, in the post-covid period, you have no way to know which of the people working from home are doing so because of Covid and which for other reasons. If you want to estimate the fraction of people who took up working at home due to covid, the best you can do is look at the total prevalence of work at home in the post period and subtract the total prevalence of work at home in the pre-period. This is the best you can do, and it may not be very good because this was not a randomized experiment. It seems likely that there were some people who took up work from home during the pandemic only because their workplace was closed and this was necessary to keep working, but at the other end there may well be people who have wanted to work from home for a long time before the pandemic but only became allowed to during the pandemic--these people would have done so anyway had it been permitted, even absent a pandemic. And finally there will people who have a mix of causes. There is no way to tease these all out. The best you can do is a pre-post comparison. But the comparison must be of the prevalence of work from home in the two periods, with the same definition of work from home applied to both.

                  Comment


                  • #10
                    Originally posted by Clyde Schechter View Post
                    Thank you for the additional information. I still don't think I fully understand what you're doing, and to the extent that I do understand it, it seems wrong to me.

                    First, you are saying that you get wfh from the jbfxuse7 variable in the post-covid period, but in the example data, there is no consistent relationship between jbfxuse7 and wfh in that era. Moreover, the only values of jbfxuse7 are missing values and "not mentioned," whatever that means. So something else is involved in this.

                    I also notice that wfh is set to 0 for everybody in the pre-covid period. From your explanation this is because in these situations working from home would not be attributable to Covid. That's true. BUT, in the post-covid period, you have no way to know which of the people working from home are doing so because of Covid and which for other reasons. If you want to estimate the fraction of people who took up working at home due to covid, the best you can do is look at the total prevalence of work at home in the post period and subtract the total prevalence of work at home in the pre-period. This is the best you can do, and it may not be very good because this was not a randomized experiment. It seems likely that there were some people who took up work from home during the pandemic only because their workplace was closed and this was necessary to keep working, but at the other end there may well be people who have wanted to work from home for a long time before the pandemic but only became allowed to during the pandemic--these people would have done so anyway had it been permitted, even absent a pandemic. And finally there will people who have a mix of causes. There is no way to tease these all out. The best you can do is a pre-post comparison. But the comparison must be of the prevalence of work from home in the two periods, with the same definition of work from home applied to both.
                    Sorry for that. It might not be clear to you in my last reply, but I said that " I generated the "wfh_pre" variable form "jbfxuse7" up to 2019q4 only and used this as pre-covid-19 working form home"
                    I apologize. I updated the sample data before your reply in 4 minutes; you might have used the old one. I think the last data contains the jbfxuse7 variable and has values " mentioned," which means answer YES when asked people if they work from home on a regular basis.

                    The reason why I used this way is that jbfxuse7 asks people if they work from home. This is ENDOGENOUS. This is partially a choice. People might want to work from home, and people might select a job where they are able to work from home, and this is pre-COVID-19.

                    However, the variable "wfh" is asked only during the last 7 months of 2020, which only. In this variable, we asked the people to work from home FORCED by COVID-19 only, so this is the EXOGENOUS and post-COVID-19.
                    Then I will take the difference between the pre and post and interact with post-years dummy variable. Here, I use a kind of instrument that should give me an interpretation that effectively identifies the effect of working from home for people in occupations that significantly shifted during COVID-19.

                    Does this make sense?

                    Comment


                    • #11
                      I agree completely about jbfxuse7 being endogenous.

                      But I'm skeptical of the wfh variable from the last 7 months of 2020. When I was training in epidemiology, I was taught that self-reports of what people do can be valid if the questions are asked appropriately. But self-reports of why people do what they do should be taken with a grain of salt, if taken at all. I would put little stock in people's self-appraisal of whether they worked from home because of covid or for other reasons or for some mix. And as for the period before Covid, wfh should not be coded as 0, it should be coded as a missing value, as it was not asked.

                      Be that as it may, it is still the case in your example data that the jbfxuse7 variable does show variation in both eras. So if you wanted to use that, calculating the pre and post means by job category would go as follows:

                      Code:
                      by jbsoc10_cc_agg, sort: egen mean_pre = mean(cond(tq <= tq(2019q4), jbfxuse7, .))
                      by jbsoc10_cc_agg: egen mean_post = mean(cond(tq > tq(2019q4) & !missing(tq), jbfxuse7, .))
                      gen diff = mean_post - mean_pre
                      Since I don't know if I've convinced you that you need to use jbfxuse7 for this (or, rather, that wfh is not a usable substitute variable for this purpose), I will just comment that there is nothing magical about jbfxuse7 in this code: the same calculation can be applied to get the pre- and post- means of any variable in your data.

                      Comment

                      Working...
                      X