Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Wage increase for quarterly data in Germany

    Hello everyone! I have a huge dataset with 270864 observations on gross hourly wage split into different quarters from years 2012 to 2020. Here is the example of my dataset:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double avg_gross_hourly_ea float qdate byte prf_group float(gender ter)
    18.86 224 6 3 1
    26.34 232 2 3 1
    17.28 223 2 2 2
    21.42 218 6 3 1
    12.95 229 4 3 2
     26.3 212 2 2 1
    17.82 221 2 1 2
    14.81 215 5 1 2
    22.54 233 6 1 1
    11.19 228 4 1 2
    15.18 231 3 1 2
    12.29 233 4 2 2
    17.15 241 4 2 1
    15.65 225 6 3 2
    21.49 235 3 3 2
    37.76 241 1 3 2
        . 239 6 2 2
    21.78 214 3 3 1
    26.31 225 2 2 1
    14.43 208 5 3 1
        . 225 5 2 2
    29.59 225 6 2 1
    41.86 237 6 2 1
    19.67 219 2 3 2
     36.5 228 6 2 1
    44.11 226 1 2 2
    43.53 213 6 3 1
     11.2 217 5 1 2
        . 218 1 1 2
        . 221 3 2 2
    20.73 226 6 1 1
     26.4 218 2 3 1
    43.87 225 1 2 1
    10.26 213 4 3 2
    14.22 225 3 1 2
    39.72 239 2 1 1
    23.69 211 2 1 1
    12.18 225 4 3 2
    33.14 217 1 2 1
    17.56 226 3 2 2
    27.92 225 3 3 1
    12.59 221 5 1 1
     32.4 239 1 3 2
        . 238 6 3 2
    14.11 242 5 3 2
    32.25 219 1 3 2
    12.28 229 5 3 2
    20.68 208 3 3 1
    22.05 213 1 1 2
    41.56 227 1 2 1
    24.78 214 2 1 1
    21.33 211 6 2 1
    12.06 217 3 2 2
    40.32 226 1 2 1
    29.81 219 2 2 1
        . 228 1 1 2
    16.96 233 4 1 1
     30.7 234 6 3 1
    14.21 220 5 2 1
    17.63 225 3 3 2
    35.37 209 1 1 2
    15.57 233 4 1 1
    21.97 224 1 1 2
    22.46 239 3 2 1
    12.28 218 3 3 2
    14.74 216 3 1 2
    16.43 240 4 3 1
    20.62 224 3 1 1
        . 221 6 1 1
     17.3 233 3 1 2
    17.36 216 2 1 2
    12.26 211 5 2 2
    22.42 240 6 3 2
    18.86 236 4 1 1
      8.7 209 5 1 2
    22.42 225 6 2 2
    28.31 208 3 2 1
    21.86 225 2 3 1
    13.27 212 4 1 1
    42.24 232 1 1 1
    14.02 215 3 2 2
    16.51 242 3 3 2
    14.24 221 4 3 1
    26.82 217 2 2 1
    10.45 231 5 2 2
    22.86 233 6 1 1
    14.34 208 6 2 2
    35.19 221 1 2 2
    44.58 237 1 1 1
    14.13 210 4 3 1
    17.35 235 4 2 1
    20.66 243 3 3 1
    22.55 225 3 3 1
    16.75 242 6 1 2
    22.55 221 2 2 1
    32.53 225 1 3 2
    14.48 209 5 1 1
    15.74 241 3 2 2
    12.43 231 4 1 2
    26.64 236 2 1 2
    end
    format %tq qdate
    label values prf_group prf_group
    label def prf_group 1 "unskilled", modify
    label def prf_group 2 "semiskilled", modify
    label def prf_group 3 "skilled", modify
    label def prf_group 4 "highly qualified", modify
    label def prf_group 5 "executive staff", modify
    label def prf_group 6 "total", modify
    label values gender gender
    label def gender 1 "Female", modify
    label def gender 2 "Male", modify
    label def gender 3 "Total", modify
    label values ter ter1
    label def ter1 1 "Former territory of the Federal Republic", modify
    label def ter1 2 "New Lander", modify
    I want to create this table (taken from http://nbn-resolving.de/urn:nbn:de:101:1-201602176403) with my variables "prf_group", "gender" "ter" corresponding each to the labels defined in the code, except for the fact that I have different territories.
    Thanks to anyone you who can share some tips/advices on how to handle this dataset.
    Click image for larger version

Name:	image.png
Views:	1
Size:	72.1 KB
ID:	1643352

  • #2
    You have a data problem: the example you show cannot support this kind of analysis because you have observations that agree on prf_group, gender, ter and qdate, but disagree on avg_gross_hourly_ea:

    Code:
           avg_gross_hour~a    qdate          prf_group   gender                                        ter  
     48.              27.92   2016q2            skilled    Total   Former territory of the Federal Republic  
     49.              22.55   2016q2            skilled    Total   Former territory of the Federal Republic  
     56.              16.96   2018q2   highly qualified   Female   Former territory of the Federal Republic  
     57.              15.57   2018q2   highly qualified   Female   Former territory of the Federal Republic  
     84.              22.54   2018q2              total   Female   Former territory of the Federal Republic  
     85.              22.86   2018q2              total   Female   Former territory of the Federal Republic
    Consequently, it is impossible to define change over same quarter last year in, for rexampe, 2019q2 for highly qualified females from the Former territory of the Federal Republic, because there are two conflicting values for the avg_gross_hourly_ea in that group in 2018q2. Similarly for total females from that territory, or, in 2016q3, skilled, Total Gender, from that territory. You have to fix the data before you can proceed.

    Comment


    • #3
      Clyde Schechter , thank you for your reply. I didn't include the industries column for my data example. I'm sorry that I didn't mention it beforehand. Here is the new code:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input double avg_gross_hourly_ea float qdate byte prf_group float(gender ter) long industrycode
      18.86 224 6 3 1 155
      26.34 232 2 3 1 184
      17.28 223 2 2 2 155
      21.42 218 6 3 1 143
      12.95 229 4 3 2  95
       26.3 212 2 2 1 107
      17.82 221 2 1 2  94
      14.81 215 5 1 2 100
      22.54 233 6 1 1  27
      11.19 228 4 1 2 208
      15.18 231 3 1 2 145
      12.29 233 4 2 2 101
      17.15 241 4 2 1  23
      15.65 225 6 3 2 143
      21.49 235 3 3 2 147
      37.76 241 1 3 2  86
          . 239 6 2 2 120
      21.78 214 3 3 1  86
      26.31 225 2 2 1 202
      14.43 208 5 3 1  99
          . 225 5 2 2  82
      29.59 225 6 2 1  89
      41.86 237 6 2 1  15
      19.67 219 2 3 2  29
       36.5 228 6 2 1  90
      44.11 226 1 2 2  35
      43.53 213 6 3 1  35
       11.2 217 5 1 2  23
          . 218 1 1 2 104
          . 221 3 2 2 197
      20.73 226 6 1 1  94
       26.4 218 2 3 1  29
      43.87 225 1 2 1  96
      10.26 213 4 3 2  93
      14.22 225 3 1 2  96
      39.72 239 2 1 1 154
      23.69 211 2 1 1  96
      12.18 225 4 3 2  51
      33.14 217 1 2 1  26
      17.56 226 3 2 2  91
      27.92 225 3 3 1 112
      12.59 221 5 1 1  23
       32.4 239 1 3 2  84
          . 238 6 3 2 197
      14.11 242 5 3 2 103
      32.25 219 1 3 2 202
      12.28 229 5 3 2  26
      20.68 208 3 3 1 145
      22.05 213 1 1 2 192
      41.56 227 1 2 1 182
      24.78 214 2 1 1 107
      21.33 211 6 2 1  22
      12.06 217 3 2 2 155
      40.32 226 1 2 1 106
      29.81 219 2 2 1 107
          . 228 1 1 2  85
      16.96 233 4 1 1 184
       30.7 234 6 3 1  31
      14.21 220 5 2 1  37
      17.63 225 3 3 2  25
      35.37 209 1 1 2 147
      15.57 233 4 1 1 203
      21.97 224 1 1 2 164
      22.46 239 3 2 1 143
      12.28 218 3 3 2 208
      14.74 216 3 1 2 203
      16.43 240 4 3 1  21
      20.62 224 3 1 1 117
          . 221 6 1 1 197
       17.3 233 3 1 2 153
      17.36 216 2 1 2 145
      12.26 211 5 2 2  89
      22.42 240 6 3 2 146
      18.86 236 4 1 1 143
        8.7 209 5 1 2 203
      22.42 225 6 2 2 193
      28.31 208 3 2 1 136
      21.86 225 2 3 1 164
      13.27 212 4 1 1 192
      42.24 232 1 1 1 145
      14.02 215 3 2 2 192
      16.51 242 3 3 2  91
      14.24 221 4 3 1 192
      26.82 217 2 2 1  87
      10.45 231 5 2 2  96
      22.86 233 6 1 1  91
      14.34 208 6 2 2  52
      35.19 221 1 2 2  31
      44.58 237 1 1 1  85
      14.13 210 4 3 1 187
      17.35 235 4 2 1 186
      20.66 243 3 3 1  81
      22.55 225 3 3 1 103
      16.75 242 6 1 2 117
      22.55 221 2 2 1  60
      32.53 225 1 3 2 205
      14.48 209 5 1 1 117
      15.74 241 3 2 2 208
      12.43 231 4 1 2  52
      26.64 236 2 1 2  36
      end
      format %tq qdate
      label values prf_group prf_group
      label def prf_group 1 "unskilled", modify
      label def prf_group 2 "semiskilled", modify
      label def prf_group 3 "skilled", modify
      label def prf_group 4 "highly qualified", modify
      label def prf_group 5 "executive staff", modify
      label def prf_group 6 "total", modify
      label values gender gender
      label def gender 1 "Female", modify
      label def gender 2 "Male", modify
      label def gender 3 "Total", modify
      label values ter ter1
      label def ter1 1 "Former territory of the Federal Republic", modify
      label def ter1 2 "New Lander", modify
      label values industrycode industrycode
      label def industrycode 15 "Air transport", modify
      label def industrycode 21 "Business economy", modify
      label def industrycode 22 "Casting of metals", modify
      label def industrycode 23 "Civil engineering", modify
      label def industrycode 25 "Computer programming, consultancy, rel. activities", modify
      label def industrycode 26 "Construction", modify
      label def industrycode 27 "Construction of buildings", modify
      label def industrycode 29 "Economic services", modify
      label def industrycode 31 "Electric power generation, transmission, distrib.", modify
      label def industrycode 35 "Extraction of crude petroleum and natural gas", modify
      label def industrycode 36 "Financial and insurance activities", modify
      label def industrycode 37 "Financial service act., ex.insurance,pension fund.", modify
      label def industrycode 51 "Information and communication", modify
      label def industrycode 52 "Information service activities", modify
      label def industrycode 60 "Maintenance and repair of motor vehicles", modify
      label def industrycode 81 "Manuf.of bodies (coachwork) for motor vehicles etc", modify
      label def industrycode 82 "Manuf.of gas, distrib. of gaseous fuels thr. mains", modify
      label def industrycode 84 "Manuf.of other food products,prepared animal feeds", modify
      label def industrycode 85 "Manufacture of agricultural and forestry machinery", modify
      label def industrycode 86 "Manufacture of basic metals", modify
      label def industrycode 87 "Manufacture of beverages", modify
      label def industrycode 89 "Manufacture of chemicals and chemical products", modify
      label def industrycode 90 "Manufacture of coke and refined petroleum products", modify
      label def industrycode 91 "Manufacture of communication equipment", modify
      label def industrycode 93 "Manufacture of consumer electronics", modify
      label def industrycode 94 "Manufacture of dairy products", modify
      label def industrycode 95 "Manufacture of electrical equipment", modify
      label def industrycode 96 "Manufacture of electronic components and boards", modify
      label def industrycode 99 "Manufacture of furniture", modify
      label def industrycode 100 "Manufacture of general-purpose machinery", modify
      label def industrycode 101 "Manufacture of glass and glass products", modify
      label def industrycode 103 "Manufacture of machinery and equipment n.e.c.", modify
      label def industrycode 104 "Manufacture of man-made fibres", modify
      label def industrycode 106 "Manufacture of other non-metallic mineral products", modify
      label def industrycode 107 "Manufacture of other special-purpose machinery", modify
      label def industrycode 112 "Manufacture of tobacco products", modify
      label def industrycode 117 "Metal and electrical industry", modify
      label def industrycode 120 "Mining of metal ores", modify
      label def industrycode 136 "Pension funding", modify
      label def industrycode 143 "Processing of mineral products", modify
      label def industrycode 145 "Producers of capital goods", modify
      label def industrycode 146 "Professional, scientific and technical activities", modify
      label def industrycode 147 "Programming and broadcasting activities", modify
      label def industrycode 153 "Real estate activities", modify
      label def industrycode 154 "Reinsurance", modify
      label def industrycode 155 "Remediation activities, waste management services", modify
      label def industrycode 164 "Residential care activities", modify
      label def industrycode 182 "Service activities", modify
      label def industrycode 184 "Sewerage", modify
      label def industrycode 186 "Software publishing", modify
      label def industrycode 187 "Specialised construction activities", modify
      label def industrycode 192 "Technical testing and analysis", modify
      label def industrycode 193 "Telecommunications", modify
      label def industrycode 197 "Trusts, funds and similar financial entities", modify
      label def industrycode 202 "Water collection, treatment and supply", modify
      label def industrycode 203 "Water supply,sewerage,waste management,remediation", modify
      label def industrycode 205 "Wholesale (without wholesaling on a fee)", modify
      label def industrycode 208 "Wholesale, retail trade, repair of motor vehicles", modify

      Comment


      • #4
        But that's only a start on the data you need to successfully calculate average gross hourly earning by gender and territory.

        For each industry, you apparently have average gross hourly earning by gender and quarter and territory. But you can't just average these averages - there are (we hope) more individuals employed in "Professional, scientific and technical activities" than in "Sewerage", so the former make up a larger portion than the latter of the overall average.

        What you need in addition to what you have are the number of individuals for whom each average gross hourly earning was computed - matching the gender and quarter and territory.

        Comment


        • #5
          William Lisowski Clyde Schechter Thank you both for replying! As far as I know, the average gross hourly earnings are being calculated based on gender, quarter, territory and industry. They don't collect data for each individual employee and the dataset is a sample survey with one-stage stratified variables mentioned earlier. There are no additional weights or proportions. Calculations in the image that I showed you were done from the exact same dataset. Hope that is good enough.
          Last edited by Tursynbay Yeskendir; 02 Jan 2022, 16:47.

          Comment


          • #6
            Well, you don't need individual level data, but you do need to have the number of individuals in each combination of prof_group, gender, and ter. If that is not available, it cannot be done.

            If the graphs you show were created from the same data set you are working with, and there are no counts of individuals in the data, or at least something that lets you calculate those numbers, then the graphs are just nonsense--numbers jumbled together in some meaningless way and put on a graph.

            Comment

            Working...
            X