Hello Stata people;
I have this interesting data I'am working with, it has more than 600 000 observations in it, this is just an example:
The first variable is the ID of the taxpayer, the second variable is the taxable base (the revenu of the individual that is taxed), the third variable is the tax that the individual paid, the fourth variable ("CSS") is a social contribution that is also paid by the individual, and the fifth variable is the sum of the tax and the social contribution that got paid by each individual (here, I did consider that the social contribution is also a form of tax).
The idea behind this is to draw a table that could show me the distribution of the tax burden by income levels of those people:
-First, the dataset should be sorted by the second variable (taxable revenu)
-Then, I do wanna divide that variable by percentiles and deciles, like, I want to have the richest 10%, the richest 1% the richest 0.1%, the middle class, the poorest 10%, the poorest 1%, the poorest 0.1%, ...
-Then, for each sub-group of the population that has been defined by the revenu level, I want to have the size of that sub-group (the number of individuals in that group)
-Then, I want to have another variable that gives me the average revenu for each sub-group before taxation (the taxable revenu), like, how much revenu does the average individual of each sub-group have
- Then, another variable that gives me the tax paid by the average individual of each revenu category (here, by tax paid, I mean tax+social contribution, so, the work will be done on the last variable "Impot_Payé_et_CSS")
-Then, I want to have another variable that shows the average tax rate that each average individual paid (so, it means we're gonna devide the taxation the average individual paid by the average revenu of each sub-group)
-Then, as a last variable, I would like to have the total of tax revenus for each sub-group (like, how much the government collected from the richest 1%, the richest 10%, the middle class, the poorest 10%, the poorest 1%, ...)
So, I guess I can describe this as a study of the tax burden on each social group, and those groups are being represented by the average individual.
Any help, please? With thanks!
I have this interesting data I'am working with, it has more than 600 000 observations in it, this is just an example:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str10 Identifiant double(Revenu_Imposable_Année Impot_Payé CSS) float Impot_Payé_et_CSS "231******4" 24015.379 3722.48 211.203 3933.683 "205******0" 42206.545 8049.537 384.897 8434.434 "007******1" 36889.06 6598.107 334.361 6932.468 "009******2" 26109.679 4106.506 230.06 4336.566 "006******6" 39681.416 7370.442 359.466 7729.908 "010******3" 14385.029 1880.258 124.535 2004.793 "009******4" 12904.748 1623.988 113.234 1737.222 "143******4" 14070.842 1953.15 126.703 2079.853 "160******3" 11686.995 1457.566 102.383 1559.949 "115******2" 13509.571 1740.774 117.664 1858.438 "009******0" 13506.436 1845.043 121.638 1966.681 "009******0" 29359.533 5910.079 273.161 6183.24 "162******2" 15564.943 2283.732 140.218 2423.95 "228******3" 24029.455 4440.872 220.135 4661.007 "232******8" 25140.046 4533.566 224.718 4758.284 "002******7" 14385.029 1932.484 126.537 2059.021 "236******8" 24805.308 3109.181 173.455 3282.636 "004******8" 21922.531 3914.52 196.863 4111.383 "005******2" 25267.761 4577.579 231.139 4808.718 "007******1" 30787.104 5402.482 278.369 5680.851 "009******9" 35680.101 6712.119 326.701 7038.82 "005******6" 42282.843 7775.964 373.81 8149.774 "005******8" 41347.412 7860.543 376.261 8236.804 "010******0" 43959.38 5926.819 309.657 6236.476 "006******0" 46051.024 9052.386 421.724 9474.11 "006******6" 41367.373 7893.934 379.64 8273.574 "009******7" 23090.514 3558.203 205.73 3763.933 "004******6" 43895.24 8121.98 396.342 8518.322 "127******4" 33677.781 4252.627 242.962 4495.589 "004******7" 30403.981 4989.779 267.871 5257.65 "008******9" 26020.927 4736.07 234.545 4970.615 "008******8" 23343.019 4119.053 211.959 4331.012 "007******7" 39376.535 6611.73 328.691 6940.421 "008******4" 23614.8 3826.891 212.709 4039.6 "004******2" 43191.675 8250.269 392.614 8642.883 "232******2" 22990.661 3628.187 204.146 3832.333 "009******8" 20275.437 3323.298 180.903 3504.201 "233******5" 56839.433 12804.454 533.65 13338.104 "232******3" 22727.017 3835.354 202.674 4038.028 "232******1" 21925.397 3617.998 194.584 3812.582 "213******9" 17463.539 2682.333 157.68 2840.013 "010******3" 17638.807 2658.298 155.039 2813.337 "148******8" 23433.331 4017.634 209.947 4227.581 "223******4" 24772.533 4394.86 222.125 4616.985 "006******8" 42997.801 8281.398 392.906 8674.304 "005******8" 42884.521 8596.703 397.428 8994.131 "233******6" 11704.479 1426.606 105.41 1532.016 "223******9" 23390.023 4093.186 212.557 4305.743 "239******0" 12352.238 1575.335 111.234 1686.569 "009******3" 26754.953 4394.735 240.887 4635.622 "005******2" 28581.334 4687.517 253.519 4941.036 "009******6" 29817.422 5134.272 266.464 5400.736 "005******1" 46083.094 9090.305 420.898 9511.203 "129******5" 33551.781 6048.223 305.119 6353.342 "229******5" 14695.325 2116.135 132.341 2248.476 "009******6" 23837.271 4377.248 218.176 4595.424 "009******5" 16364.11 1653.296 115.645 1768.941 "153******7" 14462.969 1991.49 127.238 2118.728 "228******2" 13629.119 1742.136 117.729 1859.865 "257******8" 13480.118 1839.657 121.391 1961.048 "005******4" 15225.171 2137.324 132.861 2270.185 "210******6" 25082.852 4476.786 225.914 4702.7 "151******0" 14436.699 1262.543 99.767 1362.31 "010******0" 14147.463 1880.67 122.424 2003.094 "129******1" 24736.987 4360.352 220.807 4581.159 "223******3" 21994.715 3639.911 193.53 3833.441 "006******8" 26076.943 4745.761 234.303 4980.064 "004******4" 29916.472 5022.54 269.455 5291.995 "009******7" 37155.938 6965.568 339.859 7305.427 "010******4" 26833.7 4293.662 237.569 4531.231 "006******2" 34362.458 6326.259 309.756 6636.015 "004******9" 41451.511 7818.549 379.498 8198.047 "006******7" 37614.57 6733.238 336.737 7069.975 "010******7" 21163.171 3342.307 186.214 3528.521 "003******3" 28564.741 5158.398 249.792 5408.19 "235******7" 32265.449 5490.512 287.684 5778.196 "005******6" 34556.51 6371.166 311.785 6682.951 "009******9" 30326.373 5182.706 269.171 5451.877 "156******6" 22364.445 3710.858 196.233 3907.091 "210******0" 22327.509 3758.067 197.902 3955.969 "223******2" 18349.26 2917.352 165.332 3082.684 "004******0" 24882.902 4332.289 222.378 4554.667 "231******0" 25433.153 4003.618 225.967 4229.585 "003******9" 41274.971 7725.777 374.41 8100.187 "008******9" 52445.719 11035.108 484.15 11519.258 "010******5" 28943.441 4778.66 255.657 5034.317 "128******6" 40466.357 7743.468 369.666 8113.134 "200******5" 32317.757 5590.943 291.578 5882.521 "202******1" 25738.39 4293.533 233.133 4526.666 "207******2" 28380.104 4778.526 255.473 5033.999 "212******0" 31892.922 5457.956 284.768 5742.724 "006******3" 44198.93 8458.905 402.056 8860.961 "229******6" 34875.191 6331.529 317.464 6648.993 "007******8" 41518.507 7780.463 376.004 8156.467 "006******8" 42724.921 6928.062 346.154 7274.216 "007******2" 40438.143 7585.999 366.281 7952.28 "249******8" 18711.021 2995.258 168.495 3163.753 "215******1" 24772.533 4507.275 226.131 4733.406 "009******2" 31029.76 5302.662 275.765 5578.427 "005******7" 42017.476 8030.985 381.912 8412.897 end
The idea behind this is to draw a table that could show me the distribution of the tax burden by income levels of those people:
-First, the dataset should be sorted by the second variable (taxable revenu)
-Then, I do wanna divide that variable by percentiles and deciles, like, I want to have the richest 10%, the richest 1% the richest 0.1%, the middle class, the poorest 10%, the poorest 1%, the poorest 0.1%, ...
-Then, for each sub-group of the population that has been defined by the revenu level, I want to have the size of that sub-group (the number of individuals in that group)
-Then, I want to have another variable that gives me the average revenu for each sub-group before taxation (the taxable revenu), like, how much revenu does the average individual of each sub-group have
- Then, another variable that gives me the tax paid by the average individual of each revenu category (here, by tax paid, I mean tax+social contribution, so, the work will be done on the last variable "Impot_Payé_et_CSS")
-Then, I want to have another variable that shows the average tax rate that each average individual paid (so, it means we're gonna devide the taxation the average individual paid by the average revenu of each sub-group)
-Then, as a last variable, I would like to have the total of tax revenus for each sub-group (like, how much the government collected from the richest 1%, the richest 10%, the middle class, the poorest 10%, the poorest 1%, ...)
So, I guess I can describe this as a study of the tax burden on each social group, and those groups are being represented by the average individual.
Any help, please? With thanks!
Comment