Hi, I am working on an undergraduate thesis paper, and need some help creating a new dataset in stata!
What I want is to create a new dataset, where the columns are values calculated from a pre-existing dataset which I have loaded into Stata. For example, the columns I want in my dataset look something like this;
| Year | Occupation | assoc_educ | bach_educ | hs_educ | N | Mean Wage |
My new dataset compresses rows from the original dataset; where rows previously were individuals, in the new dataset I group all the rows which share an education level and occupation. I also want to take the mean wage for these groups, where the individual's wages are already existing in the original dataset and a variable (n) which is the number of people in the group. For example, for a given row, I calculate the N and Mean Wage like this;
sum incwage if assoc_educ == 1 & occ == 3255
gen assoc_nurse_n = r(N)
gen assoc_nurse_mean = r(mean)
So I have these two values stored under variable names, I would like to enter these in the columns as such;
| Year | Occupation | assoc_educ | bach_educ | hs_educ | N | Mean Wage |
| 2018 | 3255 | 1 | 0 | 0 | 12526 | 49042.785 |
Variables like year, occupation and the dummies _educ are unchanged from the original dataset.
Thank you for any help, this is probably super basic, but I am new to Stata and Statalist forum. If you have any questions, ask away, if I made any forum faux pas, please let me know.
What I want is to create a new dataset, where the columns are values calculated from a pre-existing dataset which I have loaded into Stata. For example, the columns I want in my dataset look something like this;
| Year | Occupation | assoc_educ | bach_educ | hs_educ | N | Mean Wage |
My new dataset compresses rows from the original dataset; where rows previously were individuals, in the new dataset I group all the rows which share an education level and occupation. I also want to take the mean wage for these groups, where the individual's wages are already existing in the original dataset and a variable (n) which is the number of people in the group. For example, for a given row, I calculate the N and Mean Wage like this;
sum incwage if assoc_educ == 1 & occ == 3255
gen assoc_nurse_n = r(N)
gen assoc_nurse_mean = r(mean)
So I have these two values stored under variable names, I would like to enter these in the columns as such;
| Year | Occupation | assoc_educ | bach_educ | hs_educ | N | Mean Wage |
| 2018 | 3255 | 1 | 0 | 0 | 12526 | 49042.785 |
Variables like year, occupation and the dummies _educ are unchanged from the original dataset.
Thank you for any help, this is probably super basic, but I am new to Stata and Statalist forum. If you have any questions, ask away, if I made any forum faux pas, please let me know.
Comment