Average

Jana Romanyuk

Join Date: Apr 2022

Posts: 1
#1

Average

04 Apr 2022, 15:34

How should I proceed in the following situation:
- I have the earnings of population in 2004 and 2005, namely the earning BEFORE training in 2007. I have the earnings of people in 2009. For comparison between earnings before and after training, is it okay to "egen" a new variable, which will be the average of the earnings in 2004 and 2005?

Thank you in advance.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

04 Apr 2022, 16:19

That would not be an optimal approach, for technical reasons. The average of the earnings in 2004-2005, being based on two years, will have a different level of variation than the earnings for just 1 year in 2009. Consequently you would be introducing heteroscedasticity into the modeling, when it can be simply avoided by simply creating a variable, let's call it post_training, coded 1 for year 2009 and 0 for years 2004 and 2005 and running

Code:

regress earnings i.post_training

Added: I just noticed you talked about getting the 2004-2005 mean by using -egen-. If that is correct for your data set, it means you have your data in wide layout, where the incomes in 2004, 2005 and 2009 are three separate variables. That is not a good layout for data management or analysis in Stata. (And it won't work at all with the code I just suggested.) You should -reshape- your data to long so that each of your present observation breaks into three observations with a single earnings variable, one each for years 2004, 2005, and 2009, and year becomes a separate variable.) The -reshape- command exists precisely to do such things.

If you need more concrete advice, show example data when you post back, and use the -dataex- command to do that. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

Last edited by Clyde Schechter; 04 Apr 2022, 16:21.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#3

04 Apr 2022, 18:29

I agree with Clyde Schechter that a data example is much needed here.

That said, the idea of using egen does not necessarily imply a wide layout, as this analogue shows.

Code:

webuse grunfeld, clear egen investfirst2 = mean(cond(inlist(year, 1935, 1936), invest, .)), by(company)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

04 Apr 2022, 19:38

Yes, good point, Nick.
Comment

Announcement

Comment

Comment

Comment