Dear Statalist-Users
i am a stata-rookie and I currently want to replicate data and outcomes of a regression model.
My data set contains data of 119 cities as a panel with 11 time stamps from 1919 to 2002. For every period (1919-1925; 1925-1933 and so on) my dataset contains the population growth.
I now want to do a simple Difference-In-Difference regression, where I set 20 cities into my treatment group and the rest 99 cities into my control group.
I wrote down the formal regression model for you:
"popgrowth = beta*border + y(Border * division) + i.year + e (error term), where popgrowth is the annualized rate of population growth over the periods 1919–1925, 1925–1933, 1933–1939, 1950–1960, 1960–1970, 1970-1980, and 1980–1988 in West German city c at time t; border is a dummy equal to one when a city is a member of the treatment group of cities close to the East-West border and zero otherwise; division is a dummy equal to one when Germany is divided and zero otherwise; i.year are a full set of time dummies; and e is the error term."
(If you need more details, the paper I want to replicate is from Redding and Storm with the title: "The cost of remoteness".)
pop_growth is the population growth of the 199 cities and is my dependent variable.
division is a dummy which is zero, except for the years 1950–1988 when Germany was divided, in which case it takes the value one.
border is a dummy which is zero unless a city lies within 75 kilometers of the East-West German border, in which case it takes the value one (treatment group).
i.year implements the time dummy sets.
Also important to say is, that I exclude the 1939–1950 difference. (They did it in the paper as well).
Well I now implemented the model and the variables into stata but I cant even replicate the first main outcome for (border * division) , I even get a positive outcome (It should be negative).
My code:
//xtset city year
gen border = 0
replace border = 1 if dist_gg_border <= 75
*tab city border
gen division = 0
replace division = 1 if year >= 1950 & year <= 1988
//gen border_division = border * division
//Regression
reg pop_growth border##division i.year if (!(year >= 1939 & year < 1950) & !(year > 1988 & year <= 2002)), robust cluster(city)
Also I often saw people who did a xtreg for panel data and fe on the end of the ecuation for fixed effects. When I do that, I get the same result. Also I have some questions as well. In my regression I can interpretate the terms for the set of time-dummys. For excample 1933, 1939 and so on. Is it normal, that I cant see 1919 and 1925? My only thinking is, that I cant see "1919" because obviously pop_growth has missings and "1925" is used as the reference category. But I find it still strange. And my second question, the time-dummy 1988 is "omitted". When I do the xtreg ,fe "border" is also omitted in my regression. What does that mean?
I get the correct number of observations but the interpretation of the outcomes are just not matching with the paper. Do you see any rookie-mistake I did? For sure, I can post more information down here! Thankyou already for your help!
i am a stata-rookie and I currently want to replicate data and outcomes of a regression model.
My data set contains data of 119 cities as a panel with 11 time stamps from 1919 to 2002. For every period (1919-1925; 1925-1933 and so on) my dataset contains the population growth.
I now want to do a simple Difference-In-Difference regression, where I set 20 cities into my treatment group and the rest 99 cities into my control group.
I wrote down the formal regression model for you:
"popgrowth = beta*border + y(Border * division) + i.year + e (error term), where popgrowth is the annualized rate of population growth over the periods 1919–1925, 1925–1933, 1933–1939, 1950–1960, 1960–1970, 1970-1980, and 1980–1988 in West German city c at time t; border is a dummy equal to one when a city is a member of the treatment group of cities close to the East-West border and zero otherwise; division is a dummy equal to one when Germany is divided and zero otherwise; i.year are a full set of time dummies; and e is the error term."
(If you need more details, the paper I want to replicate is from Redding and Storm with the title: "The cost of remoteness".)
pop_growth is the population growth of the 199 cities and is my dependent variable.
division is a dummy which is zero, except for the years 1950–1988 when Germany was divided, in which case it takes the value one.
border is a dummy which is zero unless a city lies within 75 kilometers of the East-West German border, in which case it takes the value one (treatment group).
i.year implements the time dummy sets.
Also important to say is, that I exclude the 1939–1950 difference. (They did it in the paper as well).
Well I now implemented the model and the variables into stata but I cant even replicate the first main outcome for (border * division) , I even get a positive outcome (It should be negative).
My code:
//xtset city year
gen border = 0
replace border = 1 if dist_gg_border <= 75
*tab city border
gen division = 0
replace division = 1 if year >= 1950 & year <= 1988
//gen border_division = border * division
//Regression
reg pop_growth border##division i.year if (!(year >= 1939 & year < 1950) & !(year > 1988 & year <= 2002)), robust cluster(city)
Also I often saw people who did a xtreg for panel data and fe on the end of the ecuation for fixed effects. When I do that, I get the same result. Also I have some questions as well. In my regression I can interpretate the terms for the set of time-dummys. For excample 1933, 1939 and so on. Is it normal, that I cant see 1919 and 1925? My only thinking is, that I cant see "1919" because obviously pop_growth has missings and "1925" is used as the reference category. But I find it still strange. And my second question, the time-dummy 1988 is "omitted". When I do the xtreg ,fe "border" is also omitted in my regression. What does that mean?
I get the correct number of observations but the interpretation of the outcomes are just not matching with the paper. Do you see any rookie-mistake I did? For sure, I can post more information down here! Thankyou already for your help!
Comment