Calculating difference in differences, percentages and treatment effects

Chris Wong

Join Date: Oct 2020

Posts: 8
#1

Calculating difference in differences, percentages and treatment effects

27 Oct 2020, 18:28

Hi everyone,

I'm working on a project for my economic development course and am having difficulty determining what code to use to run these computations. The dataset that I am working with looks at a deworming program that was implemented in Kenya during 1998 and 1999. There are three different computations that I want to do and am unsure how to approach it in terms of what code to use.

I have the following variables in my dataset:
Pupid: pupil index number.

pill98: dummy variable for if the child took the deworming pill in 1998.

pill99: dummy variable for if the child took the deworming pill in 1999.

grade98: the pupil's grade in 1998.

sex: dummy variable, 1 for male and 0 for female.

old_girl98: girls greater or equal to the age of 13 in 1998.

totpar98: average school participation in 1998.

wgrp: assigned worm group (1, 2 or 3) in 1998.

treat_sch98: represents a school that was assigned to the deworming program in 1998.

infect_early99: moderate-heavy worm infection in early 1999.

1) First I want to compute the difference in differences through a regression looking at the pre- and post-treatment attendance records for all schools. The issue here is I am not too sure how to approach this regression and how I can effectively compute the difference in differences.

2) Next there are a few computations that I want to make and am not fully confident on the code:
I want to see how many observations there are per pupil and I'm assuming all I need to do is enter

Code:

su pupid

to summarize the variable and see how many observations there are for it.

I want to determine the percentage of pupils that are boys. I believe that what I want to do is either drop the dummy variable observations that are equal to zero and then divide the remaining observations by pupid. I'm just not sure how to drop the dummy observations equal to 0. If there is an alternative method to computing this I'm more than happy to discuss them as well.

I want to see what percentage of schools in 1998 were selected as part of the deworming program. Although this is relatively straightforward, I'm not sure how I can compute this with the given variables.

3) Finally, I want to calculate a series of different outcomes (i.e. treatment effect) based on the data I have. I want to compute the following:
Student's participation in school conditional on if they took deworming pills in 1998--i.e. the difference between students' school participation with and without the deworming pill.

Students in treatment schools versus non-treatment schools in 1998.

The difference in the probability of taking the pill given that a student was in a treatment school and the probability of taking it if a student was not in a treatment school.

The Wald Estimator.

Being completely honest here, I am not very well versed in Stata and often have difficulty doing these types of computations given my limited exposure and any help from the forum would be much appreciated.

Thanks!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

27 Oct 2020, 19:52

We were all beginners once, so no shame in that. Most of what you ask for under 2) can be done with some very basic Stata commands. But presumably you're not familiar with them, and 2)1. suggests that you do not understand how the -summarize- command works. So rather than my showing you code, I think the most useful thing I can do is to advise you to postpone working on this project until you have read the [GS] Getting Started and [U] User's Guide sections of the PDF documentation that comes installed with Stata. Launch Stata and click on the Help menu. Then select PDF Documentation. The documentation will open to the complete contents. In the Bookmarks section of the PDF reader that opened this, you wlil find links to the sections I have recommended. Give them a careful read. They go over the basics of using Stata. It is likely that after doing that you will know how to do much of what you ask about in #1. Give it a try. And post-back if you don't have it right, showing what you tried, and what Stata gave you, and, if it isn't obvious to somebody not already familiar with your project, explain why it isn't correct.

Concerning the diff-in-diff regression, for help with that it would be best if you show example data, using the -dataex- command. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

In addition to showing some example data, you will need to explain a few things:

1. You say the outcome variable is school attendance. There is no such variable in your data set as you described it. Where will you get it from?
2. It is important to explain whether the treatment was assigned at the individual level, or the school level, or perhaps at the grade level within schools or something like that? The method of assigning treatment needs to be explained.
3. The explanation of the variable old_girl98 is obscure: what does it mean?
4. What are the 3 "worm group"s referred to in 8. Does this variable represent treatment assignment--if so explain the details. If not, what is it?
5. What is the significance of the variables pill98 and pill99? Do these represent assignment to treatment, or do they represent actual compliance with the treatment? If the latter, how are they coded for students who are not in the treatment group?
6. Regarding 3)4., the Wald estimator of what?

Last edited by Clyde Schechter; 27 Oct 2020, 19:54.
Comment
Chris Wong

Join Date: Oct 2020

Posts: 8
#3

27 Oct 2020, 21:10

Hi Clyde,

Thank you for your response. I'll take a look at the PDFs and circle back with the code I have for part 2. In terms of the additional information please see below:
My apologies for the conflicting terms, I was using school attendance and participation interchangeably. The correct variable is school participation not attendance.

The treatment was done at the school level but given that we have data for individuals, I believe that the treatment would also be done at the individual level.

This is a dummy variable for if the individual receiving treatment was a girl and 13 years or older.

In the study, where the data is pulled from. there are three groups that received the treatment. The first group received deworming from 1998-2003, the second group from 1999-2003 and the third group is the control group where they received deworming from 2001-2003.

Pill98 and pill99 represent compliance with treatment (i.e. they received the deworming treatment). I don't believe there is anything in the dataset that outlines how these variables are coded for the control/non-treatment group.

The Wald Estimator for students who took the pill and students who did not take the pill in 1998.
Comment
Chris Wong

Join Date: Oct 2020

Posts: 8
#4

27 Oct 2020, 23:25

Hi Clyde,

Following up with some code for (2). The last hurdle I have is computing the percentage and I can't seem to figure out how to do it. This is what I have:

Code:

quitly drop if missing(sex) quietly count if sex==1 local numerator = r(_n) quietly count if sex local denominator = r(_n) gen per = 100 * `numerator'/r(_n)

I'm not sure if the last line of code is correct to generate the percentages.

Thanks!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#5

28 Oct 2020, 17:15

When you write -quietly count if sex-, Stata takes that to mean -quietly count if sex != 0-, which, since sex is a 0/1 variable, is the same as your numerator. What you want is just -quietly count if !missing(sex)-, or, since you already dropped the observations where sex is missing, just -quietly count-. Then your code will work.

But, there is a much simpler way to do this:

Code:

tab sex

will give you the count and percentage of both boys and girls. It will not save those results for you, but it will show them to you. It is usually not a good idea to create a variable the contains a single number. So, even assuming you want to save the percentage of boys for later use, I wouldn't recommend creating a variable for that. That's best done with a local macro:

Code:

summ sex // THIS WORKS BECAUSE sex IS CODED 0/1 local percent_boys = 100*`r(mean)'
Comment
Chris Wong

Join Date: Oct 2020

Posts: 8
#6

02 Nov 2020, 11:48

Hi Clyde,

Thanks for this and my apologies for my tardy response.

I was able to solve the rest of my queries with help from the manuals in Stata, from some colleagues and from the code you provided here.

Thanks again for your help!
Comment

Announcement

Calculating difference in differences, percentages and treatment effects

Comment

Comment

Comment

Comment

Comment