How to resolve this error in Exact Randomization Test code?

Prateek Mishra

Join Date: Nov 2024

Posts: 8
#1

How to resolve this error in Exact Randomization Test code?

Yesterday, 04:52

Hi everyone! The following question is based on a test that I am trying to run. Although I have included all possible details, please let me know if you need any other info!

Data:

My data contains the following variables: URBAN, AGE, SEX, NCHILD, FAMSIZE, Under14 (treated variable which is 1 for age < 14, and 0 otherwise), LIT (outcome), Post1986 (post-term). I am running a DiD model.

Situation:

I am trying to run an Exact Randomization Test to check the robustness of my data, for which I have written my own code. The process goes something like this:

The YEAR column has 1983, 1987, 1993, 1999, 2004, 2009. The Post1986 term is 1 for years after 1986 and 0 otherwise (including 1986). I used each of the given years as a placebo year and created a new data set before proceeding with the ritest package; that is, I did it 6 times. For this purpose, I had to create a new variable each time, PostXXXX, where XXXX is the year, following the same logic, 1 for years after it and 0 otherwise (and including XXXX). I understand that my data has no 1986 year, which is, in fact, the treatment year, but that's just how it is.

My end objective is to get 'n' probability values for the n-simulations of regressions, which I can then plot separately as a cumulative probability distribution for each placebo year. The y-axis would have the probabilities, and the x-axis would have the interaction term (post*treat) for each simulation for that placebo year. Additionally, a vertical red line on the plot cutting the distribution would represent the actual interaction term for the actual regression (Under14*Post1986).

If the vertical line cuts the probability curve at an extremity, then it would mean that the actual estimate is an outlier in the distribution, hence validating that the intervention is unlikely to have occurred randomly.

However, due to collinearity issues among some variables in the process, I had to first randomize the treated and control units (keeping the number of units in each group constant) and then proceed further.

Code:

local years 1983 1987 1993 1999 2004 2009 local reps 1000 * Store the original number of treated and control units gen original_treated = Under14 gen original_control = 1 - Under14 local treated_count = sum(original_treated) local control_count = sum(original_control) foreach year in `years' { forval i = 1/`reps' { * Randomize treated and control groups while maintaining original proportions gen random_assign = runiform() sort random_assign * Assign treated and control groups gen randomized_Under14 = 0 replace randomized_Under14 = 1 if _n <= `treated_count' * Interaction term for placebo year generate Post`year'_randomized = Post`year' * randomized_Under14 * Run ritest with the randomized groups ritest randomized_Under14, stat(_b[Post`year'_randomized]) reps(1): /// regress LIT Post`year'##randomized_Under14 URBAN AGE SEX NCHILD FAMSIZE randomized_Under14 Post`year' i.state_encoded } * Save results for the placebo year save results_randomized_`year', replace } * Visualization code (as given earlier) foreach year in `years' { use results_randomized_`year', clear * Generate cumulative distribution gen cdf = _n / _N twoway (line cdf stat, sort) /// (vline `actual_coeff', lcolor(red)), /// title("Cumulative Distribution for `year'") /// xtitle("Simulated Coefficient") /// ytitle("Cumulative Probability") }

Error:
I am always getting this error: expression list required r(100);

I pinpointed the line of code for which I am getting it; it is in the place where the ritest is being run:
ritest randomized_Under14, stat(_b[Post`year'_randomized]) reps(1)

Despite multiple attempts, I am not able to understand the exact reason why this is happening. What I feel is that while I am randomizing the Under14 variable, I am only doing so for that column and not for the ones dependent on it, such as AGE. I mean, when randomly making (Under14 = 1) for a unit, the code fails to account for the corresponding unit's AGE, which maybe 17. But its a very minor issue, and there might be some other reason for this error.

Can anyone please help me out with this error? I understand it's a very long question, but I am stuck at the moment! Thank you!
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35059
#2

Yesterday, 06:15

I don't really follow what you are doing, but I started reading the code to see how far I could get.

I didn't get further than

Code:

local treated_count = sum(original_treated) local control_count = sum(original_control)

I guess what is happening here is that you are guessing what sum() does by analogy with what a function with the same name does in some other software you know well -- or perhaps just supposing that there should be a function sum() to do what you want.

Unfortunately it's more complicated than that.

1. sum() means in Stata cumulative or running sum, and it's intended to be applied to a variable, or an expression based on one or more variables.

2. It's not illegal to take the result of sum() and put it into a local macro but if the argument is a variable Stata only uses the first observation with nothing else said.

Here is a demonstration.

Code:

. sysuse auto, clear (1978 automobile data) . local foo = sum(mpg) . local bar = sum(price) . l price mpg in 1 +-------------+ | price mpg | |-------------| 1. | 4,099 22 | +-------------+ . di "`foo'" 22 . di "`bar'" 4099

In short with a numeric variable varname

local whatever = sum(varname)

is only ever going to be a way to access varname[1]. That is likely to seem surprising and I don't know that it is even documented, except in posts like this one, I suppose the lack of documentation aries because the intent behind sum() is, as said, to be helpful in producing a variable, not a constant, as a cumulative or running sum is a vector, sequence or series, just as the cumulative or running sum of the primes 2 3 5 7 is a vector 2 5 10 17.

I am guessing that what you really want is

Code:

su orig_treated, meanonly local treated_count = r(sum)

although using count would be another way to get what you want.

I don't know how far this bears on your problem, but unless you intend to get what you got, this code is mistaken and won't help.

ritest is community-contributed. The latest version is from SSC. I've never used it, and there is no data example here, so I've stopped at that point.

Last edited by Nick Cox; Yesterday, 06:21.
Comment
Prateek Mishra

Join Date: Nov 2024

Posts: 8
#3

Yesterday, 07:26

Hi, thank you so much for your response! You are right. I am very confused about how exactly I should process this test, but I am trying my best. I realized I had created the two variables you pointed out in a way totally different from what I intended (I wanted to include them as a fixed count value, to use them later to keep the treated and control group size constant). I corrected it in this way:

Code:

count if Under14 == 1 count if Under14 == 0 local treated_count = 671973 local control_count = 288194

Originally posted by Nick Cox View Post

ritest is community-contributed. The latest version is from SSC. I've never used it, and there is no data example here, so I've stopped at that point.

I have attached an exemplary data sample below. Hope it helps! Some of the PostXXXX terms are not visible, but hopefully, the first two would be enough to make sense.

Last edited by Prateek Mishra; Yesterday, 07:31.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35059
#4

Yesterday, 09:27

Suggested code would be more like

Code:

count if Under14 == 1 local A = r(N) count if Under14 == 0 local B = r(N)

where A, B should be the names you want.

Screenshots are unfortunately less helpful than you hope. They don't allow copy and paste and no one I know wants to type in so much data. Please see FAQ Advice #12.
Comment
Prateek Mishra

Join Date: Nov 2024

Posts: 8
#5

Yesterday, 23:48

Yes, this code also gives the desired output!

I understand the data-related inconvenience, but is there any way I can make it easier to access? I am very new to this sub; therefore, I was not aware of the rules about posting.

Also, can you please tell me if there is any way to tag an expert in my reply who can help me resolve this issue? I am asking this because I need to come up with a solution by the end of this week. Thanks!
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1136
#6

Today, 00:52

From the help for ritest, the command syntax is

Code:

ritest resampvar exp_list [, options] : command

In your code in #1, your command is missing exp_list (you have provided randomized_Under14 as your resampvar, and you have provided some options and regress as the command, but are missing the expression list) and this is the reason for your error. For more help, you'll have to hope that someone who is familiar with this specific procedure will see this thread and weigh in.
2 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35059
#7

Today, 02:54

#5

I am very new to this sub; therefore, I was not aware of the rules about posting.

The rules here are largely unstated, but essentially won't trouble you. One is no spamming. The other need not spell out.

The Statalist home page and the prompt every time you post a new message contain a request to read the FAQ Advice at https://www.statalist.org/forums/help which as stated includes advice, and various requests. #12 there explains in detail how best to post data examples.

Also, can you please tell me if there is any way to tag an expert in my reply who can help me resolve this issue? I am asking this because I need to come up with a solution by the end of this week. Thanks!

There are ways to ping individuals but they are often used inappropriately -- for example, to alert someone active here who may know little or nothing about a topic. Essentially, we are all volunteers here and answering a question implies willingness to help in some way, but we are not your assistants. Pinging people at the start of a thread is not an especially good idea.

In your particular case, the obvious expert is the program author, who should be contacted directly.
Comment
Prateek Mishra

Join Date: Nov 2024

Posts: 8
#8

Today, 04:24

Originally posted by Hemanshu Kumar View Post

From the help for ritest, the command syntax is

Code:

ritest resampvar exp_list [, options] : command

Thank you for the insight. I might have accidentally missed out on that term, but now, after including it in my code, I was able to run the entire code snippet, but it did not generate any output. For your reference, this is the updated code line:

Code:

ritest randomized_Under14 _b[randomized_Under14], stat(_b[Post`year'_randomized]) reps(1):

Based on the code, I expected the plot to appear in a separate window or be saved in the source location. But there was no result. What might be the issue now (assuming that the error no longer pertains to the package itself but rather to some sort of syntactic error that I'm unable to notice)?

Last edited by Prateek Mishra; Today, 04:33.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35059
#9

Today, 04:37

The author of the program, Simon Heß (Hess), is a member here, but hasn't posted since November 2022. That strengthens the advice to email him directly for support.
Comment
Prateek Mishra

Join Date: Nov 2024

Posts: 8
#10

Today, 04:48

Sure thing. I just wanted to exhaust all other options before contacting him directly. I'll do it now, thanks.
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1136
#11

Today, 06:30

Originally posted by Prateek Mishra View Post

Based on the code, I expected the plot to appear in a separate window or be saved in the source location. But there was no result. What might be the issue now (assuming that the error no longer pertains to the package itself but rather to some sort of syntactic error that I'm unable to notice)?

Your problem may or may not have to do with the ritest command itself, since I assume from your message that the new version of your command produced no error. The later part of your code intends to produce a series of graphs in a -for- loop. It might be useful to pick any one year, and show us the contents of the results_randomized_`year' dataset for that year (use dataex and not a screenshot, as suggested in the FAQ linked in #7), and then show us also the exact output, if any, of the twoway command you run on that dataset.
1 like
Comment

Announcement

How to resolve this error in Exact Randomization Test code?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment