How to resolve this error in Exact Randomization Test code?

Prateek Mishra

Join Date: Nov 2024

Posts: 13
#1

How to resolve this error in Exact Randomization Test code?

22 Jan 2025, 03:52

Hi everyone! The following question is based on a test that I am trying to run. Although I have included all possible details, please let me know if you need any other info!

Data:

My data contains the following variables: URBAN, AGE, SEX, NCHILD, FAMSIZE, Under14 (treated variable which is 1 for age < 14, and 0 otherwise), LIT (outcome), Post1986 (post-term). I am running a DiD model.

Situation:

I am trying to run an Exact Randomization Test to check the robustness of my data, for which I have written my own code. The process goes something like this:

The YEAR column has 1983, 1987, 1993, 1999, 2004, 2009. The Post1986 term is 1 for years after 1986 and 0 otherwise (including 1986). I used each of the given years as a placebo year and created a new data set before proceeding with the ritest package; that is, I did it 6 times. For this purpose, I had to create a new variable each time, PostXXXX, where XXXX is the year, following the same logic, 1 for years after it and 0 otherwise (and including XXXX). I understand that my data has no 1986 year, which is, in fact, the treatment year, but that's just how it is.

My end objective is to get 'n' probability values for the n-simulations of regressions, which I can then plot separately as a cumulative probability distribution for each placebo year. The y-axis would have the probabilities, and the x-axis would have the interaction term (post*treat) for each simulation for that placebo year. Additionally, a vertical red line on the plot cutting the distribution would represent the actual interaction term for the actual regression (Under14*Post1986).

If the vertical line cuts the probability curve at an extremity, then it would mean that the actual estimate is an outlier in the distribution, hence validating that the intervention is unlikely to have occurred randomly.

However, due to collinearity issues among some variables in the process, I had to first randomize the treated and control units (keeping the number of units in each group constant) and then proceed further.

Code:

local years 1983 1987 1993 1999 2004 2009 local reps 1000 * Store the original number of treated and control units gen original_treated = Under14 gen original_control = 1 - Under14 local treated_count = sum(original_treated) local control_count = sum(original_control) foreach year in `years' { forval i = 1/`reps' { * Randomize treated and control groups while maintaining original proportions gen random_assign = runiform() sort random_assign * Assign treated and control groups gen randomized_Under14 = 0 replace randomized_Under14 = 1 if _n <= `treated_count' * Interaction term for placebo year generate Post`year'_randomized = Post`year' * randomized_Under14 * Run ritest with the randomized groups ritest randomized_Under14, stat(_b[Post`year'_randomized]) reps(1): /// regress LIT Post`year'##randomized_Under14 URBAN AGE SEX NCHILD FAMSIZE randomized_Under14 Post`year' i.state_encoded } * Save results for the placebo year save results_randomized_`year', replace } * Visualization code (as given earlier) foreach year in `years' { use results_randomized_`year', clear * Generate cumulative distribution gen cdf = _n / _N twoway (line cdf stat, sort) /// (vline `actual_coeff', lcolor(red)), /// title("Cumulative Distribution for `year'") /// xtitle("Simulated Coefficient") /// ytitle("Cumulative Probability") }

Error:
I am always getting this error: expression list required r(100);

I pinpointed the line of code for which I am getting it; it is in the place where the ritest is being run:
ritest randomized_Under14, stat(_b[Post`year'_randomized]) reps(1)

Despite multiple attempts, I am not able to understand the exact reason why this is happening. What I feel is that while I am randomizing the Under14 variable, I am only doing so for that column and not for the ones dependent on it, such as AGE. I mean, when randomly making (Under14 = 1) for a unit, the code fails to account for the corresponding unit's AGE, which maybe 17. But its a very minor issue, and there might be some other reason for this error.

Can anyone please help me out with this error? I understand it's a very long question, but I am stuck at the moment! Thank you!
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35697
#2

22 Jan 2025, 05:15

I don't really follow what you are doing, but I started reading the code to see how far I could get.

I didn't get further than

Code:

local treated_count = sum(original_treated) local control_count = sum(original_control)

I guess what is happening here is that you are guessing what sum() does by analogy with what a function with the same name does in some other software you know well -- or perhaps just supposing that there should be a function sum() to do what you want.

Unfortunately it's more complicated than that.

1. sum() means in Stata cumulative or running sum, and it's intended to be applied to a variable, or an expression based on one or more variables.

2. It's not illegal to take the result of sum() and put it into a local macro but if the argument is a variable Stata only uses the first observation with nothing else said.

Here is a demonstration.

Code:

. sysuse auto, clear (1978 automobile data) . local foo = sum(mpg) . local bar = sum(price) . l price mpg in 1 +-------------+ | price mpg | |-------------| 1. | 4,099 22 | +-------------+ . di "`foo'" 22 . di "`bar'" 4099

In short with a numeric variable varname

local whatever = sum(varname)

is only ever going to be a way to access varname[1]. That is likely to seem surprising and I don't know that it is even documented, except in posts like this one, I suppose the lack of documentation aries because the intent behind sum() is, as said, to be helpful in producing a variable, not a constant, as a cumulative or running sum is a vector, sequence or series, just as the cumulative or running sum of the primes 2 3 5 7 is a vector 2 5 10 17.

I am guessing that what you really want is

Code:

su orig_treated, meanonly local treated_count = r(sum)

although using count would be another way to get what you want.

I don't know how far this bears on your problem, but unless you intend to get what you got, this code is mistaken and won't help.

ritest is community-contributed. The latest version is from SSC. I've never used it, and there is no data example here, so I've stopped at that point.

Last edited by Nick Cox; 22 Jan 2025, 05:21.
Comment
Prateek Mishra

Join Date: Nov 2024

Posts: 13
#3

22 Jan 2025, 06:26

Hi, thank you so much for your response! You are right. I am very confused about how exactly I should process this test, but I am trying my best. I realized I had created the two variables you pointed out in a way totally different from what I intended (I wanted to include them as a fixed count value, to use them later to keep the treated and control group size constant). I corrected it in this way:

Code:

count if Under14 == 1 count if Under14 == 0 local treated_count = 671973 local control_count = 288194

Originally posted by Nick Cox View Post

ritest is community-contributed. The latest version is from SSC. I've never used it, and there is no data example here, so I've stopped at that point.

I have attached an exemplary data sample below. Hope it helps! Some of the PostXXXX terms are not visible, but hopefully, the first two would be enough to make sense.

Last edited by Prateek Mishra; 22 Jan 2025, 06:31.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#4

22 Jan 2025, 08:27

Suggested code would be more like

Code:

count if Under14 == 1 local A = r(N) count if Under14 == 0 local B = r(N)

where A, B should be the names you want.

Screenshots are unfortunately less helpful than you hope. They don't allow copy and paste and no one I know wants to type in so much data. Please see FAQ Advice #12.
Comment
Prateek Mishra

Join Date: Nov 2024

Posts: 13
#5

22 Jan 2025, 22:48

Yes, this code also gives the desired output!

I understand the data-related inconvenience, but is there any way I can make it easier to access? I am very new to this sub; therefore, I was not aware of the rules about posting.

Also, can you please tell me if there is any way to tag an expert in my reply who can help me resolve this issue? I am asking this because I need to come up with a solution by the end of this week. Thanks!
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1396
#6

22 Jan 2025, 23:52

From the help for ritest, the command syntax is

Code:

ritest resampvar exp_list [, options] : command

In your code in #1, your command is missing exp_list (you have provided randomized_Under14 as your resampvar, and you have provided some options and regress as the command, but are missing the expression list) and this is the reason for your error. For more help, you'll have to hope that someone who is familiar with this specific procedure will see this thread and weigh in.
2 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#7

23 Jan 2025, 01:54

#5

I am very new to this sub; therefore, I was not aware of the rules about posting.

The rules here are largely unstated, but essentially won't trouble you. One is no spamming. The other need not spell out.

The Statalist home page and the prompt every time you post a new message contain a request to read the FAQ Advice at https://www.statalist.org/forums/help which as stated includes advice, and various requests. #12 there explains in detail how best to post data examples.

Also, can you please tell me if there is any way to tag an expert in my reply who can help me resolve this issue? I am asking this because I need to come up with a solution by the end of this week. Thanks!

There are ways to ping individuals but they are often used inappropriately -- for example, to alert someone active here who may know little or nothing about a topic. Essentially, we are all volunteers here and answering a question implies willingness to help in some way, but we are not your assistants. Pinging people at the start of a thread is not an especially good idea.

In your particular case, the obvious expert is the program author, who should be contacted directly.
Comment
Prateek Mishra

Join Date: Nov 2024

Posts: 13
#8

23 Jan 2025, 03:24

Originally posted by Hemanshu Kumar View Post

From the help for ritest, the command syntax is

Code:

ritest resampvar exp_list [, options] : command

Thank you for the insight. I might have accidentally missed out on that term, but now, after including it in my code, I was able to run the entire code snippet, but it did not generate any output. For your reference, this is the updated code line:

Code:

ritest randomized_Under14 _b[randomized_Under14], stat(_b[Post`year'_randomized]) reps(1):

Based on the code, I expected the plot to appear in a separate window or be saved in the source location. But there was no result. What might be the issue now (assuming that the error no longer pertains to the package itself but rather to some sort of syntactic error that I'm unable to notice)?

Last edited by Prateek Mishra; 23 Jan 2025, 03:33.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#9

23 Jan 2025, 03:37

The author of the program, Simon Heß (Hess), is a member here, but hasn't posted since November 2022. That strengthens the advice to email him directly for support.
Comment
Prateek Mishra

Join Date: Nov 2024

Posts: 13
#10

23 Jan 2025, 03:48

Sure thing. I just wanted to exhaust all other options before contacting him directly. I'll do it now, thanks.
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1396
#11

23 Jan 2025, 05:30

Originally posted by Prateek Mishra View Post

Based on the code, I expected the plot to appear in a separate window or be saved in the source location. But there was no result. What might be the issue now (assuming that the error no longer pertains to the package itself but rather to some sort of syntactic error that I'm unable to notice)?

Your problem may or may not have to do with the ritest command itself, since I assume from your message that the new version of your command produced no error. The later part of your code intends to produce a series of graphs in a -for- loop. It might be useful to pick any one year, and show us the contents of the results_randomized_`year' dataset for that year (use dataex and not a screenshot, as suggested in the FAQ linked in #7), and then show us also the exact output, if any, of the twoway command you run on that dataset.
1 like
Comment

Prateek Mishra

Join Date: Nov 2024
Posts: 13

#12

24 Jan 2025, 03:41

I tried to pinpoint the place of the error and realized that the dta file containing the simulated probabilities of each placebo year is never created in the for loop. I think it is because for each iteration, the dta file is being replaced for the next one. That is the reason why no plot is being generated. This is the code line to be exact:

Code:

* Save results for the placebo year
    save results_randomized_`year', replace

I looked it up online and found that storing multiple datasets in memory is only available from Stata 16 onwards, and I'm using Stata 15. To tackle this, I tried to individually run it for a single placebo year using this code:

Code:

local reps 1000

* Store the original number of treated and control units
count if Under14 == 1
count if Under14 == 0
local treated_count = 671973
local control_count = 288194

gen Post1983 = YEAR >= 1983

forval i = 1/`reps' {
        * Randomize treated and control groups while maintaining original proportions
        gen random_assign = runiform()
        sort random_assign
        
        * Assign treated and control groups
        gen randomized_Under14 = 0
        replace randomized_Under14 = 1 if _n <= `treated_count'

        * Interaction term for placebo year
        generate Post1983_randomized = Post1983 * randomized_Under14

        * Run ritest with the randomized groups (i.state_encoded )
        ritest randomized_Under14 _b[randomized_Under14], stat(_b[Post1983_randomized]) reps(1): ///
            regress LIT Post1983##randomized_Under14 URBAN AGE SEX NCHILD FAMSIZE randomized_Under14 Post1983 i.state_encoded
    }

But then this error comes up: invalid syntax r(198);
And I can't find which part of the code has the error now.

Comment

Hemanshu Kumar

Join Date: Mar 2015

Posts: 1396
#13

24 Jan 2025, 03:46

Originally posted by Prateek Mishra View Post

I tried to pinpoint the place of the error and realized that the dta file containing the simulated probabilities of each placebo year is never created in the for loop. I think it is because for each iteration, the dta file is being replaced for the next one. That is the reason why no plot is being generated. This is the code line to be exact:

Code:

* Save results for the placebo year save results_randomized_`year', replace

No, I don't think that is the issue. While Stata may have only one dataset in memory, it can save as many datasets to disk as you want. The save command you are using creates (or replaces) files on disk, and it should have no problems saving one for each year, as you intend to do. The problem lies elsewhere.
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1396
#14

24 Jan 2025, 03:50

There are several debugging tools available in Stata. One is to trace what is happening during program execution, another is to pause execution at critical points and then check what the dataset, or particular variables, etc are storing at that point.

You might want to check

Code:

help trace help pause
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1396
#15

24 Jan 2025, 03:59

Glancing at your code in #1, I think there is one major problem: if I am understanding it correctly, your outer foreach year ... loop saves one file per year, but within that loop you execute another forval i ... loop that runs 1000 times. Yet, nothing stores the results of each of those 1000 iterations. So the file for any given year will only store the results for the last iteration, since everything before that is being discarded every time a new iteration runs.

This of course, has nothing to do with the syntax error you mentioned in #12. That is a separate problem you will need to investigate.
Comment

Announcement