Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to resolve this error in Exact Randomization Test code?

    Hi everyone! The following question is based on a test that I am trying to run. Although I have included all possible details, please let me know if you need any other info!

    Data:

    My data contains the following variables: URBAN, AGE, SEX, NCHILD, FAMSIZE, Under14 (treated variable which is 1 for age < 14, and 0 otherwise), LIT (outcome), Post1986 (post-term). I am running a DiD model.

    Situation:

    I am trying to run an Exact Randomization Test to check the robustness of my data, for which I have written my own code. The process goes something like this:

    The YEAR column has 1983, 1987, 1993, 1999, 2004, 2009. The Post1986 term is 1 for years after 1986 and 0 otherwise (including 1986). I used each of the given years as a placebo year and created a new data set before proceeding with the ritest package; that is, I did it 6 times. For this purpose, I had to create a new variable each time, PostXXXX, where XXXX is the year, following the same logic, 1 for years after it and 0 otherwise (and including XXXX). I understand that my data has no 1986 year, which is, in fact, the treatment year, but that's just how it is.

    My end objective is to get 'n' probability values for the n-simulations of regressions, which I can then plot separately as a cumulative probability distribution for each placebo year. The y-axis would have the probabilities, and the x-axis would have the interaction term (post*treat) for each simulation for that placebo year. Additionally, a vertical red line on the plot cutting the distribution would represent the actual interaction term for the actual regression (Under14*Post1986).

    If the vertical line cuts the probability curve at an extremity, then it would mean that the actual estimate is an outlier in the distribution, hence validating that the intervention is unlikely to have occurred randomly.

    However, due to collinearity issues among some variables in the process, I had to first randomize the treated and control units (keeping the number of units in each group constant) and then proceed further.

    Code:
    local years 1983 1987 1993 1999 2004 2009
    local reps 1000
    
    * Store the original number of treated and control units
    gen original_treated = Under14
    gen original_control = 1 - Under14
    
    local treated_count = sum(original_treated)
    local control_count = sum(original_control)
    
    foreach year in `years' {
        forval i = 1/`reps' {
            * Randomize treated and control groups while maintaining original proportions
            gen random_assign = runiform()
            sort random_assign
            
            * Assign treated and control groups
            gen randomized_Under14 = 0
            replace randomized_Under14 = 1 if _n <= `treated_count'
    
            * Interaction term for placebo year
            generate Post`year'_randomized = Post`year' * randomized_Under14
    
            * Run ritest with the randomized groups
            ritest randomized_Under14, stat(_b[Post`year'_randomized]) reps(1): ///
                regress LIT Post`year'##randomized_Under14 URBAN AGE SEX NCHILD FAMSIZE randomized_Under14 Post`year' i.state_encoded
        }
        
        * Save results for the placebo year
        save results_randomized_`year', replace
    }
    
    * Visualization code (as given earlier)
    foreach year in `years' {
        use results_randomized_`year', clear
    * Generate cumulative distribution
        gen cdf = _n / _N
        
        twoway (line cdf stat, sort) ///
               (vline `actual_coeff', lcolor(red)), ///
               title("Cumulative Distribution for `year'") ///
               xtitle("Simulated Coefficient") ///
               ytitle("Cumulative Probability")
    }
    Error:
    I am always getting this error: expression list required r(100);

    I pinpointed the line of code for which I am getting it; it is in the place where the ritest is being run:
    ritest randomized_Under14, stat(_b[Post`year'_randomized]) reps(1)

    Despite multiple attempts, I am not able to understand the exact reason why this is happening. What I feel is that while I am randomizing the Under14 variable, I am only doing so for that column and not for the ones dependent on it, such as AGE. I mean, when randomly making (Under14 = 1) for a unit, the code fails to account for the corresponding unit's AGE, which maybe 17. But its a very minor issue, and there might be some other reason for this error.

    Can anyone please help me out with this error? I understand it's a very long question, but I am stuck at the moment! Thank you!

  • #2
    I don't really follow what you are doing, but I started reading the code to see how far I could get.

    I didn't get further than

    Code:
    local treated_count = sum(original_treated)  
    
    local control_count = sum(original_control)
    I guess what is happening here is that you are guessing what sum() does by analogy with what a function with the same name does in some other software you know well -- or perhaps just supposing that there should be a function sum() to do what you want.

    Unfortunately it's more complicated than that.

    1.
    sum() means in Stata cumulative or running sum, and it's intended to be applied to a variable, or an expression based on one or more variables.

    2. It's not illegal to take the result of
    sum() and put it into a local macro but if the argument is a variable Stata only uses the first observation with nothing else said.

    Here is a demonstration.
    Code:
    . sysuse auto, clear
    (1978 automobile data)
    
    . local foo = sum(mpg)
    
    . local bar = sum(price)
    
    . l price mpg in 1
    
         +-------------+
         | price   mpg |
         |-------------|
      1. | 4,099    22 |
         +-------------+
    
    . di "`foo'"
    22
    
    . di "`bar'"
    4099


    In short with a numeric variable varname

    local whatever = sum(varname)

    is only ever going to be a way to access varname[1]. That is likely to seem surprising and I don't know that it is even documented, except in posts like this one, I suppose the lack of documentation aries because the intent behind sum() is, as said, to be helpful in producing a variable, not a constant, as a cumulative or running sum is a vector, sequence or series, just as the cumulative or running sum of the primes 2 3 5 7 is a vector 2 5 10 17.

    I am guessing that what you really want is

    Code:
    su orig_treated, meanonly
    local treated_count = r(sum)
    although using count would be another way to get what you want.

    I don't know how far this bears on your problem, but unless you intend to get what you got, this code is mistaken and won't help.

    ritest is community-contributed. The latest version is from SSC. I've never used it, and there is no data example here, so I've stopped at that point.
    Last edited by Nick Cox; 22 Jan 2025, 06:21.

    Comment


    • #3
      Hi, thank you so much for your response! You are right. I am very confused about how exactly I should process this test, but I am trying my best. I realized I had created the two variables you pointed out in a way totally different from what I intended (I wanted to include them as a fixed count value, to use them later to keep the treated and control group size constant). I corrected it in this way:

      Code:
      count if Under14 == 1
      count if Under14 == 0
      local treated_count = 671973
      local control_count = 288194
      Originally posted by Nick Cox View Post

      ritest is community-contributed. The latest version is from SSC. I've never used it, and there is no data example here, so I've stopped at that point.
      I have attached an exemplary data sample below. Hope it helps! Some of the PostXXXX terms are not visible, but hopefully, the first two would be enough to make sense.

      Click image for larger version

Name:	image_36659.png
Views:	1
Size:	64.5 KB
ID:	1771232

      Last edited by Prateek Mishra; 22 Jan 2025, 07:31.

      Comment


      • #4
        Suggested code would be more like

        Code:
        count if Under14 == 1
        local A = r(N)
        count if Under14 == 0
        local B = r(N)
        where A, B should be the names you want.

        Screenshots are unfortunately less helpful than you hope. They don't allow copy and paste and no one I know wants to type in so much data. Please see FAQ Advice #12.

        Comment


        • #5
          Yes, this code also gives the desired output!

          I understand the data-related inconvenience, but is there any way I can make it easier to access? I am very new to this sub; therefore, I was not aware of the rules about posting.

          Also, can you please tell me if there is any way to tag an expert in my reply who can help me resolve this issue? I am asking this because I need to come up with a solution by the end of this week. Thanks!

          Comment


          • #6
            From the help for ritest, the command syntax is

            Code:
            ritest resampvar exp_list [, options] : command
            In your code in #1, your command is missing exp_list (you have provided randomized_Under14 as your resampvar, and you have provided some options and regress as the command, but are missing the expression list) and this is the reason for your error. For more help, you'll have to hope that someone who is familiar with this specific procedure will see this thread and weigh in.

            Comment


            • #7
              #5

              I am very new to this sub; therefore, I was not aware of the rules about posting.
              The rules here are largely unstated, but essentially won't trouble you. One is no spamming. The other need not spell out.

              The Statalist home page and the prompt every time you post a new message contain a request to read the FAQ Advice at https://www.statalist.org/forums/help which as stated includes advice, and various requests. #12 there explains in detail how best to post data examples.

              Also, can you please tell me if there is any way to tag an expert in my reply who can help me resolve this issue? I am asking this because I need to come up with a solution by the end of this week. Thanks!
              There are ways to ping individuals but they are often used inappropriately -- for example, to alert someone active here who may know little or nothing about a topic. Essentially, we are all volunteers here and answering a question implies willingness to help in some way, but we are not your assistants. Pinging people at the start of a thread is not an especially good idea.

              In your particular case, the obvious expert is the program author, who should be contacted directly.

              Comment


              • #8
                Originally posted by Hemanshu Kumar View Post
                From the help for ritest, the command syntax is

                Code:
                ritest resampvar exp_list [, options] : command
                Thank you for the insight. I might have accidentally missed out on that term, but now, after including it in my code, I was able to run the entire code snippet, but it did not generate any output. For your reference, this is the updated code line:

                Code:
                ritest randomized_Under14 _b[randomized_Under14], stat(_b[Post`year'_randomized]) reps(1):
                Based on the code, I expected the plot to appear in a separate window or be saved in the source location. But there was no result. What might be the issue now (assuming that the error no longer pertains to the package itself but rather to some sort of syntactic error that I'm unable to notice)?
                Last edited by Prateek Mishra; Yesterday, 04:33.

                Comment


                • #9
                  The author of the program, Simon Heß (Hess), is a member here, but hasn't posted since November 2022. That strengthens the advice to email him directly for support.

                  Comment


                  • #10
                    Sure thing. I just wanted to exhaust all other options before contacting him directly. I'll do it now, thanks.

                    Comment


                    • #11
                      Originally posted by Prateek Mishra View Post
                      Based on the code, I expected the plot to appear in a separate window or be saved in the source location. But there was no result. What might be the issue now (assuming that the error no longer pertains to the package itself but rather to some sort of syntactic error that I'm unable to notice)?
                      Your problem may or may not have to do with the ritest command itself, since I assume from your message that the new version of your command produced no error. The later part of your code intends to produce a series of graphs in a -for- loop. It might be useful to pick any one year, and show us the contents of the results_randomized_`year' dataset for that year (use dataex and not a screenshot, as suggested in the FAQ linked in #7), and then show us also the exact output, if any, of the twoway command you run on that dataset.

                      Comment

                      Working...
                      X