Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Small treatment group in comparison with the control group

    I am conducting a causal relation analysis, but my treatment group is tiny compared to the control group, almost 70 to 80 times smaller. How should I proceed? I was thinking to filter the households who are in the treatment group and then randomly assign households from the remaining sample size to the control group is this approach right?

  • #2
    I am conducting a causal relation analysis
    This is extremely vague. Causal Inference extends from experimental datasets to regression discontinuity, synthetic control methods, matching, Difference-in-Differences, interrupted time series, instrumental variables, and many others I've forgotten or omitted.
    I was thinking to filter the households who are in the treatment group and then randomly assign households from the remaining sample size to the control group is this approach right?
    Definitely not, especially if the treatment wasn't wasn't randomly assigned, I'm not even quite sure what this would look like.

    Sounds like a good recipe for bootstrapping ones SEs and betas to deal with small samples. But you definitely cannot choose random controls for your control group.

    Can you give a little more context on how you're identifying the causal effect please? Welcome to Statalist, Saransh.

    Comment


    • #3
      Perhaps I am missing something, but why do you want to throw away data? There is almost no statistical disadvantage I can think of to having a control group that is vastly larger than the treatment group. If the treatment group is small in absolute terms, that is a problem. But reducing the size of the control group will not help that. In fact it will only make matters worse. If the treatment group is sufficiently large, having a much larger control group does you no harm.

      Perhaps, perhaps, if you are going to do a very computationally intensive analysis of this data, reducing the size of the control group might be worth while to reduce the length of time required to do the analysis, or perhaps reduce demands on memory. But other than that this seems like a non-problem to me.

      Comment


      • #4
        The only OTHER justification I could think of is when the method is predicated on dimensional reduction/regularization. For example, synthetic control methods do this implicitly with convex constraints upon the unit weights in the OLS objective function.

        My Forward DID method fdid throws away many many many controls on the grounds that DID's parallel trend assumption is valid with that selected control group. My issue mainly comes from
        randomly assign households from the remaining sample size to the control group

        If there's some more formal scheme to select the right controls ("right" by some metric that is) then fine, but you can't randomly select 10 controls and go "Alright, you're my new control group"

        Comment


        • #5
          Yes, I agree with what is said in #4. But, even here, the reason for discarding controls in the synthetic control method is to get a control-group that is similar pre-treatment to the treatment group. The goal is not to reduce the number of controls but rather to weed out the "bad" ones and "concentrate" the "good" ones. The number of controls used in a synthetic control analysis is just an epiphenomenon. O.P., on the other hand, only expresses concerns about the large size of the control group--which is not a problem per se.

          Comment


          • #6
            Originally posted by Jared Greathouse View Post
            This is extremely vague. Causal Inference extends from experimental datasets to regression discontinuity, synthetic control methods, matching, Difference-in-Differences, interrupted time series, instrumental variables, and many others I've forgotten or omitted.

            Definitely not, especially if the treatment wasn't wasn't randomly assigned, I'm not even quite sure what this would look like.

            Sounds like a good recipe for bootstrapping ones SEs and betas to deal with small samples. But you definitely cannot choose random controls for your control group.

            Can you give a little more context on how you're identifying the causal effect please? Welcome to Statalist, Saransh.
            Thank you for your insights actually i am working on a longitudinal household survey data and trying to do a DID analysis of how different types parental shocks affect children school enrolment but the set of individuals who received these shocks is less compared to the control group( and i don't know if this will affect the statistical significance) which i think can violate the parallel assumption of DID can you suggest any other methods of analysis which will be appropriate in this

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              Perhaps I am missing something, but why do you want to throw away data? There is almost no statistical disadvantage I can think of to having a control group that is vastly larger than the treatment group. If the treatment group is small in absolute terms, that is a problem. But reducing the size of the control group will not help that. In fact it will only make matters worse. If the treatment group is sufficiently large, having a much larger control group does you no harm.

              Perhaps, perhaps, if you are going to do a very computationally intensive analysis of this data, reducing the size of the control group might be worth while to reduce the length of time required to do the analysis, or perhaps reduce demands on memory. But other than that this seems like a non-problem to me.
              I guess i was wrong but i thought having less individuals that face a shock compared to those who do not which are in control group will include some bias in the dataset and i will not get any strong relation between my outcome variable and the main independent variable.

              Comment


              • #8
                A small treatment group means that the change following the shock in the treatment group will be imprecisely estimated, which, in turn, limits the precision with which the difference in differences can be estimated. But having a larger control group does not make that problem worse. In fact, you are still better off with a larger control group than a smaller one (again, unless you are up against memory or compute-time limitations). And reducing the size of the control group will actually further decrease the precision of the DID estimate, though perhaps not by very much.

                The solution to the problem is to get data on a larger treatment group, not to throw out some of the controls.

                Comment


                • #9
                  Originally posted by Clyde Schechter View Post
                  A small treatment group means that the change following the shock in the treatment group will be imprecisely estimated, which, in turn, limits the precision with which the difference in differences can be estimated. But having a larger control group does not make that problem worse. In fact, you are still better off with a larger control group than a smaller one (again, unless you are up against memory or compute-time limitations). And reducing the size of the control group will actually further decrease the precision of the DID estimate, though perhaps not by very much.

                  The solution to the problem is to get data on a larger treatment group, not to throw out some of the controls.
                  Finding a longitudinal dataset with latest time periods is very difficult so I am not able to get data on a larger treatment group. Any other suggestions you have in mind will be great if you can share.

                  Comment


                  • #10
                    Well, as the generals say, you fight your war with the army you have, not the army you wish you had. Similarly, statisticians attack problems with the data we can get, not the data we wish we could get. But just as the generals will also tell you that some wars cannot be won with the army they have, statisticians acknowledge that some problems cannot be answered with the data that are available.

                    All of which is just a long-winded way of saying, go ahead and analyze your existing data set as best you can. Be prepared mentally for the possibility that your analysis may turn out inconclusive if the data in the treatment group are too meager. But, again, discarding controls to try to balance the the size with that of the control group definitely won't help you--it can only make matters worse.

                    Comment


                    • #11
                      Originally posted by Clyde Schechter View Post
                      Well, as the generals say, you fight your war with the army you have, not the army you wish you had. Similarly, statisticians attack problems with the data we can get, not the data we wish we could get. But just as the generals will also tell you that some wars cannot be won with the army they have, statisticians acknowledge that some problems cannot be answered with the data that are available.

                      All of which is just a long-winded way of saying, go ahead and analyze your existing data set as best you can. Be prepared mentally for the possibility that your analysis may turn out inconclusive if the data in the treatment group are too meager. But, again, discarding controls to try to balance the the size with that of the control group definitely won't help you--it can only make matters worse.
                      Thank you Sir for your insights i guess i will try to study the dataset in more detail and find out if I can do something with it or not.

                      Comment

                      Working...
                      X