Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Balance test between treatment and control group at baseline?

    Hi,

    this might be a really stupid question, but I'm a rookie.
    I have a panel data set and can I compare and make balance tests between the (later) treatment and control group at baseline?
    I hope that's just a general question but I can also describe my data.


    thanks

  • #2
    Yes, you can. You just -keep- the baseline observations from the data set and then for discrete variables, -tab variable treatment_vs_control-, and for continuous variables -tabstat variable, by(treatment_vs_control) statistics(mean sd p25 p50 p75)- (or whatever descriptive statistics you think are relevant).

    Comment


    • #3
      That doesn't really work, because at baseline there are no observations for treatment yet.
      I thought Stata would let me observe those individuals at baseline, that will receive treatment at a later point, but apparently it doesn't work like that.
      Last edited by Kuno Mafis; 22 Aug 2022, 06:57.

      Comment


      • #4
        Why are there no baseline observations for the treatment group? There should be baseline observations for both groups (or, less desirably, for neither group) in a study design comparing two groups. If it is really true that there are no baseline observations for the treatment group then, clearly, there is no way to compare the two groups at baseline.

        Perhaps the real problem here is that the baseline observations for the treatment group occur at a different time from those in the control group, so it is not clear to you how to select the baseline values for both groups. If that is the case, then post back with example data using the -dataex- command and I'll try to figure out how you can do that. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

        Comment


        • #5
          . dataex ind_id year treatment

          ----------------------- copy starting from the next line -----------------------
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str4 ind_id float(year treatment)
          "1"    0 0
          "1"    1 1
          "10"   0 0
          "10"   1 0
          "100"  0 0
          "100"  1 1
          "1000" 0 0
          "1000" 1 1
          "1001" 0 0
          "1001" 1 0
          "1002" 0 0
          "1002" 1 0
          "1005" 0 0
          "1005" 1 0
          "1006" 0 0
          "1006" 1 0
          "1009" 0 0
          "1009" 1 0
          "101"  0 0
          "101"  1 0
          "1010" 0 0
          "1010" 1 0
          "1011" 0 0
          "1011" 1 0
          "1013" 0 0
          "1013" 1 0
          "1016" 0 0
          "1016" 1 0
          "1017" 0 0
          "1017" 1 0
          "1018" 0 0
          "1018" 1 0
          "1019" 0 0
          "1019" 1 0
          "102"  0 0
          "102"  1 0
          "1022" 0 0
          "1022" 1 0
          "1024" 0 0
          "1024" 1 0
          "1025" 0 0
          "1025" 1 0
          "1026" 0 0
          "1026" 1 0
          "1028" 0 0
          "1028" 1 0
          "103"  0 0
          "103"  1 0
          "1032" 0 0
          "1032" 1 0
          "1034" 0 0
          "1034" 1 0
          "1035" 0 0
          "1035" 1 0
          "1036" 0 0
          "1036" 1 0
          "1037" 0 0
          "1037" 1 0
          "1038" 0 0
          "1038" 1 0
          "1039" 0 0
          "1039" 1 0
          "104"  0 0
          "104"  1 0
          "1041" 0 0
          "1041" 1 0
          "1042" 0 0
          "1042" 1 0
          "1043" 0 0
          "1043" 1 0
          "1044" 0 0
          "1044" 1 0
          "1046" 0 0
          "1046" 1 0
          "1049" 0 0
          "1049" 1 0
          "105"  0 0
          "105"  1 1
          "1050" 0 0
          "1050" 1 0
          "1051" 0 0
          "1051" 1 0
          "1052" 0 0
          "1052" 1 0
          "1053" 0 0
          "1053" 1 0
          "1055" 0 0
          "1055" 1 0
          "1056" 0 0
          "1056" 1 0
          "1058" 0 0
          "1058" 1 0
          "1059" 0 0
          "1059" 1 0
          "106"  0 0
          "106"  1 0
          "1062" 0 0
          "1062" 1 0
          "1063" 0 0
          "1063" 1 0
          end
          label values year label_year
          label def label_year 0 "2015", modify
          label def label_year 1 "2016", modify
          ------------------ copy up to and including the previous line ------------------

          Listed 100 out of 9562 observations
          Use the count() option to list more



          is this what you meant?
          ind_id is the identifier for each individual, year is 2015 and 2016 so for each individual there are observations for both years, the treatment took place in 2016.

          but when I want to observe another variable for example mental health at baseline for those who later received treatment, then Stata tells me there are no observations.

          tab pds if year==0 & treatment==1
          no observations

          or do I have to do it another way?

          Comment


          • #6
            The problem is that your data set lacks a variable that distinguishes those who are ultimately treated from those who never are. But you can calculate that from the variable treatment, which identifies when the person is actually in treatment:
            Code:
            by ind_id (year), sort: egen byte in_treatment_group = max(treatment)
            
            tab year in_treatment_group

            Comment


            • #7
              That looks great thanks a lot, exactly what I needed.

              I'll probably be facing some more issues and questions while working with this data set,
              I might come back to you again.

              So far have a nice day

              Comment


              • #8
                Can anyone guide me on how to run a balancing test between a treated and a control group before the treatment year?

                Comment

                Working...
                X