Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Synthetic control. How to deal with missing observations?

    Hi Statalist users,

    I'm using the synth command (synthetic control) for a balanced panel data and facing the following situation. osin3 is my dep variable, osin3(1996(1)1998) and lgdp are the two predictor variables. This is what I'm running:
    synth osin3 osin3(1996(1)1998) lgdp, mspeperiod(1996(1)1998) trunit(26) trperiod(1999) nested keep(test1) replace

    To get the following:

    control units: for 8 of out 42 units missing obs for predictor osin3(1996(1)1998) in period 1996 -ignored for averaging
    control units: for 2 of out 42 units missing obs for predictor osin3(1996(1)1998) in period 1997 -ignored for averaging
    control units: for 2 of out 42 units missing obs for predictor lgdp in period 1994 -ignored for averaging

    control units: for 8 of out 42 units outcome variable osin3 is missing in 1996 pre-intervention MSPE period - check mspeperiod()
    invalid syntax
    r(198);


    The question I have is the following: do you need to have NO missing observations to run synth? Or more generally, how is synth dealing with missing observations? From looking at the synth ado file, it looks like missing observations could be a problem for the routine to run. Does anyone have any suggestion or know what could be happening? For instance, is there any rule of thumb of why 8 of out 42 units is preventing the code to run correctly? or 2 of out 42? Any thought would be greatly appreciated and extremely helpful.

    Thanks!
    Cesar

  • #2
    Welcome to Statalist, Cesar!

    In future posts, please put Stata commands and results between CODE delimiters, as described in FAQ 12.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      Thanks Steve for your message. I'll try this again.
      This is what I'm trying to run: a synthetic control using osin3 as my dependent variable and osin3(1996(1)1998) and lgdp as my predictors. The intervention year is 1999.

      Code:
      synth osin3 osin3(1996(1)1998) lgdp, mspeperiod(1996(1)1998) trunit(26) trperiod(1999) nested keep(test1) replace
      This is what Stata is doing:
      Code:
      ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      Synthetic Control Method for Comparative Case Studies
      ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      First Step: Data Setup
      ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      control units: for 8 of out 42 units missing obs for predictor osin3(1996(1)1998) in period 1996 -ignored for averaging
      control units: for 2 of out 42 units missing obs for predictor osin3(1996(1)1998) in period 1997 -ignored for averaging
      control units: for 2 of out 42 units missing obs for predictor lgdp in period 1994 -ignored for averaging
      control units: for 8 of out 42 units outcome variable osin3 is missing in 1996 pre-intervention MSPE period - check mspeperiod()
      invalid syntax
      r(198);
      
      end of do-file
      
      r(198);
      My question is, do we need to have NO missing observations to run synth? How is synth dealing with missing observations?
      Any comment/suggestion is greatly appreciated. Thanks in advance!!!

      Comment


      • #4
        Thanks for using CODE delimiters.

        synth (from SSC) averages all the available values for a period, so the "missing" messages are just informative. ` The error is a syntax error and lies elsewhere. I'm not experienced with synth and don't see where. Be sure you've updated to the latest version. Then run
        Code:
        set trace on
        before your synth command, and
        Code:
        set trace off
        after. Report (inside CODE delimiters) the line with the error message and the 4 or 5 lines prior.
        Steve Samuels
        Statistical Consulting
        [email protected]

        Stata 14.2

        Comment


        • #5
          Thanks Steve. I checked and looks like I'm using the latest version, so that should be good.
          I now included the set trace on/off
          This is what's happening:
          Code:
           
              - local pvar `r(panelvar)'
              = local pvar code
              - local tvar `r(timevar)'
              = local tvar year
              - local tino : list sizeof timeno
              - local cono : list sizeof unitno
              - foreach tum of local timeno {
              - qui sum `cvar' if `tvar' == `tum' & `sub' == 1 , meanonly
              = qui sum osin3 if year == 1996 & __000008 == 1 , meanonly
              - tempname checkdimis checkdimshould
              - qui scalar define `checkdimis' = `r(N)'
              = qui scalar define __00000B = 34
              - qui scalar define `checkdimshould' = `cono'
              = qui scalar define __00000C = 42
              - qui scalar define `checkdimis' = `checkdimshould' - `checkdimis'
              = qui scalar define __00000B = __00000C - __00000B
              - if `checkdimis' != 0 {
              = if __00000B != 0 {
              - qui local checkdimis : display `checkdimis'
              = qui local checkdimis : display __00000B
              - di as err "`ulabel': for `checkdimis' of out `cono' units outcome variable `cvar' is missing in `tum' `tlabel'"
              = di as err "control units: for 8 of out 42 units outcome variable osin3 is missing in 1996 pre-intervention MSPE period - check mspeperiod()"
          control units: for 8 of out 42 units outcome variable osin3 is missing in 1996 pre-intervention MSPE period - check mspeperiod()
              - error 198
          invalid syntax
                }
                }
          Thanks for your help!

          Comment


          • #6
            I can reproduce this error using an example in the synth help, if I set the outcome variable for one year in one control unit to missing.

            synth is not bothered by missing predictors for some years in the pretreatment period. It uses only the average of all years in the period and ignores the missings when computing that .So the first four missing observation messages are purely informative.

            The fourth message is misleading (there's no syntax error) but does show the cause of the problem: your outcome variable is missing for in 1996 (I think) in eight control units. So either exclude those units or start the pretreatment period in 1997.

            Be warned that there are other problems with synth. I downloaded the latest package from SSC. When I tried to run the example in the help on the enclosed data set (smoking.dta), I got still another error ("variable index not found"). Luckily I had an older version of the data available from work with Ariel Linden's itsa program, available at SSC.


            Steve Samuels
            Statistical Consulting
            [email protected]

            Stata 14.2

            Comment


            • #7
              Thanks. Yes, looks like I need to have no missing observations of my dependent variable (pre and post treatment). That seem to be the problem.
              I excluded those units for 1996 but still got a similar message for 1997, 2002 and 2003. After excluding all those units (1996, 1997, 2002 and 2003) that have missing observations, synth is running.

              Also, you might be getting the "variable index not found" because you first need to tsset the panel.
              Code:
              ​tsset state year
              And then run synth, it should work.

              Thanks

              Comment


              • #8
                You're right; I overlooked the tsset statement. Thanks.
                Steve Samuels
                Statistical Consulting
                [email protected]

                Stata 14.2

                Comment


                • #9
                  Hi all,

                  I have a similar issue. I'm running a synth and there are no observations for any of my variables in some periods, as it notes there are 10 units without any data in period 228. its not that there is an observation in period 228 but just the dependent var m_age is missing, there is nothing. I thought I could just exclude this one period however I think the missing entries are randomly around the data and so I would be excluding too much. Is it just not possible to run synth if there isn't data in every single period?? If so would it be fine to just generate an period that is equal to the previous or something like that so that there is an entry in every period??

                  Thanks.

                  Code:
                   synth m_age c_number, trunit(84) trperiod(468) nested fig
                  --------------------------------------------------------------------------------------
                  Synthetic Control Method for Comparative Case Studies
                  --------------------------------------------------------------------------------------
                  
                  First Step: Data Setup
                  --------------------------------------------------------------------------------------
                  control units: for 10 of out 36 units outcome variable m_age is missing in 228 pre-int
                  > ervention MSPE period - check mspeperiod()
                  invalid syntax
                  r(198);

                  Comment

                  Working...
                  X