I'm writing code for this method. It's a form of synthetic control analysis which predicts the counterfactual of a treated unit based on principal components analysis. The theory at least is that PCA de-noises the outcomes matrix in the pre-intervention period, after which we can perform linear regression to predict the post-intervention counterfactual. My question here is about the principal components analysis step, however. Let's looks at my dataset. What we see here is the real, non-normalized value of the gdp per capita for the Basque Country, and the normalized GDP per-capita values of the other 16 regions in Spain.
To generate the principal components, I do
Okay so this leaves us with our main outcome, out principal component scores, the time variable, and it also generates a screeplot. How do I choose the optimal number of principal components to include in my regression? That is, how do I choose the optimal number of pred scores such that adding more of them doesn't improve my model by very much?
Currently, this code
doesn't looks so bad. Here, I use two principal components, and the pre-intervention fit is good. But suppose I had more donors? Suppose I'd used covariates? What if I should use the top 4 principal components, or the top 3? How could I objectively (or defensibly, rather) choose the optimal number of scores to include here? I know I can eyeball the screeplot, but I was hoping there'd be a more objective way of determining this. Any ideas?
Code:
* Example generated by -dataex-. For more info, type help dataex clear input double(year gdpcap15) float(normgdpcap1 normgdpcap2 normgdpcap3 normgdpcap4 normgdpcap5 normgdpcap6 normgdpcap7 normgdpcap8 normgdpcap9 normgdpcap10 normgdpcap11 normgdpcap12 normgdpcap13 normgdpcap14 normgdpcap16 normgdpcap17) 1955 3.853184630005267 -1.596707 -1.3327312 -.9565113 -1.4974374 -1.2136703 -1.5789266 -1.7555073 -.7793649 -1.206382 -1.792608 -1.620488 -.3183888 -1.6007596 -1.215555 -1.238519 -1.287997 1956 3.9456582961508766 -1.5660152 -1.2639334 -.8668543 -1.4281683 -1.1545168 -1.5308938 -1.7170875 -.7160962 -1.1348827 -1.7534026 -1.5804973 -.2338523 -1.5634706 -1.1526319 -1.1889784 -1.2243198 1957 4.033561734872626 -1.535606 -1.194319 -.7780455 -1.360313 -1.0988817 -1.4827982 -1.678165 -.6560946 -1.0638859 -1.7138518 -1.5405066 -.15606996 -1.525616 -1.0903056 -1.1404744 -1.1606112 1958 4.023421896896646 -1.524548 -1.1786431 -.7371125 -1.3626063 -1.0730591 -1.4723686 -1.6659132 -.634607 -1.0358956 -1.7060295 -1.529763 -.18126445 -1.509312 -1.0718025 -1.1290082 -1.1401918 1959 4.013781968405232 -1.5134274 -1.1618992 -.6965562 -1.3658735 -1.0445975 -1.4619076 -1.6536303 -.6143446 -1.0083137 -1.698176 -1.5189877 -.20755836 -1.493165 -1.0502521 -1.1177617 -1.1197723 1960 4.285918396222732 -1.4553105 -1.071991 -.5540287 -1.3024162 -.9595584 -1.3987017 -1.6060373 -.4735448 -.923369 -1.6639656 -1.4671224 -.06911468 -1.407592 -.9479037 -1.0342307 -1.0330998 1961 4.574336095797406 -1.4029425 -.976051 -.4214283 -1.263682 -.8757132 -1.3544072 -1.5686854 -.3268078 -.8603829 -1.637389 -1.4572268 .1383151 -1.3253802 -.8720691 -.9565427 -.9524587 1962 4.898957353563045 -1.336438 -.8618279 -.29564452 -1.1745907 -.7755323 -1.259944 -1.494767 -.2112338 -.7690924 -1.5892934 -1.3780937 .22991987 -1.2303828 -.7455943 -.8555137 -.8419426 1963 5.197014981629133 -1.2701535 -.7496468 -.17890836 -1.088138 -.680472 -1.1640353 -1.419341 -.1042046 -.6832051 -1.5399725 -1.2985836 .31071785 -1.1346314 -.6261879 -.757092 -.7254261 1964 5.3389029787527225 -1.2359117 -.7206513 -.11768155 -1.0413303 -.6505656 -1.1176047 -1.3794446 -.070434034 -.6574768 -1.5113225 -1.2579334 .3436404 -1.0852792 -.5864485 -.7239498 -.6661471 1965 5.465153005251848 -1.2025496 -.6919699 -.05874794 -.9959993 -.6212243 -1.0716767 -1.33892 -.04159556 -.6352667 -1.482264 -1.2179426 .3668243 -1.0348276 -.5478087 -.6928181 -.6143131 1966 5.545915627064143 -1.1542655 -.6312456 .065370746 -.9178715 -.5657779 -1.025089 -1.293306 .006280098 -.58952725 -1.445415 -1.1651664 .348541 -.9762081 -.4875243 -.6197794 -.56552655 1967 5.614895726639487 -1.1067982 -.5728147 .1845259 -.8423824 -.5113051 -.9782501 -1.2475665 .04894103 -.54762024 -1.4080317 -1.1131753 .3251686 -.9198505 -.4248838 -.5477458 -.50938886 1968 5.8521849330715785 -1.0254031 -.472728 .3709708 -.729353 -.4145485 -.8868653 -1.1478255 .1566927 -.4568323 -1.3517684 -1.0293615 .4113071 -.816968 -.3348814 -.4441409 -.41580495 1969 6.0814054173695915 -.9410553 -.3704112 .5558452 -.6182399 -.31119475 -.7906428 -1.0429639 .2627793 -.3659816 -1.2917353 -.9418093 .4915711 -.7091849 -.2430567 -.3380855 -.3143991 1970 6.17009424134957 -.8639642 -.3176034 .6901736 -.4809588 -.23174755 -.7248607 -.9596213 .3293466 -.29187495 -1.2316707 -.8606028 .53897566 -.6210988 -.14903314 -.3020532 -.23086786 1971 6.283633404546246 -.7897945 -.26152843 .8140724 -.3627773 -.15333693 -.6582934 -.8792316 .4000921 -.22043823 -1.1720461 -.7845797 .5967782 -.5418713 -.05215097 -.2721151 -.1473054 1972 6.5555553986528405 -.6871634 -.13942033 .9909357 -.23592573 -.0716278 -.5439448 -.7430497 .53709066 -.07888454 -1.0874155 -.6752887 .7781027 -.41297776 .065370746 -.13005887 -.03876837 1973 6.810768561103078 -.5853176 -.015270197 1.160574 -.11582801 .009044589 -.4279623 -.6036323 .6685919 .0577056 -.999769 -.5667203 .9488406 -.286503 .18138444 .012060257 .068826266 1974 7.105184302810804 -.55710745 .05462705 1.1677995 -.15415373 .083654 -.37546885 -.56753707 .7343109 .10548712 -.9745118 -.52167183 1.028319 -.2729319 .2932514 .081078 .11334074 1975 7.377891682175629 -.53056216 .1243985 1.1684276 -.1921025 .15706965 -.3242633 -.53065634 .7948152 .1485877 -.950888 -.4782884 1.0990016 -.26196828 .3980188 .1468285 .1574466 1976 7.232933621922754 -.482341 .1840546 1.19513 -.16266713 .2102229 -.25757018 -.4652201 .7994331 .19495553 -.9043317 -.4219623 1.0509377 -.23485756 .4309412 .12056603 .21923886 1977 7.089831372119127 -.43399405 .2411663 1.2152352 -.1364988 .25725052 -.19072025 -.3985899 .802763 .2388102 -.8590949 -.3662644 1.003816 -.20837517 .4690784 .09700516 .2804971 1978 6.786703607144611 -.4456487 .2669575 1.1982714 -.12198532 .2632506 -.1633896 -.3835739 .7483845 .2167572 -.8055646 -.35040015 .9114573 -.1995477 .44256455 .13039878 .329221 1979 6.6398173868571035 -.48007905 .27917776 1.1819359 -.03726034 .25878978 -.175013 -.4000977 .7447091 .20117553 -.7929046 -.33466145 .8812994 -.21195647 .4053697 .16985537 .3443943 1980 6.562839171369564 -.4868645 .28882203 1.2940855 .05230221 .255554 -.18673053 -.435282 .7745529 .2079924 -.782475 -.3240119 .9233946 -.2441563 .3916416 .21063133 .3477556 1981 6.50078545499277 -.48884365 .3048433 1.4134606 .149593 .26249668 -.1923224 -.4649687 .820041 .25580537 -.7678673 -.3077078 .9739722 -.2684082 .3781963 .2716382 .3615152 1982 6.545058606999563 -.4703406 .3587193 1.4976516 .1623787 .26337618 -.17381923 -.4289364 .8662204 .3079221 -.7510605 -.26721445 .9786844 -.22307713 .4592142 .2338467 .4362188 1983 6.595329801139407 -.4516175 .4146999 1.586869 .1767037 .27427712 -.1528658 -.3930923 .9149128 .3614523 -.7343479 -.22518185 .9855954 -.1809504 .53957254 .19834834 .5208494 1984 6.761496750091492 -.4221193 .4639893 1.741114 .21254756 .3126343 -.0900681 -.3379285 .9529243 .4041445 -.6955512 -.1842803 1.0474818 -.12591213 .6506541 .25037074 .5757306 1985 6.937160671727721 -.3922441 .5179279 1.906668 .24961662 .36079255 -.02472599 -.28179082 .9906216 .4478107 -.6553719 -.1409596 1.1090544 -.0719735 .7608874 .30490625 .6411355 1986 7.332191151300521 -.2881366 .7075457 2.173063 .4129093 .4849112 .14478648 -.14849916 1.2139786 .6252084 -.5087605 -.01772062 1.3327255 .06744402 .9375311 .4032336 .8125016 1987 7.742788123594152 -.18368343 .8976348 2.421866 .569479 .6137106 .3195767 -.00996114 1.4367074 .8033913 -.3605784 .10891119 1.5535696 .20692444 1.1266465 .5046396 .9953338 1988 8.12053664075889 -.06977446 1.0930328 2.55255 .7297558 .7501752 .4558528 .15823197 1.6449857 .986538 -.24139184 .2601406 1.7316896 .3548866 1.333668 .6026841 1.1809933 1989 8.509711162324157 .04378901 1.2865463 2.6791506 .8885245 .8866398 .5964641 .328687 1.8510646 1.1662288 -.1173673 .411904 1.9091815 .49810535 1.5334642 .7018282 1.379219 1990 8.776777889074104 .11748728 1.381418 2.7250156 .891666 .9381596 .6828225 .4228676 1.9650993 1.240681 -.03700916 .4993306 1.9745237 .5543059 1.706558 .7324574 1.5576532 1991 9.02527866619582 .18961503 1.4737767 2.7985256 .8926086 1.0022452 .7676731 .508692 2.0819612 1.314505 .035244167 .582202 2.0436356 .612454 1.8799658 .757746 1.7206944 1992 8.873892824706335 .14211632 1.3851875 2.6401966 .8366907 .9431858 .7598823 .4454232 1.9883457 1.2152352 .02079358 .5394782 1.9892883 .59156346 1.771586 .7059121 1.6993325 1993 8.718223539089278 .0953401 1.2928293 2.4862654 .7804273 .8841264 .7520288 .3829085 1.8947308 1.1169081 .006342988 .4968173 1.9358838 .57108116 1.671374 .6511567 1.677971 1994 9.018137849286365 .14104815 1.4323093 2.684177 .9205676 .9821399 .7725423 .4015372 2.0624843 1.209895 .05352742 .5447558 2.007823 .6452194 1.780696 .7197031 1.838813 1995 9.440873861653367 .17708066 1.552313 2.839993 1.011041 1.0817237 .8643355 .4445751 2.20919 1.306966 .08057525 .602904 2.1030087 .6896394 1.953476 .7911084 1.9506484 1996 9.68651813767495 .2981521 1.6622636 2.90722 1.0993156 1.1392124 1.0088422 .5702331 2.3131716 1.368852 .25850698 .679367 2.2016501 .7598509 2.0863593 .8354342 2.0844743 1997 10.170665872808662 .4323548 1.847923 3.093508 1.2064394 1.279635 1.1307302 .6806549 2.519565 1.4989083 .3987413 .8008153 2.3819695 .8696759 2.2896109 .9491547 2.2691915 end format %ty year
Code:
pca norm* if year < 1975 screeplot qui{ cap drop pred predict double pred* } drop norm*
Currently, this code
Code:
reg gdp pred1-pred2 if year < 1975 // intervention begins here predict cf, xb line gdp cf year, lcol(black red) xli(1975, lcol(blue) lpat(dash) lwid(thick))