Model with interactions vs joint estimation

Zurab Buadze

Join Date: Jan 2022
Posts: 5

Model with interactions vs joint estimation

23 Jan 2022, 04:59

Hello!

I would appreciate some advice about choosing between a model with interactions (data in long format) and a joint estimation (in wide format).

I am running a model to estimate the effect of several expenditure categories on revenue. To generate an illustration, the following is an example of the data in wide and long formats respectively.

Code:

year     cost1   cost2     revenue
2019 10300.48 4035.11 190495.36 
2019 7199.57 10395.63 210668.33
2019 11203.21 7352.16 215108.37

Code:

year source        cost revenue
2019 "source1"  10300.48 190495.36 
2019 "source2"  4035.11 190495.36 
2019 "source1"  7199.57 210668.33
2019 "source2" 10395.63 210668.33
2019 "source1" 11203.21 215108.37
2019 "source2" 7352.16 215108.37

Now, this is the regression output if we run a simple model with both cost1 and cost2 included as explanatory variables (wide data).

Code:

. reg revenue cost1 cost2 if year > 2020

      Source |       SS           df       MS      Number of obs   =        52
-------------+----------------------------------   F(2, 49)        =     89.46
       Model |  9.2743e+11         2  4.6371e+11   Prob > F        =    0.0000
    Residual |  2.5399e+11        49  5.1836e+09   R-squared       =    0.7850
-------------+----------------------------------   Adj R-squared   =    0.7762
       Total |  1.1814e+12        51  2.3165e+10   Root MSE        =     71997

------------------------------------------------------------------------------
     revenue |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       cost1 |   10.83604   1.989934     5.45   0.000      6.83712    14.83497
       cost2 |   3.998844   1.036033     3.86   0.000     1.916857    6.080831
       _cons |   64755.98   29273.31     2.21   0.032     5929.067    123582.9
------------------------------------------------------------------------------

On the other hand, if we run a model with interactions (long data), we get the following output (below). Previously, I was under the impression that this model should replicate the model above. However, I can see from the margins command (and directly from the coefficient of cost below) that coefficients from this model replicate not those in the above model, where cost1 and cost2 are jointly estimated, but coefficients from separate bivariate regression models (reg revenue cost1 and reg revenue cost2).

My question is: is it possible to replicate the above model in panel data format? It seems logical that the more robust way to proceed is including both variables in the regression (notwithstanding multicollinearity). I thought the interaction approach did just this, only in different data format, but turns out it estimates the coefficients from separate regressions for source = 1 and source = 2. I would think data format (wide vs long) would not have any impact on the results as long as the same exact commands are used, so I would like to ask (1) how it is possible to replicate the wide data model above in long data format, and (2) would you argue there is any reason in favour of not including both cost measures in the model and proceeding with the following model with interactions.

Thank you!

Code:

. reg revenue c.cost##i.source if year > 2020

      Source |       SS           df       MS      Number of obs   =       104
-------------+----------------------------------   F(3, 100)       =     73.26
       Model |  1.6239e+12         3  5.4131e+11   Prob > F        =    0.0000
    Residual |  7.3892e+11       100  7.3892e+09   R-squared       =    0.6873
-------------+----------------------------------   Adj R-squared   =    0.6779
       Total |  2.3628e+12       103  2.2940e+10   Root MSE        =     85960

-------------------------------------------------------------------------------
      revenue |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
         cost |   8.269876   .8081747    10.23   0.000      6.66648    9.873271
              |
       source |
      Google  |   19917.72   46326.79     0.43   0.668    -71993.32    111828.8
              |
source#c.cost |
      Google  |   8.380857   1.750063     4.79   0.000     4.908781    11.85293
              |
        _cons |   95964.47   34274.35     2.80   0.006     27965.14    163963.8
-------------------------------------------------------------------------------

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17603
#2

23 Jan 2022, 05:11

Zurab:
the two regressions are completely different (the simplest difference is the number of observations 52 vs, 104).
That said, if you what to panel, you should have the same sample measured on the same variables at (preferably) equally spaced time intervals.

Last edited by Carlo Lazzaro; 23 Jan 2022, 05:25.

Kind regards,
Carlo
(StataNow 18.5)
Comment

Zurab Buadze

Join Date: Jan 2022
Posts: 5

23 Jan 2022, 05:21

Carlo,

Thanks for the quick reply!

I completely understand that the models are different, but the interaction model replicates the coefficients from (also different) models, only with separate regressions:

Code:

. reg revenue cost1 if year > 2020

      Source |       SS           df       MS      Number of obs   =        52
-------------+----------------------------------   F(1, 50)        =    128.35
       Model |  8.5020e+11         1  8.5020e+11   Prob > F        =    0.0000
    Residual |  3.3122e+11        50  6.6243e+09   R-squared       =    0.7196
-------------+----------------------------------   Adj R-squared   =    0.7140
       Total |  1.1814e+12        51  2.3165e+10   Root MSE        =     81390

------------------------------------------------------------------------------
     revenue |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       cost1 |   16.65073   1.469751    11.33   0.000     13.69865    19.60281
       _cons |   115882.2   29510.85     3.93   0.000     56607.89    175156.5
------------------------------------------------------------------------------

. reg revenue cost2 if year > 2020

      Source |       SS           df       MS      Number of obs   =        52
-------------+----------------------------------   F(1, 50)        =     94.89
       Model |  7.7372e+11         1  7.7372e+11   Prob > F        =    0.0000
    Residual |  4.0770e+11        50  8.1540e+09   R-squared       =    0.6549
-------------+----------------------------------   Adj R-squared   =    0.6480
       Total |  1.1814e+12        51  2.3165e+10   Root MSE        =     90300

------------------------------------------------------------------------------
     revenue |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       cost2 |   8.269876   .8489707     9.74   0.000     6.564668    9.975083
       _cons |   95964.47   36004.49     2.67   0.010     23647.32    168281.6
------------------------------------------------------------------------------

And panel data:

Code:

. . reg revenue c.cost##i.source if year > 2020

      Source |       SS           df       MS      Number of obs   =       104
-------------+----------------------------------   F(3, 100)       =     73.26
       Model |  1.6239e+12         3  5.4131e+11   Prob > F        =    0.0000
    Residual |  7.3892e+11       100  7.3892e+09   R-squared       =    0.6873
-------------+----------------------------------   Adj R-squared   =    0.6779
       Total |  2.3628e+12       103  2.2940e+10   Root MSE        =     85960

-------------------------------------------------------------------------------
      revenue |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
         cost |   8.269876   .8081747    10.23   0.000      6.66648    9.873271
              |
       source |
      Google  |   19917.72   46326.79     0.43   0.668    -71993.32    111828.8
              |
source#c.cost |
      Google  |   8.380857   1.750063     4.79   0.000     4.908781    11.85293
              |
        _cons |   95964.47   34274.35     2.80   0.006     27965.14    163963.8
-------------------------------------------------------------------------------

. margins, dydx(cost) at(source ==2)

Average marginal effects                        Number of obs     =        104
Model VCE    : OLS

Expression   : Linear prediction, predict()
dy/dx w.r.t. : cost
at           : source          =           2

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        cost |   16.65073   1.552281    10.73   0.000     13.57105    19.73041
------------------------------------------------------------------------------

So what I would like to know is if it's possible to replicate the joint model as well in panel data format. The reason I'm asking this question is, as I mentioned in the post, I believe including both variables in the regression is the more robust way (so as to not lose information), and I assume the approach one is taking should not be dependent on whether the data is in wide or long format, all other things equal. Thank you again!

Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17603

23 Jan 2022, 05:49

Zurab:
as per your data excerpt in -long- format, this is what I obtain with -xtreg- and -regress-. Please note that -regress- is not the first choice when you have a panel dataset:

Code:

. encode source, g(num_source)
. xtset num_source year
repeated time values within panel
r(451);

. xtset num_source

Panel variable: num_source (balanced)

. xtreg revenue i.num_source##c.cost, fe
note: 2.num_source omitted because of collinearity.

Fixed-effects (within) regression               Number of obs     =          6
Group variable: num_source                      Number of groups  =          2

R-squared:                                      Obs per group:
     Within  = 0.3171                                         min =          3
     Between =      .                                         avg =        3.0
     Overall = 0.0402                                         max =          3

                                                F(2,2)            =       0.46
corr(u_i, Xb) = -0.9344                         Prob > F          =     0.6829

-----------------------------------------------------------------------------------
          revenue | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
------------------+----------------------------------------------------------------
       num_source |
         source2  |          0  (omitted)
             cost |  -.8525919   5.162163    -0.17   0.884    -23.06359     21.3584
                  |
num_source#c.cost |
         source2  |   4.087677   6.185454     0.66   0.577    -22.52619    30.70154
                  |
            _cons |   197757.8   28320.88     6.98   0.020     75902.87    319612.7
------------------+----------------------------------------------------------------
          sigma_u |  22377.976
          sigma_e |  15330.755
              rho |  .68057873   (fraction of variance due to u_i)
-----------------------------------------------------------------------------------
F test that all u_i=0: F(1, 2) = 0.31                        Prob > F = 0.6326

. xtreg revenue i.num_source##c.cost, re

Random-effects GLS regression                   Number of obs     =          6
Group variable: num_source                      Number of groups  =          2

R-squared:                                      Obs per group:
     Within  = 0.0000                                         min =          3
     Between = 0.0000                                         avg =        3.0
     Overall = 0.3171                                         max =          3

                                                Wald chi2(3)      =       0.93
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.8185

-----------------------------------------------------------------------------------
          revenue | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
------------------+----------------------------------------------------------------
       num_source |
         source2  |  -31647.24   56641.76    -0.56   0.576    -142663.1    79368.58
             cost |  -.8525919   5.162163    -0.17   0.869    -10.97025    9.265061
                  |
num_source#c.cost |
         source2  |   4.087677   6.185454     0.66   0.509    -8.035591    16.21095
                  |
            _cons |   213581.4   50177.14     4.26   0.000       115236    311926.8
------------------+----------------------------------------------------------------
          sigma_u |          0
          sigma_e |  15330.755
              rho |          0   (fraction of variance due to u_i)
-----------------------------------------------------------------------------------

. reg revenue i.num_source##c.cost

      Source |       SS           df       MS      Number of obs   =         6
-------------+----------------------------------   F(3, 2)         =      0.31
       Model |   218244724         3  72748241.3   Prob > F        =    0.8215
    Residual |   470064091         2   235032046   R-squared       =    0.3171
-------------+----------------------------------   Adj R-squared   =   -0.7073
       Total |   688308815         5   137661763   Root MSE        =     15331

-----------------------------------------------------------------------------------
          revenue | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
------------------+----------------------------------------------------------------
       num_source |
         source2  |  -31647.24   56641.76    -0.56   0.633    -275357.1    212062.6
             cost |  -.8525919   5.162163    -0.17   0.884    -23.06359     21.3584
                  |
num_source#c.cost |
         source2  |   4.087677   6.185454     0.66   0.577    -22.52619    30.70154
                  |
            _cons |   213581.4   50177.14     4.26   0.051    -2313.417    429476.2
-----------------------------------------------------------------------------------

.

Please also note that -regress- standard errors are not clustered on -num_source- (that is, your -panelid-) as two groups (ie, panels) of observations are too few to make the cluster machinery works properly.

Kind regards,
Carlo
(StataNow 18.5)

Comment

Zurab Buadze

Join Date: Jan 2022
Posts: 5

23 Jan 2022, 06:19

Carlo,

Thanks again for your time!

Sorry for not being able to provide the entire data, I just included an excerpt to show the data properties. I am only using -reg- in panel data format to explicitly model "source", which corresponds to the fixed effects. I am aware the models are different, but they are numerically equivalent for the purposes here.

I know I can replicate "xtreg revenue cost, fe" with "reg revenue cost i.source" (or other combinations). This is not what I am trying to do. I will illustrate on a public dataset.

First, I take a wide dataset and reshape it into long. Then I run the model with interactions:

Code:

. use https://stats.idre.ucla.edu/stat/stata/modules/kidshtwt, clear

. drop wt2

. reshape long ht, i(famid birth) j(age)
(note: j = 1 2)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                        9   ->      18
Number of variables                   5   ->       5
j variable (2 values)                     ->   age
xij variables:
                                ht1 ht2   ->   ht
-----------------------------------------------------------------------------

. reg wt1 c.ht##i.age

      Source |       SS           df       MS      Number of obs   =        18
-------------+----------------------------------   F(3, 14)        =      0.46
       Model |  4.59754985         3  1.53251662   Prob > F        =    0.7138
    Residual |  46.5135613        14  3.32239723   R-squared       =    0.0900
-------------+----------------------------------   Adj R-squared   =   -0.1051
       Total |  51.1111111        17  3.00653595   Root MSE        =    1.8227

------------------------------------------------------------------------------
         wt1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          ht |  -1.916342   1.705495    -1.12   0.280    -5.574265    1.741581
       2.age |   -2.61139   6.239886    -0.42   0.682    -15.99461    10.77183
             |
    age#c.ht |
          2  |   1.375146   2.307385     0.60   0.561    -3.573703    6.323996
             |
       _cons |   25.52335   3.875808     6.59   0.000     17.21056    33.83613
------------------------------------------------------------------------------

. margins, dydx(ht) at (age==2)

Average marginal effects                        Number of obs     =         18
Model VCE    : OLS

Expression   : Linear prediction, predict()
dy/dx w.r.t. : ht
at           : age             =           2

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          ht |  -.5411956   1.554128    -0.35   0.733    -3.874468    2.792077

Now I run the models on the dataset in original form. As you can see, the coefficients of "ht" above (in wide format) correspond to coefficients of "ht1" and "ht2" when they are regressed separately on "wt1" (long format). What I am trying to replicate in wide format is the third model below - where "ht1" and "ht2" are included together in the model. Suppose I have panel data and wish to include the two variables together in the model. I assumed interactions did this, but apparently they estimate coefficients on separate samples for the interaction variable. So I am trying to see if there is another way to estimate coefficients with both variables included in the model without reshaping data.

Code:

. reg wt1 ht1

      Source |       SS           df       MS      Number of obs   =         9
-------------+----------------------------------   F(1, 7)         =      1.37
       Model |  4.19465975         1  4.19465975   Prob > F        =    0.2794
    Residual |  21.3608958         7  3.05155654   R-squared       =    0.1641
-------------+----------------------------------   Adj R-squared   =    0.0447
       Total |  25.5555556         8  3.19444444   Root MSE        =    1.7469

------------------------------------------------------------------------------
         wt1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         ht1 |  -1.916342   1.634502    -1.17   0.279    -5.781324     1.94864
       _cons |   25.52335   3.714473     6.87   0.000     16.74001    34.30668
------------------------------------------------------------------------------

. reg wt1 ht2

      Source |       SS           df       MS      Number of obs   =         9
-------------+----------------------------------   F(1, 7)         =      0.11
       Model |  .402890105         1  .402890105   Prob > F        =    0.7475
    Residual |  25.1526655         7  3.59323792   R-squared       =    0.0158
-------------+----------------------------------   Adj R-squared   =   -0.1248
       Total |  25.5555556         8  3.19444444   Root MSE        =    1.8956

------------------------------------------------------------------------------
         wt1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         ht2 |  -.5411956   1.616233    -0.33   0.748    -4.362979    3.280588
       _cons |   22.91196   5.085644     4.51   0.003     10.88632    34.93759
------------------------------------------------------------------------------

. reg wt1 ht1 ht2

      Source |       SS           df       MS      Number of obs   =         9
-------------+----------------------------------   F(2, 6)         =      1.22
       Model |  7.37114101         2  3.68557051   Prob > F        =    0.3603
    Residual |  18.1844145         6  3.03073576   R-squared       =    0.2884
-------------+----------------------------------   Adj R-squared   =    0.0512
       Total |  25.5555556         8  3.19444444   Root MSE        =    1.7409

------------------------------------------------------------------------------
         wt1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         ht1 |  -4.258625   2.808546    -1.52   0.180    -11.13089    2.613639
         ht2 |   2.620094    2.55928     1.02   0.345    -3.642238    8.882427
       _cons |   22.59995   4.675176     4.83   0.003     11.16021     34.0397
------------------------------------------------------------------------------

Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17603

23 Jan 2022, 07:09

Zurab:
fixed effect regrression via -regress- requires that you plug the -panelid- in the right-hand side of your regression:

Code:

. reg wt1 i.famid c.ht##i.age

      Source |       SS           df       MS      Number of obs   =        18
-------------+----------------------------------   F(5, 12)        =      1.37
       Model |  18.5355756         5  3.70711512   Prob > F        =    0.3037
    Residual |  32.5755355        12  2.71462796   R-squared       =    0.3627
-------------+----------------------------------   Adj R-squared   =    0.0971
       Total |  51.1111111        17  3.00653595   Root MSE        =    1.6476

------------------------------------------------------------------------------
         wt1 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       famid |
          2  |   3.129854   1.384912     2.26   0.043      .112389    6.147319
          3  |   1.688644   1.077032     1.57   0.143    -.6580073    4.035295
             |
          ht |   1.112259   2.040362     0.55   0.596    -3.333309    5.557826
       2.age |  -1.786838   5.687297    -0.31   0.759    -14.17839    10.60472
             |
    age#c.ht |
          2  |   .2595977   2.152542     0.12   0.906    -4.430388    4.949584
             |
       _cons |   17.11965   5.106807     3.35   0.006     5.992877    28.24643
------------------------------------------------------------------------------

This way the common coefficient between -regress- and -xtreg,fe- are identical:

Code:

. xtset famid

Panel variable: famid (balanced)

. xtreg wt1  c.ht##i.age, fe

Fixed-effects (within) regression               Number of obs     =         18
Group variable: famid                           Number of groups  =          3

R-squared:                                      Obs per group:
     Within  = 0.0603                                         min =          6
     Between = 0.9793                                         avg =        6.0
     Overall = 0.0584                                         max =          6

                                                F(3,12)           =       0.26
corr(u_i, Xb) = -0.6870                         Prob > F          =     0.8551

------------------------------------------------------------------------------
         wt1 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          ht |   1.112259   2.040362     0.55   0.596    -3.333309    5.557826
       2.age |  -1.786838   5.687297    -0.31   0.759    -14.17839    10.60472
             |
    age#c.ht |
          2  |   .2595977   2.152542     0.12   0.906    -4.430388    4.949584
             |
       _cons |   18.72582   4.612295     4.06   0.002     8.676493    28.77515
-------------+----------------------------------------------------------------
     sigma_u |  1.5665563
     sigma_e |  1.6476128
         rho |  .47479753   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(2, 12) = 2.57                       Prob > F = 0.1180

.

The previous comment about clustered standard errors still holds.

Last edited by Carlo Lazzaro; 23 Jan 2022, 07:13.

Kind regards,
Carlo
(StataNow 18.5)

Comment

Zurab Buadze

Join Date: Jan 2022

Posts: 5
#7

23 Jan 2022, 07:29

Carlo,

I used "age" as the -panelid- in the example, because I reshaped data according to "age" and not "famid". The example was simply to illustrate that the panel data model with interactions on age replicates coefficients from the wide data models with "ht1" and "ht2" (corresponding to ages 1 and 2) regressed separately on the dependent variable, and not together. I might be not explaining clearly, but it appears to me you think I am trying to replicate "xtreg, fe" with "reg, i.panelid" in panel data. This is not an issue, I can replicate FE with "reg" perfectly, I am trying to replicate in panel data the model that is estimated in wide data (with both ht1 and ht2 included in the model). I also agree with your comment about standard errors, but my issue concerns coefficient replication first of all, as I can replicate (in panel data) "ht" coefficients from the first two models in post #5 at the end, but not the 3rd one. Apologies if I am misunderstanding your comments.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17603

23 Jan 2022, 08:20

Zurab:
the dataset available from https://stats.idre.ucla.edu/stat/stata/modules/kidshtwt is actually in -long- format.
If you want to go panel with it (assuming that you are interested in -fe- specification):

Code:

. use https://stats.idre.ucla.edu/stat/stata/modules/kidshtwt, clear

. xtset famid birth

Panel variable: famid (strongly balanced)
 Time variable: birth, 1 to 3
         Delta: 1 unit

. xtreg wt1 c.ht1##c.ht2, fe

Fixed-effects (within) regression               Number of obs     =          9
Group variable: famid                           Number of groups  =          3

R-squared:                                      Obs per group:
     Within  = 0.1344                                         min =          3
     Between = 0.5812                                         avg =        3.0
     Overall = 0.0020                                         max =          3

                                                F(3,3)            =       0.16
corr(u_i, Xb) = -0.5639                         Prob > F          =     0.9198

------------------------------------------------------------------------------
         wt1 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         ht1 |   7.933671   31.90006     0.25   0.820    -93.58654    109.4539
         ht2 |   6.985955   17.20438     0.41   0.712    -47.76607    61.73799
             |
 c.ht1#c.ht2 |  -2.454112   8.288878    -0.30   0.786    -28.83302     23.9248
             |
       _cons |  -.9201627   64.19629    -0.01   0.989    -205.2214    203.3811
-------------+----------------------------------------------------------------
     sigma_u |  1.6018417
     sigma_e |  2.2363795
         rho |  .33907738   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(2, 3) = 0.29                        Prob > F = 0.7653

. reg wt1 i.famid c.ht1##c.ht2

      Source |       SS           df       MS      Number of obs   =         9
-------------+----------------------------------   F(5, 3)         =      0.42
       Model |  10.5513764         5  2.11027527   Prob > F        =    0.8130
    Residual |  15.0041792         3  5.00139307   R-squared       =    0.4129
-------------+----------------------------------   Adj R-squared   =   -0.5657
       Total |  25.5555556         8  3.19444444   Root MSE        =    2.2364

------------------------------------------------------------------------------
         wt1 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       famid |
          2  |   3.177099   4.320447     0.74   0.515    -10.57249    16.92669
          3  |   1.231869   2.923588     0.42   0.702    -8.072293    10.53603
             |
         ht1 |   7.933671   31.90006     0.25   0.820    -93.58654    109.4539
         ht2 |   6.985955   17.20438     0.41   0.712    -47.76607    61.73799
             |
 c.ht1#c.ht2 |  -2.454112   8.288878    -0.30   0.786    -28.83302     23.9248
             |
       _cons |  -2.389819   65.32599    -0.04   0.973    -210.2863    205.5066
------------------------------------------------------------------------------

.

I might not be on a roll today, but I do not understand what you want to obtain from your data.

Kind regards,
Carlo
(StataNow 18.5)

Comment

Zurab Buadze

Join Date: Jan 2022

Posts: 5
#9

23 Jan 2022, 10:26

Carlo,

What I am trying to obtain from the data is to generate same coefficients whether my data is in long or wide form. In this particular case, I have raw data and I can prepare the dataset for estimation both in long and wide formats. Since the dataset is the same, with only their format being different, I assume all models can be replicated in both formats (with different techniques of course). This is why I used the public dataset as an example of 1) if the data is long and 2) if the data is wide.

Now if I prepare my dataset in long format and run the model with interactions with the dummy variable of interest which is age in this case (e.g. reg wt1 c.ht##i.age in post #5), this would be equivalent to preparing the dataset in wide format and running models separately for when the age = 1 and = 2 (e.g. reg wt1 ht1; reg wt1 ht2 in post #5). However, if I had the data in wide format, I would run the model with both variables together rather than separately (e.g. reg wt1 ht1 ht2 in post #5). The issue is I cannot replicate the latter model if I have data in long format, since the model with interactions estimates coefficients equivalent to separate models for age = 1 and = 2, as demonstrated above. So I am asking if there is a way to replicate this model (reg wt1 ht1 ht2) in panel data. I assume there is, because the dataset is the same. Of course reshaping the data gives this option, but I assume it should be possible in either form.
Comment

Announcement

Model with interactions vs joint estimation

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment