xtdcce2 is not working

Frank Giaquinto

Join Date: Dec 2023

Posts: 30
#1

xtdcce2 is not working

07 Jan 2025, 04:44

Hi everyone,

I am learning how to use the command xtdcce2 developed by Jan Ditzen.

To do this, I am attempting to replicate the first example provided by Ditzen (2018, Stata Journal) for estimating the Solow Growth model using the Jackknife bias-corrected Dynamic CCEMG estimator.

I downloaded the dataset xtdcce2_sample_dataset.dta and entered the following commands:

use xtdcce2_sample_dataset.dta, clear
xtset id year
xtdcce2 log_rgdpo L.log_rgdpo log_ck log_ngd, crosssectional(log_rgdpo log_ck log_ngd) cr_lags(3) jackknife

However, I encountered the following error:

xtdcce_m_reg(): 3301 subscript invalid
xtdcce_m_reg(): - function returned error
<istmt>: - function returned error
r(3301);

I'm using STATA 17, and xtdcce2 4.7 - 03.06.2024.

Actually, the problem occurs by adding the Jackknife correction

Last edited by Frank Giaquinto; 07 Jan 2025, 04:59.
Tags: xtdcce2
George Ford

Join Date: Aug 2014

Posts: 3120
#2

07 Jan 2025, 09:19

did you install moremata?
Comment
Frank Giaquinto

Join Date: Dec 2023

Posts: 30
#3

07 Jan 2025, 10:39

Dear George,

Thank you very much for your response. Yes, I have installed moremata.

The code works if I omit jackknife, but doing so means the jackknife correction is not applied.
Comment
George Ford

Join Date: Aug 2014

Posts: 3120
#4

07 Jan 2025, 11:20

I can replicate the problem. I'd send a note to Stata.
Comment
Frank Giaquinto

Join Date: Dec 2023

Posts: 30
#5

07 Jan 2025, 11:39

Thanks a lot!
Comment
Frode Andre

Join Date: Oct 2023

Posts: 46
#6

08 Jan 2025, 13:10

Somewhat surprising, I am on Stata 18 and on version:

Code:

xtdcce2 2.0 - 13.07.2019; update 22.12.2020

And it works fine.
Comment
Frank Giaquinto

Join Date: Dec 2023

Posts: 30
#7

10 Jan 2025, 08:13

Actually, even the Dynamic CCEMG IV Estimator cannot be implemented. The following command (taken from the Stata Journal article published by Ditzen, 2018) does not work:

xtdcce2 log_rgdpo L.log_rgdpo log_ngd (log_ck = L.log_ck L2.log_ck), crosssectional(log_rgdpo log_ck log_ngd) cr_lags(3) ivreg2options(noid)

Indeed, I got the following error:

__000014_SMF not found

Is there a way to install the version released just before the latest one?

Last edited by Frank Giaquinto; 10 Jan 2025, 09:04.
Comment
JanDitzen

Join Date: Jan 2015

Posts: 348
#8

13 Jan 2025, 01:59

There was a bug caused by the recently implemented calculation of information criteria which caused both errors. This bug is now fixed. Please install the latest version from GitHub:

Code:

net install xtdcce2 , from("https://janditzen.github.io/xtdcce2/")
2 likes
Comment
Frank Giaquinto

Join Date: Dec 2023

Posts: 30
#9

13 Jan 2025, 07:09

Dear Jan,

Thank you very much for fixing the bug so quickly - everything works now. I have a clarification question about your code if I may ask.

Does the (Dynamic) Common Correlated Effects Estimator - Mean Group IV implement the methodology proposed by Neal (2015)? If so, what kind of weight matrix does the GMM procedure use in the example (the replication of your paper)? Does it return an efficient HAC weight matrix?

Thank you for your tremendous work and research on these topics.
Comment
JanDitzen

Join Date: Jan 2015

Posts: 348
#10

14 Jan 2025, 02:17

In general no - however there is a way around it. xtdcce2 can use ivreg2 to estimate the IV model. Hence if you pass the appropriate options to ivreg2, it should be possible to use GMM with a HAC weight matrix.

I would like to point out two things however: 1) The paper by Neal (2015) was - to the best of my knowledge - never published. 2) the literature on (D)CCE + IV is very scarce and it depends on the source of endogeneity. In the large N,T setting with interactive fixed effects, endogeneity can results from a) reversed causality ("classical micro setting"; Y <-> X), b) lags (dynamic GMM setting), c) spatial endogeneity via spatial lag or d) strong cross-section dependence. There is a literature on b, c and d, but besides Neal (2015) none on a). Hence I would be very careful employing the CCE + IV (+ MG which even complicates things more) estimator and only do it if absolutely necessary.
Comment
Frank Giaquinto

Join Date: Dec 2023

Posts: 30
#11

14 Jan 2025, 07:17

Dear Jan,
Thank you for clarifying this point. I have been trying to implement the Pesaran and Smith (1995) Mean Group (MG) Estimator using an ARDL specification, but I encountered a technical issue. To ensure that I am running the code correctly, I am using the dataset jasa2 from Ditzen (2021, Stata Journal). The article provides the following code to implement the CS-ARDL estimator:

xtdcce2 c if year >= 1962, lr(L.c L(0/1).y pi L.pi) ///
lr_options(ardl) crosssectional(_all) cr_lags(3)

This code works perfectly. However, let us suppose that cross-sectional dependence between errors is not an issue, and I wish to drop the cross-sectional averages. In principle, this adjustment should allow me to estimate the MG-Estimator (Pesaran and Smith, 1995). The modified command I use is:

xtdcce2 c if year >= 1962, lr(L.c L(0/1).y pi L.pi) ///
lr_options(ardl) crosssectional(_none)

However, I get the following error:

invsym(): 3300 argument out of range
m_xtdcce_inverter(): - function returned error
xtdcce2_ic(): - function returned error
xtdcce_m_reg(): - function returned error
<istmt>: - function returned error

I would appreciate your guidance on resolving this issue.

Finally, I have a couple of methodological questions that I believe would be very enlightening. Thank you in advance for your time and assistance:

1) Assuming slope heterogeneity, would you agree that a preliminary (and imprecise) approach to address unobserved common factors is to apply the between transformation to each series to eliminate the time dummies?

2)
On page 699 of your article in the Stata Journal (2021), you provide a table with the point estimates of the coefficients for an ARDL(1,1,1) model where cross-sectional averages are included. For example, the point estimate of the long-run coefficient for pi is reported as -0.5976.

Is it correct that this value cannot be consistently derived by taking the ratio of the sum of the short-run coefficients for pi (-0.113 - 0.0146) to (1 minus the sum of the short-run coefficients for the dependent variable, 0.3888)? This is because the long-run coefficient is not linear in the short-run parameters, and the ratio of two unbiased estimators does not necessarily yield an unbiased estimator.

The paper mentions that long-run coefficients are computed differently compared to the Maximum Likelihood method used by the xtpmg command. Are these methods numerically equivalent?

Are the short-run coefficients directly interpretable in this context? For instance, if the dependent variable is log consumption and pi is log inflation, would it be accurate to interpret that a 1% increase in inflation is expected to reduce consumption in the short term by (-0.113 - 0.0146)%?

3)
When using an ARDL specification, are weak stationarity and cointegration still concerns? My understanding is that the ARDL specification should be robust to the presence of cointegration and to different orders of integration I(d), provided that d<2. However, do you still recommend conducting panel unit root tests to ensure that the variables are not I(2) ? Thank you very much!

Last edited by Frank Giaquinto; 14 Jan 2025, 07:36.
Comment
JanDitzen

Join Date: Jan 2015

Posts: 348
#12

23 Jan 2025, 04:35

I have overlooked this post.

If you do not want to add cross-section averages, i.e. use the MG estimator, the option is "nocrosssectional".

On your questions:
1) Yes, but it would be very imprecise. One would have to write this down properly, but I think there will still be a bias in it. Hence I would avoid it.
2) You need to carefully differentiate if you take the averages of the individual long run coefficients, or calculate the long run coefficients from the averaged short run coefficients. Results between xtpmg and xtdcce2 are not numerically equivalent. See also the discussion in the paper.
3) Yes, I would definitively recommend to do those tests.

Hope this clarifies your questions.
1 like
Comment
Frank Giaquinto

Join Date: Dec 2023

Posts: 30
#13

23 Jan 2025, 13:47

Dear Jan, thank you very much!
1 like
Comment

Abdul Sazali

Join Date: Mar 2022
Posts: 10

#14

27 Jan 2025, 07:32

Dear JanDitzen,

I am trying to implement the command -xtdcce2- by following the example you provided in this journal page 602 to address the endogeneity issue using instrumental variables in the presence of cross-sectional dependence, heteroskedasticity and autocorrelation issues.

I am using a balanced panel data with the following details:
- 819 groups
- 8 years (2012-2019)
- 5032 observations.

My dependent variable is natural log fare, lnfare; and my endogenous independent variable is natural log number of passengers, lnpax. I use two instrumental variables for lnpax that consist of "lnpop" and "lntrade". Other independent variables include the DID dummy variable, asam; totalcarriers, desigcarriers, fscper, lngdp, fuelprice, and atii.

The error structure is assumed to be heteroskedastic and autocorrelated up to 7 lags obtained from the Breusch-Godfrey Lagrange Multiplier (LM) test for serial correlation.

I am using the following code to run the model:

Code:

. xtset routeid year

Panel variable: routeid (strongly balanced)
 Time variable: year, 2012 to 2019
         Delta: 1 unit

. 
end of do-file

. do "C:\Users\tsams\AppData\Local\Temp\STDb84_000000.tmp"

. xtdcce2 lnfare asam totalcarriers desigcarriers fscper lngdp countryfuel atii (lnpax = lnpop lntrade), crosssectional(lnfare lnpax totalcarriers desigcarriers fscp
> er lngdp countryfuel atii) cr_lags(7) ivreg2options(noid)

and the result as follow:

Code:

. xtdcce2 lnfare asam totalcarriers desigcarriers fscper lngdp countryfuel atii (lnpax = lnpop lntrade), crosssectional(lnfare lnpax totalcarriers desigcarriers fscp
> er lngdp countryfuel atii) cr_lags(7) ivreg2options(noid)

Large number of observations, xtdcce2 might be very slow and problems occur if maximum of matrix size is reached.
Consider the use of xtdcce2fast instead of xtdcce2.

Units (routeid) to be removed due to insufficient numbers of observations:  1 2 3 4 5 6 7 8 9 10 12 13 14 16 17 18 19 20 22 23 24 25 26 27 28 29 30 32 34 35 36 37 38
>  39 40 41 42 43 44 45 46 47 49 50 51 52 53 54 55 56 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 80 81 83 84 85 87 88 89 90 91 93 94 95 96 99 100 10
> 1 102 103 104 105 106 107 108 111 112 113 114 115 116 117 118 119 120 126 127 128 129 130 131 132 134 137 138 139 140 141 142 144 145 146 149 150 151 152 153 154 1
> 55 156 159 160 161 165 168 169 170 171 172 173 174 175 176 178 181 182 183 184 185 186 187 188 189 190 191 192 193 194 196 197 198 199 200 201 205 206 207 208 210 
> 211 212 213 214 215 216 217 218 219 220 221 223 225 226 227 228 229 230 231 232 236 238 239 240 241 242 243 244 245 246 247 248 249 250 253 254 255 256 257 258 259
>  260 262 263 264 265 266 268 269 270 271 272 273 274 275 276 278 279 280 282 283 284 286 287 288 289 290 291 292 293 294 297 298 299 300 301 302 303 304 305 307 30
> 8 309 310 311 312 313 314 315 317 318 320 321 322 323 326 327 329 330 331 332 333 334 335 337 339 340 341 345 346 347 348 349 351 355 356 361 362 363 364 366 367 3
> 68 370 371 372 373 374 375 377 378 380 381 382 383 385 386 387 389 390 391 392 393 394 395 396 399 402 403 404 405 407 408 409 410 412 413 414 415 416 417 418 419 
> 420 423 425 426 431 433 434 435 436 437 439 440 441 444 446 447 448 450 452 453 454 455 456 458 461 462 463 464 466 467 469 470 471 472 473 474 475 476 486 487 488
>  489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 506 507 511 512 513 514 515 516 517 519 521 522 523 524 525 527 528 529 530 531 532 534 535 536 53
> 7 538 539 540 541 542 544 545 547 548 549 550 552 553 554 555 556 560 563 566 567 570 572 575 579 580 581 583 584 585 587 588 589 590 593 594 595 596 597 599 601 6
> 02 603 604 605 606 607 608 609 610 611 613 614 615 616 617 618 619 620 621 623 624 625 626 627 628 629 630 631 632 633 634 635 637 638 640 641 643 645 646 648 649 
> 650 651 652 653 654 655 656 657 658 661 662 663 664 665 666 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 684 685 686 687 688 689 690 691 692 693 694
>  695 696 697 698 699 700 701 702 703 704 705 706 708 710 711 712 713 715 717 720 721 722 724 727 728 729 730 733 734 735 736 737 738 740 741 742 743 744 745 746 74
> 8 749 750 751 752 753 754 756 757 758 761 763 764 765 766 767 770 771 772 774 775 778 779 780 782 784 785 786 789 791 792 793 794 795 797 798 799 800 802 803 804 8
> 05 806 807 808 810 811 812 813 814 815 816 818 819

No observations left.
r(2001);

end of do-file

Please kindly advice why the units are removed given that my dataset has a large numbers of observations. Please help me to solve the issues.

Many thanks,
Abdul

Comment

JanDitzen

Join Date: Jan 2015

Posts: 348
#15

27 Jan 2025, 07:41

I think you should not use the CCE-MG estimator and xtdcce2 given your data. Please keep in mind that the CCE and MG estimator are for datasets with large N and T. While your N is large, T = 8 is not.

In detail, the MG estimator is estimating a time series equation for each cross-section. You have 8 time periods, hence 8 observations to estimate such an equation. The number of explanatory variables (10) is larger than the number of observations over time. Even without the cross-section averages, it is not possible to estimate such a equation.
Comment

Announcement