Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • global NTij = `: word count `r(varlist)'' - An Advanced Guide to Trade Policy Analysis (Chpt. 2 Applications)

    Dear Statalist community,

    I'm working through the Yotov et al (2016) guide on the gravity model, which is a lot of fun. I'm still learning how to make proper posts here, so I apologize in advance if something requires clarification. Please find a dataex sample of the data at the end of this post.

    I have one technical question. I read the help files on global and word count, but I don't understand why the authors are doing the following:

    Code:
    * Set additional exogenous parameters
    * Set additional exogenous parameters
            quietly ds EXPORTER_TIME_FE*
            global NT = `: word count `r(varlist)''
            
            quietly tabulate year, gen(TIME_FE)        
            quietly ds TIME_FE*
            global Nyr = `: word count `r(varlist)''
            global NT_yr = $NT - $Nyr
            
            quietly ds PAIR_FE*
            global NTij = `: word count `r(varlist)''
            global NTij_1 = $NTij - 1
            global NTij_8 = $NTij - 8
    It says in the help file that global assigns strings. And word count apparently counts words. But that doesn't bring me any further. Why do they subtract the number 8 at the end?
    I see that they use this macro later on. For example here:

    Code:
        * Construct the trade costs from the pair fixed effects
                        forvalues ijt = 1(1)$NTij_8{
                            qui replace PAIR_FE`ijt' = PAIR_FE`ijt' * _b[PAIR_FE`ijt']
                        }
                        
                        egen gamma_ij = rowtotal(PAIR_FE1-PAIR_FE$NTij )
                            replace gamma_ij = . if gamma_ij == 1 & exporter != importer
                            replace gamma_ij = 0 if gamma_ij == 1 & exporter == importer
                        generate tij_bar = exp(gamma_ij)
                        generate tij_bln = exp(gamma_ij + RTA_est*RTA + NAFTA_est*NAFTA)
    In the code above: Why would they use NTij_8 rather than NTij_1 or NTij? And could I not simply replace it by a scalar (i.e. the last PAIR_FE)?

    If anyone could explain what this global wordcount code is good for, I'd be very thankful.
    Thank you so much.

    Best wishes,
    Marc


    --- Data sample ---

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str3(exporter importer) float(pair_id year) double(trade DIST) byte(CNTG LANG CLNY) float RTA double(ln_DIST Y E E_R_BLN E_R exp_time imp_time NAFTA)
    "AUS" "ARG" 1 2006  35.89710967541014 12044.573995607949 0 0 0 0 9.396369546686529  305793.0057226912  60231.60725245608 . 1771970.0083047857 12  6 0
    "AUS" "ARG" 1 1990  38.84238577440381 12044.573995607949 0 0 0 0 9.396369546686529 128876.86963476577  74599.21484265546 . 1030207.9358447487  8  2 0
    "ARG" "AUS" 1 2006 107.80197615921497 12044.574133735276 0 0 0 0 9.396369558154541  59561.41971842663  362227.3912414866 . 1771970.0083047857  6 12 0
    "AUS" "ARG" 1 1986 52.298013541936875 12044.573995607949 0 0 0 0 9.396369546686529  71626.43612518166  64607.14601827428 .  813980.1273545903  7  1 0
    "ARG" "AUS" 1 1990  60.70578616046905 12044.574133735276 0 0 0 0 9.396369558154541  79001.25715625827 143264.11254164524 . 1030207.9358447487  2  8 0
    "ARG" "AUS" 1 1994 39.490870571225884 12044.574133735276 0 0 0 0 9.396369558154541  99852.35341475373 142061.07233178918 . 1133906.9303312835  3  9 0
    "AUS" "ARG" 1 1994 36.527244535446165 12044.573995607949 0 0 0 0 9.396369546686529 123514.13780373444 108707.03506064929 . 1133906.9303312835  9  3 0
    "ARG" "AUS" 1 1998  40.19169009739161 12044.574133735276 0 0 0 0 9.396369558154541 103362.35019355347 195931.14177788427 . 1114154.4194502595  4 10 0
    "AUS" "ARG" 1 1998  58.88467556142807 12044.573995607949 0 0 0 0 9.396369546686529  170744.1683652075  114167.5652533337 . 1114154.4194502595 10  4 0
    "ARG" "AUS" 1 1986  27.76487390676141 12044.574133735276 0 0 0 0 9.396369558154541 64622.553433544614  83731.51190672339 .  813980.1273545903  1  7 0
    "ARG" "AUS" 1 2002  79.18093992342428 12044.574133735276 0 0 0 0 9.396369558154541  44737.55993643325 167623.98246846374 . 1240242.3194290341  5 11 0
    "AUS" "ARG" 1 2002  8.707986437678336 12044.573995607949 0 0 0 0 9.396369546686529 139927.46492191666   36521.0887583743 . 1240242.3194290341 11  5 0
    "ARG" "AUT" 2 1994   8.79075589519739 11751.146520555496 0 0 0 0 9.371706091029544  99852.35341475373 104209.97342940612 . 1133906.9303312835  3 15 0
    "AUT" "ARG" 2 1986 12.302489494144917 11751.146581945419 0 0 0 0 9.371706096253709  56548.99256757468  64607.14601827428 .  813980.1273545903 13  1 0
    "AUT" "ARG" 2 1990 27.157683624386788 11751.146581945419 0 0 0 0 9.371706096253709  93829.32896953366  74599.21484265546 . 1030207.9358447487 14  2 0
    "ARG" "AUT" 2 1990  9.750014431774616 11751.146520555496 0 0 0 0 9.371706091029544  79001.25715625827  98710.67501031503 . 1030207.9358447487  2 14 0
    "AUT" "ARG" 2 2006 100.60377620289475 11751.146581945419 0 0 0 0 9.371706096253709 170426.57188511224  60231.60725245608 . 1771970.0083047857 18  6 0
    "AUT" "ARG" 2 1998 119.87082076358796 11751.146581945419 0 0 0 0 9.371706096253709  106049.7432038412  114167.5652533337 . 1114154.4194502595 16  4 0
    "AUT" "ARG" 2 2002  29.18210575020313 11751.146581945419 0 0 0 0 9.371706096253709 102992.46192809373   36521.0887583743 . 1240242.3194290341 17  5 0
    "ARG" "AUT" 2 1998  4.868750989615918 11751.146520555496 0 0 0 0 9.371706091029544 103362.35019355347 114632.90923208142 . 1114154.4194502595  4 16 0
    end
    Last edited by Marc Leet; 22 Sep 2018, 15:46.

  • #2
    You have to interpret it in the context of the immediately preceding commands:

    Code:
    quietly ds EXPORTER_TIME_FE*
            global NT = `: word count `r(varlist)''
    The -ds- command creates a list of all variables named EXPORTER_TIME_FE* (i.e. all variables whose names begin with EXPORTER_TIME_FE) and places that list in r(varlist). The -global NT- command then takes the contents of r(varlist), i.e. that list of variables, and counts how many of them there are. It then stores that count in global macro NT. So, the global macro NT now contains a count of the number of variables in the data set whose names begin with EXPORTER_TIME_FE. Why the want that number, I do not know--the subsequent code is obscure to somebody who does not know what these variables are and what they are attempting to do.

    Code:
    quietly tabulate year, gen(TIME_FE)
         quietly ds TIME_FE*
            global Nyr = `: word count `r(varlist)''
            global NT_yr = $NT - $Nyr
    does something similar. The -tabulate, gen()- command creates a set of indicator ("dummy") variables, one for each distinct value taken on by the TIME_FE variable. Then -ds- lists all of those out in r(varlist), and -global Nyr- then counts up how many of those indicator variables there are. So -global Nyr- ends up with a count of the number of distinct values taken on by TIME_FE. The final -global NT_yr = $NT - $Nyr- sets global NT_yr equal to the amount by which the number of variables whose names begin with EXPORTER_TIME_FE exceeds the number of distinct values of TIME_FE. Again, without understanding the context, I can't explain why that is of interest. It does sound like the kind of thing one might do to calculate degrees of freedom in some regression analysis, but they might have totally unrelated reasons for doing this.

    Code:
    quietly ds PAIR_FE*
            global NTij = `: word count `r(varlist)''
            global NTij_1 = $NTij - 1
            global NTij_8 = $NTij - 8
    is very much like the first situation. It sets global NTij equal to the number of variables in the data set whose names begin with PAIR_FE. The next two globals are equal to that value minus 1 and 8 respectively. Why the numbers 1 and 8, I have no idea. Perhaps somebody who is familiar with the overall context here will respond with an explanation.

    All of that said, I would not call this high quality code at all. First, there is the use of global macros in the first place. It's an unsafe programming practice that should be used only as a last resort. Now, as I don't understand the context here, it may be that they really do need to use global macros here. But I'd be surprised. I've been programming Stata for 24 years now, and only once in all that time have I encountered a situation where I had to resort to a global macro. Local macros are almost always able to fill the role and they are safe to use.

    Next, the -global whatever = `:word count `whatever_else''- construction does not require an equals sign, nor does it require the outermost `' quotes. It can be written more simply as
    Code:
    global whatever :word count `whatever_else'
    // OR EVEN BETTER:
    local whatever :word count `whatever_else'
    Admittedly little harm is done by the longer code, but it does execute less efficiently. (It would have to be deep inside multiple loops executing millions of times to make a noticeable difference, however.)

    There is also a simpler, better way to generate macro NT. Unless the TIME_FE* indicator variables are actually needed for some other purpose, creating them just wastes execution time and memory. You can more easily create a macro with the number of distinct values of the TIME_FE variable as:
    Code:
    levelsof TIME_FE, local(tfe)
    local Nyr :word count `tfe'
    and that takes care of it. It defines local macro Nyr without creating unnecessary variables.**

    Now, I hasten to add that the other purpose for which the TIME_FE* indicator variables are actually needed must not simply be to serve as time fixed effects (the name is suggestive, no?) in a regression. That can be accomplished better with factor-variable notation (-help fvvarlist- for details). Thus if there is some regression that uses these, you can get it done simply with:
    Code:
    regression_command depvar indvars covariates i.TIME_FE
    Again, no need to waste time or memory on these. (There may, however, be some other legitimate reason to create the indicators--I don't know, but nothing in the code you show really suggests that there is.)

    So, I think I've explained and critiqued the coding aspects of your question. You now know what is being calculated with those statements, and how it could be done better. But as for why those things are being calculated in the first place, we await a response from somebody else who knows and understands this model.

    **Added: There is also a command, -distinct-, written by Nick Cox and available from SSC. It counts the number of distinct values a variable takes on and returns that count in r(ndistinct). So, a more direct way to calculate local Nyr would be
    Code:
    distinct TIME_FE
    local Nyr `r(ndistinct)'
    Last edited by Clyde Schechter; 22 Sep 2018, 17:56.

    Comment


    • #3
      Dear Clyde,

      thank you. This already helped a lot.

      For completness and in case someone wants to explain why the numbers 1 and 8 I attach the complete do file of their model.

      Best,
      Marc

      Update: I understand it now, I think. They have 8 country pairs which have zero trade flows for all years. These observations will be dropped due to no trade. The authors use the global macro to only estimate the N-8 pair fixed effects.
      Still, it seems somehow unwieldy.
      Attached Files
      Last edited by Marc Leet; 24 Sep 2018, 08:58.

      Comment

      Working...
      X