Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • reduce the number of zero trade values in my panel dataset

    Good morning everyone,

    I am using Stata 13.0; I have a panel dataset of 60605 observations. I have to analyze the impact of free trade agreement (trade creation and trade diversion effect) on trade.

    Reporters include states of the USA and partners are different countries in the world. My dependent variable is export of vegetables. However there is a lot of trade zero values (I don't have any explanations of the zeros trade so far). precisely 55536 observations have 0 values.

    Now my boss asked my to reduce the number of zeros to 50% if possible in order to estimates our gravity panel equations first using OLS (mainly for comparison with the appropriate estimators).

    What the appropriate method /correct Stata syntax to reduce my panel dataset (removing zeros without messing up the panel structure).


    Here is a look a my dataset (small portions)

    HTML Code:
    
    . list  YEAR QUARTER REPORTINGCOUNTRY STATE PARTNERCOUNTRY VALUE , sepby ( YEAR )
    
           +--------------------------------------------------------------------------------------------------+
           | YEAR   QUARTER   REPORTIN~Y                  STATE                     PARTNERCOUNTRY      VALUE |
           |--------------------------------------------------------------------------------------------------|
        1. | 1996         1   U.S. State                Alabama                            Grenada          0 |
        2. | 1996         2   U.S. State                Alabama                            Grenada          0 |
        3. | 1996         3   U.S. State                Alabama                            Grenada          0 |
        4. | 1996         4   U.S. State                Alabama                            Grenada          0 |
           |--------------------------------------------------------------------------------------------------|
        5. | 1997         1   U.S. State                Alabama                            Grenada          0 |
        6. | 1997         2   U.S. State                Alabama                            Grenada          0 |
        7. | 1997         3   U.S. State                Alabama                            Grenada          0 |
        8. | 1997         4   U.S. State                Alabama                            Grenada          0 |
           |--------------------------------------------------------------------------------------------------|
        9. | 1998         1   U.S. State                Alabama                            Grenada          0 |
       10. | 1998         2   U.S. State                Alabama                            Grenada          0 |
       11. | 1998         3   U.S. State                Alabama                            Grenada          0 |
       12. | 1998         4   U.S. State                Alabama                            Grenada          0

    here is the summary statisctics

    Code:
    
     sum YEAR QUARTER VALUE Kilograms
    
        Variable |       Obs        Mean    Std. Dev.       Min        Max
    -------------+--------------------------------------------------------
            YEAR |     60605    2006.129    6.135364       1996       2017
         QUARTER |     60605    2.482353    1.123154          1          4
           VALUE |     60605    14987.31    280650.2          0   1.59e+07
       Kilograms |     60605    13937.74    235654.2          0   1.29e+07
    
    . sum  VALUE Kilograms if VALUE==0
    
        Variable |       Obs        Mean    Std. Dev.       Min        Max
    -------------+--------------------------------------------------------
           VALUE |     55536           0           0          0          0
       Kilograms |     55536           0           0          0          0
    Please let me know what and how I can do it. if there is something I need to show you to be able to help me, I will do it. I have never dealt with so many zeros before.

    Thanks again for your help.

  • #2
    I Apologize for the little grammar or orthography in my previous post.

    Comment


    • #3
      There is no need to apologize for minor errors in spelling and grammar here. This is an international Forum, and many members speak English only as a second language. From a linguistic perspective, your post is understandable, and the number of errors you have made doesn't seem out of proportion to what is commonly seen in posts from some native speakers who are just rushing and don't take time to correct their little mistakes.

      But I think there is a reason you are not getting a response after several hours. What you are asking help in doing seems rather strange, and, at least to me, seems like something that shouldn't be done at all. Removing inconvenient data is not science!

      So you need to get an understanding of what is going on in your data generating process that leads to the large number of zeroes, and what it means in terms of your research questions. If zero is simply not a plausible value for the variable, and it represents an error, then the best response is to find a more accurate data source and replace it with the correct value. If corrected values are not obtainable, then perhaps one could convert them to missing values and consider whether a technique such as multiple-imputation might be appropriate for filling them in. (Although it sounds like a large majority of your data consists of these zeroes, and using multiple imputation when the bulk of the data is missing is probably not a great idea.)

      But if the zeroes are really correct values, then you have to take them as they are, and perhaps switch your analytic plan to one which is well -suited to analyzing variables whose distributions include a large spike at zero.

      Comment


      • #4
        Thank you very much M. Clyde. This really help.

        My boss is actually trying to figure out why so many zeros. and if possible find more accurate data. Thanks again sir.

        Comment


        • #5
          Dear Cynthia,

          I am not surprised by the large number of zeros; I expect that most states do not export vegetables for most countries. So, that looks fine.

          Now, dropping part of the zeros to estimate the gravity equation is a double error. 1) the dependent variable is the log of the flow, so to estimate by OLS you need to drop all zeros; 2) estimating the gravity equation by OLS is a big mistake, not least because you will have to drop most of your data.

          The right approach in this case is to estimate the gravity equation by Poisson regression (PPML), which has no problem with all the zeros.

          Best wishes,

          Joao

          Comment


          • #6
            M. Clyde,

            I have one more question. How can I xset my panel with such data. Which variable can I use as my panel variable here.

            I have quarterly data from 1996 to 2017 quarter1. hence I have exports data from each state to a country variable for each quarter till 2017 quarter 1.

            Exporters are different states of the country while Importers are countries.

            the variables are shown in post #1. I also have the following: StateCountryYearQuarter YEAR QUARTER REPORTINGCOUNTRY STATE PARTNERCOUNTRY VALUE (Exports).


            here are the codes I have tried so far to find the right panel for declare my data as a panel .


            Code:
            (1):
            
            
            . egen pair = group(StateCountry)
            
            
            
            . list  YEAR QUARTER StateCountry pair VALUE , sepby ( YEAR )
            
                   +--------------------------------------------------------------------------------+
                   | YEAR   QUARTER                                  StateCountry   pair      VALUE |
                   |--------------------------------------------------------------------------------|
                1. | 1996         1                               Alabama Grenada      1          0 |
                2. | 1996         2                               Alabama Grenada      1          0 |
                3. | 1996         3                               Alabama Grenada      1          0 |
                4. | 1996         4                               Alabama Grenada      1          0 |
                   |--------------------------------------------------------------------------------|
                5. | 1997         1                               Alabama Grenada      1          0 |
                6. | 1997         2                               Alabama Grenada      1          0 |
                7. | 1997         3                               Alabama Grenada      1          0 |
                8. | 1997         4                               Alabama Grenada      1          0 |
                   |--------------------------------------------------------------------------------|
                9. | 1998         1                               Alabama Grenada      1          0 |
               10. | 1998         2                               Alabama Grenada      1          0 |
               11. | 1998         3                               Alabama Grenada      1          0 |
               12. | 1998         4                               Alabama Grenada      1          0 |
            
            ....
                |--------------------------------------------------------------------------------|
               81. | 2016         1                               Alabama Grenada      1          0 |
               82. | 2016         2                               Alabama Grenada      1          0 |
               83. | 2016         3                               Alabama Grenada      1          0 |
               84. | 2016         4                               Alabama Grenada      1       3791 |
                   |--------------------------------------------------------------------------------|
               85. | 2017         1                               Alabama Grenada      1          0 |
                   |--------------------------------------------------------------------------------|
               86. | 1996         1                                Alabama Panama      2          0 |
               87. | 1996         2                                Alabama Panama      2          0 |
               88. | 1996         3                                Alabama Panama      2          0 |
               89. | 1996         4                                Alabama Panama      2          0 |
                   |--------------------------------------------------------------------------------|
               90. | 1997         1                                Alabama Panama      2          0 |
               91. | 1997         2                                Alabama Panama      2          0 |
               92. | 1997         3                                Alabama Panama      2


            or
            Code:
            (2);
            
             egen pair_Yr_quarter = group (STATE PARTNERCOUNTRY YEAR QUARTER)
            
            . list  pair_Yr_quarter StateCountryYearQuarter , sepby ( YEAR )
            
                   +---------------------------------------------------------------+
                   | pair_Y~r                              StateCountryYearQuarter |
                   |---------------------------------------------------------------|
                1. |        1                               Alabama Grenada 1996 1 |
                2. |        2                               Alabama Grenada 1996 2 |
                3. |        3                               Alabama Grenada 1996 3 |
                4. |        4                               Alabama Grenada 1996 4 |
                   |---------------------------------------------------------------|
                5. |        5                               Alabama Grenada 1997 1 |
                6. |        6                               Alabama Grenada 1997 2 |
                7. |        7                               Alabama Grenada 1997 3 |
                8. |        8                               Alabama Grenada 1997 4 |
                   |---------------------------------------------------------------|
                9. |        9                               Alabama Grenada 1998 1 |
               10. |       10                               Alabama Grenada 1998 2 |
               11. |       11                               Alabama Grenada 1998 3 |
               12. |       12                               Alabama Grenada 1998 4 |
                   |---------------------------------------------------------------|


            Comment


            • #7
              Thank you M. Santos,

              I am familiar with the PPML and wanted to used it but I wasn't sure about the PPML with a large number of zeros in my case.

              Thanks again .

              However can you suggest about the panelvar to set up my data as a panel (post #6)? I am not sure what to do with quarterly data instead of annual data.

              Comment


              • #8
                Do not worry about the large proportion of zeros; the results in the following paper suggest it will be OK:

                Santos Silva, J.M.C. and Tenreyro, Silvana (2011), Further Simulation Evidence on the Performance of the Poisson Pseudo-Maximum Likelihood Estimator, Economics Letters, 112(2), pp. 220-222.

                Using quarterly data for this is unusual, but I do not see major problems if you account for the seasonality.

                Best wishes,

                Joao

                Comment


                • #9
                  I have tried also to group exporter importer Year all together and got the following

                  Code:
                  (3)
                  
                   egen pair_Yr = group (STATE PARTNERCOUNTRY YEAR)
                  
                  . list  pair_Yr pair_Yr_quarter StateCountryYearQuarter , sepby ( YEAR )
                  
                         +-------------------------------------------------------------------------+
                         | pair_Yr   pair_~er                              StateCountryYearQuarter |
                         |-------------------------------------------------------------------------|
                      1. |       1          1                               Alabama Grenada 1996 1 |
                      2. |       1          2                               Alabama Grenada 1996 2 |
                      3. |       1          3                               Alabama Grenada 1996 3 |
                      4. |       1          4                               Alabama Grenada 1996 4 |
                         |-------------------------------------------------------------------------|
                      5. |       2          5                               Alabama Grenada 1997 1 |
                      6. |       2          6                               Alabama Grenada 1997 2 |
                      7. |       2          7                               Alabama Grenada 1997 3 |
                      8. |       2          8                               Alabama Grenada 1997 4 |
                         |-------------------------------------------------------------------------|
                      9. |       3          9                               Alabama Grenada 1998 1 |
                     10. |       3         10                               Alabama Grenada 1998 2 |
                     11. |       3         11                               Alabama Grenada 1998 3 |
                     12. |       3         12                               Alabama Grenada 1998 4 |
                         |-------------------------------------------------------------------------|
                  --more--

                  Comment


                  • #10
                    Thanks M. Santos S. for your reply and thank you very much for the article. it will really help.

                    Comment


                    • #11
                      The problem with your data is that you have year and quarter as separate variables. You need to combine them into a single quarterly date variable. Doing that with -egen, group()- is not the right way to do with that because the resulting variables will not have numeric values that calculate the way dates do.

                      Code:
                      gen int qdate = quarterly(string(YEAR)+"q"+string(QUARTER), "YQ")
                      format qdate %tq
                      xtset StateCountry qdate, quarterly
                      If you are going to be working with panel data in Stata, you need to get familiar with Stata internal format dates. Read the manual section linked to at -help datetime-, It is a long read, and you won't absorb it all. Even advanced Stata users who work with them all the time find themselves referring back to the help files and manual to remind themselves of the details. But it will expose you to the basic ways in which Stata works with dates internally, and the required data for doing this. And you will see the functions for transforming between strings and numeric dates, display formats, and formats for piecing dates together from parts or extracting parts from dates. You will then know which functions to look for when situations like this come up, and then you can return to the help files and manual for details.

                      Comment


                      • #12
                        Thank you very much M. Clyde S. this is helpful. I appreciate it.

                        Comment


                        • #13
                          This is my output

                          Code:
                          . xtset StateCountry qdate, quarterly
                                 panel variable:  StateCountry (strongly balanced)
                                  time variable:  qdate, 2007q1 to 2016q4
                                          delta:  1 quarter
                          
                          
                          . list Year Quarter qdate StateCountry CountryStateYearQuarter, sepby (StateCountry)
                          
                                  +---------------------------------------------------------------------------------+
                                  | Year   Quarter    qdate   StateC~y                      CountryStateYearQuarter |
                                  |---------------------------------------------------------------------------------|
                               1. | 2007         1   2007q1          1                     Argentina Alabama 2007 1 |
                               2. | 2007         2   2007q2          1                     Argentina Alabama 2007 2 |
                               3. | 2007         3   2007q3          1                     Argentina Alabama 2007 3 |
                               4. | 2007         4   2007q4          1                     Argentina Alabama 2007 4 |
                               5. | 2008         1   2008q1          1                     Argentina Alabama 2008 1 |
                               6. | 2008         2   2008q2          1                     Argentina Alabama 2008 2 |
                               7. | 2008         3   2008q3          1                     Argentina Alabama 2008 3 |
                               8. | 2008         4   2008q4          1                     Argentina Alabama 2008 4 |
                               9. | 2009         1   2009q1          1                     Argentina Alabama 2009 1 |
                          Thanks Again sir

                          Comment


                          • #14
                            Clyde,

                            I have a question, related to this thread and one of my regression and I am not sure if I have to start a new thread.

                            I have to run a simple OLS with my variables above*.

                            I have to include quarterly dummy variables. At first I was including i.Quarter in the regressors, but then I was wondering if I was supposed to use i.qdate (my timevar) instead.

                            I saw one previous threat where you suggested to use i.qdate. (I don't know how to attach references)
                            I did it and this my output below.


                            now I don't know how to interpret that or explain what I see under i.qdate? (201 202 203 ...219) I know they are dummies, right but why is it like that and not 2004q1...

                            Could you tell what it is so that I can explain it and know if I was right.

                            Code:
                            
                            
                            . reg lnTrade l.lnTrade NAFTAmn TTIPmm ASEANm MERCOSURm lnExpGDP lnImpGDP lnExpPop lnImpPop lnExpProd lnImpPro
                            > d lnFarmInc lnDistance BorderDummy i.qdate , robust
                            
                            Linear regression                                      Number of obs =   20583
                                                                                   F( 33, 20549) = 2411.01
                                                                                   Prob > F      =  0.0000
                                                                                   R-squared     =  0.8061
                                                                                   Root MSE      =   .9872
                            
                            ------------------------------------------------------------------------------
                                         |               Robust
                                 lnTrade |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                            -------------+----------------------------------------------------------------
                                 lnTrade |
                                     L1. |   .8065222   .0057878   139.35   0.000     .7951778    .8178667
                                         |
                                 NAFTAmn |   .3849063   .0759979     5.06   0.000     .2359445    .5338682
                                  TTIPmm |  -.0843905   .0190203    -4.44   0.000    -.1216718   -.0471092
                                  ASEANm |   .1827272     .03313     5.52   0.000     .1177897    .2476647
                               MERCOSURm |    .106616   .0260004     4.10   0.000     .0556531    .1575789
                                lnExpGDP |    .176421   .0228181     7.73   0.000     .1316958    .2211463
                                lnImpGDP |    .284607   .0206201    13.80   0.000       .24419     .325024
                                lnExpPop |    .217896   .0121813    17.89   0.000     .1940198    .2417723
                                lnImpPop |  -.1024257   .0165326    -6.20   0.000     -.134831   -.0700205
                               lnExpProd |   -.205941    .024995    -8.24   0.000    -.2549331   -.1569488
                               lnImpProd |   .3564517   .0929342     3.84   0.000     .1742933    .5386101
                               lnFarmInc |   .0627696   .0178918     3.51   0.000     .0277002     .097839
                              lnDistance |  -.2099424   .0231056    -9.09   0.000    -.2552312   -.1646535
                             BorderDummy |  -.0189073   .1467043    -0.13   0.897    -.3064595    .2686448
                                         |
                                   qdate |
                                    201  |   .1115018   .0438794     2.54   0.011     .0254947    .1975089
                                    202  |    .085171   .0449058     1.90   0.058    -.0028479    .1731899
                                    203  |   .1976359   .0454145     4.35   0.000     .1086199    .2866519
                                    204  |   .0568291   .0462319     1.23   0.219     -.033789    .1474472
                                    205  |     .04672    .045316     1.03   0.303     -.042103    .1355429
                                    206  |   .0344724   .0450258     0.77   0.444    -.0537818    .1227266
                                    207  |   .1413654    .045559     3.10   0.002     .0520661    .2306647
                                    208  |   .0189634   .0444991     0.43   0.670    -.0682584    .1061852
                                    209  |   .1413246    .044507     3.18   0.001     .0540872    .2285619
                                    210  |   .1144702   .0431502     2.65   0.008     .0298925     .199048
                                    211  |   .2292631   .0444827     5.15   0.000     .1420735    .3164528
                                    212  |   .0496247   .0460296     1.08   0.281     -.040597    .1398463
                                    213  |   .0975865   .0441832     2.21   0.027     .0109839    .1841891
                                    214  |    .032016   .0441452     0.73   0.468    -.0545121    .1185441
                                    215  |   .2071304   .0457063     4.53   0.000     .1175424    .2967184
                                    216  |   .0559767   .0451166     1.24   0.215    -.0324555    .1444088
                                    217  |   .1510984   .0451696     3.35   0.001     .0625625    .2396344
                                    218  |   .0459828   .0437547     1.05   0.293    -.0397798    .1317455
                                    219  |   .2100515   .0496358     4.23   0.000     .1127615    .3073416
                                         |
                                   _cons |  -1.746531   .4084736    -4.28   0.000    -2.547172   -.9458904
                            ------------------------------------------------------------------------------
                            
                            .

                            * I have to run OLS, and then later will do the PPML and others

                            Comment


                            • #15
                              Whether to use i.qdate or i.Quarter depends on what your attempting to adjust for here. If you want to adjust for quarter of the year (which is kind of like season) so that it is the same for first quarter 2010 as it is for first quarter 2011 and first quarter 2012, etc., then you want i.Quarter. If what you want is an adjustment for every single quarter separately (so 2010q1 is not equivalent to 2011q1, etc.) then you use i.qdate.

                              The output you have looks good to me. The mysterious numbers 201 through 219 that you see under qdate are the internal numeric representations that Stata uses for 2010q2 through 2010q4. You can visualize this directly yourself, if you like, by running:

                              Code:
                              forvalues i = 201/219 {
                                  display `i', %tq `i'
                              }
                              When you have valued-labeled numeric variables and you use factor-variable notation, Stata picks up the value labels and uses them in the output. But Stata does not pick up display formats when using factor-variables, and it is the display format that shows the equivalence between 201 and 2010q2, etc.

                              Comment

                              Working...
                              X