
No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Handle data with blanks and zeros with PPML

    Dear all.
    I am trying to run a gravity model with PPML.
    I have a question. It could be simple or the question indeed could be wrong. But I would like to ask you.
    In my export data I have many blanks because there are no trade between some countries in some years.
    My question is: Shall I leave blanks in the export data and run the model or shall I replace blanks with zeros and then run the model?
    Best wishes to all of you.

  • #2
    If the blanks are due to lack of trade, they should be replaced with zeros.

    Best wishes,



    • #3
      Professor Santos.

      Thanks for your quick answer.

      Best wishes.



      • #4
        Professor Joao Santos Silva.

        Another simple and similar question.

        Shall I replace blanks in GDP data with zeros? Taking into account these blanks are not due the lack of country production.

        Best wishes.



        • #5
          No, that is a different problem. However, if you include time-varying importer and exporter FEs you do not need GDP.



          • #6
            I am going to use Time Invariant Fixed Effects for importers and exporters and Time Fixed Effects.
            In this case, shall I replace blanks in GDPs with zeros or shall I run the model with these blanks?


            • #7
              Missing GDP values should be replaced with zero if and only if you think the GDP was really zero in those observations. I can't imagine circumstances in which that makes sense economically for the kind of data I think you have. Trade and GDP are not alike in that respect.

              It is not clear what you mean by running the model with these blanks. Observations with missing values -- what I think you mean -- will be ignored by most modelling commands. Joao certainly can tell you if PPML does anything different.


              • #8
                Hello Nick Cox.

                I wanted to say missing values (blanks). I think in my data there are GDP missing values because some countries (North Korea, Syria, Venezuela and others) have not reported data to the World Bank or any other institution, then GDP was not zero.

                When I say running the model I want to say that I am doing some test with stata commands.

                Following your answer, I understand I do not have to replace missing values with zero.

                Thanks for your time.

                Best wishes.



                • #9
                  Picking up on this thread again - I have a somewhat similar set-up with data on number of transactions conducted by agents on a platform per week (so at agent-week level). Not every agent transacts every week, which means I could have data for agent A in week 1, 3 and 6. I am estimating the effect of a policy change on the platform on the agent transactions using a difference in difference design and with PPML. I ran the estimation in two ways (i) filling 0s for the gaps for example for weeks 2,4,5 for agent A when she did not transact (ii) not filling 0s and only using non-zero weeks. As expected I get different coefficients. But I am curious to know how would I interpret these coefficients differently? Are both acceptable but with different meanings?

                  This becomes even more relevant when I run sub-group analyses (e.g. how the policy affected certain kinds of businesses vs. others), when the results for the two sub-groups flip depending on whether I fill 0s or not.

                  Another relevant information is that agents transact in only about 35% of weeks, which means the rest of the weeks would be all 0 if I were to fill in the gaps.

                  I will really appreciate any thoughts. Professor Joao Santos Silva, Nick Cox
                  Last edited by Aparajita Agarwal; 22 Feb 2024, 07:05.


                  • #10
                    I don't have any thoughts different from #7. What makes sense for your goals and your data is entirely your call.


                    • #11
                      Thank you Nick Cox . My question is mainly around understanding how PPML treats these two scenarios differently (filling 0s vs not in dependent variable) and thereby the implications for interpretation of the result. Thank you
                      Last edited by Aparajita Agarwal; 22 Feb 2024, 08:24.


                      • #12
                        Dear Aparajita Agarwal,

                        If I understand correctly, you have count data where zero counts are not recorded, but can be identified. If that is the case, you really need yo fill in the zeros, because otherwise you truncate the data.

                        Best wishes,



                        • #13
                          Thank you Prof. Joao Santos Silva .

                          There are two sources of zeroes in my data
                          (i) when an agent is active and does not transact intermittently. So there are gaps in between
                          (ii) when an agent becomes completely inactive and leaves the platform. So after certain weeks her data is completely missing (as one would expect).

                          I suppose I should fill 0s for the missing weeks for the first case but not for second case. Because in second case, the 0 is because of the agent becoming inactive completely. Else wouldn't I be artificially inflating zeroes in my sample, when it is known that the agent is not even active on the platform?


                          • #14
                            I think I agree with you.


                            • #15
                              Prof. Joao Santos Silva , I wanted to ask another related question. If observations in my data are entering at different times, i.e. suppose agents are joining the company in different weeks, then should I be filling 0s before the first week when an agent starts? To make it more specific, if I have agent A join in week 1 and agent B join in week 4, and I have 5 weeks of data, then

                              Scenario 1
                              Agent Week Transactions
                              A 1 5
                              A 2 5
                              A 3 5
                              A 4 4
                              A 5 2
                              B 1 0
                              B 2 0
                              B 3 0
                              B 4 xx
                              B 5 xx

                              Scenario 2
                              Agent Week Transactions
                              A 1 5
                              A 2 5
                              A 3 5
                              A 4 4
                              A 5 2
                              B 4 xx
                              B 5 xx

                              PPML gives very different results depending or I fill the 3 rows for B as 0 or not. Can you please advise what the right set up of data would be for PPML?

