Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issues using tsfill command

    I am facing challenges using tsfill command. I am using Stata 15.1. I am trying to create a company-year panel. The research question looks at the impact of investors on companies. Currently data set the data set has a row each time investors invests in a company. The first investment marks the entry to the investor in the company and the last investment marks the exit. A company can have multiple investors, entering and exiting in any year. I use tsfill command to create a panel for the years investor(s) remain invested in a company. But I have to account for all the amounts investors have invested cumulatively in a company. When I use tsfill to create a company-year panel, I loose the information of other investors (if there are more than 1 investor).

    Currently, the data looks like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(company_id year investor_id amount)
    1 2014 11 80
    1 2015 11 84
    1 2016 11 82
    2 2014 23 75
    2 2014 25 78
    2 2017 23 77
    2 2017 25 70
    2 2018 25 80
    3 2014 11 50
    3 2015 23 78
    3 2018 23 77
    3 2019 11 55
    end
    I want data to be in this format:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(company_id year total_amount)
    1 2014  80
    1 2015  84
    1 2016  82
    2 2014 153
    2 2015 153
    2 2016 153
    2 2017 147
    3 2014  50
    3 2015 128
    3 2016 128
    3 2017 128
    3 2018 127
    3 2019  55
    end
    Kindly suggest proper way for handling this data requirement.Thanks.

  • #2
    Your additions seem error-prone to me, and why gaps are filled with just the previous known value I don't understand. But this may help:


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(company_id year investor_id amount)
    1 2014 11 80
    1 2015 11 84
    1 2016 11 82
    2 2014 23 75
    2 2014 25 78
    2 2017 23 77
    2 2017 25 70
    2 2018 25 80
    3 2014 11 50
    3 2015 23 78
    3 2018 23 77
    3 2019 11 55
    end
    
    collapse (sum) amount, by(company_id year) 
    
    tsset company_id year 
    
    tsfill 
    
    replace amount = L.amount if missing(amount) 
    
    list, sepby(company_id) 
    
         +--------------------------+
         | compan~d   year   amount |
         |--------------------------|
      1. |        1   2014       80 |
      2. |        1   2015       84 |
      3. |        1   2016       82 |
         |--------------------------|
      4. |        2   2014      153 |
      5. |        2   2015      153 |
      6. |        2   2016      153 |
      7. |        2   2017      147 |
      8. |        2   2018       80 |
         |--------------------------|
      9. |        3   2014       50 |
     10. |        3   2015       78 |
     11. |        3   2016       78 |
     12. |        3   2017       78 |
     13. |        3   2018       77 |
     14. |        3   2019       55 |
         +--------------------------+

    Comment


    • #3
      Nick, thanks for the code. The issue is that the variable amount varies by investor for each year. It comes from another dataset, which is an investor-year panel. In such a scenario
      replace amount = L.amount if missing(amount), will not work. Probably this issue was not clear in my earlier message, apologies for that. Do you have a work around for this issue.

      clear
      input float(investor_id year amount)
      11 2014 80
      11 2015 84
      11 2016 82
      11 2017 86
      23 2014 75
      23 2015 77
      23 2016 70
      23 2017 79
      23 2018 73
      25 2014 78
      25 2015 77
      25 2016 77
      25 2017 80
      25 2018 77
      end
      [/CODE]

      Comment


      • #4
        Sorry, no. I don’t have a work-around for what you didn’t explain. I can’t see that giving us another data example helps at all here.

        What’s wrong with my code in terms of your example? Alternatively please explain the logic for company 3 in producing the output in #1 from the input.

        Comment


        • #5
          The challenge I am facing is that the value of variable amount is for each investor-year and not company-investor. The rows 11 and 12 in the results you generated, the variable amount has the value 78, which comes from row 10. But value of this variable changes for each investor-year. The value for the variable amount for year investor-year may/may not be present in the master data set. For example for the investor_id 23 the value of variable amount for 2017 is 77 (given in the row 6 of the first data set,as given below)

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float(company_id year investor_id amount)
          1 2014 11 80
          1 2015 11 84
          1 2016 11 82
          2 2014 23 75
          2 2014 25 78
          2 2017 23 77
          2 2017 25 70
          2 2018 25 80
          3 2014 11 50
          3 2015 23 78
          3 2018 23 77
          3 2019 11 55
          end
          Last edited by Shekhar Misra; 09 Mar 2020, 16:11.

          Comment


          • #6
            This is wide open for anyone else who can follow it.

            Comment

            Working...
            X