Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to generate a variable containing correlation coefficients of two variables in a panel data set

    Good day everyone

    I was kindly requesting assistance

    I am doing research using firm-level data, that has an unbalanced panel of 1 445 042 firms and around 6 million observations, the firm ID or identifier is (FID) and the time period is annual data from the years 2008 to 2014. I am trying to generate a variable which contains the correlation of operating cash flow (Op_cash) and the repo rate (int_rate) from my data set. I would like to use this correlation variable in my panel regression.

    The Op_cash variable does have both missing (.) and zeros (0) observations.

    I’ve tried different methods that I’ve seen on this forum but I think I am not doing something right as I get error messages or my programs run continuously without any output. I’d really appreciate assistance. Thanks.

    Here is an example of the data set :
    year FID int_rate Op_cash
    2008 1 8.6 .
    2009 1 8.2 641671
    2010 1 6.5 697992
    2011 1 5.6 671114
    2012 1 5.4 737932
    2013 1 5.1 745722
    2014 1 5.9 157756
    2008 2 8.6 .
    2009 2 8.2 .
    2013 2 5.1 0
    2008 3 8.6 .
    2009 3 8.2 .
    2010 3 6.5 .
    2010 4 6.5 .
    2011 4 5.6 .
    2012 4 5.4 .
    2013 4 5.1 .
    2014 4 5.9 .
    2015 4 6.2 .
    2008 5 8.6 .
    2009 5 8.2 1250708
    2010 5 6.5 1545442
    2011 5 5.6 1311604
    2012 5 5.4 1103661
    2013 5 5.1 1411393

  • #2
    the correlation of operating cash flow (Op_cash) and the repo rate (int_rate) from my data set
    is, taken literally, one number yielded by

    Code:
    corr Op_cash int_rate
    Presumably you want this by panel, by year, or otherwise. So precisely how, as we can't see example code even to comment on?

    Comment


    • #3
      Hi Nick

      Thanks

      I would like to generate a correlation coefficient variable for each firm, so each firm (firm ID) would have the same correlation coefficient across the time dimension from 2008 to 2015


      The command i used is :

      sort FID
      egen corr_cashrepo = corr ( operating_cash int_rate), by (FID)

      However, the program runs for long continuously and not sure if i am doing it correctly.


      Here is an example of what output I am looking for, check the corr_cashrepo column, ( I just punched in random numbers for demonstration)
      year FID int_rate Op_cash corr_cashrepo
      2008 1 8.6 . -0.2
      2009 1 8.2 641671 -0.2
      2010 1 6.5 697992 -0.2
      2011 1 5.6 671114 -0.2
      2012 1 5.4 737932 -0.2
      2013 1 5.1 745722 -0.2
      2014 1 5.9 157756 -0.2
      2008 2 8.6 . -0.5
      2009 2 8.2 . -0.5
      2013 2 5.1 0 -0.5
      2008 3 8.6 . -0.8
      2009 3 8.2 . -0.8
      2010 3 6.5 . -0.8
      2010 4 6.5 . -0.6
      2011 4 5.6 . -0.6
      2012 4 5.4 . -0.6
      2013 4 5.1 . -0.6
      2014 4 5.9 . -0.6
      2015 4 6.2 . -0.6
      2008 5 8.6 . -0.2
      2009 5 8.2 1250708 -0.2
      2010 5 6.5 1545442 -0.2
      2011 5 5.6 1311604 -0.2
      2012 5 5.4 1103661 -0.2
      2013 5 5.1 1411393 -0.2

      Comment


      • #4
        rangestat (SSC) will be faster.

        Comment


        • #5
          Hi Nick

          Thanks, will read on the command and try it. Appreciate your assistance.

          Regards
          Keagile

          Comment


          • #6
            Hi Nick

            I ran the following rangestat command:

            Code:
             rangestat (corr) operating_cash  int_rate, int(finyear 2009 20015) by (FID)
            But I get the following error message :

            no result for all obs: corr operating_cash int_rate


            What could I be possibly getting wrong? thanks



            Comment


            • #7
              Please excuse the "20015" typo in the code I just posted , in Stata it is typed correctly as "2015" but I still get the error message

              Thanks
              Keagile

              Comment


              • #8
                The second and third arguments of interval() are offsets, not values.

                If you want industry-year combinations that would be

                Code:
                 rangestat (corr) operating_cash  int_rate, int(finyear 0 0) by(FID)
                If you want something else, you may need to tell us what that is.
                Last edited by Nick Cox; 18 May 2018, 07:10.

                Comment


                • #9
                  Hi Nick

                  It worked!!!

                  Thanks a million lot for your help, very grateful!

                  Regards
                  Keagile

                  Comment

                  Working...
                  X