Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate correlation variable but value missing

    Hello everyone:

    I am trying to generate a correlation variable by two variables, i am using the code below:

    bysort gvkey fyear: egen corrcash= corr(CashFlow median_RD)

    but however every time i run it generate a missing value.

    Can you please help me with this?

  • #2
    Just a guess: gvkey fyear jointly identify observations. If so you are in effect working with a scatter plot with one data point and asking for a correlation, which can’t be done. Perhaps what you really want is a correlation for each year.

    If the guess is wrong I don’t have another one without seeing example data.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      Just a guess: gvkey fyear jointly identify observations. If so you are in effect working with a scatter plot with one data point and asking for a correlation, which can’t be done. Perhaps what you really want is a correlation for each year.

      If the guess is wrong I don’t have another one without seeing example data.
      Dear Nick:

      Thanks for the reply.

      I am try to generate a correlation variable by correlation between a firm’s cash flow from current operations (CashFlow) and its industry-level median R&D expenditures. Where the CashFlow variable is different for each year for each company, but the R&D median is the same for all companies that has the same SIC.

      Whenever I run the code i wrote above it generate a missing value for all the obs.

      Comment


      • #4
        I don’t get a clear picture here, as despite my request you don’t show example data. But I will try again. With your syntax all that matters is what is true for the data for each group, with fixed gvkey fyear.

        In particular, if one variable is constant then its SD is zero and any related correlation is indeterminate. This is standard stuff: the correlation is the covariance divided by the product of the standard deviations, and division by zero spells doom.

        I find that often when people get indeterminate correlations they forget what their introductory course should have emphasized, drawing a scatter plot to see why the result is not as expected.

        Comment


        • #5
          Originally posted by Nick Cox View Post
          I don’t get a clear picture here, as despite my request you don’t show example data. But I will try again. With your syntax all that matters is what is true for the data for each group, with fixed gvkey fyear.

          In particular, if one variable is constant then its SD is zero and any related correlation is indeterminate. This is standard stuff: the correlation is the covariance divided by the product of the standard deviations, and division by zero spells doom.

          I find that often when people get indeterminate correlations they forget what their introductory course should have emphasized, drawing a scatter plot to see why the result is not as expected.
          Dear Nick:

          Again thank you for your reply. I appreciate that.

          I found out the reason, as you mentioned, because all the firms that under the same SIC will have the same median, that's why I am getting all the missing value. I should try to get the median by firms in the same industry and by each year, in that case, I will get different median variables under each year, and I think the correlation won't be zero again in that case.

          Again thank you for your time.

          Comment


          • #6
            Meanwhile note that your correlations so far are not zero. They are indeterminate, so returned as missing, utterly different from zero.

            Comment

            Working...
            X