Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Significance

    Hi guys.

    For my research (Master Thesis) I have two control variables on segments: one for the number of business segments and one for the number of geographic segments. Of my sample of 2318 observations, approximately 700 have missing observations for the number of geographic segments.

    When I give "1" to these observations (because every firm has at least one segment) I got significant results in the direction that I expected.

    Once I run the regression with the 2318-700 sample (thus the observations that do have a value on geographic segments) the coefficient decreases nearly to zero and significance is gone (t-value 0.06).

    Does anyone know what I could do best here? I don't know if I can just remove a control variable because it is a control variable that is widely used in this kind of research.

    FYI: I'm examining annual report' readability of US firms.

  • #2
    The best thing to do would be to go back to SEC's Edgar and code the geographic segments manually (assuming they're in the original documents). I'd also check with an accountant about what factors require a firm to report geographic segments. While it seems immense, if the data is available in the original documents, you can do the 700 firm-years in a day or two.

    There are two approaches to missing data currently in fashion - Multiple Imputations and maximum likelihood. If you read the pdf documentation, you'll find a full manual on multiple imputation. The SEM/GSEM documentation includes an example on missing data.

    Arbitrarily assigning a value to a variable (even the mean of the observed values) is not a good solution.

    Comment


    • #3
      It helps if you follow the FAQ on asking questions - we don't know what model you've run. While there is a sample selection problem with the missing geographic data (the data is almost certainly not missing at random and so estimation without those observations can give you biased coefficients), if you're just doing regression with a few variables, then 1600 usable observations is lots of data. Another approach would be a heckman correction - see heckman in the documentation.

      Comment

      Working...
      X