Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dealing with outliers

    Greetings Statalisters

    I'm conducting a research with panel data (6 years) about the impact of corporate governance on financial performance. I used graphics-box plot- and -extremes- to detect outliers. Is this the right way to go?
    If it is, I found that 5 of my independent variables have very extreme outliers (I check for any data errors but none). I read a lot about how to deal with those values but I'm still lost!!!. I don't want to delete those observations because my data is small (228 observations) and we can say that those extreme values are the characterics of the individuals of the population being studied. I thought about winsorizing but I didn't find any theoretical foundation.
    I would really appreciate any help and I'm sorry if my question is trivial.
    Many thanks in advance.
    I'm using Stata 15.1.






    Attached Files

  • #2
    Work on logarithmic scale then! (Your first two variables look fine to me. Nothing surprising to this non-economist. The last three seem to call for consideration of log scale.)

    Comment


    • #3
      Nick Cox Thank you so much Mr Cox for your quick answer. About the 3 variables you mentioned: the value of the last 2 is in terms of capital percentage (%) can we still work on logarithmic scale? Should I take logarithms first, for example for my variable CAP_INVESINS :
      Code:
       gen log_CAPINVES= ln(CAP_INVESINS)
      Thank you again for your help

      Comment


      • #4
        I forgot to mention that those 2 variables takes 0 as a value

        Comment


        • #5
          Zeros are certainly an issue for logarithms. Many researchers would leave such variables be. More exotic transformations which get round this are certainly possible.

          You don't say precisely what lies downstream of this but bear in mind that for regression-like models there are typically no assumptions about the marginal distributions of either the response or any predictors. What here counts as financial performance?

          Comment


          • #6
            Nick Cox Thank you so mcuh for your prompt reply. Sorry but I did not get your last point: are you referring to how my dependent variable (financial performance) is observed and measured? If it is, I used a "Market to Book Ratio"

            Comment


            • #7
              The-box plot- of my dependent variable (Y)
              Attached Files

              Comment


              • #8
                Nick Cox Like I said Mr Cox , my 2 last variables are Institutional Ownership and employee ownership which are observed in terms of percentages (For example 30%, 20% and 0%of the shares). So you think I can go log and for 0 I leave such variables be. I would appreciate any help and advice. Thank you so mcuh.

                Comment


                • #9
                  Market to book ratio looks nicely symmetrical. The main issue with predictors is getting the functional form about right.

                  It's not absolutely clearcut to me that you need any transformations.

                  Comment


                  • #10
                    Nick Cox Thank you Mr Cox.So, how can I justify in my thesis report that I don't need such transformations and my outliers looks "fine" and will not have an impact in my results? And according to you Mr cox what is the systematic approach to determine the underlying relationship or the functional form of my predictor variables that isn't obvious and sometimes "non-existant"?
                    Once again thank you so much. I'm so pleased to read your responses and advice.

                    Comment


                    • #11
                      "Mr Cox" is, I am sure, your way of being polite, but it reads oddly to me for several reasons. I call myself Nick here and almost everywhere else and you should feel to call me that, or not use my name at all if that seems over-familiar.

                      That said, you're asking much more than I can give. You have presented some results and -- although my inclination is to wonder whether working with logarithms might help -- you are telling us specifically that two of your variables have zeros, which implies either that you don't do that or that you might need to consider something more complicated, which I sense might be too much of a side-issue at your level.

                      Now you want advice on what to write in your thesis report and I really can't, and shouldn't, try to tell you what to write. Not being clear that you need transformations is not the same as being clear that you don't need them! Also, marginal distributions alone are an incomplete guide to what should be done before modelling. A variable that is very skewed might turn out to be unproblematic in modelling, so that could be good news. Or an outlier or outliers might dominate a model's results. Hard to be confident in abstraction.

                      It seems that you have some kind of panel model in mind, in which case the best diagnostics of what might need to be done differently would be plots of residuals versus fitted and of residuals versus predictors. If you try out a model and show us the syntax you used, and the results you get, it should be easier to make specific suggestions.
                      Last edited by Nick Cox; 21 Nov 2019, 07:04.

                      Comment


                      • #12
                        Nick Cox Thank you so much Nick That was very helpful. I will try to do so and come back with the results. Once again thank you for your valuable time and feedbacks.

                        Comment


                        • #13
                          Hi Nick.
                          I did the -LM- and -hausman- tests, which tells me that the -re- is the way to go. In order to count for the heteros and serial corr problems, I run my model using the -vce (cluster i)- :

                          Code:
                          xtreg MARKET_BOOKR HH_INDEX FAM_CONTR EMPLOYEE_OWNER INSTITU_OWNER CONC_PACT STO_OPTION AUD_BIG4 DEBT ROA LOG_ASSESTS i.INDUSTRY_*,re vce(cluster i)
                          PS: i.INDUSTRY_*: is one of my control variables (6 dummies)

                          Then, I predict the -e- residuals that are exclusively due to disturbance; and the-ue-: individuals effects+ disturbance;
                          I also predict the fitted values-xb-: the fitted values that are explained by just the independent variables; and -xbu-: the fitted values that are explained by the Xi and the time invariant individuals effects.
                          Code:
                          predict e, e
                          predict ue, ue
                          predict xb, xb
                          predict  xbu, xbu
                          Then, using the following syntax :
                          code:
                          Code:
                          crossplot (e ue) xbu xb , combine(imargin(small))
                          I have produced the following plots following your last comment (hopefully):
                          Click image for larger version

Name:	re.png
Views:	1
Size:	90.8 KB
ID:	1525870





                          Thank you so much for taking my post into consideration and sharing your.knowldege with us. I'm waiting patiently for your valuable feedback.
                          Last edited by Amel Kenthari; 21 Nov 2019, 19:40.

                          Comment


                          • #14
                            Nick Cox I forgot to plot residuals (e ue) versus predictors (I just choose the continuous predictors).

                            Here are the plot results:
                            Click image for larger version

Name:	pre.png
Views:	1
Size:	127.7 KB
ID:	1525874



                            Comment


                            • #15
                              I see nothing pathological here. My main suggestion would be to consider simplifying the model.

                              Comment

                              Working...
                              X