Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Removing Outliers Survey in Stata (For variables time)

    Hi guys, Appreciate any help.
    Basically I will want to remove outliers on certain criteria of (time variables) within my dataset.
    I have about 4 of these variables for which the removal of outliers should follow criteria rules that I have set

    The issue is I want to first visually showcase the outliers, and I am not getting it very clear in a histogram plot just yet.
    Maybe you guys advice the histogram wouldnt even be my best way to show it.
    Otherwise how could I improve the histogram as such, that I can really see how many respondents actually fall under a certain time.
    More clarity: Im measuring the time in the example below on how long people spend reading a pitchdeck, if they spend below a certain amount of seconds they need to be removed

    What tips can you guys give me, better ways to visualise, more clarity, etc

    Then after visualising it, I can actually go through with an if statement, to actually exclude data below a certain time from further analysis

    I use this command: hist TimePitchdeck, freq normal bin (212)


    Click image for larger version

Name:	Screenshot 2021-05-28 at 12.54.29.png
Views:	1
Size:	33.0 KB
ID:	1612072

  • #2
    To remove extreme values you always can use the -if- option. However, the problem with your data is that they most likely do not result from a process that creates normally distributed values (as typical for response latencies or count variables). Therefore, the normal distribution is not suited to judge whether a value is an outlier or not.

    Depending on the data generating process and the aims of your analysis you either could try models that do not assume normally distributed dependent variables or you could try to transform the values in some way, see http://fmwww.bc.edu/repec/bocode/t/transint.html (there could be another version of this text by Nick Cox I am not aware of) or for a very first impression https://www.stata.com/stata-news/news34-2/spotlight/ .

    Comment


    • #3
      You can use the sort command, by sorting the specific variable once you sort than you can easily remove extreme values.

      Comment


      • #4
        You can use the sort command, by sorting the specific variable once you sort then you can easily remove extreme values.

        Comment

        Working...
        X