Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to identify responses who provide same values to each question in a survey

    Hi Everyone,

    I administered a survey and I think a few of them did not critically pay attention when answering the questions. I would like to remove those who provide the same value to each question. For example, in the attached sample dataset, ID 1 provided "5" to each question. How do I identify and remove ID 1? Your help will be appreciated.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(id motiv1 motiv2 motiv3 motiv4 motiv5 achieve1 achieve2 achieve3 achieve4 achieve5)
     1 5 5 5 5 5 5 5 5 5 5
     2 2 5 4 4 3 4 5 3 5 4
     3 5 4 2 4 4 5 5 5 5 2
     4 1 1 1 1 1 1 1 1 1 1
     5 4 4 1 4 1 5 4 4 5 5
     6 4 5 5 2 2 4 5 4 5 4
     7 2 4 3 4 3 5 4 5 4 5
     8 1 5 3 5 4 5 4 5 5 4
     9 4 4 3 3 4 5 4 4 3 4
    10 5 5 3 4 5 5 3 4 4 3
    end
    Regards,
    Al Bothwell



  • #2
    Use egen to find observations in which the row (observation) minimum and maximum are identical.

    Comment


    • #3
      #2 will work perfectly if there are no missing values in any of the question variables.

      If some question variables can have missing values (item non-response), then it depends on how you want to handle that. If you want to remove observations where all the responses are the same except for non-responses (which might be numerous), then the approach in #2 is still fine. But if, say, somebody responded 5 to every question, but skipped some, you might want to keep that observation--I don't know, that's up to you, and is a substantive issue. Anyway, if you do want to retain observations that have a mixture of non-response and a single distinct, but repeated, valid response in the rest:
      Code:
      egen low = rowmin(motiv* achieve*)
      egen high = rowmax(motiv* achieve*)
      egen mcount = rowmiss(motiv* achieve*)
      drop if low == high & mcount == 0 // ONLY DROP IF NO NON-RESPONSE
      You could also play with that code to allow dropping when there is more than a certain number of missing responses by changing the final mcount == 0 to some other relational expression.

      Comment


      • #4
        I agree with Clyde: You must answer the substantive questions. However, the idea that straight-lining (providing the same answer to all items of a battery) is always a consequence of not paying attention (or otherwise invalid answers) might not hold. To quote Reuning and Pulzer (2020, 440)
        Straightlining is one manifestation of satisficing. However, if satisficing frequently leads to straightlining, it does not logically follow that straightlining is a good indicator of data quality. While a sleepless night can lower test performance, it would be incorrect to view poor test performance as a good indicator of sleeplessness.
        You might want to consider carefully whether your items justify treating straightlining as an unambiguous signal for poor data quality and, thus, dropping all respective respondents.


        Reuning, K., & Plutzer, E. (2020). Valid vs. Invalid Straightlining: The Complex Relationship Between Straightlining and Data Quality. Survey Research Methods, 14(5), 439--459.

        Comment


        • #5
          Everyone's an expert here by virtue of having some experience, are they not?

          This experience must be common: When I get requests for feedback on some purchase, I often just ignore them. But when I am impressed, I want to do my bit to give praise and promote the product. So, it may well be 5/5 on everything, quickly, and I am out of there. A researcher wants to read my mind on the grounds that I should have a more nuanced view and determine that I am satisficing. Well, no; perhaps I am just being honest and direct.

          Comment


          • #6
            Thank you all for the insightful comments. I posted a sample dataset because I couldn't post the real one for confidentiality reasons.The actual data is a battery of psychological questions. It will be rare for someone to provide the same response on each question. I will safely conclude that a response such as all 5s should be removed. Regarding #3, thanks Clyde for the code. Thankfully the data has no missing values. Best regards to everyone who contributed.

            Comment


            • #7
              Not my field, but in what passes as my field deleting data points that are unusual -- with a defence that they are rare -- would not be regarded as defensible.

              Comment

              Working...
              X