Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unstable Stata Outputs/Results

    Dear all,

    Every time, I run my do-file in both Stata 15 and 17, I get different results for the same dataset, in the same do-file, using the same machine. I have about 11 different sections that I shall merge together for the final analysis. In each section, the data is sorted using the "household_id" variable. I tried with "sort household_id, stable", but it does not solve my problem. Here comes my data. I would appreciate your help in advance. Thank you.


    dataex household_id diarrhea_2wks diarrhea_4wks insurance lntotal_med_fee lnwaterscarcity Rf_shock Temp_shock

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str14 household_id byte diarrhea_2wks float diarrhea_4wks byte insurance float(lntotal_med_fee lnwaterscarcity Rf_shock Temp_shock)
    "01010101601002" . 0 0         .         0 5.127945 5.001626
    "01010101601002" . 0 0         .         0 5.127945 5.001626
    "01010101601002" . 0 0         .         0 5.127945 5.001626
    "01010101601002" . 0 0         .         0 5.127945 5.001626
    "01010101601002" . 0 0         .         0 5.127945 5.001626
    "01010101601002" . 0 0         .         0 5.127945 5.001626
    "01010101601002" . 0 0         .         0 5.127945 5.001626
    "01010101601002" . 0 0         .         0 5.127945 5.001626
    "01010101601017" . 0 0         .  .6251451 5.127945 5.001626
    "01010101601017" . 0 0         .  .6251451 5.127945 5.001626
    "01010101601017" . 0 0         .  .6251451 5.127945 5.001626
    "01010101601017" . 0 0         .  .6251451 5.127945 5.001626
    "01010101601017" . 0 0         .  .6251451 5.127945 5.001626
    "01010101601017" 0 0 0         .  .6251451 5.127945 5.001626
    "01010101601017" . 0 0         .  .6251451 5.127945 5.001626
    "01010101601017" 0 0 0         .  .6251451 5.127945 5.001626
    "01010101601017" 0 0 0         .  .6251451 5.127945 5.001626
    "01010101601034" . 0 0  5.703794         0 5.127945 5.001626
    "01010101601049" . 0 0         .  2.464003 5.127945 5.001626
    "01010101601049" . 0 0  4.787561  2.464003 5.127945 5.001626
    "01010101601049" . 0 0 4.6052704  2.464003 5.127945 5.001626
    "01010101601049" . 0 0 4.6052704  2.464003 5.127945 5.001626
    "01010101601064" . 0 1         0         0 5.127945 5.001626
    "01010101601064" . 0 1         0         0 5.127945 5.001626
    "01010101601064" . 0 1         0         0 5.127945 5.001626
    "01010101601064" . 0 1         0         0 5.127945 5.001626
    "01010101601080" . 0 1         . 1.8184465 5.127945 5.001626
    "01010101601080" . 0 1         . 1.8184465 5.127945 5.001626
    "01010101601080" . 0 1         0 1.8184465 5.127945 5.001626
    "01010101601080" . 0 1         0 1.8184465 5.127945 5.001626
    "01010101601087" . 0 1         0  .8813736 5.127945 5.001626
    "01010101601087" . 0 1         0  .8813736 5.127945 5.001626
    "01010101601087" . 0 1         0  .8813736 5.127945 5.001626
    "01010101601087" 0 0 1         0  .8813736 5.127945 5.001626
    "01010101601087" 0 0 1         0  .8813736 5.127945 5.001626
    "01010101601087" 0 0 1         0  .8813736 5.127945 5.001626
    "01010101601101" . 0 0  4.382183  1.047593 5.127945 5.001626
    "01010101601101" . 0 0  5.135833  1.047593 5.127945 5.001626
    "01010101601101" . 0 0  3.689504  1.047593 5.127945 5.001626
    "01010101601101" 0 0 0 4.0256705  1.047593 5.127945 5.001626
    "01010101601101" . 0 0  3.689504  1.047593 5.127945 5.001626
    "01010101601116" . 0 1         0 1.4436355 5.127945 5.001626
    "01010101601116" . 0 1         0 1.4436355 5.127945 5.001626
    "01010101601116" . 0 1         0 1.4436355 5.127945 5.001626
    "01010101601116" . 0 1         0 1.4436355 5.127945 5.001626
    "01010101601116" . 0 1         0 1.4436355 5.127945 5.001626
    "01010101601116" . 0 1         0 1.4436355 5.127945 5.001626
    "01010101601116" 0 0 1         0 1.4436355 5.127945 5.001626
    "01010101601116" 1 0 1         0 1.4436355 5.127945 5.001626
    "01010101601116" . 0 1         0 1.4436355 5.127945 5.001626
    "01010101601131" . 0 0  5.991471 2.0947125 5.127945 5.001626
    "01010101601131" . 0 0  6.214612 2.0947125 5.127945 5.001626
    "01010101601131" . 0 0  5.991471 2.0947125 5.127945 5.001626
    "01010101601131" . 0 0  5.298342 2.0947125 5.127945 5.001626
    "01010101601131" . 0 0  5.703794 2.0947125 5.127945 5.001626
    "01010101601131" . 0 0  6.396933 2.0947125 5.127945 5.001626
    "01010101601131" . 0 0  6.429722 2.0947125 5.127945 5.001626
    "01010101601131" . 0 0  6.429722 2.0947125 5.127945 5.001626
    "01010101601146" . 0 1         0 1.3258978 5.127945 5.001626
    "01010101601146" . 0 1         0 1.3258978 5.127945 5.001626
    "01010101601146" . 0 1         0 1.3258978 5.127945 5.001626
    "01010101601146" 0 0 1         0 1.3258978 5.127945 5.001626
    "01010101601146" . 0 1         0 1.3258978 5.127945 5.001626
    "01010101601146" . 0 1         0 1.3258978 5.127945 5.001626
    "01010101601146" . 0 1         0 1.3258978 5.127945 5.001626
    "01010201403001" . 0 0         0 .24746646  5.22204 4.802176
    "01010201403001" . 0 0         0 .24746646  5.22204 4.802176
    "01010201403001" . 0 0         0 .24746646  5.22204 4.802176
    "01010201403001" . 0 0         0 .24746646  5.22204 4.802176
    "01010201403001" . 0 0         0 .24746646  5.22204 4.802176
    "01010201403001" . 0 0         0 .24746646  5.22204 4.802176
    "01010201403001" . 0 0         0 .24746646  5.22204 4.802176
    "01010201403001" . 0 0         0 .24746646  5.22204 4.802176
    "01010201403001" 0 0 0  5.991471 .24746646  5.22204 4.802176
    "01010201403001" 0 0 0  5.991471 .24746646  5.22204 4.802176
    "01010201403011" . 0 0 4.6052704  .4812118  5.22204 4.802176
    "01010201403011" . 0 0  8.006368  .4812118  5.22204 4.802176
    "01010201403011" . 0 0         .  .4812118  5.22204 4.802176
    "01010201403011" . 0 0   6.08678  .4812118  5.22204 4.802176
    "01010201403011" . 0 0         .  .4812118  5.22204 4.802176
    "01010201403011" . 0 0         .  .4812118  5.22204 4.802176
    "01010201403011" . 0 0         .  .4812118  5.22204 4.802176
    "01010201403016" . 0 0         . .24746646  5.22204 4.802176
    "01010201403016" . 0 0         0 .24746646  5.22204 4.802176
    "01010201403016" . 0 0         . .24746646  5.22204 4.802176
    "01010201403016" . 0 0         . .24746646  5.22204 4.802176
    "01010201403016" 0 0 0         . .24746646  5.22204 4.802176
    "01010201403016" 0 0 0         . .24746646  5.22204 4.802176
    "01010201403016" 0 0 0         . .24746646  5.22204 4.802176
    "01010201403016" 0 0 0  4.382183 .24746646  5.22204 4.802176
    "01010201403026" . 0 0         .  .4812118  5.22204 4.802176
    "01010201403026" . 0 0         .  .4812118  5.22204 4.802176
    "01010201403031" . 0 0  5.298342  .9047485  5.22204 4.802176
    "01010201403031" . 0 0   6.32794  .9047485  5.22204 4.802176
    "01010201403031" . 0 0  5.703794  .9047485  5.22204 4.802176
    "01010201403031" . 0 0  5.991471  .9047485  5.22204 4.802176
    "01010201403031" . 0 0 4.6052704  .9047485  5.22204 4.802176
    "01010201403031" . 0 0 4.6052704  .9047485  5.22204 4.802176
    "01010201403031" . 0 0  5.991471  .9047485  5.22204 4.802176
    "01010201403031" 1 0 0  5.298342  .9047485  5.22204 4.802176
    end
    label values diarrhea_2wks diarrhea_2wks
    label def diarrhea_2wks 0 "no", modify
    label def diarrhea_2wks 1 "yes", modify
    label values diarrhea_4wks diarrhea_4wks
    label def diarrhea_4wks 0 "no", modify
    label values insurance insurance
    label def insurance 0 "no", modify
    label def insurance 1 "yes", modify

  • #2
    Thank you for the -dataex-, but it's not clear what commands you are running that lead to different results. My guess is that it may have to do with ties among your dataset (probably in the household id). This can happen when Stata must implicitly sort your data and ties are not allowed, so it will randomly break those ties. My initial reaction is to try to narrow down which command returns different results. You can also -set sortseed- which is a sort-specific seed for generating random numbers. If results do not vary after setting the sort seed at the top your do-file, then you can be sure it has to do with this kind of mechanism.

    Comment


    • #3
      If you need to reproduce old results and suspect the difference is caused by different sort methods used between Stata 17 and Stata 15, in Stata 17, before running the do-file,

      Code:
      set sortmethod qsort
      see -help set_sortmethod- for more information.

      Comment


      • #4
        Dear Hua Peng,
        Thanks for your response. I tried that, but it does not work. I would love to post the data cleaning sections, but it is too long to bring it here.

        Comment


        • #5
          Dear Leonardo,

          Thanks for your help, but I don't really understand how 'ties' can be applied and how to define the "sort seed" in my do-file. I look forward to receiving a bit of explanation. Thanks.

          Comment


          • #6
            At the top of your do-file, put the line -set sortseed 17- where 17 is a random number. Run your do-file multiple times and observe if the results change.

            See -help sortseed- and also -help seed- for explanations.

            Comment


            • #7
              Bahre Kiros posted his code, but in a new topic rather than in this topic. I have copied it here

              Originally posted by Bahre Kiros View Post
              Dear Stata users,

              Sorry! I am running a do-file using Stata 17, but the results vary every time I run the do-file while everything remains the same: the dataset, the do-file, and the computer. This is an update on my previous post because I am asked to post the commands I use. Here comes the commands I use in one of the sections. What is common everywhere is that I use "sort household_id" in all the remaining sections before I proceed to merge. Any hint is well appreciated.

              Code:
               
              use "raw/ETH_HouseholdGeovars_Y3", clear
              rename af_bio_1 annmtemp
              gen household_id = substr(household_id2,1,7) + substr(household_id2,12,18)
              duplicates report household_id 
              duplicates list household_id 
              duplicates list household_id 
              keep household_id annmtemp 
              sort household_id
              save "temp/Pub_ETH_HouseholdGeovariables_Y3R", replace

              Comment


              • #8
                Dear Leonardo,

                Many thanks. Your recommendation "set sortseed 17" precisely solves my problem.

                Comment


                • #9
                  I hate to spoil the party, but does it "solve" your problem or just sweep it under the rug?

                  The inescapable conclusion of the thread so far is that your code does something that is sensitive to the sort order of the data. Now, sometimes that is appropriate. For example if it is picking out the chronologically earliest observation from among all the observations for each household, or that of the oldest person in the household, or something order-relevant like that. But if you are not doing something that is explicitly related to ordering, then the results shouldn't depend on the sort order. The conclusion then is that your code is wrong and you are just covering up the problem by stipulating a sort order.

                  At least in the code that you have shown so far in this post, I see nothing that is order-related. Perhaps there are order-related operations in other unseen parts of your code. So you really need to carefully review what you are doing: are you trying to calculate something that is related to some inherent ordering of the data? If so, you may be OK. If not, you are in trouble and you don't even recognize it. A thorough code review is needed.

                  When a quick fix makes an error go away, that is not necessarily a good thing. It's often a red flag. Heed it.

                  Comment


                  • #10
                    I would like to add to Clyde's post and start by stating that you definitely never need to set sortseed and you probably should never do so either. Using Clyde's example, if you were to

                    Originally posted by Clyde Schechter View Post
                    pick[...] out the chronologically earliest observation from among all the observations for each household
                    then I argue that this observation should be identified by two variables: a household identifier and a variable holding the chronological order. More generally, if your results depend on the sort order, then that order is informative and the relevant information should be represented by variables in the dataset. Sometimes that means it is best to add variables that present the sort order. In my view, this is always better than set sortseed.


                    The remainder of the post is based on guesswork. I think Clyde would agree that

                    Originally posted by Clyde Schechter View Post
                    in the code that you have shown so far
                    the duplicates command is explicitly based on the sort order of the specified variables. I suspect that the repeated line

                    Code:
                    duplicates list household_id
                    is probably

                    Code:
                    duplicates drop household_id
                    in the code that was really used, because repeating that statement, especially in a do-file, does not make any sense. Always showing the exact code that was used is a different but equally important topic. Anyway, if the goal is to identify identical households, then the code is fine. But then the results cannot depend on which of the duplicate households is chosen. If they do, then you have not included all relevant variables that define a duplicate. From the snipped that you show, you probably want

                    Code:
                    duplicates list household_id annemtemp

                    Comment

                    Working...
                    X