Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    The reason I said the variable tv is discrete is because the medians turn out to be exactly 1 and 4, respectively. But it looks like it's not an integer; maybe, as Nick suggested, it's recorded every half an hour?

    Unless TV takes on a wide range of values, you can't trust either the usual standard errors or those obtained via bootstrap. The median is not a smooth function of the data in that case.

    You could assume that tv has a particular distribution -- such as that for Tobit model, which allows true zeros -- and then use interval regression. Then you could compute the difference in medians off of the lognormal distribution and easily obtain a valid standard error for the difference.

    Comment


    • #17
      It's definitely collected as hourly; the question asked was "In a typical day, how many hours do you spend watching TV. Put 0 if you do not spend any time watching TV".

      Code:
      . tab tv
      
       Total time |
            spent |
      watching tv |
         in hours |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |        112       11.57       11.57
                1 |         69        7.13       18.70
                2 |        147       15.19       33.88
                3 |        173       17.87       51.76
                4 |        163       16.84       68.60
                5 |        100       10.33       78.93
                6 |         77        7.95       86.88
                7 |         35        3.62       90.50
                8 |         39        4.03       94.52
                9 |         14        1.45       95.97
               10 |         18        1.86       97.83
               11 |          2        0.21       98.04
               12 |          6        0.62       98.66
               13 |          3        0.31       98.97
               14 |          1        0.10       99.07
               15 |          1        0.10       99.17
               16 |          1        0.10       99.28
               19 |          1        0.10       99.38
               20 |          1        0.10       99.48
               21 |          4        0.41       99.90
               24 |          1        0.10      100.00
      ------------+-----------------------------------
            Total |        968      100.00
      Based on this - is interval regression still the appropriate way forward?

      Thanks!

      Comment


      • #18
        Joe:
        are there people who scored 0 because the actually did not have (access to) a TV set?
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #19
          Hi Carlo,

          Unfortunately I do not have that information but it's possible, particularly given the participants are people with serious mental illness, but relatively well functioning at baseline. Also possible is that they answered incorrectly for whatever reason or underestimated.
          Last edited by Joe Tuckles; 23 Mar 2020, 04:16.

          Comment


          • #20
            A minor point about the histograms: I would use the discrete option and not let histogram choose the bin width.

            Jeff Wooldridge My point was that if the data were integers, then the median must be an integer or half-integer. I wasn't ruling out finer resolution in the data, which none of us could see, but it's now clear that the data are integers and cover the possible range [0, 24], credibly or not.

            Comment


            • #21
              Thanks Nick. Please see updated Histograms. Do I also need to run intreg?

              Click image for larger version

Name:	Graph00.png
Views:	1
Size:	34.4 KB
ID:	1542439


              Click image for larger version

Name:	Graph11.png
Views:	1
Size:	28.7 KB
ID:	1542440

              Comment


              • #22
                Originally posted by Nick Cox View Post
                A minor point about the histograms: I would use the discrete option and not let histogram choose the bin width.

                Jeff Wooldridge My point was that if the data were integers, then the median must be an integer or half-integer. I wasn't ruling out finer resolution in the data, which none of us could see, but it's now clear that the data are integers and cover the possible range [0, 24], credibly or not.
                Oh right. Stata uses the convention of averaging two values if both are medians (and then, technically, so is any point in between).

                Comment


                • #23
                  As I have written somewhere, non-mathematical readers are told that averaging two middle values to get a median is a rule while more mathematical readers are told that it is only a convention, for the reason you give. .
                  Last edited by Nick Cox; 23 Mar 2020, 13:50.

                  Comment


                  • #24
                    Hi apologies but could I request some clarification. I have the two median values, and I have been advised to formally test for a difference if I'm going to report the two median values. Should I therefore just report the qreg bootstrap finding, and/or the histograms and/or run intreg?

                    Thanks

                    Comment


                    • #25
                      Is this coursework?

                      Comment


                      • #26
                        No

                        Comment


                        • #27
                          Do the following statistics provide differing information:

                          Code:
                          . cendif tv, by(group)
                          Y-variable: tv (Total time spent watching tv in hours)
                          Grouped by: group
                          Group numbers:
                          
                                group |      Freq.     Percent        Cum.
                          ------------+-----------------------------------
                                    0 |        806       83.26       83.26
                                    1 |        162       16.74      100.00
                          ------------+-----------------------------------
                                Total |        968      100.00
                          Transformation: Fisher's z
                          95% confidence interval(s) for percentile difference(s)
                          between values of tv in first and second groups:
                             Percent    Pctl_Dif     Minimum     Maximum 
                                  50           0          -1           0 
                          
                          . cid tv, by(group) median unpaired
                          
                          Rank-based confidence interval for difference in  medians by group
                          
                          Variable |     Obs     Estimate           K        [95% Conf. Interval]
                          ---------+-------------------------------------------------------------
                                tv |     968            0       58922              -1           0

                          Comment


                          • #28
                            Apologies to seek further clarification. Hopefully clarification will help both myself and future vistiors. Looking at research papers it seems many people just report two medians followed by a p value. Would it be appropriate therefore for me to use the Mood's median test?

                            Code:
                            Median test
                            
                               Greater |
                              than the |         group
                                median |         0          1 |     Total
                            -----------+----------------------+----------
                                    no |       426         75 |       501
                                   yes |       380         87 |       467
                            -----------+----------------------+----------
                                 Total |       806        162 |       968
                            
                                      Pearson chi2(1) =   2.3228   Pr = 0.127
                            
                               Continuity corrected:
                                      Pearson chi2(1) =   2.0677   Pr = 0.150
                            I note the p value is different though to the qreg bootstrap p value.

                            Comment


                            • #29
                              Joe: I suggesting backing up and re-reading the comments on your post(s), including #16 and thereabouts, and also reading relevant literature before you post again. A key point in the comments was the distinction between distributions that are continuous and those that are not; this has implications for the definition of the "median" and thence tests. My understanding is that virtually all tests of differences in medians developed to date are for the continuous distribution case. That refers to tests available via centile (built-in), cendif (by Roger Newson, part of his somersd package on SSC), or cid (by Patrick Royston, posted on Statalist in 1995). I had not heard of Mood's median test, but I have just looked at https://en.wikipedia.org/wiki/Median_test and I suspect that it is also for the continuous distributions case. (Statistical experts -- not me -- can advise.)

                              BTW please have another look at the Forum FAQ and note the request to state the provenance of community-contributed commands.

                              Comment


                              • #30
                                Hi, I apologise for posting again. The reason I am confused is because it appears that Nick Cox and Jeff Wooldridge posts seem to be contradictory.

                                It appears the variable tv is discrete and it is an integer. I have run the qreg and bootstrapped it and provided histograms using the discrete option as per Nick's advice. However, Jeff states that unless TV takes on a wide range of values, you can't trust either the usual standard errors or those obtained via bootstrap.

                                Therefore I am not clear whether the histograms and qreg output is usable or whether I need to perform an intreg, or basically not report median differences for this variable at all.

                                Comment

                                Working...
                                X