Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to summarize data with numerous conditions

    Hi,

    I want to obtain summary data (specifically total sum) by using a couple of conditions. I want the sum total of variables "patients" and "providers" categorized by variables "year" and "region" for a specific variable "stress_echo". I tried a couple of different codes, including the one below, and I cannot seem to get it.

    Thanks for the help


    . tab year region, sum (patients providers) if stress_echo==1
    option if not allowed
    r(198);

    . total patients providers by year region if stress_echo==1
    variable by not found
    r(111);



  • #2
    You need to master the basic Stata syntax that applies to the vast majority of Stata commands. -if- and -in- clauses precede the comma, and options follow the comma. And the value of an option is always wrapped in parentheses. These are very general rules in Stata and you cannot violate them. The syntax is highly constrained: it is not like Englsh where you can move clauses around with considerable freedom.

    Also, you can only use options that the particular command supports. In particular, -total- does not have a -by- option: it has an -over()- option instead. So
    Code:
    tab year region if stress_echo==1, sum(patients providers)
    total patients providers if stress_echo == 1, over(year region)
    Last edited by Clyde Schechter; 06 Sep 2022, 09:57.

    Comment


    • #3
      Thanks. I tried using the codes and I still get error messages (I am not sure why for the second code it reports no observations).


      . tab year region if stress_echo==1, sum (patients providers)
      option sum() incorrectly specified
      r(198);

      . total patients providers if stress_echo == 1, over(year region)
      no observations
      r(2000);

      Comment


      • #4
        . tab year region if stress_echo==1, sum (patients providers)
        option sum() incorrectly specified
        r(198);
        Ah yes, in the -tab- command, the -sum()- option only allows one variable to be specified. So you can either break this up into two commands, one for patients and one for providers, or you can use a more complicated tabulation command such as -table-. If you are running version 17, the syntax of -table- is pretty complicated and will probably prove more forbidding than the commands you are already wrestling with. If you use it, I suggest working interactively with the drop-down menus of the graphical user interface for this. In addition to creating the table, it will also show you the command, which you can then incorporate into your do-file. If you are using version 16 or earlier, the -table- syntax is more similar to that of basic Stata commands and if you read -help table- carefully and look at the examples you should be able to get what you want.

        . total patients providers if stress_echo == 1, over(year region)
        no observations
        r(2000);
        The error message means exactly what it says. There are no observations in your data set that fulfill the requirements for inclusion in this analysis. Run
        Code:
        count if stress_echo == 1 & !missing(year) & !missing(region) & !missing(patients) & !missing(providers)
        and you will get 0. If that comes as a surprise, then there is something wrong with your data set and you will have to fix that. There's nothing wrong with the code in this case.

        Comment


        • #5
          Thanks for the help. So if I understand correctly, in my case it would be better to use the -total- command so I do not have to use -tab- twice (once for provider and once for patients).

          for using -tab- I still received an error message

          . tab year region if stress_echo=1, sum (patients)
          invalid syntax
          r(198);

          .


          I also ran the code to check my data and did have values:

          . count if stress_echo == 1 & !missing(year) & !missing(region) & !missing(patients) & !missing(
          > providers)
          1576

          Do you have any recommendations?
          Thanks
          Last edited by Yosef Cohen; 06 Sep 2022, 10:17.

          Comment


          • #6
            So if I understand correctly, in my case it would be better to use the -total- command so I do not have to use -tab- twice (once for provider and once for patients).
            But the -total- command gives different statistics from what you get with -tab, sum()-. So if one of them is suitable, the other is not.

            As for the other problem, this is very bizarre. It suggests that there is something wrong with your Stata installation. But before we conclude that, please post back with example data that reproduces your problem and I will try to troubleshoot your problem. Use the -dataex- command to show the example data. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

            Comment


            • #7
              [CODE]
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input long(patients providers) float year str9 region
              4747 192 13 "West"
              350 28 13 "West"
              2308 140 13 "South"
              1717 103 13 "Midwest"
              383 30 13 "Midwest"
              184995 7699 13 ""
              439 73 13 "Midwest"
              1810 69 13 "West"
              601 81 13 "Midwest"
              4831 419 13 "Northeast"
              480 32 13 "South"
              719 42 13 "West"
              70 9 13 ""
              2439 109 13 "West"
              295 34 13 "South"
              44 4 13 "West"
              355 60 13 "Midwest"
              4759 334 13 "South"
              3958 172 13 "South"
              587 83 13 "West"
              1423 68 13 "Midwest"
              6561 428 13 "South"
              334 36 13 "West"
              528 36 13 "West"
              7156 276 13 "Midwest"
              459 94 13 "South"
              1063 54 13 "Northeast"
              696 30 13 "Northeast"
              1172 53 13 "Midwest"
              75 5 13 "West"
              3572 308 13 "West"
              2527 73 13 "South"
              119 9 13 "West"
              1263 79 13 "South"
              483 51 13 "Midwest"
              660 48 13 "South"
              149 18 13 "South"
              502 76 13 "Midwest"
              994 54 13 "South"
              81 16 13 "West"
              59 23 13 "South"
              296 66 13 "South"
              119 21 13 "Northeast"
              1184 126 13 "South"
              427 33 13 "West"
              908 40 13 "Northeast"
              19960 2248 13 ""
              101 27 13 "South"
              156 19 13 "West"
              54 7 13 "South"
              9390 421 13 "Midwest"
              2233 167 13 "West"
              216 20 13 "South"
              44 20 13 "Midwest"
              346 14 13 "Northeast"
              39 2 13 "Midwest"
              614 43 13 "West"
              148 5 13 "Midwest"
              24948 839 13 "Northeast"
              294 26 13 "South"
              1703 215 13 "South"
              2112 69 13 "South"
              97 17 13 "Midwest"
              4546 354 13 "South"
              3914 194 13 "Midwest"
              259 11 13 "West"
              104 12 13 "Midwest"
              327 88 13 "Midwest"
              433 38 13 "Northeast"
              4836 388 13 "Northeast"
              2015 226 13 "South"
              1142 62 13 "Northeast"
              5260 274 13 "West"
              177 24 13 "South"
              627 51 13 "Midwest"
              101 12 13 "West"
              3103 165 13 "South"
              2636 65 13 "South"
              1473 61 13 "West"
              4877 256 13 "Northeast"
              205 39 13 "Northeast"
              4928 277 13 "Northeast"
              1114 83 13 "Midwest"
              129 13 13 "West"
              230 14 13 "South"
              2294 134 13 "Midwest"
              1182 98 13 "South"
              82107 4712 13 ""
              492 41 13 "West"
              206 12 13 "South"
              1424 149 13 "South"
              104 11 13 "South"
              2579 94 13 "South"
              197 28 13 "West"
              2765 208 13 "Northeast"
              1583 122 13 "South"
              586 19 13 "West"
              57 15 13 "South"
              1636 113 13 "Northeast"
              4564 363 13 "Northeast"
              end


              Thanks you very much

              Comment


              • #8
                Your example data did not include the stress_echo variable, so I cannot test the implications of including that in the command. It may be that when you add that condition is when you lose your observations.

                Aside from that, your region variable is a string, and to appear in the -over()- option of -total-, it must be converted to numeric. So if I run:
                Code:
                encode region, gen(n_region)
                total patients providers, over(year n_region)
                Stata responds with:
                Code:
                . encode region, gen(n_region)
                
                . total patients providers /* if stress_echo == 1 */, over(year n_region)
                
                Total estimation                                         Number of obs = 96
                
                ---------------------------------------------------------------------------
                                          |      Total   Std. err.     [95% conf. interval]
                --------------------------+------------------------------------------------
                 c.patients@year#n_region |
                              13#Midwest  |      32329   11208.01      10078.28    54579.72
                            13#Northeast  |      58297   24001.07      10648.86    105945.1
                                13#South  |      50274   9376.389      31659.51    68888.49
                                 13#West  |      27332    7371.44      12697.84    41966.16
                                          |
                c.providers@year#n_region |
                              13#Midwest  |       1898   456.3874      991.9564    2804.044
                            13#Northeast  |       3161   893.5084      1387.162    4934.838
                                13#South  |       3434   611.9343      2219.157    4648.843
                                 13#West  |       1658   413.3998      837.2976    2478.702
                ---------------------------------------------------------------------------
                
                .
                So, this example does not reproduce your problem. Perhaps if the example had included the stress_echo variable it would have.

                But given the results you got from the -count- command in #5 this seems unlikely.

                I think the two likely possibilities are:

                1. You ran some command that deleted some or all of the data set before you tried the -total- command, but then refreshed the data set before running the subsequent -count- command.
                2. Something is wrong with your Stata installation.

                My advice at this point is to just go back and do it over. Make sure that the full data set is in memory, and you don't do anything to disturb it, before you run the -total- command. If you still get the same problem, then I suggest you first update your Stata installation with -update all- and try again. If that does not solve the problem, I would uninstall Stata and re-install it, and then try again.
                Last edited by Clyde Schechter; 06 Sep 2022, 12:05.

                Comment


                • #9
                  Thanks, that worked. Is there a way to make my data look like your table where instead of "_subpop_#" I have "13#midwest"

                  . encode region, gen(n_region)

                  .
                  . total patients providers if stress_echo == 1, over(year n_region)

                  Total estimation Number of obs = 1576

                  Over: year n_region
                  _subpop_1: 13 Midwest
                  _subpop_2: 13 Northeast
                  _subpop_3: 13 South
                  _subpop_4: 13 West
                  _subpop_5: 14 Midwest
                  _subpop_6: 14 Northeast
                  _subpop_7: 14 South
                  _subpop_8: 14 West
                  _subpop_9: 15 Midwest
                  _subpop_10: 15 Northeast
                  _subpop_11: 15 South
                  _subpop_12: 15 West
                  _subpop_13: 16 Midwest
                  _subpop_14: 16 Northeast
                  _subpop_15: 16 South
                  _subpop_16: 16 West
                  _subpop_17: 17 Midwest
                  _subpop_18: 17 Northeast
                  _subpop_19: 17 South
                  _subpop_20: 17 West
                  _subpop_21: 18 Midwest
                  _subpop_22: 18 Northeast
                  _subpop_23: 18 South
                  _subpop_24: 18 West
                  _subpop_25: 19 Midwest
                  _subpop_26: 19 Northeast
                  _subpop_27: 19 South
                  _subpop_28: 19 West
                  _subpop_29: 20 Midwest
                  _subpop_30: 20 Northeast
                  _subpop_31: 20 South
                  _subpop_32: 20 West

                  --------------------------------------------------------------
                  Over | Total Std. Err. [95% Conf. Interval]
                  -------------+------------------------------------------------
                  patients |
                  _subpop_1 | 104901 18212.29 69178.11 140623.9
                  _subpop_2 | 80957 25116.4 31691.9 130222.1
                  _subpop_3 | 104222 19000.88 66952.32 141491.7
                  _subpop_4 | 107777 53166.89 3491.667 212062.3
                  _subpop_5 | 97758 17013.18 64387.14 131128.9
                  _subpop_6 | 80127 24368.91 32328.07 127925.9
                  _subpop_7 | 99015 17787.05 64126.2 133903.8
                  _subpop_8 | 103601 52290.87 1033.95 206168.1
                  _subpop_9 | 92232 15965.65 60915.83 123548.2
                  _subpop_10 | 79054 24022.09 31935.36 126172.6
                  _subpop_11 | 95584 16546.04 63129.42 128038.6
                  _subpop_12 | 100707 50283.38 2077.599 199336.4
                  _subpop_13 | 89140 13669.28 62328.09 115951.9
                  _subpop_14 | 81545 24759.05 32980.83 130109.2
                  _subpop_15 | 94208 15963.13 62896.78 125519.2
                  _subpop_16 | 98473 49162.47 2042.216 194903.8
                  _subpop_17 | 84689 12603.18 59968.23 109409.8
                  _subpop_18 | 79538 24875.91 30744.61 128331.4
                  _subpop_19 | 90948 15759.55 60036.09 121859.9
                  _subpop_20 | 96948 49037.19 762.9546 193133
                  _subpop_21 | 79070 11955.27 55620.07 102519.9
                  _subpop_22 | 76435 24448.79 28479.41 124390.6
                  _subpop_23 | 86133 15402.73 55920.99 116345
                  _subpop_24 | 95030 49006.35 -1094.543 191154.5
                  _subpop_25 | 78453 11504.38 55887.48 101018.5
                  _subpop_26 | 75752 25023.37 26669.37 124834.6
                  _subpop_27 | 84529 15560.61 54007.31 115050.7
                  _subpop_28 | 94925 49427.65 -2025.923 191875.9
                  _subpop_29 | 56020 8054.335 40221.65 71818.35
                  _subpop_30 | 54289 18418.46 18161.72 90416.28
                  _subpop_31 | 61527 11553.73 38864.69 84189.31
                  _subpop_32 | 70611 36933.75 -1833.496 143055.5
                  -------------+------------------------------------------------
                  providers |
                  _subpop_1 | 6056 800.5845 4485.677 7626.323
                  _subpop_2 | 4590 1002.247 2624.122 6555.878
                  _subpop_3 | 6827 964.4013 4935.354 8718.646
                  _subpop_4 | 4423 1293.595 1885.65 6960.35
                  _subpop_5 | 5851 760.5423 4359.218 7342.782
                  _subpop_6 | 4678 1012.677 2691.663 6664.337
                  _subpop_7 | 6819 946.1216 4963.21 8674.79
                  _subpop_8 | 4391 1320.716 1800.453 6981.547
                  _subpop_9 | 5757 762.5106 4261.357 7252.643
                  _subpop_10 | 4703 995.9955 2749.383 6656.617
                  _subpop_11 | 6687 909.7881 4902.477 8471.523
                  _subpop_12 | 4483 1323.424 1887.143 7078.857
                  _subpop_13 | 5998 749.0821 4528.697 7467.303
                  _subpop_14 | 4944 1038.237 2907.528 6980.472
                  _subpop_15 | 7063 925.0516 5248.538 8877.462
                  _subpop_16 | 4555 1323.913 1958.183 7151.817
                  _subpop_17 | 5881 724.3367 4460.234 7301.766
                  _subpop_18 | 5008 1075.978 2897.5 7118.5
                  _subpop_19 | 7021 915.265 5225.734 8816.266
                  _subpop_20 | 4589 1352.614 1935.886 7242.114
                  _subpop_21 | 5771 708.9699 4380.376 7161.624
                  _subpop_22 | 5012 1073.048 2907.247 7116.753
                  _subpop_23 | 6960 916.0772 5163.141 8756.859
                  _subpop_24 | 4651 1358.909 1985.538 7316.462
                  _subpop_25 | 5834 715.0595 4431.431 7236.569
                  _subpop_26 | 4920 1076.065 2809.329 7030.671
                  _subpop_27 | 6940 931.4363 5113.014 8766.986
                  _subpop_28 | 4745 1365.306 2066.991 7423.009
                  _subpop_29 | 5369 686.9638 4021.54 6716.46
                  _subpop_30 | 4439 979.6207 2517.502 6360.498
                  _subpop_31 | 6293 854.0405 4617.824 7968.176
                  _subpop_32 | 4236 1258.982 1766.542 6705.458
                  --------------------------------------------------------------

                  Comment


                  • #10
                    What version of Stata are you using?

                    Comment


                    • #11
                      13.1

                      Comment


                      • #12
                        Well, I'm afraid your pretty much stuck with the output you have. Version 13.1 didn't do the kind of labeling that the more modern versions do. I'm not sure exactly when the more modern results table display it was introduced, but I do recall that in version 13.1, you got what you are getting.

                        It is in principle possible to use the matrices that -total- leaves behind, e(b) or r(table) and write some code that pulls the results out and posts them to a temporary file that you can then apply value labels to and list out to get something that looks a lot like the more modern output. But the code is fairly complicated. On top of that, it may be that version 13.1 labeled those matrices differently from the way they are labeled in current Stata, and, as I no longer have version 13.1 installed on any of my computers, I have no way to check. So I don't think it would be a good idea for me to try to write that code here. We'd probably both end up wasting a lot of time trying to debug it.

                        Comment


                        • #13
                          Would I be able to export this to excel? I've tried using putexcel and it did not work. Thanks
                          Last edited by Yosef Cohen; 07 Sep 2022, 06:25.

                          Comment


                          • #14
                            Well, you don't show your -putexcel- command so it's not possible to say why it did not work. I hardly ever use -putexcel- myself, so I'm not familiar with all of its capabilities and don't really remember much about its syntax. But it seems to me that if after -total- you grab the matrix e(b), -matrix M = e(b)- you should be able to export that matrix M using -putexcel-.

                            Comment


                            • #15
                              Thanks

                              Comment

                              Working...
                              X