Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to add a (empty) category to a variable?

    Dear community,

    I conducted a survey with the software "limesurvey". I successfully exported the data to excel and imported the data to stata. By viewing the data I noticed a problem that not every possibility of respones is available:

    For example:
    "What is your gender (according to birth registry)?" with response options 1: "male", 2: "female", 3: "divers", and 4: "unspecified according to birth registry".

    Option 4 was not used by the paticipants, so that in Stata the variable "gender" only has the scale from 1 to 3. For completness and for descriptive Statistics, I would like to add catecory 4 in retrospect. How can I do it?

    Now:
    . tab gender

    gender | Freq. Percent Cum.
    ------------+-----------------------------------
    1 | 20 31.75 31.75
    2 | 39 61.90 93.65
    3 | 4 6.35 100.00
    ------------+-----------------------------------
    Total | 63 100.00

    My goal is to tablulate gender and see it like this:

    . tab gender

    gender | Freq. Percent Cum.
    ------------+-----------------------------------
    1 | 20 31.75 31.75
    2 | 39 61.90 93.65
    3 | 4 6.35 100.00
    4 0 .... ....
    ------------+-----------------------------------
    Total | 63 100.00

    Can anyone help? I tried so hard, but cound not fix it. Many thanks in advance!

    Note: This is only one (short) example among others.
    Last edited by Vera Schmidt; 04 Jul 2024, 07:06.

  • #2
    tabulate cannot handle absent categories. For one way tabulations with absent categories, one possibility is fre from SSC. For a labeled variable, see the option -includelabeled- whereas for an unlabeled variable, see the option -include()-.

    Code:
    sysuse auto, clear
    lab define rep78 1 "Poor"  2 "Fair" 3 "Average" 4 "Good" 5 "Excellent"
    lab values rep78 rep78
    tab rep78 if foreign
    fre rep78 if foreign, includelabeled
    Res.:

    Code:
    . tab rep78 if foreign
    
         Repair |
    record 1978 |      Freq.     Percent        Cum.
    ------------+-----------------------------------
        Average |          3       14.29       14.29
           Good |          9       42.86       57.14
      Excellent |          9       42.86      100.00
    ------------+-----------------------------------
          Total |         21      100.00
    
    .
    . fre rep78 if foreign, includelabeled
    
    rep78 -- Repair record 1978
    -----------------------------------------------------------------
                        |      Freq.    Percent      Valid       Cum.
    --------------------+--------------------------------------------
    Valid   1 Poor      |          0       0.00       0.00       0.00
            2 Fair      |          0       0.00       0.00       0.00
            3 Average   |          3      13.64      14.29      14.29
            4 Good      |          9      40.91      42.86      57.14
            5 Excellent |          9      40.91      42.86     100.00
            Total       |         21      95.45     100.00          
    Missing .           |          1       4.55                      
    Total               |         22     100.00                      
    -----------------------------------------------------------------
    
    .

    Comment


    • #3
      Originally posted by Andrew Musau View Post
      tabulate cannot handle absent categories. For one way tabulations with absent categories, one possibility is fre from SSC. For a labeled variable, see the option -includelabeled- whereas for an unlabeled variable, see the option -include()-.

      Code:
      sysuse auto, clear
      lab define rep78 1 "Poor" 2 "Fair" 3 "Average" 4 "Good" 5 "Excellent"
      lab values rep78 rep78
      tab rep78 if foreign
      fre rep78 if foreign, includelabeled
      Res.:

      Code:
      . tab rep78 if foreign
      
      Repair |
      record 1978 | Freq. Percent Cum.
      ------------+-----------------------------------
      Average | 3 14.29 14.29
      Good | 9 42.86 57.14
      Excellent | 9 42.86 100.00
      ------------+-----------------------------------
      Total | 21 100.00
      
      .
      . fre rep78 if foreign, includelabeled
      
      rep78 -- Repair record 1978
      -----------------------------------------------------------------
      | Freq. Percent Valid Cum.
      --------------------+--------------------------------------------
      Valid 1 Poor | 0 0.00 0.00 0.00
      2 Fair | 0 0.00 0.00 0.00
      3 Average | 3 13.64 14.29 14.29
      4 Good | 9 40.91 42.86 57.14
      5 Excellent | 9 40.91 42.86 100.00
      Total | 21 95.45 100.00
      Missing . | 1 4.55
      Total | 22 100.00
      -----------------------------------------------------------------
      
      .
      Thank you so much! It works really well. But how can I show summary statistics which includes the categories?


      sum gender

      Variable | Obs Mean Std. dev. Min Max
      -------------+---------------------------------------------------------
      gender | 63 1.746032 .5670596 1 3

      My goal:

      sum gender

      Variable | Obs Mean Std. dev. Min Max
      -------------+---------------------------------------------------------
      gender | 63 1.746032 .5670596 1 4

      Comment


      • #4
        Originally posted by Vera Schmidt View Post
        But how can I show summary statistics which includes the categories?


        sum gender

        Variable | Obs Mean Std. dev. Min Max
        -------------+---------------------------------------------------------
        gender | 63 1.746032 .5670596 1 3

        My goal:

        sum gender

        Variable | Obs Mean Std. dev. Min Max
        -------------+---------------------------------------------------------
        gender | 63 1.746032 .5670596 1 4

        I do not think it makes sense to do this for summary statistics as the information would be misleading. In this case, the summary states that the largest value in the dataset is 4, which is false. It may make sense to show absent categories for tabulations. Others may have different views.

        Comment


        • #5
          I agree strongly with #4. #3 is (to be blunt) an indefensible proposal in my view. If you want to write your command flagging what is possible in principle as compared with what happens in practice, that's fine.

          On the main issue, see also tabcount from SSC.

          Comment


          • #6
            Thank you so much!

            Comment

            Working...
            X