Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Auto data: graphing encoded variable with a dummy variable

    Hello everyone!
    I'm a high-school student who is writing a research paper in applied statistics and who is also a complete newbie in Stata. Right now I'm faced with a problem to which I couldn't find a clear solution on this forum.
    My data set is rather simple. In fact, it's so simple that I'll use "auto.dta" on 16.0 Stata to hopefully explain my problem. I did following things: 1) I encoded the string "make"
    Code:
    encode make, gen(make1)
    2) I created a dummy variable with condition of price being less than 5000
    Code:
     gen price5000=cond(price<5000, 1, 0)
    3) I tried to create a bar graph of dummy variable
    Code:
    catplot make1, by(price5000)
    but I get this
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	194.9 KB
ID:	1643060




    I can see that models are all messed up, but I want to know why the frequency doesn't result in right way.
    Even the slightest bit of help is much appreciated.
    Best regards,
    Yeskendir
    Last edited by Tursynbay Yeskendir; 30 Dec 2021, 15:10.

  • #2
    Your dataset (that is, auto.dta) has precisely one observation for each model, which is what your plots are telling you.
    Code:
    . sysuse auto, clear
    (1978 automobile data)
    
    . encode make, gen(make1)
    
    . gen price5000=cond(price<5000, 1, 0)
    
    . tab make1 price5000
    
                      |       price5000
       Make and model |         0          1 |     Total
    ------------------+----------------------+----------
          AMC Concord |         0          1 |         1 
            AMC Pacer |         0          1 |         1 
           AMC Spirit |         0          1 |         1 
            Audi 5000 |         1          0 |         1 
             Audi Fox |         1          0 |         1 
             BMW 320i |         1          0 |         1 
        Buick Century |         0          1 |         1 
        Buick Electra |         1          0 |         1 
        Buick LeSabre |         1          0 |         1 
           Buick Opel |         0          1 |         1 
    ...
            VW Dasher |         1          0 |         1 
            VW Diesel |         1          0 |         1 
            VW Rabbit |         0          1 |         1 
          VW Scirocco |         1          0 |         1 
            Volvo 260 |         1          0 |         1 
    ------------------+----------------------+----------
                Total |        37         37 |        74
    What was it that you were expecting?

    Comment


    • #3
      Thank you for your reply, William Lisowski!
      I didn't notice that auto.dta has only unique strings for "models". In my data, there are multiple observations for each encoded string.
      Here is the example of my dataset:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input double v1 float v2 byte v3 float(v4 v5) long v6 float v7 int v8
      4.14 215 4 2 2  19 2013 0
      4.32 209 5 3 2 170 2012 0
      4.32 209 5 2 2 170 2012 0
      4.56 208 2 2 2 170 2012 0
       4.7 213 4 2 2  19 2013 0
      4.93 209 4 2 2 198 2012 0
      4.94 208 5 3 2  19 2012 0
      4.94 208 5 1 2  19 2012 0
      5.08 212 5 1 2 114 2013 0
      5.09 211 5 1 1 142 2012 0
      5.14 208 4 1 2 168 2012 0
      5.16 211 5 3 1 142 2012 0
      5.21 209 5 2 1 138 2012 0
      5.21 208 5 2 1 138 2012 0
      5.25 212 5 2 1 138 2013 0
      5.25 213 5 2 1 138 2013 0
      5.29 211 5 2 1 142 2012 0
       5.3 214 5 2 1 138 2013 0
       5.3 212 5 3 2 114 2013 0
      5.32 215 5 2 1 138 2013 0
      5.33 211 5 1 2 114 2012 0
      5.39 211 5 2 1 138 2012 0
      5.39 210 5 2 1 138 2012 0
      5.42 211 5 3 2 114 2012 0
      5.43 209 5 2 2  92 2012 0
      5.43 209 5 1 2 114 2012 0
      5.45 209 5 3 2 114 2012 0
      5.45 213 5 1 2 114 2013 0
      5.52 208 5 1 2 114 2012 0
      5.53 215 6 2 2  19 2013 0
      5.57 208 4 3 2 168 2012 0
      5.59 208 5 3 2 114 2012 0
      5.61 213 5 3 2 114 2013 0
      5.64 215 3 2 2  19 2013 0
      5.66 208 2 3 2 170 2012 0
      5.67 210 5 1 2 114 2012 0
      5.69 218 5 1 1 142 2014 1
       5.7 210 5 3 2 114 2012 0
      5.79 208 5 2 2  92 2012 0
      5.82 214 5 1 1 142 2013 0
      5.84 215 4 2 2 198 2013 0
      5.85 208 4 1 2 170 2012 0
      5.89 214 5 1 2 114 2013 0
      5.91 213 5 2 2 114 2013 0
      5.92 210 5 1 2 172 2012 0
      5.92 212 5 2 2 114 2013 0
      6.01 208 5 1 2  44 2012 0
      6.01 209 4 1 2 170 2012 0
      6.03 209 4 1 2 168 2012 0
      6.04 208 5 3 2  44 2012 0
      6.06 214 6 2 2  19 2013 0
       6.1 217 5 1 2 114 2014 1
       6.1 214 5 3 2 114 2013 0
      6.12 213 5 3 2 196 2013 0
      6.12 213 5 2 2 196 2013 0
      6.12 214 4 2 2  57 2013 0
      6.12 208 5 2 2  44 2012 0
      6.17 208 4 1 2  19 2012 0
      6.17 212 6 2 2  19 2013 0
      6.18 213 6 2 2  19 2013 0
       6.2 215 4 2 2  57 2013 0
      6.21 216 4 2 2 198 2014 1
      6.21 214 5 1 2  46 2013 0
      6.21 216 5 1 2 114 2014 1
      6.26 217 3 1 2 174 2014 1
      6.27 215 5 1 2 114 2013 0
      6.28 208 5 2 2   1 2012 0
      6.29 208 5 2 2  46 2012 0
       6.3 208 5 2 2  34 2012 0
       6.3 216 4 2 2  57 2014 1
      6.31 213 5 3 2  44 2013 0
      6.31 208 5 1 2 172 2012 0
      6.32 214 3 2 2  19 2013 0
      6.36 217 5 3 2 170 2014 1
      6.36 210 5 1 2  42 2012 0
      6.36 217 5 3 2 114 2014 1
      6.38 209 5 2 2  34 2012 0
      6.39 214 5 1 2   1 2013 0
      6.41 210 5 3 2  42 2012 0
      6.42 216 5 3 2 114 2014 1
      6.43 210 5 1 2  44 2012 0
      6.44 211 5 1 2  44 2012 0
      6.44 218 5 1 2 114 2014 1
      6.45 214 5 2 2 114 2013 0
      6.45 208 5 1 2  42 2012 0
      6.45 215 5 1 1 168 2013 0
      6.45 219 5 1 2 114 2014 1
      6.46 217 5 1 2  46 2014 1
      6.47 215 6 3 2  19 2013 0
      6.48 212 5 1 2  42 2013 0
      6.48 212 5 3 2  44 2013 0
      6.49 215 5 3 2 114 2013 0
       6.5 208 5 3 2  42 2012 0
      6.51 215 3 3 2  19 2013 0
      6.52 209 4 1 2  19 2012 0
      6.53 209 5 1 2  42 2012 0
      6.55 208 3 1 2 174 2012 0
      6.56 216 3 1 2 174 2014 1
      6.57 209 5 1 1 169 2012 0
      6.57 214 5 3 2  46 2013 0
      end
      format %tq v2
      v6 is the variable of numbers with encoded strings and v8 is a binary variable.
      When I use
      Code:
      catplot v6 if v8==1, blabel(bar) var1opts(sort(1) descending) percent(v8)
      it doesn't show me the percentage of "1" occurring in v8 for one category of v6.
      Now I know that catplot is just a wrapper for graph bar, so I tried to do this:
      Code:
      graph hbar v8 if v8==1, over(v6) ytitle(%) yla(0 0.25 "25" .5 "50" .75 "75" 1 "100")
      but it doesn't resolve my issue either.
      What are the other ways of showing a bar graph for categories of v6 in terms of v8?
      Last edited by Tursynbay Yeskendir; 31 Dec 2021, 01:46.

      Comment


      • #4
        catplot is from SSC as you are asked to explain (FAQ Advice #12). catplot shows results (frequencies, fractions, percents) for the data selected and not for the data as a subset of what might have been shown.

        The frequencies of a 0, 1 variable can be summarized by the mean of such a variable. So if values are 0,0,0,1,1,1,1,1,1 the mean is 0.7 which is just the fraction of values that are 1. You can recast that mean to percents by fixing the axis labels. A bar chart with percents for such data shows redundantly that percent of 0 = 100 - percent of 1 or conversely. Rather than two bars, there could be just one piece of information.

        With your example data (thanks!) there are several zeros for fraction of 1s so for that reason alone I would not use a bar chart. It's easier to spot those categories with graph dot.

        graph dot by default shows means, which is what you seem to need.

        Code:
        set scheme s1color 
        graph dot v8, over(v6, sort(1)) linetype(line) lines(lc(gs12) lw(thin)) l1title(use better text here) ytitle(... and here say % of whatever) ysc(r(-0.02 .)) yla(0 .2 "20" .4 "40" .6 "60", grid)
        Some of the code here is a matter of taste. I find that the default grid with graph dot often degrades on export to other software, so I reach in and use thin grey solid lines instead. As the results include zeros I lift the category axis to the left.

        Click image for larger version

Name:	Tursynbay.png
Views:	1
Size:	23.1 KB
ID:	1643138

        Comment


        • #5
          Nick Cox, Thank you very much, you've cleared up a lot of things for me!
          However, I don't see a problem with using a bar graph. Here I used your code, replacing dot with hbar:
          Click image for larger version

Name:	Graph1.png
Views:	1
Size:	35.2 KB
ID:	1643199


          Now I want to apologize for not clearly explaining my problem. The thing that I'm looking for are the ways of showing this bar graph without zeros. Do I need to create a new variable to save the proportions/fractions? If so, how one would do that? Thanks again for your reply.
          Last edited by Tursynbay Yeskendir; 31 Dec 2021, 16:25.

          Comment


          • #6
            Is this about what you want?


            EDIT: Belay my original response, Nick Cox 's graph was much better the first time

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input double v1 float v2 byte v3 float(v4 v5) long v6 float v7 int v8
            4.14 215 4 2 2  19 2013 0
            4.32 209 5 3 2 170 2012 0
            4.32 209 5 2 2 170 2012 0
            4.56 208 2 2 2 170 2012 0
             4.7 213 4 2 2  19 2013 0
            4.93 209 4 2 2 198 2012 0
            4.94 208 5 3 2  19 2012 0
            4.94 208 5 1 2  19 2012 0
            5.08 212 5 1 2 114 2013 0
            5.09 211 5 1 1 142 2012 0
            5.14 208 4 1 2 168 2012 0
            5.16 211 5 3 1 142 2012 0
            5.21 209 5 2 1 138 2012 0
            5.21 208 5 2 1 138 2012 0
            5.25 212 5 2 1 138 2013 0
            5.25 213 5 2 1 138 2013 0
            5.29 211 5 2 1 142 2012 0
             5.3 214 5 2 1 138 2013 0
             5.3 212 5 3 2 114 2013 0
            5.32 215 5 2 1 138 2013 0
            5.33 211 5 1 2 114 2012 0
            5.39 211 5 2 1 138 2012 0
            5.39 210 5 2 1 138 2012 0
            5.42 211 5 3 2 114 2012 0
            5.43 209 5 2 2  92 2012 0
            5.43 209 5 1 2 114 2012 0
            5.45 209 5 3 2 114 2012 0
            5.45 213 5 1 2 114 2013 0
            5.52 208 5 1 2 114 2012 0
            5.53 215 6 2 2  19 2013 0
            5.57 208 4 3 2 168 2012 0
            5.59 208 5 3 2 114 2012 0
            5.61 213 5 3 2 114 2013 0
            5.64 215 3 2 2  19 2013 0
            5.66 208 2 3 2 170 2012 0
            5.67 210 5 1 2 114 2012 0
            5.69 218 5 1 1 142 2014 1
             5.7 210 5 3 2 114 2012 0
            5.79 208 5 2 2  92 2012 0
            5.82 214 5 1 1 142 2013 0
            5.84 215 4 2 2 198 2013 0
            5.85 208 4 1 2 170 2012 0
            5.89 214 5 1 2 114 2013 0
            5.91 213 5 2 2 114 2013 0
            5.92 210 5 1 2 172 2012 0
            5.92 212 5 2 2 114 2013 0
            6.01 208 5 1 2  44 2012 0
            6.01 209 4 1 2 170 2012 0
            6.03 209 4 1 2 168 2012 0
            6.04 208 5 3 2  44 2012 0
            6.06 214 6 2 2  19 2013 0
             6.1 217 5 1 2 114 2014 1
             6.1 214 5 3 2 114 2013 0
            6.12 213 5 3 2 196 2013 0
            6.12 213 5 2 2 196 2013 0
            6.12 214 4 2 2  57 2013 0
            6.12 208 5 2 2  44 2012 0
            6.17 208 4 1 2  19 2012 0
            6.17 212 6 2 2  19 2013 0
            6.18 213 6 2 2  19 2013 0
             6.2 215 4 2 2  57 2013 0
            6.21 216 4 2 2 198 2014 1
            6.21 214 5 1 2  46 2013 0
            6.21 216 5 1 2 114 2014 1
            6.26 217 3 1 2 174 2014 1
            6.27 215 5 1 2 114 2013 0
            6.28 208 5 2 2   1 2012 0
            6.29 208 5 2 2  46 2012 0
             6.3 208 5 2 2  34 2012 0
             6.3 216 4 2 2  57 2014 1
            6.31 213 5 3 2  44 2013 0
            6.31 208 5 1 2 172 2012 0
            6.32 214 3 2 2  19 2013 0
            6.36 217 5 3 2 170 2014 1
            6.36 210 5 1 2  42 2012 0
            6.36 217 5 3 2 114 2014 1
            6.38 209 5 2 2  34 2012 0
            6.39 214 5 1 2   1 2013 0
            6.41 210 5 3 2  42 2012 0
            6.42 216 5 3 2 114 2014 1
            6.43 210 5 1 2  44 2012 0
            6.44 211 5 1 2  44 2012 0
            6.44 218 5 1 2 114 2014 1
            6.45 214 5 2 2 114 2013 0
            6.45 208 5 1 2  42 2012 0
            6.45 215 5 1 1 168 2013 0
            6.45 219 5 1 2 114 2014 1
            6.46 217 5 1 2  46 2014 1
            6.47 215 6 3 2  19 2013 0
            6.48 212 5 1 2  42 2013 0
            6.48 212 5 3 2  44 2013 0
            6.49 215 5 3 2 114 2013 0
             6.5 208 5 3 2  42 2012 0
            6.51 215 3 3 2  19 2013 0
            6.52 209 4 1 2  19 2012 0
            6.53 209 5 1 2  42 2012 0
            6.55 208 3 1 2 174 2012 0
            6.56 216 3 1 2 174 2014 1
            6.57 209 5 1 1 169 2012 0
            6.57 214 5 3 2  46 2013 0
            end
            format %tq v2
            
            
            
            graph dot v8, over(v6, sort(1)) linetype(line) lines(lc(gs12) lw(thin)) l1title(use better text here) ytitle(... and here say % of whatever) ysc(r(-0.02 .)) yla(0 .2 "20" .4 "40" .6 "60", grid)
            Last edited by Jared Greathouse; 31 Dec 2021, 17:31.

            Comment


            • #7
              Does this help?

              Code:
              set scheme s1color
              
              egen mean = mean(v8), by(v6)
              
              graph dot v8 if mean > 0, over(v6, sort(1))
              with extra options, or other options, as needed.

              Comment


              • #8
                That works very well. Thank you for answering my basic questions. I'm sure that you are busy but I'm a slow learner, so I really appreciate it! Happy New Year!

                Comment

                Working...
                X