Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help producing line graph

    Hi all,

    I have somewhat of embarrassing question. I am trying to make a line graph that displays what percentage of the observations (or individuals) fall into different categories over time-- like as seen in the image attached. Does anyone have any idea of how to do so?

    Thanks,
    Claire
    Attached Files

  • #2
    Code:
    // open example data
    sysuse nlsw88, clear
    
    // those who are not married and not never married are divorced or widowed
    gen divorced = married == 0 & never_married == 0 if !missing(married, never_married)
    
    // the mean of an indicator (dummy) variable is the proportion
    // so this command creates a dataset that per age contains the proportion of
    // married, never_married, and divorced/widowed individuals
    collapse (mean) married never_married divorced, by(age)
    
    // turn proportions into percentages
    replace married = married*100
    replace never_married = never_married*100
    replace divorced = divorced*100
    
    // nicer labels
    label var married "married"
    label var never_married "never married"
    label var divorced "divorced or widowed"
    
    // graph
    twoway line married never_married divorced age
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      That worked great. Thank you! If I wanted to make different graphs separated by sex how would I do so? This collapses the data frame and gets rid of other variables I would want to control by.

      Comment


      • #4
        The example dataset I used is all women, so separating by sex would be very uninteresting. So here I separated by college degree:

        Code:
        // open example data
        sysuse nlsw88, clear
        
        // those who are not married and not never married are divorced or widowed
        gen divorced = married == 0 & never_married == 0 if !missing(married, never_married)
        
        // the mean of an indicator (dummy) variable is the proportion
        // so this command creates a dataset that per age contains the proportion of
        // married, never_married, and divorced/widowed individuals
        collapse (mean) married never_married divorced, by(age collgrad) //<-- changed the by() option
        
        // turn proportions into percentages
        replace married = married*100
        replace never_married = never_married*100
        replace divorced = divorced*100
        
        // nicer labels
        label var married "married"
        label var never_married "never married"
        label var divorced "divorced or widowed"
        
        // graph
        twoway line married never_married divorced age, by(collgrad) // <-- changed the graph command
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          One more question-- how would you add a moving average to this type of line graph? I tried to use tssmooth ma managers_ma = managers, window(1 1 1) but was getting a very jagged graph...

          Comment


          • #6
            A moving average that is just the mean of lagging, present and leading values will indeed not smooth much. It sounds as you need to smooth (much) more, and better results may follow from using a longer window and also unequal weights. Equal weights mean that smooth values jump up or down as values enter or leave the window, whereas tapered weights reduce that effect.

            Also, considering the kind of data shown in #1:

            1. Special effects can follow from what is legal, traditional or possible at different ages.

            2. There can be side-effects with percents that are components of a total.
            Last edited by Nick Cox; 22 Aug 2020, 07:36.

            Comment


            • #7
              The problem is more that since I have collapsed by dataset (as seen above), I cannot create a moving average as I get the error message that the "variable wave not found". Is there anyway to create a moving average off of the code seen above?

              Comment


              • #8
                #5 says you have a moving average which isn't smooth enough and #7 says that you can't create a moving average at all. Which is it? In neither case do you give any data example or code.

                I suggest that you back up and read https://www.statalist.org/forums/help#stata and come back with a data example and show the code you tried. Or, equivalently, create an analogue of your problem using a standard Stata dataset, as Maarten Buis did in #2 and #4.

                Comment


                • #9
                  Okay using the data set Maarten used above... I have now figure out how to create the following with a moving average by including three lines I have added to the code Maarten provided above.

                  Code:
                   
                   // open example data sysuse nlsw88, clear  // those who are not married and not never married are divorced or widowed gen divorced = married == 0 & never_married == 0 if !missing(married, never_married)  // the mean of an indicator (dummy) variable is the proportion // so this command creates a dataset that per age contains the proportion of // married, never_married, and divorced/widowed individuals collapse (mean) married never_married divorced, by(age) //<-- changed the by() option  // turn proportions into percentages replace married = married*100 replace never_married = never_married*100 replace divorced = divorced*100  //MA Variable tsset age tssmooth ma married_ma = married, window(1 1 1) tssmooth ma never_married_ma = never_married, window(1 1 1) tssmooth ma divorced_ma = divorced, window(1 1 1)  // nicer labels label var married_ma "married" label var never_married_ma "never married" label var divorced_ma "divorced or widowed"  // graph twoway line married_ma never_married_ma divorced_ma age
                  Yet, I have been trying to create different graphs by different categories (such as by(collgrad)) and have (1) been receiving the error "repeated time values in sample" following tsset age and (2) the error "variable wave not found" when creating the moving average variables. I have tried to include the wave ID variable when collapsing the data, but have then have been receiving the error "window() invalid -- invalid numlist has elements outside of allowed range" following the creation of my moving average variables. Do you have any idea of how I could fix this? Thank you, Claire
                  Last edited by Claire Wright; 28 Aug 2020, 10:06.

                  Comment


                  • #10
                    This works for me:


                    Code:
                    // open example data 
                    sysuse nlsw88, clear  
                    // those who are not married and not never married are divorced or widowed 
                    gen divorced = married == 0 & never_married == 0 if !missing(married, never_married)  
                    // the mean of an indicator (dummy) variable is the proportion 
                    // so this command creates a dataset that per age contains the proportion of 
                    // married, never_married, and divorced/widowed individuals 
                    collapse (mean) married never_married divorced, by(age) 
                    //<-- changed the by() option  
                    // turn proportions into percentages 
                    replace married = married*100 
                    replace never_married = never_married*100 
                    replace divorced = divorced*100  
                    
                    list 
                    
                    //MA Variable 
                    tsset age 
                    tssmooth ma married_ma = married, window(1 1 1) 
                    tssmooth ma never_married_ma = never_married, window(1 1 1) 
                    tssmooth ma divorced_ma = divorced, window(1 1 1)  
                    
                    // nicer labels 
                    label var married_ma "married" 
                    label var never_married_ma "never married" 
                    label var divorced_ma "divorced or widowed"  
                    
                    // graph 
                    
                    twoway line married_ma never_married_ma divorced_ma age

                    Comment


                    • #11
                      Yes that works for me as well. But, I am getting the errors mentioned above when I try to make different graphs by collgrad.

                      Comment


                      • #12
                        \\example

                        // open example data
                        sysuse nlsw88, clear
                        // those who are not married and not never married are divorced or widowed
                        gen divorced = married == 0 & never_married == 0 if !missing(married, never_married)

                        // the mean of an indicator (dummy) variable is the proportion
                        // so this command creates a dataset that per age contains the proportion of
                        // married, never_married, and divorced/widowed individuals
                        collapse (mean) married never_married divorced, by(age collgrad) //<-- changed the by() option

                        // turn proportions into percentages
                        replace married = married*100
                        replace never_married = never_married*100
                        replace divorced = divorced*100
                        list

                        //MA Variable
                        tsset age
                        tssmooth ma married_ma = married, window(1 1 1), by(collgrad)
                        tssmooth ma never_married_ma = never_married, window(1 1 1), by(collgrad)
                        tssmooth ma divorced_ma = divorced, window(1 1 1), by(collgrad)

                        // nicer labels
                        label var married_ma "married"
                        label var never_married_ma "never married"
                        label var divorced_ma "divorced or widowed"

                        // graph

                        twoway line married_ma never_married_ma divorced_ma age, by(collgrad)

                        Comment


                        • #13
                          OK, but as usual, it helps mightily if you show the code that failed as otherwise we're obliged to guess what you tried (at best).

                          If with the nlsw88 data you collapsed by collgrad age then you're producing the equivalent of a panel dataset and the implication is that you must tsset collgrad age as you've seen that a time variable alone isn't enough,

                          There is no wave variable in that dataset. I guess you're referring to a variable in your own dataset.

                          There is a choice. Give enough of an example of your own dataset for us to understand and ideally replicate what you did -- and state code in terms of those variables -- or use another example, such as nlsw88

                          EDIT noting #12:


                          #13 is a reply to #11, but the guess seems to have been correct.
                          Last edited by Nick Cox; 28 Aug 2020, 13:12.

                          Comment


                          • #14
                            preserve

                            collapse (mean) changeoccup_female changeoccup_male, by(year agegroup)

                            replace changeoccup_female = changeoccup_female*100
                            replace changeoccup_male = changeoccup_male*100

                            tsset year
                            *This is where I get the error of multiple years
                            tssmooth ma changeoccup_female_ma = changeoccup_female, window(1 1 1) by(agegroup)
                            * Wave not found error
                            tssmooth ma changeoccup_male_ma = changeoccup_male, window(1 1 1) by(agegroup)

                            label var changeoccup_male_ma "Male"
                            label var changeoccup_female_ma "Female"

                            twoway line changeoccup_male_ma changeoccup_female_ma year, by(agegroup) ytitle("Percentage of Individuals") xtitle("Year")
                            restore
                            Last edited by Claire Wright; 28 Aug 2020, 13:10.

                            Comment


                            • #15
                              That is the code from my own project. I am graphing occupation change for males and females across time for different age groups. There is a wave variable, but I am not sure how I should save it when collapsing the data or if it would make sense to generate a new variable?
                              Last edited by Claire Wright; 28 Aug 2020, 13:11.

                              Comment

                              Working...
                              X