Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Summing columns

    This is a really stupid question, but I've been searching for help to no avail.

    I'm pretty new to Stata, I've used it in school to run regressions with pre-cleaned data but that's about it.

    I'm trying to get some summary stats and I'm at my wit's end. I have number of car crashes per state by year and I want to calculate a total number of crashes per year in each state.

    In Excel the formula would be something like =sumifs(state column, specific state, year column, specific year, number of crashes). Hopefully that makes sense, I'm new to asking for help with data stuff. I could use something like that to make a table with the information I want.

    I have no clue how to do it in Stata. Sum gives me median, min-max, st. dev, etc, but not a column total. The closest I've been able to get is sorting by year and doing:

    by year: tab number_crashes if state == x

    but that just gives me a table of frequencies. It seems like there should be option or something to get a column total of the values instead of frequencies.

  • #2
    Branden,

    Try the following:

    Code:
    total number_crashes, over(state year)
    If you want instead to create a variable that contains the total:

    Code:
    bysort state year: egen tot_crashes=total(number_crashes)
    If you want a data set that contains just the totals:

    Code:
    collapse (sum) number_crashes, by(state year)
    Regards,
    Joe

    Comment


    • #3
      Thanks! I'm so used to working with Excel, it's taking some time to adapt to creating new variables and datasets and stuff to do stuff like this. In Excel I'd just set up formulas and get the information I want right there. It's a weird adjustment creating whole new variables for a simple total, or collapsing an entire dataset into a smaller form.

      There so many variations on commands too. I know sort, and by, but this is the first I've heard of bysort. Similarly I use gen all the time to combine variables or create dummies, but I have no clue what egen is. I can appreciate how much more powerful Stata is than Excel and I know I'll end up saving a lot of time in the long run if I learn all this stuff but it's really tempting to just stick with what I know.

      Anyway, I'll try out your suggestions now. Thanks again!

      Comment


      • #4
        Welcome to Statalist!

        It sounds like you're starting to use Stata in a serious way, not just to make it to the end of the semester. If so, when I began using Stata in a serious way, I started by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. All of these manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through Stata's Help menu. The objective in doing this was not so much to master Stata as to be sure I'd become familiar with a wide variety of important basic techniques (like the by: prefix and its variant bysort:) , so that when the time came that I needed them, I might recall their existence, if not the full syntax.

        Stata supplies exceptionally good documentation that amply repays the time spent studying it.

        Comment


        • #5
          When gazing at the view of a Stata dataset provided in the Browse window, or some -list- output, it is tempting to think of a Stata dataset as a spreadsheet and, if one has been using Excel extensively, to then draw on one's Excel-based experience in approaching data analysis. But a Stata data set is not a spreadsheet, and your Excel-driven instincts are almost never helpful, and often get badly in the way. To help break the tendency to apply Excel habits in Stata it is helpful to not use Excel-based vocabulary when talking about Stata. So, in Stata we don't speak of columns, we speak of variables. In Stata, we don't speak of rows: we speak of observations. Using a distinct vocabulary helps keep the thought processes of using Stata and using Excel separate in your mind.

          Comment


          • #6
            Originally posted by William Lisowski View Post
            Welcome to Statalist!

            It sounds like you're starting to use Stata in a serious way, not just to make it to the end of the semester. If so, when I began using Stata in a serious way, I started by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. All of these manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through Stata's Help menu. The objective in doing this was not so much to master Stata as to be sure I'd become familiar with a wide variety of important basic techniques (like the by: prefix and its variant bysort:) , so that when the time came that I needed them, I might recall their existence, if not the full syntax.

            Stata supplies exceptionally good documentation that amply repays the time spent studying it.
            Yeah, I recently graduated and I'm starting to do my own projects with messier data than I'm used to. I'm trying to get a better grasp of Stata but thinking ahead I'm not sure how much time I should devote to it, it's hard to tell what will be best for my career with regards to learning Stata vs. R vs. whatever else.

            Thanks for the advice! Clyde as well, I appreciate it.

            Comment

            Working...
            X