Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drop observations based on values in certain variables

    Hi,
    I am trying to find observations where the areaXX variables have changed over time.
    Thus, I begin by trying to delete observations where areaXX code has not changed at all over the years (1997 to 2015).

    I realize that I cannot use drop command here as it meant for variables as a whole.
    Not sure how to use replace command either here. Any suggestions ???

    Code:
    gen check=.
    replace check=1 if (area97==area98 & area98==area99 & area99==area00 & area00==area01 ///
    & area01==area02 & area02==area03 & area03==area05 & area05==area06   ///        //2003==2004
                                      & area06==area08 ///                            //2006==2007
                                      & area08==area10 & area10==area11 ///         //2008==2009
                                      & area11==area15 )                            //2011==2012==2013==2014

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str2(fp stab) str20 state str3(countycode towncode) str40 countytownname str6(area97 area98 area99 area00 area01 area02 area03) str7(area05 area06 area08 area10 area11 area15) str72 area15name float check
    "01" "AL" "Alabama" "001" "000" "Autauga County"  "5240"   "5240"   "5240"   "5240"   "5240"   "5240"   "5240"   "  33860" "  33860" "  33860" "  33860" "  33860" "  33860" "Montgomery, AL"                         .
    "01" "AL" "Alabama" "003" "000" "Baldwin County"  "5160"   "5160"   "5160"   "5160"   "5160"   "5160"   "5160"   "0100003" "0100003" "0100003" "0100003" "0100003" "  19300" "Daphne-Fairhope-Foley, AL"              .
    "01" "AL" "Alabama" "005" "000" "Barbour County"  "010004" "010004" "010004" "010004" "010004" "010004" "010004" "0100004" "0100004" "0100004" "0100004" "0100004" "0100004" "Southeast Alabama nonmetropolitan area" .
    "01" "AL" "Alabama" "007" "000" "Bibb County"     "010001" "010001" "010001" "010001" "010001" "010001" "010001" "  13820" "  13820" "  13820" "  13820" "  13820" "  13820" "Birmingham-Hoover, AL"                  .
    "01" "AL" "Alabama" "009" "000" "Blount County"   "1000"   "1000"   "1000"   "1000"   "1000"   "1000"   "1000"   "  13820" "  13820" "  13820" "  13820" "  13820" "  13820" "Birmingham-Hoover, AL"                  .
    "01" "AL" "Alabama" "011" "000" "Bullock County"  "010004" "010004" "010004" "010004" "010004" "010004" "010004" "0100004" "0100004" "0100004" "0100004" "0100004" "0100004" "Southeast Alabama nonmetropolitan area" .
    "01" "AL" "Alabama" "013" "000" "Butler County"   "010004" "010004" "010004" "010004" "010004" "010004" "010004" "0100004" "0100004" "0100004" "0100004" "0100004" "0100004" "Southeast Alabama nonmetropolitan area" .
    "01" "AL" "Alabama" "015" "000" "Calhoun County"  "0450"   "0450"   "0450"   "0450"   "0450"   "0450"   "0450"   "  11500" "  11500" "  11500" "  11500" "  11500" "  11500" "Anniston-Oxford-Jacksonville, AL"       .
    "01" "AL" "Alabama" "017" "000" "Chambers County" "010002" "010002" "010002" "010002" "010002" "010002" "010002" "0100002" "0100002" "0100002" "0100002" "0100002" "0100002" "Northeast Alabama nonmetropolitan area" .
    "01" "AL" "Alabama" "019" "000" "Cherokee County" "010002" "010002" "010002" "010002" "010002" "010002" "010002" "0100002" "0100002" "0100002" "0100002" "0100002" "0100002" "Northeast Alabama nonmetropolitan area" .
    end

  • #2
    It is not clear what your problem is. In your sample data, there are no observations where check has been set to 1. Is that your problem in your full dataset? If so, I note that in 2004 a number of your codes added an extra zero. So for example Cherokee County starts with several years of area being 010002 but then ends with the remaining years being 0100002.

    But if indeed you have some observations in your full data where the entire set of area variables is the same, then I think a misunderstanding of the drop command comes into play. You write

    Thus, I begin by trying to delete observations where areaXX code has not changed at all over the years (1997 to 2015).

    I realize that I cannot use drop command here as it meant for variables as a whole.
    That's not correct. There are two syntaxes for the drop command, as the output of help drop explains to us.
    Code:
        Drop variables
    
            drop varlist
    
    
        Drop observations
    
            drop if exp
    So
    Code:
    drop if check==1
    would eliminate the observations for which area is unchanging.

    With all that said, with your data containing multiple variables for the same quantity (area) at different times, your data is in what is called a wide layout. The experienced users here generally agree that, with few exceptions, Stata makes it much more straightforward to accomplish complex analyses using a long layout of your data rather than a wide layout of the same data. In particular, dropping all the observations for a state/county/town combination with an unchanging area would be as simple as
    Code:
    bysort fp countycode towncode: drop if area[1]==area[_N]

    Comment


    • #3
      Thank you William.
      I was making a quick eyeballing error, and did not realize the extra "0". Thus, I was getting troubled why no observations were being dropped.

      Thanks for the long form data conversion insight too.

      Comment

      Working...
      X