Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drop values in between a range

    Hi everyone,

    I am quite new to stata so would really appreciate your help. Currently I have a database containing firms from all industries marked by SIC code. I'd like to remove finance, insurance, real estate industries with SIC codes between 60 and 67. Hence, I am wondering if there is a simple command that can drop all firms with SIC code between 6xxx and 67xx?

    I tried drop if command but so far has no worked. So far from what I've googled, I see only one sided constraint command such as drop if value>xxx.

    Thanks very much in advance.

  • #2
    You could try
    Code:
     drop if SIC>5999 & SIC<6800
    *if* your SIC codes are all numeric four-digit codes. If this is not the case (some are two or three digit and/or the variable is actually text), then there are more complex ways to handle the situation.

    Comment


    • #3
      They are indeed all four-digit codes. Thanks very much ben.

      Comment


      • #4
        Note that inrange() can help here too.

        Code:
        drop if inrange(SIC, 6000, 6799)

        Comment


        • #5
          Thank you too Nick.

          In addition, I also need to modify the database such that

          1). only keep brands that have available data every consecutive year.
          2). only brands that appear at least three consecutive years.

          Is this the right command to use? Thanks very much again.

          bys gvkey: gen nyear=[_N]
          keep if nyear>=3

          Comment


          • #6
            The two problems are quite different. I think you may need to show us more about your data; otherwise we have to make guesses. For example, does "brand" mean gvkey? is there at most one observation for each gvkey and year?



            Comment


            • #7
              You are right, it is indeed gvkey. It is a panel dataset over 10 years of time period. However, not every gvkey has 10 years data and some only with 1 or 2 years data available. The database is very large so I am not sure if the available data are always displayed in consecutive years or there are gap years. (for example, only 1997 and 2001 data are available). In addition, I have quite a few variables per gvkey. So there are more than one observation for each gvkey and year.

              Comment


              • #8
                You can get the number of years represented for each firm and year by

                Code:
                egen tag  = tag(gvkey year)
                egen ntags = total(tag), by(gvkey)
                and then

                Code:
                keep if ntags >= 3
                Consecutive years? Try

                Code:
                bysort gvkey (tag year) : gen consec = tag & tag[_n-1] & (year ==  year[_n-1] + 1)
                by gvkey : egen nconsec = total(consec)
                drop if nconsec < 3

                Comment


                • #9
                  Thanks very much Nick!

                  When I used

                  Code:
                  keep if ntags >=3

                  no observations dropped. But when I tried the second set of command, some observations are deleted successfully.

                  What I still don't understand well is how to do this 1). only keep firms that have available data every consecutive year in the sample..

                  Comment


                  • #10
                    You just look for firms with 10 tags.

                    Comment


                    • #11
                      Nick Cox, is there also a simple command to drop data with the location variable not equal to USA. Maybe something with tags as well? Let's phrase it different. I only need to keep those firms that are located in the USA.

                      Comment


                      • #12
                        You could try
                        Code:
                        keep
                        This function keeps als vars with in varlist. You may combine it with a an if clause sach as

                        Code:
                        keep varlist if locationvar=="USA"
                        or

                        Code:
                        keep if locationvar=="USA"
                        when you want to keep all variables within the set.

                        Comment


                        • #13
                          thanks!

                          Comment


                          • #14
                            One more question. I have this variable Total Assets and I would like to drop all firms in the dataset that have missing values for this Total assets variable. In other words I only want to keep the firms that have nonmissing total assets. But for some firms there are missing values for only 4/15 years for example (but the whole firm needs to be removed....) What is the best way to deal with this?
                            Last edited by Roy Steinvoort; 22 Jan 2016, 04:18.

                            Comment


                            • #15
                              I have another question. My dataset is from 1992-2007. There is this variable costat which indicates if a firm is A=active or I=inactive per year. I need to only keep those firms that were active in 1993 (although they might be inactive for a while now), this is crucial for further research. Can someone help me out with the commands to keep only those firms that were (once) active in 1993? Thanks in advance

                              Comment

                              Working...
                              X