Drop values in between a range

janine law

Join Date: Jan 2016

Posts: 5
#1

Drop values in between a range

14 Jan 2016, 08:44

Hi everyone,

I am quite new to stata so would really appreciate your help. Currently I have a database containing firms from all industries marked by SIC code. I'd like to remove finance, insurance, real estate industries with SIC codes between 60 and 67. Hence, I am wondering if there is a simple command that can drop all firms with SIC code between 6xxx and 67xx?

I tried drop if command but so far has no worked. So far from what I've googled, I see only one sided constraint command such as drop if value>xxx.

Thanks very much in advance.
Tags: None
ben earnhart

Join Date: May 2014

Posts: 1027
#2

14 Jan 2016, 09:12

You could try

Code:

drop if SIC>5999 & SIC<6800

*if* your SIC codes are all numeric four-digit codes. If this is not the case (some are two or three digit and/or the variable is actually text), then there are more complex ways to handle the situation.
Comment
janine law

Join Date: Jan 2016

Posts: 5
#3

14 Jan 2016, 09:17

They are indeed all four-digit codes. Thanks very much ben.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#4

14 Jan 2016, 09:20

Note that inrange() can help here too.

Code:

drop if inrange(SIC, 6000, 6799)
2 likes
Comment
janine law

Join Date: Jan 2016

Posts: 5
#5

14 Jan 2016, 09:35

Thank you too Nick.

In addition, I also need to modify the database such that

1). only keep brands that have available data every consecutive year.
2). only brands that appear at least three consecutive years.

Is this the right command to use? Thanks very much again.

bys gvkey: gen nyear=[_N]
keep if nyear>=3
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#6

14 Jan 2016, 09:40

The two problems are quite different. I think you may need to show us more about your data; otherwise we have to make guesses. For example, does "brand" mean gvkey? is there at most one observation for each gvkey and year?
Comment
janine law

Join Date: Jan 2016

Posts: 5
#7

14 Jan 2016, 09:50

You are right, it is indeed gvkey. It is a panel dataset over 10 years of time period. However, not every gvkey has 10 years data and some only with 1 or 2 years data available. The database is very large so I am not sure if the available data are always displayed in consecutive years or there are gap years. (for example, only 1997 and 2001 data are available). In addition, I have quite a few variables per gvkey. So there are more than one observation for each gvkey and year.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35697

14 Jan 2016, 10:00

You can get the number of years represented for each firm and year by

Code:

egen tag  = tag(gvkey year)
egen ntags = total(tag), by(gvkey)

and then

Code:

keep if ntags >= 3

Consecutive years? Try

Code:

bysort gvkey (tag year) : gen consec = tag & tag[_n-1] & (year ==  year[_n-1] + 1)
by gvkey : egen nconsec = total(consec)
drop if nconsec < 3

Comment

janine law

Join Date: Jan 2016

Posts: 5
#9

14 Jan 2016, 10:39

Thanks very much Nick!

When I used

Code:
keep if ntags >=3

no observations dropped. But when I tried the second set of command, some observations are deleted successfully.

What I still don't understand well is how to do this 1). only keep firms that have available data every consecutive year in the sample..
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#10

14 Jan 2016, 10:42

You just look for firms with 10 tags.
Comment
Roy Steinvoort

Join Date: Jan 2016

Posts: 30
#11

22 Jan 2016, 03:29

Nick Cox, is there also a simple command to drop data with the location variable not equal to USA. Maybe something with tags as well? Let's phrase it different. I only need to keep those firms that are located in the USA.
Comment
Rachel Sleeps

Join Date: May 2015

Posts: 64
#12

22 Jan 2016, 03:35

You could try

Code:

keep

This function keeps als vars with in varlist. You may combine it with a an if clause sach as

Code:

keep varlist if locationvar=="USA"

or

Code:

keep if locationvar=="USA"

when you want to keep all variables within the set.
Comment
Roy Steinvoort

Join Date: Jan 2016

Posts: 30
#13

22 Jan 2016, 03:48

thanks!
Comment
Roy Steinvoort

Join Date: Jan 2016

Posts: 30
#14

22 Jan 2016, 04:14

One more question. I have this variable Total Assets and I would like to drop all firms in the dataset that have missing values for this Total assets variable. In other words I only want to keep the firms that have nonmissing total assets. But for some firms there are missing values for only 4/15 years for example (but the whole firm needs to be removed....) What is the best way to deal with this?

Last edited by Roy Steinvoort; 22 Jan 2016, 04:18.
Comment
Roy Steinvoort

Join Date: Jan 2016

Posts: 30
#15

22 Jan 2016, 05:07

I have another question. My dataset is from 1992-2007. There is this variable costat which indicates if a firm is A=active or I=inactive per year. I need to only keep those firms that were active in 1993 (although they might be inactive for a while now), this is crucial for further research. Can someone help me out with the commands to keep only those firms that were (once) active in 1993? Thanks in advance
Comment

Announcement

Drop values in between a range

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment