Dropping observations

Anoush Khachatryan

Join Date: Sep 2021

Posts: 56
#1

Dropping observations

12 Mar 2022, 20:06

Hello Members,

I have a dataset that looks like this:

Code:

clear input float(time v1 v2) 1 500 20 2 500 15 3 500 12 1 200 20 2 200 15 3 200 12 1 100 20 2 100 15 3 100 12 end tempfile dataset1 save `dataset1'

I want to replace the 500 values at time=2 and time=3 with 200 and 100. The actual dataset I'm working with is much larger, so a simple -replace- or -drop- would take a very long time to compute. The end result should look something like this:

Code:

clear input float(time v1 v2) 1 500 20 2 200 15 3 100 12 end tempfile dataset1 save `dataset1'

I would appreciate any assistance!

Thanks,
Anoush K.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29799
#2

12 Mar 2022, 20:49

Code:

replace v1 = 200 if time == 2 replace v1 = 100 if time == 3

The actual dataset I'm working with is much larger, so a simple -replace- or -drop- would take a very long time to compute.

-replace- is very fast. I expanded your example data to a total of 900,000 observations, and this code took 0.016 seconds to run. How much bigger than that is your data set?
Comment
Anoush Khachatryan

Join Date: Sep 2021

Posts: 56
#3

13 Mar 2022, 07:55

Clyde Schechter thank you for your response. My dataset contains several thousand observations with many different time and v1 combinations. Is there some sort of loop I can run that would replace these observations and hold true for each different time/v1 combination?

Thanks,
Anoush
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

13 Mar 2022, 11:24

In the data you present in post #1 you show, apparently, one group of observations with particular time and v1 combinations. Perhaps you should present several groups of example data, and explain how groups can be distinguished from each other - do you have a variable that identifies each group of observations?
Comment
Anoush Khachatryan

Join Date: Sep 2021

Posts: 56
#5

13 Mar 2022, 11:48

William Lisowski Thank you for your response. I am unsure how to create groups with this particular dataset. For example, groups should be based on time and v1. Each time (1-3) corresponds to a v1 value (500, 200, 100). These should be their own group. This is how I would like to form the groups:

Code:

clear input float(time v1 v2 group) 1 500 20 1 2 500 15 1 3 500 12 1 1 200 20 2 2 200 15 2 3 200 12 2 1 100 20 3 2 100 15 3 3 100 12 3 end tempfile dataset1 save `dataset1'

I tried forming groups before but was unsuccessful. My previous code grouped all the time=1 together, which would not be correct in this case. Once I have the groups, I think I can use the -drop- command for the group variable if _n>1.

Thanks,
Anoush K.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

13 Mar 2022, 12:22

We cannot tell if you would describe your real data as
many sets with each set containing 9 observations total: 3 different times, each time with the same 3 pairs of values of v1 and v2.

one set containing 10,000 observations total: 100 different times each with 100 pairs of values of v1 and v2.

So - is your total number of observations in your data a perfect square - like 10,000 = 100x100, or is it some multiple of 9, or is it something else altogether that we cannot imagine from your description.

We need to see more data to clarify this.
Comment
Anoush Khachatryan

Join Date: Sep 2021

Posts: 56
#7

13 Mar 2022, 17:57

William Lisowski Thank you for your assistance. I realized I made a slight coding error with the -group- command. I was able to fix it and drop the appropriate observations.

Thanks,
A
Comment

Announcement

Dropping observations

Comment

Comment

Comment

Comment

Comment

Comment