Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping observations

    Hello Members,

    I have a dataset that looks like this:

    Code:
    clear
    input float(time v1 v2)
    1 500 20
    2 500 15 
    3 500 12  
    1 200 20 
    2 200 15
    3 200 12 
    1 100 20
    2 100 15
    3 100 12
    end
    tempfile dataset1
    save `dataset1'


    I want to replace the 500 values at time=2 and time=3 with 200 and 100. The actual dataset I'm working with is much larger, so a simple -replace- or -drop- would take a very long time to compute. The end result should look something like this:

    Code:
    clear
    input float(time v1 v2)
    1 500 20
    2 200 15 
    3 100 12  
    end
    tempfile dataset1
    save `dataset1'


    I would appreciate any assistance!

    Thanks,
    Anoush K.


  • #2
    Code:
    replace v1 = 200 if time == 2
    replace v1 = 100 if time == 3
    The actual dataset I'm working with is much larger, so a simple -replace- or -drop- would take a very long time to compute.
    -replace- is very fast. I expanded your example data to a total of 900,000 observations, and this code took 0.016 seconds to run. How much bigger than that is your data set?

    Comment


    • #3
      Clyde Schechter thank you for your response. My dataset contains several thousand observations with many different time and v1 combinations. Is there some sort of loop I can run that would replace these observations and hold true for each different time/v1 combination?

      Thanks,
      Anoush

      Comment


      • #4
        In the data you present in post #1 you show, apparently, one group of observations with particular time and v1 combinations. Perhaps you should present several groups of example data, and explain how groups can be distinguished from each other - do you have a variable that identifies each group of observations?

        Comment


        • #5
          William Lisowski Thank you for your response. I am unsure how to create groups with this particular dataset. For example, groups should be based on time and v1. Each time (1-3) corresponds to a v1 value (500, 200, 100). These should be their own group. This is how I would like to form the groups:

          Code:
          clear
          input float(time v1 v2 group)
          1 500 20 1
          2 500 15 1
          3 500 12 1
          1 200 20 2
          2 200 15 2
          3 200 12 2
          1 100 20 3
          2 100 15 3
          3 100 12 3
          end
          tempfile dataset1
          save `dataset1'
          I tried forming groups before but was unsuccessful. My previous code grouped all the time=1 together, which would not be correct in this case. Once I have the groups, I think I can use the -drop- command for the group variable if _n>1.

          Thanks,
          Anoush K.

          Comment


          • #6
            We cannot tell if you would describe your real data as
            • many sets with each set containing 9 observations total: 3 different times, each time with the same 3 pairs of values of v1 and v2.
            • one set containing 10,000 observations total: 100 different times each with 100 pairs of values of v1 and v2.
            So - is your total number of observations in your data a perfect square - like 10,000 = 100x100, or is it some multiple of 9, or is it something else altogether that we cannot imagine from your description.

            We need to see more data to clarify this.

            Comment


            • #7
              William Lisowski Thank you for your assistance. I realized I made a slight coding error with the -group- command. I was able to fix it and drop the appropriate observations.

              Thanks,
              A

              Comment

              Working...
              X