Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Deleting unknown variables

    Hi,
    I have a dataset with several variables that I need to subtract from each other, as in the example:

    (A-B)+(C-D)
    A B C D
    . 2 . 3
    . 1 1 .
    . 3 2 .
    1 . 2 .
    2 . . 2
    5 . . 5

    Since they are the same number of observations, I would like to have them so that
    A B C D
    1 1 1 2
    2 2 2 3
    5 3 2 5

    Note that, I would like also to have them in increasing order if possible.

    Does somebody know how to do it?


  • #2
    Hi Fredrik,

    It's true that there are the same number of non-missing observations for each variable (N = 3), and also that there are the same number of total observations (N = 6), but the issue is that the missing data pattern isn't consistent within a single observation (row). What you are asking can be achieved but you should carefully scrutinize whether it makes sense based on what the data are measuring and how you plan to analyze it.

    Removing the interspersed missingness and collapsing columns as you have requested implies that the rows don't matter. Is this reasonable? If the rows are people or some other meaningful unit of measure, collapsing does not make sense, in my mind.

    Comment


    • #3
      Hi Matt, thanks for answering.

      The rows as supposed to be month/year, but I do not need it to be in the same year anymore.

      How should I use collapse? I checked on the help window, but the only way for deleting missing variables is by using
      Code:
      firstnm
      or
      Code:
      lastnm
      and it leaves me with just one observation per variable.
      Last edited by Fredrik Jones; 30 May 2018, 08:50.

      Comment


      • #4
        To expand on Matt's comments, I'm not sure you get what he is saying.

        I don't fully understand what you are trying to do. If each observation (in Stata we normally talk about observations and variables - excel uses rows and columns) comes from a different date, then you have to think about how you are going to combine across dates and if that makes sense. Just because of missing data, you don't want to use one variable measured in 2000 and another in 2001 and a third in 1999. It just doesn't work. [The exception is that there are models where such lags make structural sense but that is not what you seemed to indicate here.]

        I suppose it could be that it doesn't matter which year something is measured in, but then you would need to have a different data structure. You still need a better way to align the observations than just order in the data set.

        While there are probably more sophisticated ways to do it, you can certainly move variables to the earliest observations with complex replace statements using if conditions.

        You can always sort the data using sort after you get it aligned. You actually might be able to use sort to move the observations, but I'm not sure how..

        Comment


        • #5
          A combination of xpose and sortrows (you might need to install this package if you have not) would help, provided that the number of observation is not too large.
          Code:
          xpose, clear varname
          sortrows v*, replace
          xpose, clear

          Comment


          • #6
            Like others I don't understand what is being done here but to get from start to finish in #1 there are various routes.

            In #5 sortrows is from SSC (Jeff Arnold). It refers to a program of mine with the same name; but I think Jeff meant rowsort (older version on SSC; newer version from Stata Journal).

            Consider also fixsort (SSC), which is more direct. .

            Code:
            clear
            input A    B    C    D
            .    2    .    3
            .    1    1    .
            .    3    2    .
            1    .    2    .
            2    .    .    2
            5    .    .    5
            end
            fixsort *, gen(a b c d)  
            list , sep(0)
            
            
                 +-------------------------------+
                 | A   B   C   D   a   b   c   d |
                 |-------------------------------|
              1. | .   2   .   3   1   1   1   2 |
              2. | .   1   1   .   2   2   2   3 |
              3. | .   3   2   .   5   3   2   5 |
              4. | 1   .   2   .   .   .   .   . |
              5. | 2   .   .   2   .   .   .   . |
              6. | 5   .   .   5   .   .   .   . |
                 +-------------------------------+
            Last edited by Nick Cox; 31 May 2018, 17:40.

            Comment


            • #7
              Nick Cox sensei, thanks for introducing fixsort, among so many convenient packages created by you, What inspire you to give out to the stata world that much?

              Comment


              • #8
                Romalpa: Thanks for your kind words. I like Stata programming and I was fortunate to come across it a while ago.

                Comment

                Working...
                X