Deleting unknown variables

Fredrik Jones

Join Date: May 2018

Posts: 2
#1

Deleting unknown variables

30 May 2018, 08:22

Hi,
I have a dataset with several variables that I need to subtract from each other, as in the example:

(A-B)+(C-D)

A B C D

. 2 . 3

. 1 1 .

. 3 2 .

1 . 2 .

2 . . 2

5 . . 5

Since they are the same number of observations, I would like to have them so that

A B C D

1 1 1 2

2 2 2 3

5 3 2 5

Note that, I would like also to have them in increasing order if possible.

Does somebody know how to do it?
Tags: None
Matt Warkentin

Join Date: May 2016

Posts: 104
#2

30 May 2018, 08:33

Hi Fredrik,

It's true that there are the same number of non-missing observations for each variable (N = 3), and also that there are the same number of total observations (N = 6), but the issue is that the missing data pattern isn't consistent within a single observation (row). What you are asking can be achieved but you should carefully scrutinize whether it makes sense based on what the data are measuring and how you plan to analyze it.

Removing the interspersed missingness and collapsing columns as you have requested implies that the rows don't matter. Is this reasonable? If the rows are people or some other meaningful unit of measure, collapsing does not make sense, in my mind.
Comment
Fredrik Jones

Join Date: May 2018

Posts: 2
#3

30 May 2018, 08:42

Hi Matt, thanks for answering.

The rows as supposed to be month/year, but I do not need it to be in the same year anymore.

How should I use collapse? I checked on the help window, but the only way for deleting missing variables is by using

Code:

firstnm

or

Code:

lastnm

and it leaves me with just one observation per variable.

Last edited by Fredrik Jones; 30 May 2018, 08:50.
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#4

31 May 2018, 12:48

To expand on Matt's comments, I'm not sure you get what he is saying.

I don't fully understand what you are trying to do. If each observation (in Stata we normally talk about observations and variables - excel uses rows and columns) comes from a different date, then you have to think about how you are going to combine across dates and if that makes sense. Just because of missing data, you don't want to use one variable measured in 2000 and another in 2001 and a third in 1999. It just doesn't work. [The exception is that there are models where such lags make structural sense but that is not what you seemed to indicate here.]

I suppose it could be that it doesn't matter which year something is measured in, but then you would need to have a different data structure. You still need a better way to align the observations than just order in the data set.

While there are probably more sophisticated ways to do it, you can certainly move variables to the earliest observations with complex replace statements using if conditions.

You can always sort the data using sort after you get it aligned. You actually might be able to use sort to move the observations, but I'm not sure how..
Comment
Romalpa Akzo

Join Date: Oct 2017

Posts: 369
#5

31 May 2018, 16:20

A combination of xpose and sortrows (you might need to install this package if you have not) would help, provided that the number of observation is not too large.

Code:

xpose, clear varname sortrows v*, replace xpose, clear
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35405

31 May 2018, 17:03

Like others I don't understand what is being done here but to get from start to finish in #1 there are various routes.

In #5 sortrows is from SSC (Jeff Arnold). It refers to a program of mine with the same name; but I think Jeff meant rowsort (older version on SSC; newer version from Stata Journal).

Consider also fixsort (SSC), which is more direct. .

Code:

clear
input A    B    C    D
.    2    .    3
.    1    1    .
.    3    2    .
1    .    2    .
2    .    .    2
5    .    .    5
end
fixsort *, gen(a b c d)  
list , sep(0)


     +-------------------------------+
     | A   B   C   D   a   b   c   d |
     |-------------------------------|
  1. | .   2   .   3   1   1   1   2 |
  2. | .   1   1   .   2   2   2   3 |
  3. | .   3   2   .   5   3   2   5 |
  4. | 1   .   2   .   .   .   .   . |
  5. | 2   .   .   2   .   .   .   . |
  6. | 5   .   .   5   .   .   .   . |
     +-------------------------------+

Last edited by Nick Cox; 31 May 2018, 17:40.

Comment

Romalpa Akzo

Join Date: Oct 2017

Posts: 369
#7

31 May 2018, 17:42

Nick Cox sensei, thanks for introducing fixsort, among so many convenient packages created by you, What inspire you to give out to the stata world that much?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35405
#8

31 May 2018, 17:58

Romalpa: Thanks for your kind words. I like Stata programming and I was fortunate to come across it a while ago.
Comment

Announcement

Deleting unknown variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment