Generate a variable equal to 1 when a country shows missing data for a specific variable

Julia Simon

Join Date: Apr 2022

Posts: 37
#1

Generate a variable equal to 1 when a country shows missing data for a specific variable

12 Apr 2022, 19:14

Dear Statalisters,

I have appended standardized datasets for different countries with the same variables name across datasets. Yet, some countries do not have observations for specific variables. I'd like to identify, for each variable `var' stored in a macro, which country show all of their data missing. I intend to do this by generating a variable m_`var' that is equal to 1 when the variable country has all of its data missing in `var'. I have no code to show you as I am new to Stata and all of my attempts remained unsuccessful.

Could anyone help me ?

Regards,

Julia
Tags: None
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#2

12 Apr 2022, 20:38

for a specific variable

Wouldn't this just be

Code:

g tag =1 if var ==.

or

Code:

g tag =1 if mi(var)

If not, why not?

I have no code to show you as I am new to Stata and all of my attempts remained unsuccessful.

You must have code, if you made attempts, you must've tried some sort of code that didn't give you the result you wanted. I don't mind you being new to Stata at all, sometimes I feel like I'm new to Stata and I've used it in some form since 2016 (I think), but for us to get anywhere, we need a minimal worked example.

So, let's get to the bottom of this. For me (or anyone) to help you, firstly, I'll need to see an example of your dataset. Use the dataex command and show me what your dataset looks like (see the FAQ for more info on this).

Secondly, I'd like to see the code you tried to tag the missing observations.

If nobody's told you yet, welcome to Statalist.
Comment
Ken Chui

Join Date: Aug 2014

Posts: 1058
#3

12 Apr 2022, 20:56

I'd suggest collapse with either first non-missing (firstnm) or last non-missing (lastnm). This works for both string and numeric variables. Here is a demonstration:

Code:

clear input str5 country x1 x2 str3 x3 A 1 2 "" A 1 2 "" A 1 2 "" B . 2 "No" B . . "Yes" B . 2 "No" C 4 . "No" C . . "No" C 4 . "No" D 2 . "" D 2 . "Yes" D 2 . "" end collapse (firstnm) x1 x2 x3, by(country) list

The results will be a new data set with one country per line. If any country shows a "." for a numeric variable or an empty cell "" for a string variable, it'd mean that variable was all missing for that country in the original long data.

Code:

+-------------------------+ | country x1 x2 x3 | |-------------------------| 1. | A 1 2 | 2. | B . 2 No | 3. | C 4 . No | 4. | D 2 . Yes | +-------------------------+

Last edited by Ken Chui; 12 Apr 2022, 20:58.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#4

13 Apr 2022, 01:22

Code:

bysort country (whatever) : gen wanted = missing[1] & missing[_N]

The logic is that if after sorting the first and last values on whatever are missing for a particular country, then they all are.

This works for string and numeric variables, but includes extended missing values .a to .z as missing.

You mention a value of 1 for the indicator being true. This code creates values of 0 for the indicator being false, which is by far the most useful flavour of indicator variables in Stata,

See #4 of the recent thread https://www.statalist.org/forums/for...s-are-the-same for more links and references.
Comment

Announcement

Generate a variable equal to 1 when a country shows missing data for a specific variable

Comment

Comment

Comment