Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate a variable equal to 1 when a country shows missing data for a specific variable

    Dear Statalisters,

    I have appended standardized datasets for different countries with the same variables name across datasets. Yet, some countries do not have observations for specific variables. I'd like to identify, for each variable `var' stored in a macro, which country show all of their data missing. I intend to do this by generating a variable m_`var' that is equal to 1 when the variable country has all of its data missing in `var'. I have no code to show you as I am new to Stata and all of my attempts remained unsuccessful.

    Could anyone help me ?

    Regards,

    Julia

  • #2
    for a specific variable
    Wouldn't this just be
    Code:
    g tag =1 if var ==.
    or
    Code:
    g tag =1 if mi(var)
    If not, why not?

    I have no code to show you as I am new to Stata and all of my attempts remained unsuccessful.
    You must have code, if you made attempts, you must've tried some sort of code that didn't give you the result you wanted. I don't mind you being new to Stata at all, sometimes I feel like I'm new to Stata and I've used it in some form since 2016 (I think), but for us to get anywhere, we need a minimal worked example.

    So, let's get to the bottom of this. For me (or anyone) to help you, firstly, I'll need to see an example of your dataset. Use the dataex command and show me what your dataset looks like (see the FAQ for more info on this).

    Secondly, I'd like to see the code you tried to tag the missing observations.

    If nobody's told you yet, welcome to Statalist.

    Comment


    • #3
      I'd suggest collapse with either first non-missing (firstnm) or last non-missing (lastnm). This works for both string and numeric variables. Here is a demonstration:

      Code:
      clear
      input str5 country x1 x2 str3 x3
      A 1 2 ""
      A 1 2 ""
      A 1 2 ""
      B . 2 "No"
      B . . "Yes"
      B . 2 "No"
      C 4 . "No"
      C . . "No"
      C 4 . "No"
      D 2 . ""
      D 2 . "Yes"
      D 2 . ""
      end
      
      collapse (firstnm) x1 x2 x3, by(country)
      list
      The results will be a new data set with one country per line. If any country shows a "." for a numeric variable or an empty cell "" for a string variable, it'd mean that variable was all missing for that country in the original long data.

      Code:
           +-------------------------+
           | country   x1   x2    x3 |
           |-------------------------|
        1. |       A    1    2       |
        2. |       B    .    2    No |
        3. |       C    4    .    No |
        4. |       D    2    .   Yes |
           +-------------------------+
      Last edited by Ken Chui; 12 Apr 2022, 20:58.

      Comment


      • #4

        Code:
        bysort country (whatever) : gen wanted = missing[1] & missing[_N]
        The logic is that if after sorting the first and last values on whatever are missing for a particular country, then they all are.


        This works for string and numeric variables, but includes extended missing values
        .a to .z as missing.

        You mention a value of 1 for the indicator being true. This code creates values of 0 for the indicator being false, which is by far the most useful flavour of indicator variables in Stata,

        See #4 of the recent thread https://www.statalist.org/forums/for...s-are-the-same for more links and references.

        Comment

        Working...
        X