Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • help with generating parental status in EU-SILC

    Dear Statalist Community,

    I am currently undertaking a research project focusing on labor market participation and outcomes in Kosovo, with a specific emphasis on gender disparities. My objective is to measure labor market outcomes separately for men and women, investigate the gender pay gap, and explore whether parenthood is associated with either penalties (motherhood penalty) or premiums (fatherhood premium). Given that Kosovo implemented EU-SILC only since 2018, I plan to merge separate data files for each year (H, D, P, R files) into a unified dataset for each year (2018, 2019, 2020, 2021) and subsequently run regressions, including probit, mincer, and oaxaca analyses.

    I am facing challenges related to demographic variables, particularly in determining the parental status of individuals. There is no direct variable indicating parental status, and I aim to generate this variable for all individuals, additionally noting those with children under 25 and 30. In addition to determining parental status, I am faced with the challenge of obtaining the number of children for each individual.
    To provide context, I have a subsample with a 'dataex' of RB220 (father ID) and PB230 (mother ID), where PB030 represents the personal ID.

    Code:
    * Example generated by -dataex-.            To          install:   ssc         install              dataex
    clear
    input str6(RB220 RB230)
    " "      " "    
    "10001"  "10002"
    " "      " "    
    "170001" "170002"
    " "      " "    
    "170003" "170005"
    "170003" "170005"
    " "      " "    
    "170001" "170002"
    " "      " "    
    " "      " "    
    " "      " "    
    " "      " "    
    "200001" "200002"
    " "      " "    
    "200003" " "    
    "200003" " "    
    "200003" "200004"
    " "      " "    
    " "      " "    
    " "      " "    
    " "      " "    
    "30003"  "30004"
    "60003"  "60004"
    "60001"  " "    
    " "      " "    
    " "      " "    
    "60001"  " "    
    "200003" "200004"
    "200003" ""     
    ""       ""     
    ""       ""     
    "200001" "200002"
    "200003" ""     
    ""       ""     
    ""       ""     
    ""       ""     
    ""       ""     
    ""       ""     
    "30003"  "30004"
    ""       ""     
    ""       ""     
    ""       ""     
    ""       ""     
    end
    The EU-SILC 2021 guidelines shed light on these variables:
    1. PB030: PERSONAL ID
      • Topic: Technical items / Identification
      • Variable Type: Annual
      • Unit: All current household members aged 16 and over
      • Reference Period: Constant
      • Mode of Collection: Frame, register, or interviewer
      • In Use Since: First year of EU-SILC data collection
    2. RB220: FATHER ID (equivalent to PB160)
      • Topic: Person and household characteristics / Demography
      • Variable Type: Annual
      • Unit: All current household members (of any age)
      • Reference Period: Current
      • Mode of Collection: Derived
      • In Use Since: First year of EU-SILC data collection
      • Series’ Differences: From 2021 onwards, foster fathers are excluded
    3. RB230: MOTHER ID (equivalent to PB170)
      • Topic: Person and household characteristics / Demography
      • Variable Type: Annual
      • Unit: All current household members (of any age)
      • Reference Period: Current
      • Mode of Collection: Derived
      • In Use Since: First year of EU-SILC data collection
      • Series’ Differences: From 2021 onwards, foster mothers are excluded
    I am seeking guidance on creating the necessary parental status variable, determining the number of children, and addressing any challenges related to “Error 2000- No Observations” message (it might be to the type of data the info is saved as)

    Your help with the syntax on creating these two variables or guiding me in how to understand this would be greatly appreciated!

    If anyone has the time or needs more info, I am providing the link to the methodological guidelines here: https://circabc.europa.eu/sd/a/f8853...09.12.2020.pdf

    Thank you in advance for your expertise and support!


  • #2
    Your example data doesn't contain the PB030 variable, which is critical, so this code could not be tested and may contain errors. Hopefully, it will point you in the right direction.

    You refer to things such as the age of the children, but there is nothing in the example data that represents that, so no help can be provided for that.

    Code:
    frame put RB220 if !missing(RB220), into(fathers)
    frame fathers {
        duplicates drop
    }
    frame put RB230 if !missing(RB230), into(mothers)
    frame mothers {
        duplicates drop
    }
    
    frlink m:1 PB030, frame(fathers RB220)
    gen byte is_father = !missing(fathers)
    drop fathers
    frame drop fathers
    
    frlink m:1 PB030, frame(mothers RB230)
    gen byte is_mother = !missing(mothers)
    drop mothers
    frame drop mothers

    Comment


    • #3
      Thank you so much for such a prompt response and sorry for not having provided more info. I am providing a 'dataex' of PB030 (renamed pers_id) from a subsample and also the variable for age (there are three potential variables but I want to use RB082 "age in completed years at the time of the interview"), RB090 is sex (if needed to differentiate between mothers and fathers) and RB240 (spouse/partner ID) - maybe it would help to see differences between mothers and fathers who are together whenever they are employed.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str6 pers_id
      "10002" 
      "10003" 
      "10001" 
      "170004"
      "170005"
      "170007"
      "170006"
      "170002"
      "170003"
      "170001"
      "20002" 
      "20001" 
      "200002"
      "200003"
      "200001"
      "200006"
      "200007"
      "200005"
      "200004"
      "30002" 
      "30003" 
      "30004" 
      "30001" 
      "60005" 
      "60003" 
      "60001" 
      "60004" 
      "60002" 
      "200005"
      "200007"
      "200004"
      "200001"
      "200003"
      "200006"
      "200002"
      "30002" 
      "30004" 
      "30005" 
      "30003" 
      "30001" 
      "30006" 
      ""      
      ""      
      ""      
      end
      ------------------ copy up to and including the previous line ------------------

      Listed 44 out of 44 observations


      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte RB082 str6 RB240 byte RB090
       . "10001"  2
       . " "      1
       . "10002"  1
       . " "      1
       . "170003" 2
       . " "      2
       . " "      2
       . "170001" 2
       . "170005" 1
       . "170002" 1
       . "20001"  2
       . "20002"  1
       . "200001" 2
       . "200004" 1
       . "200002" 1
       . " "      1
       . " "      1
       . " "      1
       . "200003" 2
       . "30001"  2
       . "30004"  1
       . "30003"  2
       . "30002"  1
       . " "      1
       . "60004"  1
       . " "      1
       . "60003"  2
       . " "      1
      20 ""       1
      10 ""       1
      39 "200003" 2
      69 "200002" 1
      41 "200004" 1
      15 ""       1
      66 "200001" 2
      30 "30001"  2
      62 "30003"  2
       7 ""       2
      64 "30004"  1
      34 "30002"  1
       4 ""       2
       . ""       .
       . ""       .
       . ""       .
      end
      ------------------ copy up to and including the previous line ------------------

      Listed 44 out of 44 observations

      .


      Also, if you have a quick solution to what to do when I get not enough observation message (I have enough), what can be a problem? Thank you a million times!!!

      Comment


      • #4
        I put them all together here:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str4 year str2 country str6 pers_id byte RB082 str6(RB220 RB230 RB240) byte RB090
        "2020" "XK" "10002"   . " "      " "      "10001"  2
        "2020" "XK" "10003"   . "10001"  "10002"  " "      1
        "2020" "XK" "10001"   . " "      " "      "10002"  1
        "2020" "XK" "170004"  . "170001" "170002" " "      1
        "2020" "XK" "170005"  . " "      " "      "170003" 2
        "2020" "XK" "170007"  . "170003" "170005" " "      2
        "2020" "XK" "170006"  . "170003" "170005" " "      2
        "2020" "XK" "170002"  . " "      " "      "170001" 2
        "2020" "XK" "170003"  . "170001" "170002" "170005" 1
        "2020" "XK" "170001"  . " "      " "      "170002" 1
        "2020" "XK" "20002"   . " "      " "      "20001"  2
        "2020" "XK" "20001"   . " "      " "      "20002"  1
        "2020" "XK" "200002"  . " "      " "      "200001" 2
        "2020" "XK" "200003"  . "200001" "200002" "200004" 1
        "2020" "XK" "200001"  . " "      " "      "200002" 1
        "2020" "XK" "200006"  . "200003" " "      " "      1
        "2020" "XK" "200007"  . "200003" " "      " "      1
        "2020" "XK" "200005"  . "200003" "200004" " "      1
        "2020" "XK" "200004"  . " "      " "      "200003" 2
        "2020" "XK" "30002"   . " "      " "      "30001"  2
        "2020" "XK" "30003"   . " "      " "      "30004"  1
        "2020" "XK" "30004"   . " "      " "      "30003"  2
        "2020" "XK" "30001"   . "30003"  "30004"  "30002"  1
        "2020" "XK" "60005"   . "60003"  "60004"  " "      1
        "2020" "XK" "60003"   . "60001"  " "      "60004"  1
        "2020" "XK" "60001"   . " "      " "      " "      1
        "2020" "XK" "60004"   . " "      " "      "60003"  2
        "2020" "XK" "60002"   . "60001"  " "      " "      1
        "2021" "XK" "200005" 20 "200003" "200004" ""       1
        "2021" "XK" "200007" 10 "200003" ""       ""       1
        "2021" "XK" "200004" 39 ""       ""       "200003" 2
        "2021" "XK" "200001" 69 ""       ""       "200002" 1
        "2021" "XK" "200003" 41 "200001" "200002" "200004" 1
        "2021" "XK" "200006" 15 "200003" ""       ""       1
        "2021" "XK" "200002" 66 ""       ""       "200001" 2
        "2021" "XK" "30002"  30 ""       ""       "30001"  2
        "2021" "XK" "30004"  62 ""       ""       "30003"  2
        "2021" "XK" "30005"   7 ""       ""       ""       2
        "2021" "XK" "30003"  64 ""       ""       "30004"  1
        "2021" "XK" "30001"  34 "30003"  "30004"  "30002"  1
        "2021" "XK" "30006"   4 ""       ""       ""       2
        "2020" "XK" ""        . ""       ""       ""       .
        "2020" "XK" ""        . ""       ""       ""       .
        "2020" "XK" ""        . ""       ""       ""       .
        end

        Comment


        • #5
          Thank you for your response. But in order for the example data to be useful here, what is needed is a single -dataex- output containing all of the variables (including the ones in the original post). Please post back with that.

          Also, if you have a quick solution to what to do when I get not enough observation message (I have enough), what can be a problem? Thank you a million times!!!
          You need to explain the context in which you are getting this message to get more specific advice. However, I will make one general point: we often see questions like this on Statalist where somebody is getting an insufficient observations message but they insist they don't have enough. It always turns out that they, in fact, do not have enough observations. Stata has never been wrong about this, as far as I have seen in over 29 years on Statalist.

          Remember that in almost all Stata commands, any observation that has a missing value for any variable mentioned in the command is automatically excluded from the calculations. Looking at the data you have shown, your data set is bristling with missing values in every variable. Whatever analysis you are undertaking, it would not surprise me to learn that after excluding the observations where some mentioned variable is missing, there is either nothing at all left, or some tiny number of observations that is too small for the requested calculations to support.

          Comment


          • #6
            Hi, thank you for the feedback on the observations. The observations I am providing here are from a sub-sample of 2021 since I only have access to the full database at the premises of a statistical agency. But, I am trying to write the codes in do-files and to have them as basis when I work with the full sample.


            Here's the data ex with:

            1. year (=PB010, HB010, RB010, DB010; year of the survey)
            2. country (=PB020; country of residence)
            3. pers_id (=PB030 and it should be RB030 – since RB030 is all hh members, while PB030 is only 16+ members, and I merged the DBs renaming PB030 and RB030 and HB030 and DB030 as pers_id)
            4. RB032 (=sequential number of the persons in the household)
            5. RB090 (=sex of all hh members)
            6. RB082 (=age in completed years at the time of interview)
            7. RB220 (=father id)
            8. RB230 (=mother id)
            9. RB240 (=spouse_id)



            ----------------------- copy starting from the next line -----------------------
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str4 year str2 country str6 pers_id byte(RB032 RB090 RB082) str6(RB220 RB230 RB240)
            "2020" "XK" "10002"  . 2  . " "      " "      "10001"
            "2020" "XK" "10003"  . 1  . "10001"  "10002"  " "    
            "2020" "XK" "10001"  . 1  . " "      " "      "10002"
            "2020" "XK" "170004" . 1  . "170001" "170002" " "    
            "2020" "XK" "170005" . 2  . " "      " "      "170003"
            "2020" "XK" "170007" . 2  . "170003" "170005" " "    
            "2020" "XK" "170006" . 2  . "170003" "170005" " "    
            "2020" "XK" "170002" . 2  . " "      " "      "170001"
            "2020" "XK" "170003" . 1  . "170001" "170002" "170005"
            "2020" "XK" "170001" . 1  . " "      " "      "170002"
            "2020" "XK" "20002"  . 2  . " "      " "      "20001"
            "2020" "XK" "20001"  . 1  . " "      " "      "20002"
            "2020" "XK" "200002" . 2  . " "      " "      "200001"
            "2020" "XK" "200003" . 1  . "200001" "200002" "200004"
            "2020" "XK" "200001" . 1  . " "      " "      "200002"
            "2020" "XK" "200006" . 1  . "200003" " "      " "    
            "2020" "XK" "200007" . 1  . "200003" " "      " "    
            "2020" "XK" "200005" . 1  . "200003" "200004" " "    
            "2020" "XK" "200004" . 2  . " "      " "      "200003"
            "2020" "XK" "30002"  . 2  . " "      " "      "30001"
            "2020" "XK" "30003"  . 1  . " "      " "      "30004"
            "2020" "XK" "30004"  . 2  . " "      " "      "30003"
            "2020" "XK" "30001"  . 1  . "30003"  "30004"  "30002"
            "2020" "XK" "60005"  . 1  . "60003"  "60004"  " "    
            "2020" "XK" "60003"  . 1  . "60001"  " "      "60004"
            "2020" "XK" "60001"  . 1  . " "      " "      " "    
            "2020" "XK" "60004"  . 2  . " "      " "      "60003"
            "2020" "XK" "60002"  . 1  . "60001"  " "      " "    
            "2021" "XK" "200005" 5 1 20 "200003" "200004" ""     
            "2021" "XK" "200007" 7 1 10 "200003" ""       ""     
            "2021" "XK" "200004" 4 2 39 ""       ""       "200003"
            "2021" "XK" "200001" 1 1 69 ""       ""       "200002"
            "2021" "XK" "200003" 3 1 41 "200001" "200002" "200004"
            "2021" "XK" "200006" 6 1 15 "200003" ""       ""     
            "2021" "XK" "200002" 2 2 66 ""       ""       "200001"
            "2021" "XK" "30002"  2 2 30 ""       ""       "30001"
            "2021" "XK" "30004"  4 2 62 ""       ""       "30003"
            "2021" "XK" "30005"  5 2  7 ""       ""       ""     
            "2021" "XK" "30003"  3 1 64 ""       ""       "30004"
            "2021" "XK" "30001"  1 1 34 "30003"  "30004"  "30002"
            "2021" "XK" "30006"  6 2  4 ""       ""       ""     
            "2020" "XK" ""       . .  . ""       ""       ""     
            "2020" "XK" ""       . .  . ""       ""       ""     
            "2020" "XK" ""       . .  . ""       ""       ""     
            end

            Thanks a million!


            Comment


            • #7
              Here's code to create a parental status variable:
              Code:
              rename RB032 seq
              rename RB090 sex
              rename RB082 age
              rename RB220 father_id
              rename RB230 mother_id
              rename RB240 spouse_id
              
              foreach x in father mother {
                  frame put `x'_id if !missing(`x'_id), into(`x's)
                  frame `x's: duplicates drop
                  frlink m:1 pers_id, frame(`x's `x'_id)
              }
              
              label define parental_status    0    "Not a parent"    ///
                                              1    "Is Father"    ///
                                              2    "Is Mother"
                                              
              //    VERIFY CONSISTENCY OF DATA
              assert missing(fathers) | missing(mothers)
              assert sex == 1 if !missing(fathers)
              assert sex == 2 if !missing(mothers)
              
              gen parental_status:parental_status = 0
              replace parental_status = 1 if !missing(fathers)
              replace parental_status = 2 if !missing(mothers)
              
              drop fathers mothers
              frame drop fathers
              frame drop mothers
              I renamed a bunch of the variables because I find it difficult to work with variabe names like RB###--it's hard to keep straight which is which. And it makes the code impossible to understand if you are not familiar with the data set. I prefer to work with variable names that are descriptive. And the code does rely crucially on having renamed RB220 and RB230 to father_id and mother_id, as it exploits the occurrence of the words father and mother in the variable names. If you are very familiar with the original variable names and comfortable working with them, you can always rename them back to the original after running the code.

              Comment

              Working...
              X