Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Balancing panel dataset for country pairs by expanding for missing data

    Hi all,

    I have an unbalanced panel dataset consisting of 217 exporting countries, 172 importing countries, and for 30 years.
    I need to expand it by filling missing years, exporters, and importer, to get a balanced panel with a total of 30*217^2 observations.
    I tried the command as in
    https://www.statalist.org/forums/for...783#post843783
    but STATA replies "NOT SORTED"
    Kindly find below an example of my dataset where there is 3 exporters, 4 importers, and 4 years.
    Basically, I want first to get 4^3= 64 observations (48 observations between country pairs and 16 between themselves), and second to fill exporter's and importer's names for the corresponding missing data.
    any help is highly appreciated, thank you so much.

    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str30(exporter importer) long year double simple float pair_id
    "Afghanistan" "Australia" 2014 2.55 1
    "Afghanistan" "Australia" 2015 1.68 1
    "Afghanistan" "Australia" 2016 3.39 1
    "Afghanistan" "Australia" 2017 2.72 1
    "Afghanistan" "Chile" 2015 6 2
    "Afghanistan" "Chile" 2016 6 2
    "Afghanistan" "Chile" 2017 6 2
    "Afghanistan" "United States" 2014 3.58 3
    "Afghanistan" "United States" 2015 3.45 3
    "Afghanistan" "United States" 2016 3.46 3
    "Afghanistan" "United States" 2017 2.75 3
    "Australia" "Australia" 2014 3.23 4
    "Australia" "Australia" 2015 2.94 4
    "Australia" "Australia" 2016 3.04 4
    "Australia" "Australia" 2017 2.97 4
    "Australia" "Chile" 2015 6 5
    "Australia" "Chile" 2016 6 5
    "Australia" "Chile" 2017 6 5
    "Australia" "United States" 2014 3.52 6
    "Australia" "United States" 2015 3.45 6
    "Australia" "United States" 2016 3.46 6
    "Australia" "United States" 2017 3.24 6
    "Chile" "Australia" 2014 3.07 7
    "Chile" "Australia" 2015 2.81 7
    "Chile" "Australia" 2016 2.92 7
    "Chile" "Australia" 2017 2.93 7
    "Chile" "Chile" 2015 6 8
    "Chile" "Chile" 2016 6 8
    "Chile" "United States" 2014 3.28 9
    "Chile" "United States" 2015 3.16 9
    "Chile" "United States" 2016 3.18 9
    "Chile" "United States" 2017 6.16 9
    end
    [/CODE]
    ------------------ copy up to and including the previous line ------------------

    Listed 32 out of 32 observations


  • #2
    The below code snippet gives a full panel with pair_id as the panel variable:

    Code:
    xtset pair_id year
    tsfill, full
    bysort pair_id: egen temp = mode(exporter)
    replace exporter = temp if missing(exporter)
    drop temp
    bysort pair_id: egen temp = mode(importer)
    replace importer = temp if missing(importer) 
    drop temp
    However, this only gives 36 observations as opposed to the 64 you want. Could you explain what other observations you want to be in the dataset that aren't there now?

    Comment


    • #3
      Thank you, David

      In the data sample I provided, I have three exporters e.g. ( Afghanistan, Australia, and Chile) and four importers e.g. (same countries and the USA).
      I need each country pair to have 4 observations as follow:
      AFG AFG from 2014 to 2017
      AFG USA from 2014 to 2017
      AFG AUS from 2014 to 2017
      AFG Chile from 2014 to 2017
      thus for every importer, I need to have 16 observations.
      in the case of Afganistan, 4 observations are missing and I need Stata to generate these missing observations e.g. AFG AFG from 2014 to 2017
      while for Australia and Chile 4 observations each, e.g. Chile AFG and AUS AFG from 2014 to 2017
      whereas for the USA 16 observations are missing, so I need to generate these missing observations as explained above.
      At the end, I will have 4 exporters, 4 importers and 4 years, which gives 64 observations.
      These are the data that are not there.

      Comment


      • #4
        Rabin:
        In general, your approach is not recommended.
        Stata can handle both unbalanced and balanced panel with no problem.
        More substantively, the way you fill missing values affect your subsequent inference; if the missing mechanism is not ignorable and you skip modeling it, the results you get are far from being reliable.
        Last edited by Carlo Lazzaro; 19 Apr 2019, 02:26.
        Kind regards,
        Carlo
        (StataNow 19.0)

        Comment


        • #5
          Thanks, Carlo,

          However, after expanding my dataset with the missing observations I will fill them with data from available resources, but I need to get this first done.

          Comment


          • #6
            Rabin:
            the main issue is if the dataset thet you get after filling in the missing values according to your approach, resembles (or not) your original sample.
            Kind regards,
            Carlo
            (StataNow 19.0)

            Comment

            Working...
            X