Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace missing values by matching observations using a different variable

    Hello all,

    I am trying to replace missing values in my data by matching them with a different observation. The data I am working with is from a household survey in Mali. The missing value I am interested in is the region, and the one I want to match it with is a household ID called Main. So, basically I want to replace all the missing values for region with the region value that is listed in a different record by matching the observation that is missing the region with an observation that is not missing the region by the variable Main (which is a household ID).

    Thanks,
    David

  • #2
    I don't understand your situation. Do you have two data sets, with observations that are linked by the household ID variable called Main, and one of these data sets has a region variable that is missing in the other (either entirely, or for some observations)? Then you want to just update the missing values of region from the value of region for the observation having the same value of Main in the second data set? That would be:

    Code:
    use dataset1  // THE DATA SET WITH MISSING VALUES OF REGION
    merge 1:1 MAIN using dataset2, update keepusing(region)
    Or do you have a single data set, where some of the observations have missing values for the variable region and you want to match those observations with some other observation (having a different value of MAIN) in the same data set and copy that value. This would be an entirely different approach. And it require your telling us what kind of criteria you need to apply to a pair of observations to consider them to be a satisfactory matched pair for this purpose.

    Comment


    • #3
      Thanks for the quick reply.

      It's a single data set. But I want to match those that have the same value of MAIN. So it looks like this once I've sorted it:

      Main region
      6623
      6624 Sikasso
      6624
      6625 Sikasso
      6625
      6626 Sikasso
      6626

      6647 Koulikoro
      6647 Koulikoro
      6648
      6649 Koulikoro

      For the above, I would like to replace the missing region values with the region listed for the matching Main value.

      Comment


      • #4
        See the groupwise option of mipolate or stripolate (SSC). http://www.statalist.org/forums/foru...-interpolation

        Comment


        • #5
          This should do it:

          bysort Main (region) : replace region=region[1] if mi(region)

          This assumes that there is no more than one distinct value of region per Main; if there are multiple values, then you probably don't want to do this anyway.

          hth,
          Jeph

          Comment


          • #6
            The solution in #4 ensures that the problem of multiple candidates mentioned in #5 by Jeph does not bite.

            Comment


            • #7
              Thanks all for the help! In this case there is only one distinct value of region per Main so I was able to use Jeph's solution and it worked great.

              Comment


              • #8
                Hi all,

                I have a similar problem, I am trying to merge Indian Human Development Survey (IHDS) Household file with Non resident file. (1:m) as there is multiple non residents from same household.

                Household file uses stateid (2digit)+distid (2digit)+psuid (3digit) +hhid (3digit) ? hhsplitid (1 digit) to create a 10 digit unique id called idhh as you can below.

                Household File dataex

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input int(STATEID DISTID PSUID HHID HHSPLITID) str10 IDHH
                1 2 1  10 1 "0102010101"
                1 2 1  20 1 "0102010201"
                1 2 1  30 1 "0102010301"
                1 2 1  40 1 "0102010401"
                1 2 1  50 1 "0102010501"
                1 2 1  60 1 "0102010601"
                1 2 1  70 1 "0102010701"
                1 2 1  80 1 "0102010801"
                1 2 1  90 1 "0102010901"
                1 2 1  90 2 "0102010902"
                1 2 1 100 1 "0102011001"
                1 2 1 110 1 "0102011101"
                1 2 1 120 1 "0102011201"
                1 2 1 130 1 "0102011301"
                1 2 1 140 1 "0102011401"
                1 2 1 160 1 "0102011601"
                1 2 1 170 1 "0102011701"
                1 2 1 180 1 "0102011801"
                1 2 1 190 1 "0102011901"
                1 2 1 200 1 "0102012001"
                1 2 2  10 1 "0102020101"
                1 2 2  20 1 "0102020201"
                1 2 2  30 1 "0102020301"
                1 2 2  40 1 "0102020401"
                1 2 2  50 1 "0102020501"
                1 2 2  60 1 "0102020601"
                1 2 2  70 1 "0102020701"
                1 2 2  80 1 "0102020801"
                1 2 2  90 1 "0102020901"
                1 2 2 100 1 "0102021001"
                1 2 2 110 1 "0102021101"
                1 2 2 120 1 "0102021201"
                1 2 2 120 2 "0102021202"
                1 2 2 120 3 "0102021203"
                1 2 2 130 1 "0102021301"
                1 2 2 140 1 "0102021401"
                1 2 2 150 1 "0102021501"
                1 2 2 160 1 "0102021601"
                1 2 2 170 1 "0102021701"
                1 2 2 180 1 "0102021801"
                1 2 2 190 1 "0102021901"
                1 2 2 200 1 "0102022001"
                1 2 3  10 1 "0102030101"
                1 2 3  20 1 "0102030201"
                1 2 3  40 1 "0102030401"
                1 2 3  50 1 "0102030501"
                1 2 3  60 1 "0102030601"
                1 2 3  70 1 "0102030701"
                1 2 3  80 1 "0102030801"
                1 2 3  80 2 "0102030802"
                1 2 3  90 1 "0102030901"
                1 2 3 100 1 "0102031001"
                1 2 3 100 2 "0102031002"
                1 2 3 110 1 "0102031101"
                1 2 3 120 1 "0102031201"
                1 2 3 130 1 "0102031301"
                1 2 3 140 1 "0102031401"
                1 2 3 150 1 "0102031501"
                1 2 3 160 1 "0102031601"
                1 2 3 170 1 "0102031701"
                1 2 3 180 1 "0102031801"
                1 2 3 190 1 "0102031901"
                1 2 3 200 1 "0102032001"
                1 2 4  10 1 "0102040101"
                1 2 4  20 1 "0102040201"
                1 2 4  30 1 "0102040301"
                1 2 4  40 1 "0102040401"
                1 2 4  50 1 "0102040501"
                1 2 4  60 1 "0102040601"
                1 2 4  70 1 "0102040701"
                1 2 4  80 1 "0102040801"
                1 2 4  80 2 "0102040802"
                1 2 4  90 1 "0102040901"
                1 2 4 100 1 "0102041001"
                1 2 4 110 1 "0102041101"
                1 2 4 120 1 "0102041201"
                1 2 4 130 1 "0102041301"
                1 2 4 130 2 "0102041302"
                1 2 4 130 3 "0102041303"
                1 2 4 140 1 "0102041401"
                1 2 4 150 1 "0102041501"
                1 2 4 160 1 "0102041601"
                1 2 4 170 1 "0102041701"
                1 2 4 170 2 "0102041702"
                1 2 4 180 1 "0102041801"
                1 2 4 190 1 "0102041901"
                1 2 4 200 1 "0102042001"
                1 2 5  10 1 "0102050101"
                1 2 5  20 1 "0102050201"
                1 2 5  30 1 "0102050301"
                1 2 5  40 1 "0102050401"
                1 2 5  50 1 "0102050501"
                1 2 5  50 2 "0102050502"
                1 2 5  60 1 "0102050601"
                1 2 5  70 1 "0102050701"
                1 2 5  80 1 "0102050801"
                1 2 5  90 1 "0102050901"
                1 2 5 100 1 "0102051001"
                1 2 5 110 1 "0102051101"
                1 2 5 120 1 "0102051201"
                end
                label values STATEID STATEID
                label def STATEID 1 "Jammu & Kashmir 01", modify


                Non-resident file doesn't have 10 digit unique id called idhh but has the stateid, distid, psuid, hhid, hhsplitd. I created the unique ID and 13,821 observations match while 421 doesn't.
                The problem I think is if you take the first idhh from the non resident data below and look for its idhh in the household file you will see that stateid is 00 but the unique idhh has 0404010109 instead of 0400010109. The survey people used some logic which is missing from their data documentation for this irregularity.

                list distid if stateid==04 & psuid==01 & hhid==010 & hhsplitid==9

                +--------+
                | distid |
                |--------|
                3899. | 0 |
                +--------+

                . list idhh if stateid==04 & psuid==01 & hhid==010 & hhsplitid==9

                +------------+
                | idhh |
                |------------|
                3899. | 0404010109 |
                +------------+

                I want to replace the idhh values in the non-resident data with the idhh values from household data if the stateid,psuid,hhid,hhsplitid matches for each row.

                Non-resident file dataex
                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input int(STATEID DISTID PSUID HHID HHSPLITID) str10 IDHH
                 4 0 1  10 9 "0400010109"
                 4 0 1  30 9 "0400010309"
                 4 0 1  30 9 "0400010309"
                 4 0 1  60 9 "0400010609"
                 4 0 1  60 9 "0400010609"
                 4 0 1  60 9 "0400010609"
                 4 0 1  60 9 "0400010609"
                 4 0 1  70 1 "0400010701"
                 4 0 1  70 2 "0400010702"
                 4 0 1  80 1 "0400010801"
                 4 0 1  80 1 "0400010801"
                 4 0 1  90 1 "0400010901"
                 4 0 1  90 1 "0400010901"
                 4 0 1 100 9 "0400011009"
                 4 0 1 120 9 "0400011209"
                 4 0 1 130 9 "0400011309"
                 4 0 1 150 1 "0400011501"
                 4 0 2 110 1 "0400021101"
                 4 0 2 120 1 "0400021201"
                 4 0 2 130 1 "0400021301"
                 4 0 3  10 1 "0400030101"
                 4 0 3  50 1 "0400030501"
                 4 0 3 140 1 "0400031401"
                 4 0 3 140 1 "0400031401"
                 4 0 4  10 9 "0400040109"
                 4 0 4  10 9 "0400040109"
                 4 0 4  20 1 "0400040201"
                 4 0 4  20 1 "0400040201"
                 4 0 4  40 1 "0400040401"
                 4 0 4  50 9 "0400040509"
                 4 0 4  80 1 "0400040801"
                 4 0 4  80 1 "0400040801"
                 4 0 4  80 2 "0400040802"
                 4 0 4  90 1 "0400040901"
                 4 0 4  90 1 "0400040901"
                 4 0 4  90 1 "0400040901"
                 4 0 4 100 1 "0400041001"
                 4 0 4 100 1 "0400041001"
                 4 0 4 100 1 "0400041001"
                 4 0 4 100 1 "0400041001"
                 4 0 4 100 1 "0400041001"
                 4 0 4 110 1 "0400041101"
                 4 0 4 120 9 "0400041209"
                 4 0 4 120 9 "0400041209"
                 4 0 5  20 1 "0400050201"
                 4 0 5  30 9 "0400050309"
                 4 0 5  40 1 "0400050401"
                 4 0 5  60 9 "0400050609"
                 4 0 5  70 1 "0400050701"
                 4 0 5  80 1 "0400050801"
                 4 0 5 140 9 "0400051409"
                 4 0 6  10 2 "0400060102"
                 4 0 6  10 2 "0400060102"
                 4 0 6  90 9 "0400060909"
                11 0 1  70 1 "1100010701"
                11 0 1  80 1 "1100010801"
                11 0 1  80 1 "1100010801"
                11 0 1 120 1 "1100011201"
                11 0 1 160 1 "1100011601"
                11 0 1 190 1 "1100011901"
                11 0 2  90 1 "1100020901"
                11 0 2 120 1 "1100021201"
                11 0 2 120 1 "1100021201"
                11 0 2 120 1 "1100021201"
                11 0 2 120 1 "1100021201"
                11 0 2 130 1 "1100021301"
                11 0 3  30 1 "1100030301"
                11 0 3  50 1 "1100030501"
                11 0 3  60 1 "1100030601"
                11 0 3  60 1 "1100030601"
                11 0 3  80 2 "1100030802"
                11 0 3  90 1 "1100030901"
                11 0 3  90 1 "1100030901"
                11 0 3  90 2 "1100030902"
                11 0 3 130 1 "1100031301"
                11 0 3 140 1 "1100031401"
                11 0 3 150 1 "1100031501"
                11 0 3 150 1 "1100031501"
                11 0 3 160 1 "1100031601"
                11 0 3 180 1 "1100031801"
                11 0 3 190 1 "1100031901"
                11 0 4  40 9 "1100040409"
                11 0 4 100 1 "1100041001"
                11 0 4 120 1 "1100041201"
                11 0 5  30 9 "1100050309"
                11 0 5  70 1 "1100050701"
                11 0 5  80 1 "1100050801"
                11 0 5  90 1 "1100050901"
                11 0 5 110 1 "1100051101"
                11 0 5 110 1 "1100051101"
                11 0 6  10 1 "1100060101"
                11 0 6  10 1 "1100060101"
                11 0 6  10 2 "1100060102"
                11 0 6  20 1 "1100060201"
                11 0 6  30 1 "1100060301"
                11 0 6  40 1 "1100060401"
                11 0 6  50 9 "1100060509"
                11 0 6  60 9 "1100060609"
                11 0 6 100 1 "1100061001"
                11 0 6 100 1 "1100061001"
                end
                label values STATEID STATEID
                label def STATEID 4 "Chandigarh 04", modify
                label def STATEID 11 "Sikkim 11", modify

                Please let me know if anything is unclear.

                Best,
                Anustup

                Comment

                Working...
                X