Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Joinby not consistently joining datasets

    I am trying to join a set of weather variables identified by longitude (longnum), latitude (latnum) and date(bdate) to individual data by longitude (longnum), latitude (latnum) and birthdate(bdate). There are 61158 files with unique combination of longitude and latitude points, and so 61158 joins. I broke down the process in batches. Some of the files have successfully joined. However not all the resulting joined datasets have observations, which means that the join is not successful across all the files.

    Upon checking manually I find that the combination of latnum, longnum and bdate exist in both datasets but they are not getting joined. There are no observations with longnum, latnum, or bdate missing in any of the data sets.

    My code is:

    Code:
    clear all
    set more off
    
    global cdd "/Users/tahre/cdd"
    global cdd_dhs "/Users/tahre/cdd_dhs"
    
    forvalues i=35001/36000{
        disp `i
        use "$cdd/heat_`i'.dta", clear    
        joinby bdate longnum latnum using "/Users/tahre/Data/allvars_SA_clean.dta"
    
        save "$cdd_dhs/merged_`i'.dta", replace
    }
    '
    The data type for longnum and latnum in heat_*.dta files is float in format %9.0g and bdate is int in format %td.

    In the allvars_SA_clean.dta file: longnum and latnum are type double in %10.0g format, while bdate is type float in %td.

    I tried to change the data type and format in allvars_SA_clean.dta to match the heat_*.dta with the following code. However, this did not solve the problem.

    Code:
    use allvars_SA_clean.dta, clear
    *format longitude and latitude to match gridpoints
    format %9.0g latnum
    format %9.0g longnum
    
    recast int bdate
    I am using Stata 18.0.

    Is there a way to make sure all the joins are successful?

  • #2
    Well, as you have not provided any example data, it is hard to give a confident answer. But I suspect this is a precision problem.

    With the -joinby- command you have written, Stata will require an exact match on longitude, latitude, and bdate. I believe the problem is that your longitude and latitude variables are double in one file and float in the other. These will, in general, not match exactly. That can happen even if what your eyes see in -list- output or the browser looks like an exact match: there may be far-out decimal places that are non-zero in the double version, but do not appear in the float version.

    I think what you have to do is go to the allvars_SA_clean.dta file and
    Code:
    recast float longnum, force
    recast float latnum, force
    NOW, BE VERY CAREFUL ABOUT THIS. You are going to lose some low-order decimal places in longnum and latnum. I would not overwrite the existing file after doing this: save this as a different file and use this different file for the joins. You might conceivably need the low-order decimal places for some purpose later. Note also that in the joins, there may be observations with values of latnum or longnum from the revised version of allvars_SA_clean.dta that were distinct in the original data set but are now the same in the revised version. In other words, there may now be a problem of "overjoining." But given the differences in precision of the longnum and latnum variables in the two sets of data, there is no choice but to have underjoining or overjoining.

    There is one safer procedure, if it is feasible. Regenerate all of the heat_*.dta datasets so that the longnum and latnum variables are doubles in the first place. But this will only work if the source data from which these variables originally come has the same precision (number of decimal places) as the allvars_SA_clean.dta has.

    Comment


    • #3
      (Crossed with Clyde Schechter's comments, with which I agree.)

      It sounds like you have done a good job of narrowing this down to a small number of observations that you *know* exhibit the problem. Solving your difficulty without seeing examples would be very hard, so I'd strongly encourage you to post two small example datasets that illustrate the problem. To do this, it's essential that you use the -dataex- command in Stata to prepare these examples, as described in the StataList FAQ for new members.

      That being said:
      1) I wonder about precision issues in the original data: If the lat and long in the files being joined differ in (say) the 6th decimal place, they won't join properly Note in this regard that the Stata format only affects the display of the variable, not its storage.

      2) Are you *sure* you want -joinby- rather than -merge-? -joinby- forms all pairwise combinations of observations that match on the group variables (bdate, latnum, and longnum). Of course I don't know what your ultimate goal is, but from what you describe, I can imagine a possible confusion here. This would not make your -joinby- go wrong, of course, but maybe you don't actually need it. Further, using either of these commands with large numbers of files, as you describe, is unusual, which makes me wonder if there might be an easier way to do what you want. (I do appreciate the memory problems that can require processing files a batch at a time). In this regard, some substantive description of what you are doing might help people here understand and clarify this issue.

      Comment


      • #4
        Here are examples of my dataset. First, is allvars_SA_clean.dta file:

        Code:
        * dataex longnum latnum bdate
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input double(longnum latnum) int bdate
        91.85778045654297 24.884700775146484 13631
        88.86125183105469 26.099369049072266 13941
        90.17142486572266 22.189411163330078 13013
        88.83942413330078 23.648990631103516 14400
         90.3418197631836  23.76909637451172 14127
        90.35098266601563 23.799299240112305 13180
        91.39692687988281 22.970067977905273 13766
        90.55355072021484 24.440805435180664 12885
        91.33464813232422 24.045835494995117 14214
        92.08505249023438 22.460346221923828 14490
        91.79778289794922 22.532541275024414 13891
        90.79085540771484 24.191125869750977 14502
        91.07923126220703 22.810792922973633 14065
         91.8548355102539 22.372676849365234 14168
        89.99523162841797 22.063182830810547 14368
        90.73096466064453 23.024248123168945 13294
        90.11418914794922 23.871458053588867 13996
        90.43378448486328 24.013629913330078 13468
        88.35013580322266   25.7891845703125 13302
        90.93897247314453  24.21148109436035 13866
        88.88182067871094 24.031126022338867 14245
        89.41320037841797 23.510133743286133 13924
        89.43077087402344 23.643150329589844 13837
        90.18814086914063 22.642501831054688 12893
        88.88182067871094 24.031126022338867 12966
        88.88182067871094 24.031126022338867 14287
        90.94422912597656 24.232086181640625 13218
        89.92398071289063 23.259105682373047 13624
        89.83019256591797 23.861656188964844 13149
        90.79085540771484 24.191125869750977 12923
        89.36486053466797  24.54058265686035 13166
        90.29139709472656 22.741519927978516 14582
        90.11465454101563 24.422611236572266 13999
        89.60989379882813  23.13684844970703 14228
        90.64472961425781 22.684001922607422 13686
         91.0171127319336 22.889406204223633 14262
        89.05474090576172      24.0830078125 13552
        89.72599792480469 22.962783813476563 13206
        91.85778045654297 24.884700775146484 13513
         89.8399658203125 23.595176696777344 13807
         91.0171127319336 22.889406204223633 13025
        91.46865844726563 22.550188064575195 13885
         91.8995132446289 21.624465942382813 14184
        89.37201690673828  23.46474266052246 13948
        91.85778045654297 24.884700775146484 14139
        89.48926544189453 25.068864822387695 13512
        88.57640838623047  26.35506820678711 13550
         88.7499008178711  25.07000160217285 13085
        90.11465454101563 24.422611236572266 14593
         89.1942367553711 23.556812286376953 12927
        89.81500244140625 25.157245635986328 14112
        90.54685974121094 22.439199447631836 13020
        90.40995788574219 23.712308883666992 13746
        91.87504577636719 24.881223678588867 12799
        89.55054473876953 22.809101104736328 13463
        89.76048278808594  22.49187660217285 14209
        91.41016387939453  25.06570816040039 13760
        89.43077087402344 23.643150329589844 13153
        89.54611206054688  25.32240867614746 13897
        91.17279815673828  23.44853973388672 13381
        90.43460083007813 22.998729705810547 13486
        91.85083770751953 24.891639709472656 14192
        89.22294616699219 22.529712677001953 14203
        90.11465454101563 24.422611236572266 13535
        90.41873931884766 23.745874404907227 14613
        90.37071990966797 22.716875076293945 13105
         91.7061538696289  22.52454376220703 13546
        88.28621673583984 24.589248657226563 14490
        90.02623748779297 24.627826690673828 13894
         89.0627212524414 23.598957061767578 13957
        92.08505249023438 22.460346221923828 14589
        92.01692199707031 22.564403533935547 14326
         91.2950668334961 23.255586624145508 13880
        91.85778045654297 24.884700775146484 13331
        90.71637725830078  22.59821891784668 13268
        89.08354949951172 22.711524963378906 12917
        92.17169189453125 22.145431518554688 14308
        90.30548858642578 23.631616592407227 14429
        91.87432861328125 24.387239456176758 14458
        91.87432861328125 24.387239456176758 14548
        89.82540893554688  23.28156089782715 14493
        90.86991119384766 23.822275161743164 13512
        90.58907318115234 22.316837310791016 13829
        91.05269622802734  25.06073570251465 14515
        89.11957550048828  23.13471221923828 13430
        88.99678802490234  23.05318260192871 14212
        91.11457061767578  23.99697494506836 13608
        91.87504577636719 24.881223678588867 14137
         89.7067642211914  26.11971092224121 14225
        89.76048278808594  22.49187660217285 13929
        91.05269622802734  25.06073570251465 14300
        90.93897247314453  24.21148109436035 14318
        91.32881164550781 24.642860412597656 14097
         90.1637191772461 23.662221908569336 12849
        90.30548858642578 23.631616592407227 14124
        90.79085540771484 24.191125869750977 13468
        88.54264831542969 24.523544311523438 13036
         90.4012680053711 23.905513763427734 13171
        89.51409149169922 25.322484970092773 14335
        91.85083770751953 24.891639709472656 13404
        end
        format %td bdate
        Next, is an example of heat_35001.dta. This is an example file for which the matching does not work:

        Code:
        * dataex longnum latnum bdate cdd_29 cdd_32 cdd_35
        
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input float(longnum latnum) int bdate byte(cdd_29 cdd_32 cdd_35)
        84.41391 27.21664 12054 0 0 0
        84.41391 27.21664 12055 0 0 0
        84.41391 27.21664 12056 0 0 0
        84.41391 27.21664 12057 0 0 0
        84.41391 27.21664 12058 0 0 0
        84.41391 27.21664 12059 0 0 0
        84.41391 27.21664 12060 0 0 0
        84.41391 27.21664 12061 0 0 0
        84.41391 27.21664 12062 0 0 0
        84.41391 27.21664 12063 0 0 0
        84.41391 27.21664 12064 0 0 0
        84.41391 27.21664 12065 0 0 0
        84.41391 27.21664 12066 0 0 0
        84.41391 27.21664 12067 0 0 0
        84.41391 27.21664 12068 0 0 0
        84.41391 27.21664 12069 0 0 0
        84.41391 27.21664 12070 0 0 0
        84.41391 27.21664 12071 0 0 0
        84.41391 27.21664 12072 0 0 0
        84.41391 27.21664 12073 0 0 0
        84.41391 27.21664 12074 0 0 0
        84.41391 27.21664 12075 0 0 0
        84.41391 27.21664 12076 0 0 0
        84.41391 27.21664 12077 0 0 0
        84.41391 27.21664 12078 0 0 0
        84.41391 27.21664 12079 0 0 0
        84.41391 27.21664 12080 0 0 0
        84.41391 27.21664 12081 0 0 0
        84.41391 27.21664 12082 0 0 0
        84.41391 27.21664 12083 0 0 0
        84.41391 27.21664 12084 0 0 0
        84.41391 27.21664 12085 0 0 0
        84.41391 27.21664 12086 0 0 0
        84.41391 27.21664 12087 0 0 0
        84.41391 27.21664 12088 0 0 0
        84.41391 27.21664 12089 0 0 0
        84.41391 27.21664 12090 0 0 0
        84.41391 27.21664 12091 0 0 0
        84.41391 27.21664 12092 0 0 0
        84.41391 27.21664 12093 0 0 0
        84.41391 27.21664 12094 0 0 0
        84.41391 27.21664 12095 0 0 0
        84.41391 27.21664 12096 0 0 0
        84.41391 27.21664 12097 0 0 0
        84.41391 27.21664 12098 0 0 0
        84.41391 27.21664 12099 0 0 0
        84.41391 27.21664 12100 0 0 0
        84.41391 27.21664 12101 0 0 0
        84.41391 27.21664 12102 0 0 0
        84.41391 27.21664 12103 0 0 0
        84.41391 27.21664 12104 0 0 0
        84.41391 27.21664 12105 0 0 0
        84.41391 27.21664 12106 0 0 0
        84.41391 27.21664 12107 0 0 0
        84.41391 27.21664 12108 0 0 0
        84.41391 27.21664 12109 0 0 0
        84.41391 27.21664 12110 0 0 0
        84.41391 27.21664 12111 0 0 0
        84.41391 27.21664 12112 0 0 0
        84.41391 27.21664 12113 0 0 0
        84.41391 27.21664 12114 0 0 0
        84.41391 27.21664 12115 0 0 0
        84.41391 27.21664 12116 0 0 0
        84.41391 27.21664 12117 0 0 0
        84.41391 27.21664 12118 0 0 0
        84.41391 27.21664 12119 0 0 0
        84.41391 27.21664 12120 0 0 0
        84.41391 27.21664 12121 0 0 0
        84.41391 27.21664 12122 0 0 0
        84.41391 27.21664 12123 0 0 0
        84.41391 27.21664 12124 0 0 0
        84.41391 27.21664 12125 0 0 0
        84.41391 27.21664 12126 0 0 0
        84.41391 27.21664 12127 0 0 0
        84.41391 27.21664 12128 0 0 0
        84.41391 27.21664 12129 0 0 0
        84.41391 27.21664 12130 0 0 0
        84.41391 27.21664 12131 0 0 0
        84.41391 27.21664 12132 0 0 0
        84.41391 27.21664 12133 0 0 0
        84.41391 27.21664 12134 0 0 0
        84.41391 27.21664 12135 0 0 0
        84.41391 27.21664 12136 0 0 0
        84.41391 27.21664 12137 0 0 0
        84.41391 27.21664 12138 0 0 0
        84.41391 27.21664 12139 0 0 0
        84.41391 27.21664 12140 0 0 0
        84.41391 27.21664 12141 0 0 0
        84.41391 27.21664 12142 0 0 0
        84.41391 27.21664 12143 0 0 0
        84.41391 27.21664 12144 0 0 0
        84.41391 27.21664 12145 0 0 0
        84.41391 27.21664 12146 0 0 0
        84.41391 27.21664 12147 0 0 0
        84.41391 27.21664 12148 0 0 0
        84.41391 27.21664 12149 0 0 0
        84.41391 27.21664 12150 0 0 0
        84.41391 27.21664 12151 0 0 0
        84.41391 27.21664 12152 0 0 0
        84.41391 27.21664 12153 0 0 0
        end
        format %td bdate
        Next, this is heat_1.dta, which produces successful matches:

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input float(longnum latnum) int bdate float(cdd_29 cdd_32) byte cdd_35
        92.11084 24.784885 12054 0 0 0
        92.11084 24.784885 12055 0 0 0
        92.11084 24.784885 12056 0 0 0
        92.11084 24.784885 12057 0 0 0
        92.11084 24.784885 12058 0 0 0
        92.11084 24.784885 12059 0 0 0
        92.11084 24.784885 12060 0 0 0
        92.11084 24.784885 12061 0 0 0
        92.11084 24.784885 12062 0 0 0
        92.11084 24.784885 12063 0 0 0
        92.11084 24.784885 12064 0 0 0
        92.11084 24.784885 12065 0 0 0
        92.11084 24.784885 12066 0 0 0
        92.11084 24.784885 12067 0 0 0
        92.11084 24.784885 12068 0 0 0
        92.11084 24.784885 12069 0 0 0
        92.11084 24.784885 12070 0 0 0
        92.11084 24.784885 12071 0 0 0
        92.11084 24.784885 12072 0 0 0
        92.11084 24.784885 12073 0 0 0
        92.11084 24.784885 12074 0 0 0
        92.11084 24.784885 12075 0 0 0
        92.11084 24.784885 12076 0 0 0
        92.11084 24.784885 12077 0 0 0
        92.11084 24.784885 12078 0 0 0
        92.11084 24.784885 12079 0 0 0
        92.11084 24.784885 12080 0 0 0
        92.11084 24.784885 12081 0 0 0
        92.11084 24.784885 12082 0 0 0
        92.11084 24.784885 12083 0 0 0
        92.11084 24.784885 12084 0 0 0
        92.11084 24.784885 12085 0 0 0
        92.11084 24.784885 12086 0 0 0
        92.11084 24.784885 12087 0 0 0
        92.11084 24.784885 12088 0 0 0
        92.11084 24.784885 12089 0 0 0
        92.11084 24.784885 12090 0 0 0
        92.11084 24.784885 12091 0 0 0
        92.11084 24.784885 12092 0 0 0
        92.11084 24.784885 12093 0 0 0
        92.11084 24.784885 12094 0 0 0
        92.11084 24.784885 12095 0 0 0
        92.11084 24.784885 12096 0 0 0
        92.11084 24.784885 12097 0 0 0
        92.11084 24.784885 12098 0 0 0
        92.11084 24.784885 12099 0 0 0
        92.11084 24.784885 12100 0 0 0
        92.11084 24.784885 12101 0 0 0
        92.11084 24.784885 12102 0 0 0
        92.11084 24.784885 12103 0 0 0
        92.11084 24.784885 12104 0 0 0
        92.11084 24.784885 12105 0 0 0
        92.11084 24.784885 12106 0 0 0
        92.11084 24.784885 12107 0 0 0
        92.11084 24.784885 12108 0 0 0
        92.11084 24.784885 12109 0 0 0
        92.11084 24.784885 12110 0 0 0
        92.11084 24.784885 12111 0 0 0
        92.11084 24.784885 12112 0 0 0
        92.11084 24.784885 12113 0 0 0
        92.11084 24.784885 12114 0 0 0
        92.11084 24.784885 12115 0 0 0
        92.11084 24.784885 12116 0 0 0
        92.11084 24.784885 12117 0 0 0
        92.11084 24.784885 12118 0 0 0
        92.11084 24.784885 12119 0 0 0
        92.11084 24.784885 12120 0 0 0
        92.11084 24.784885 12121 0 0 0
        92.11084 24.784885 12122 0 0 0
        92.11084 24.784885 12123 0 0 0
        92.11084 24.784885 12124 0 0 0
        92.11084 24.784885 12125 0 0 0
        92.11084 24.784885 12126 0 0 0
        92.11084 24.784885 12127 0 0 0
        92.11084 24.784885 12128 0 0 0
        92.11084 24.784885 12129 0 0 0
        92.11084 24.784885 12130 0 0 0
        92.11084 24.784885 12131 0 0 0
        92.11084 24.784885 12132 0 0 0
        92.11084 24.784885 12133 0 0 0
        92.11084 24.784885 12134 0 0 0
        92.11084 24.784885 12135 0 0 0
        92.11084 24.784885 12136 0 0 0
        92.11084 24.784885 12137 0 0 0
        92.11084 24.784885 12138 0 0 0
        92.11084 24.784885 12139 0 0 0
        92.11084 24.784885 12140 0 0 0
        92.11084 24.784885 12141 0 0 0
        92.11084 24.784885 12142 0 0 0
        92.11084 24.784885 12143 0 0 0
        92.11084 24.784885 12144 0 0 0
        92.11084 24.784885 12145 0 0 0
        92.11084 24.784885 12146 0 0 0
        92.11084 24.784885 12147 0 0 0
        92.11084 24.784885 12148 0 0 0
        92.11084 24.784885 12149 0 0 0
        92.11084 24.784885 12150 0 0 0
        92.11084 24.784885 12151 0 0 0
        92.11084 24.784885 12152 0 0 0
        92.11084 24.784885 12153 0 0 0
        end
        format %td bdate
        Finally, here is an example of a file with matches between files called merge_1.dta with 9 observations:

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input double(longnum latnum) float(bdate cdd_29)
        92.11083984375 24.78488540649414 13153  4.160034
        92.11083984375 24.78488540649414 13384  3.660034
        92.11083984375 24.78488540649414 13424  4.160034
        92.11083984375 24.78488540649414 13649 3.8500366
        92.11083984375 24.78488540649414 13709  3.660034
        92.11083984375 24.78488540649414 13949        .5
        92.11083984375 24.78488540649414 14231  4.160034
        92.11083984375 24.78488540649414 14272 1.9100037
        92.11083984375 24.78488540649414 14331        .5
        end
        format %td bdate

        What I am trying to do: I initially started with an index of lat and long data points from allvars_SA_clean.dta and these were utilized to retrieve temperature data from netCDF files in python. Next, I used those temperature data to create some heat measures and stored them in the heat_*.dta files by each lat long index and they are time series. This step was very time consuming and took roughly 80 hours to complete. Next I use, these heat_*.dta files to match with individual observations in the allvars_SA_clean.dta. The end goal is to have all observations matched to the heat variables created in the heat_*.dta files by the group variables (bdate, latnum, and longnum).

        I have a total of 477,573 observations in the allvars_SA_clean.dta that are distributed among 61,158 lat long combinations.

        I have also tried to -merge- the heat_*.dta and allvars_SA_clean.dta by the group variables (bdate, latnum, and longnum) but it did not give a different outcome.

        Given the examples would #2 still be a good way to go about this?

        Comment


        • #5
          Your examples actually do illustrate the point about precision. Because these examples are shown in decimal, it is not obvious what is going on, in binary, internally in Stata. The decimal number 92.11083984375 actually does have a finite binary representation, and that finite binary representation involves a number of bits that is completely contained within a float. So both the -double- version of this number and the -float- version will match each other exactly.

          By contrast 84.41391 has no finite representation in binary (just as 1/3 has no finite representation in decimal). The first 8 hexadecimal digits (which is what fits inside a float) of the double version agree with the float version. But the double version has additional non-zero digits beyond that, whereas the float does not. If we display both of these in hexadecimal notation you can see the similarities and differences:
          Code:
          . display %21x =float(84.41391)
          +1.51a7d80000000X+006
          
          . display %21x =84.41391
          +1.51a7d805e5f31X+006
          This difference between them prevents the two files from matching with this value of longnum.

          The only "perfect" solution would be to regenerate the heat*.dta files in such a way as to hold the values of longnum and latnum in double precision. I do not program in python, so I do not know if that is impossible, difficult, or easy to do. Then when importing the python output files to Stata be sure to preserve double precision. Given that the original computations took 80 hours, this may or may not be a practical solution for you.

          The alternative is to -recast- the longnum and latnum variables to float in the allvars_SA_clean.dta file (which requires using the dreaded -force- option). This will be pretty fast. The drawback is that it may result in some false matches. You also need to judge whether the false matches are actually a problem or not. I am no geographer and I have no intuitions about longitude and latitude numbers; I never work with this kind of data. It may be that, for practical purposes, all those extra digits in the double precision version correspond to negligible distances on the surface of the earth. Or it may be quite the opposite, and the differences really do matter. I defer to your judgment on this issue.

          Added: There is a concern about the "perfect" solution. If the python programs you are using are just reading in the numbers and spitting them back out, this solution should work well (if feasible). But if the programs are actually calculating these numbers in some way, then there is the possibility that rounding errors during the course of the calculations might lead to slightly inaccurate double precision results that would turn out not to match anyway. Floating point numbers are difficult to work with in this way.

          You might want to consider combining these files in a different way that does not require exact matching of longnum and latnum. Instead, identify some threshold level of disagreement that is acceptable in a match, and then use -rangejoin- to create this approximate match on longnum, and then -keep- only the pairs that also agree on latnum to this acceptable amount. -rangejoin- is written by Robert Picard and is available from SSC. To use it, you must also install -rangestat-, by Robert Picard, Roberto Ferrer, and Nick Cox, also available from SSC.
          Last edited by Clyde Schechter; 02 Apr 2024, 11:03.

          Comment


          • #6
            Thank you so much for the detailed response Clyde Schechter. I will try the -rangejoin- option and I think that would be the best way.

            I have tried the solution suggested by Mike Lacy. It seems to work well with minimal overmatching. So I am running with this solution for now.

            I appreciate all the help!

            Comment


            • #7
              Having read Tahreen's description at #4 above, I have a response that if I'm right is important conceptually, but which probably doesn't matter operationally in the current situation.

              I have interpreted Tahreen's description at #4 to mean that there is: 1) A file with observations on individuals, for which the key variables are combinations of Lat/Long and bdate; and 2) A file with measures of heat with the same key variables. The goal is to attach the proper heat measures from file 2 onto each corresponding observation in file 2.

              If that is the correct understanding at a molar level-- and I'm not sure it is -- what is desired here *conceptually* would be either: a) 1:1 -merge-, if there is only one instance of each Lat/Long/bdate in file 1; or b) an m:1 -merge- if there are potentially multiple individuals with each Lat/Long/bdate in file 1.

              That's the conceptual point, and *if* I'm right, then -joinby- was *not* the right way to think about this problem, and understanding that might help Tahreen in the future. However, thinking now operationally, the fact that exact matches on Lat/Long are fraught with potential for precision errors, Clyde's suggestion to use -rangejoin- is the right approach. There is an old maxim in data processing, namely "Never compare floating point numbers for equality," and the current problem appears to resonate with that.

              I once had a situation similar to the current one, namely to attach temperature data from one file, keyed via Lat/Long, to a relatively large number of geographical locations in another file. In that situation, I used the user-written program -geonear- (-ssc describe geonear-) to put the temperature data from the closest 5 or so locations onto the other file. While this approach is likely not applicable or preferable in the current situation, it's similar enough that it might help a future user.

              Comment

              Working...
              X