Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Dear Andrew, Suppose that I save county and county+town information in two separate Stata data file as attached. county.dta and county-town.dta. I have the same addresses to split.
    Code:
    clear
    input str144 t_addr2
    "台北市市民大道三段2號5樓"                          
    "台北市南港區園區街3之1號G棟8樓"                  
    "508 彰化縣 和美鎮 西鄉路161巷2號"                  
    "302 新竹縣竹北市文興路一段372號"                  
    "320 桃園市 中壢區 桃園市中壢區中山路201號4樓"
    end
    Do I need use frame to do this, and how? Thanks.
    Attached Files
    Ho-Chuan (River) Huang
    Stata 17.0, MP(4)

    Comment


    • #17
      For the 239th observation, it should be
      Code:
      臺南市 新市區
      。Thus, I guess we should use your prior suggestion
      Code:
      gen county = ustrregexs(0) if ustrregexm(" " + t_addr2 + " ", "(`counties')")
      to extract cities/counties (instead of gen county = ustrregexra(countytown,"(.*縣|.*市)?(.*)?", "$1")). Any comments?
      I agree with your suggested solution. As we have the list of counties, there is no need to extract them using regular expressions and we avoid misclassifications as in observation 239. With the counties and towns in the datasets, there is no need for frames. Here is the full code using your datasets in #16:

      Code:
      use "county.dta", clear
      levelsof county, local(counties) sep(|) clean
      clear
      input str144 t_addr2
      "台北市市民大道三段2號5樓"                          
      "台北市南港區園區街3之1號G棟8樓"                  
      "508 彰化縣 和美鎮 西鄉路161巷2號"                  
      "302 新竹縣竹北市文興路一段372號"                  
      "320 桃園市 中壢區 桃園市中壢區中山路201號4樓"
      end
      gen county = ustrregexs(0) if ustrregexm(" " + t_addr2 + " ", "(`counties')")
      preserve
      use "county-town.dta", clear
      gen county = ustrregexs(0) if ustrregexm(" " + countytown+ " ", "(`counties')")
      gen town = ustrregexra(countytown,county, "")
      list in 239
      levelsof town, local(towns) sep(|) clean
      restore
      gen town = ustrregexs(0) if ustrregexm(t_addr2, "(`towns')")
      gen road= ustrregexra(ustrregexra(ustrregexra(t_addr2, county, "", 1), town, "", 1), "^([0-9]*)(.*街|.*路|.*大道)?(.*)?", "$2")
      Res.:

      Code:
      . list in 239
      
           +--------------------------------+
           |   countytown   county     town |
           |--------------------------------|
      239. | 臺南市新市區   臺南市   新市區 |
           +--------------------------------+
      
      . l
      
           +----------------------------------------------------------------------------+
           |                                      t_addr2   county     town        road |
           |----------------------------------------------------------------------------|
        1. |                     台北市市民大道三段2號5樓   台北市             市民大道 |
        2. |               台北市南港區園區街3之1號G棟8樓   台北市   南港區      園區街 |
        3. |             508 彰化縣 和美鎮 西鄉路161巷2號   彰化縣   和美鎮      西鄉路 |
        4. |              302 新竹縣竹北市文興路一段372號   新竹縣   竹北市      文興路 |
        5. | 320 桃園市 中壢區 桃園市中壢區中山路201號4樓   桃園市   中壢區      中山路 |
           +----------------------------------------------------------------------------+

      Comment


      • #18
        Dear Andrew, Thanka again for your very useful suggestions.
        Ho-Chuan (River) Huang
        Stata 17.0, MP(4)

        Comment

        Working...
        X