Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • help with strmatch to select firm-years that meet certain conditions

    Dear statalist,

    I have a panel data from 2010-2019, the variables are Year, Symbol, disclosure and dlist. The disclosure variable takes 3 values, 0, 1, and 2; I create a dlist variable, which aggregates the value of disclosure of a firm each year into a string variable. So what I want is to create an indicator variable, one_zero, which equals 1 if, for a firm in a given year, the value of disclosure switches from 1 to 0. I require before the switch, the firm is always with disclosure==1, and after it switches from 1 to 0, it will keep disclosure==0 for the rest of the sample period. The indicator variable equals 0 for all other cases. Here are some example data
    Code:
    Year    Symbol    disclosure    dlist
    2010    900    1    1111100000
    2011    900    1    1111100000
    2012    900    1    1111100000
    2013    900    1    1111100000
    2014    900    1    1111100000
    2015    900    0    1111100000
    2016    900    0    1111100000
    2017    900    0    1111100000
    2018    900    0    1111100000
    2019    900    0    1111100000
    2010    903    0    0000010100
    2011    903    0    0000010100
    2012    903    0    0000010100
    2013    903    0    0000010100
    2014    903    0    0000010100
    2015    903    1    0000010100
    2016    903    0    0000010100
    2017    903    1    0000010100
    2018    903    0    0000010100
    2019    903    0    0000010100
    2010    932    1    1100000000
    2011    932    1    1100000000
    2012    932    0    1100000000
    2013    932    0    1100000000
    2014    932    0    1100000000
    2015    932    0    1100000000
    2016    932    0    1100000000
    2017    932    0    1100000000
    2018    932    0    1100000000
    2019    932    0    1100000000
    2010    961    0    0002110211
    2011    961    0    0002110211
    2012    961    0    0002110211
    2013    961    2    0002110211
    2014    961    1    0002110211
    2015    961    1    0002110211
    2016    961    0    0002110211
    2017    961    2    0002110211
    2018    961    1    0002110211
    2019    961    1    0002110211
    So in the example above, the 2 bolded firm-year should be coded 1 for one_zero. Basically, the firms that meet my desired condition would have the form 111000, the firm can have different numbers of 1s before switching to 0, and can have different numbers of 0s after switching to 0. I believe -strmatch- can do this, can anyone kindly help me with this? Thanks a lot!

  • #2
    Code:
    gen wanted = strmatch(dlist,"1*0*") & disclosure<disclosure[_n-1]

    Comment


    • #3
      Hi Øyvind,

      Thanks a lot for your kind help. I tried your code and I think there is some minor problem. I found that for the following firm-year observations, they are wrongly coded as 1. I only want the wanted variable to code 1 for firms that switch once from disclosure==1 to disclosure==0 in the sample period, not firms that switch from 1 to 0 then to 1 again (and possibly then to 0 again, such as firm with symbol 707). None of the firms listed below should be coded 1 for wanted. Sorry that I didn't make this clear in #1. Any suggestion on how this can be taken into account? Thanks!

      Code:
      Year    Symbol    disclosure    dlist    wanted
      2010    2174    1    1111110111    0
      2011    2174    1    1111110111    0
      2012    2174    1    1111110111    0
      2013    2174    1    1111110111    0
      2014    2174    1    1111110111    0
      2015    2174    1    1111110111    0
      2016    2174    0    1111110111    1
      2017    2174    1    1111110111    0
      2018    2174    1    1111110111    0
      2019    2174    1    1111110111    0
      2010    707    1    1000001000    0
      2011    707    0    1000001000    1
      2012    707    0    1000001000    0
      2013    707    0    1000001000    0
      2014    707    0    1000001000    0
      2015    707    0    1000001000    0
      2016    707    1    1000001000    0
      2017    707    0    1000001000    1
      2018    707    0    1000001000    0
      2019    707    0    1000001000    0
      2010    564    1    1000001111    0
      2011    564    0    1000001111    1
      2012    564    0    1000001111    0
      2013    564    0    1000001111    0
      2014    564    0    1000001111    0
      2015    564    0    1000001111    0
      2016    564    1    1000001111    0
      2017    564    1    1000001111    0
      2018    564    1    1000001111    0
      2019    564    1    1000001111    0
      Edited:
      I found another place that might be wrong, such as the bolded one below. Only firm with symbol 600487 in 2014 should have wanted==1.
      Code:
      Year    Symbol    disclosure    dlist    wanted
      2010    600487    1    1111000000    1
      2011    600487    1    1111000000    0
      2012    600487    1    1111000000    0
      2013    600487    1    1111000000    0
      2014    600487    0    1111000000    1
      2015    600487    0    1111000000    0
      2016    600487    0    1111000000    0
      2017    600487    0    1111000000    0
      2018    600487    0    1111000000    0
      2019    600487    0    1111000000    0
      Last edited by Flora Yin; 24 Apr 2022, 04:13.

      Comment


      • #4
        I suppose there is a way to do this relying on -strmatch()- or perhaps it requires regular expressions. But it can be done without the string variable dlist at all, and I think this code is simpler:

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input int year long symbol byte disclosure str10 dlist
        2010    900 1 "1111100000"
        2011    900 1 "1111100000"
        2012    900 1 "1111100000"
        2013    900 1 "1111100000"
        2014    900 1 "1111100000"
        2015    900 0 "1111100000"
        2016    900 0 "1111100000"
        2017    900 0 "1111100000"
        2018    900 0 "1111100000"
        2019    900 0 "1111100000"
        2010    903 0 "0000010100"
        2011    903 0 "0000010100"
        2012    903 0 "0000010100"
        2013    903 0 "0000010100"
        2014    903 0 "0000010100"
        2015    903 1 "0000010100"
        2016    903 0 "0000010100"
        2017    903 1 "0000010100"
        2018    903 0 "0000010100"
        2019    903 0 "0000010100"
        2010    932 1 "1100000000"
        2011    932 1 "1100000000"
        2012    932 0 "1100000000"
        2013    932 0 "1100000000"
        2014    932 0 "1100000000"
        2015    932 0 "1100000000"
        2016    932 0 "1100000000"
        2017    932 0 "1100000000"
        2018    932 0 "1100000000"
        2019    932 0 "1100000000"
        2010    961 0 "0002110211"
        2011    961 0 "0002110211"
        2012    961 0 "0002110211"
        2013    961 2 "0002110211"
        2014    961 1 "0002110211"
        2015    961 1 "0002110211"
        2016    961 0 "0002110211"
        2017    961 2 "0002110211"
        2018    961 1 "0002110211"
        2019    961 1 "0002110211"
        2010   2174 1 "1111110111"
        2011   2174 1 "1111110111"
        2012   2174 1 "1111110111"
        2013   2174 1 "1111110111"
        2014   2174 1 "1111110111"
        2015   2174 1 "1111110111"
        2016   2174 0 "1111110111"
        2017   2174 1 "1111110111"
        2018   2174 1 "1111110111"
        2019   2174 1 "1111110111"
        2010    707 1 "1000001000"
        2011    707 0 "1000001000"
        2012    707 0 "1000001000"
        2013    707 0 "1000001000"
        2014    707 0 "1000001000"
        2015    707 0 "1000001000"
        2016    707 1 "1000001000"
        2017    707 0 "1000001000"
        2018    707 0 "1000001000"
        2019    707 0 "1000001000"
        2010    564 1 "1000001111"
        2011    564 0 "1000001111"
        2012    564 0 "1000001111"
        2013    564 0 "1000001111"
        2014    564 0 "1000001111"
        2015    564 0 "1000001111"
        2016    564 1 "1000001111"
        2017    564 1 "1000001111"
        2018    564 1 "1000001111"
        2019    564 1 "1000001111"
        2010 600487 1 "1111000000"
        2011 600487 1 "1111000000"
        2012 600487 1 "1111000000"
        2013 600487 1 "1111000000"
        2014 600487 0 "1111000000"
        2015 600487 0 "1111000000"
        2016 600487 0 "1111000000"
        2017 600487 0 "1111000000"
        2018 600487 0 "1111000000"
        2019 600487 0 "1111000000"
        end
        
        
        by symbol (year), sort: gen run_num = sum(disclosure != disclosure[_n-1])
        by symbol (year): gen ones_zeroes = (run_num[_N] == 2 & disclosure[1] == 1 ///
            & disclosure[_N] == 0)
        by symbol run_num (year), sort: gen byte wanted = (ones_zeroes & run_num == 2 ///
            & _n == 1)
        In the future, when showing data examples, please use the -dataex- command to do so, as I have here. (It took me much more time to wrestle your example data into a usable Stata data set than it took me to write and test the solution code! -dataex- output would have saved me all that excess time and effort.) If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

        Comment


        • #5
          Hi Clyde,

          Thanks a lot for your kind help! My apologies that I didn't use dataex to show example data, and sorry to cause you inconvenience; I'll make sure to use it next time.

          Comment

          Working...
          X