Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • -fillin- command doesn't change anything in my dataset

    Dear Statalist,

    I am trying to rectangularize my dataset based on two variables source and target. In other words, I would like each observation of my dataset to be one and only one possible combination between my source and my target variables. I understood that the command -fillin- does this quite simply, and indeed, I wrote :

    Code:
    fillin source target
    A new _fillin variable has been generated, however it never takes the value 1 and nothing has changed in my dataset. I checked if it could be a number of observation problem, but the number of expected values on my target variable is very small, not more than 80, so at best there should be 80*80 observations which is totally fine. Please find an example of my dataset before the command and after it has been sorted by source and target :

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str3(source target) long conn
    "13" "0"  1
    "13" "1"  1
    "13" "10" 1
    "13" "11" 1
    "13" "12" 1
    "13" "13" 1
    "13" "14" 1
    "13" "15" 1
    "13" "16" 1
    "13" "17" 2
    "13" "18" 2
    "13" "19" 1
    "13" "2"  1
    "13" "20" 2
    "13" "21" 1
    "13" "22" 2
    "13" "23" 1
    "13" "24" 1
    "13" "25" 1
    "13" "26" 1
    end
    label values conn connec
    label def connec 1 "No", modify
    label def connec 2 "Yes", modify
    As you can see, my dataset starts directly with 13 values for source when I was expecting it to start with 0, 0 / 0, 1 / 0, 10, etc. I used this command in the past and I don't remember having this problem, so it's probably a typo or a data structure problem coming from me. In that case I would appreciate some help on the matter.

    I don't know if this is a string problem, but if I'm dealing with string variables on this example it's because some other files have letters in their source/target variable and I'm using a loop.

    Many thanks!

    EDITED : I changed my dataex example because it was confusing. To be clear, the numbers displayed in source and target refer to the same unit in a network. Therefore there should never be more than n*n observations. There are n possible values in target, but a subset of N in source.
    Last edited by Adam Sadi; 07 Feb 2023, 05:28.

  • #2
    There is no typo or data structure problem here.

    Simply, fillin does not know that you expect the variable to start at 0, or 00, or wherever it should start -- or where it should finish It relies entirely on what is already present in the data.

    See https://www.stata-journal.com/articl...article=dm0011 which goes a bit beyond the help file -- but is referenced in the documentation.
    Something like this may be closer to what you need.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str3(source target) long conn
    "13" "10" 1
    "13" "11" 1
    "13" "12" 1
    "13" "13" 1
    "13" "14" 1
    "13" "15" 1
    "13" "16" 1
    "13" "17" 2
    "13" "18" 2
    "13" "19" 1
    "13" "20" 2
    "13" "21" 1
    "13" "22" 2
    "13" "23" 1
    "13" "24" 1
    "13" "25" 1
    "13" "26" 1
    end
    label values conn connec
    label def connec 1 "No", modify
    label def connec 2 "Yes", modify
    
    local N1 = _N
    local N2 = _N + 80
    set obs `N2'
    replace source = strofreal(_n - `N1' - 1, "%02.0f") if _n > `N1'
    replace target = source if _n >  `N1'
    fillin source target
    
    sort source target
    
    list in 1/20
    
    list if conn < .
    
    tab1 *
    Last edited by Nick Cox; 07 Feb 2023, 05:53.

    Comment


    • #3
      Nick : I think I understood how the command works thanks to your code. If I understood correctly, if want a "square" network dataset of n*n possible interactions, every individual must appear at least once in every variable specified for the fillin. For instance,

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str10(fruit1 fruit2)
      "apple" "apple" 
      "apple" "orange"
      "apple" "pear"  
      "pear"  "apple" 
      "pear"  "orange"
      "pear"  "pear"  
      end
      isn't suited for a fillin command, but

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str10(fruit1 fruit2)
      "apple"  "orange"
      "pear"   "apple" 
      "orange" "pear"  
      end
      this dataset is. Got it! I will try working around your code because the "80" must vary across datasets. Thank you for your help.

      Comment


      • #4
        I'd describe the need a bit differently than what Adam said in #3: In this case, one needs two variables each of which has all the names (values), but not necessarily all the pairs. In this context, the uncommonly used command -stack- offers an alternative approach that might be intuitive.
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str10(fruit1 fruit2)
        "apple" "apple"
        "apple" "orange"
        "apple" "pear"  
        "pear"  "apple"
        "pear"  "orange"
        "pear"  "pear"  
        end
        //
        // Obtain one variable that lists all the fruitnames and clean up.
        drop _stack
        duplicates drop fruitname, force
        //
        // Another variable with all the fruitnames
        gen fruitname2 = fruitname1
        //
        // Create all fruitname pairs
        fillin fruitname1 fruitname2
        //
        // Presuming pair order doesn't matter
        keep if fruitname1 <= fruitname2

        Comment

        Working...
        X