Dear Statalist,
I am trying to rectangularize my dataset based on two variables source and target. In other words, I would like each observation of my dataset to be one and only one possible combination between my source and my target variables. I understood that the command -fillin- does this quite simply, and indeed, I wrote :
A new _fillin variable has been generated, however it never takes the value 1 and nothing has changed in my dataset. I checked if it could be a number of observation problem, but the number of expected values on my target variable is very small, not more than 80, so at best there should be 80*80 observations which is totally fine. Please find an example of my dataset before the command and after it has been sorted by source and target :
As you can see, my dataset starts directly with 13 values for source when I was expecting it to start with 0, 0 / 0, 1 / 0, 10, etc. I used this command in the past and I don't remember having this problem, so it's probably a typo or a data structure problem coming from me. In that case I would appreciate some help on the matter.
I don't know if this is a string problem, but if I'm dealing with string variables on this example it's because some other files have letters in their source/target variable and I'm using a loop.
Many thanks!
EDITED : I changed my dataex example because it was confusing. To be clear, the numbers displayed in source and target refer to the same unit in a network. Therefore there should never be more than n*n observations. There are n possible values in target, but a subset of N in source.
I am trying to rectangularize my dataset based on two variables source and target. In other words, I would like each observation of my dataset to be one and only one possible combination between my source and my target variables. I understood that the command -fillin- does this quite simply, and indeed, I wrote :
Code:
fillin source target
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str3(source target) long conn "13" "0" 1 "13" "1" 1 "13" "10" 1 "13" "11" 1 "13" "12" 1 "13" "13" 1 "13" "14" 1 "13" "15" 1 "13" "16" 1 "13" "17" 2 "13" "18" 2 "13" "19" 1 "13" "2" 1 "13" "20" 2 "13" "21" 1 "13" "22" 2 "13" "23" 1 "13" "24" 1 "13" "25" 1 "13" "26" 1 end label values conn connec label def connec 1 "No", modify label def connec 2 "Yes", modify
I don't know if this is a string problem, but if I'm dealing with string variables on this example it's because some other files have letters in their source/target variable and I'm using a loop.
Many thanks!
EDITED : I changed my dataex example because it was confusing. To be clear, the numbers displayed in source and target refer to the same unit in a network. Therefore there should never be more than n*n observations. There are n possible values in target, but a subset of N in source.
Comment