Hi,
I am using Stata/SE 13.0 on Windows.
I am struggling with vertically combining string variables into a single string (combining several ‘observations’ into a single one). I have text data that I had to import into stata in a way that puts each line of text as a separate observation. This is both useful for isolating certain lines of text that I need to isolate (so I do not want to merge all text into a single observation before importing into stata), but also bad because I do have to now combine some of the lines back together.
I am trying to combine them using the following command (and a loop around it, as I will describe shortly):
gen var2 = var1[_n]+var1[_n+1]
This works fine, but the problem is that the number of lines of text that need to be combined vary unpredictably. I want to combine all lines of text until an empty line is encountered, after which I want to start over, as there are many such lines. In other words, my data looks like var1 and I want to create a new variable, var2, that will look as follows, without knowing how many lines will need to be added in each case (here I first need to combine 3 lines and then 4 lines):
Var 1 Var 2
Some text describing Some text describing something but only until some point
something but only until
some point
Then again some text Then again some text Describing something But this time it runs until Line four
Describing something
But this time it runs until
Line four
Is there a way to do this with a loop? E.g. ‘carry out gen var2 = var1[_n]+var1[_n+1]+…+var1[_n+X] if observations 1-X are non-empty and observation X+1 is empty’.
I think I need a loop with a stopping rule, but I do not know how to write one that will do exactly this.
Thank you so much for all your help in advance!!
All the best,
Victoria
I am using Stata/SE 13.0 on Windows.
I am struggling with vertically combining string variables into a single string (combining several ‘observations’ into a single one). I have text data that I had to import into stata in a way that puts each line of text as a separate observation. This is both useful for isolating certain lines of text that I need to isolate (so I do not want to merge all text into a single observation before importing into stata), but also bad because I do have to now combine some of the lines back together.
I am trying to combine them using the following command (and a loop around it, as I will describe shortly):
gen var2 = var1[_n]+var1[_n+1]
This works fine, but the problem is that the number of lines of text that need to be combined vary unpredictably. I want to combine all lines of text until an empty line is encountered, after which I want to start over, as there are many such lines. In other words, my data looks like var1 and I want to create a new variable, var2, that will look as follows, without knowing how many lines will need to be added in each case (here I first need to combine 3 lines and then 4 lines):
Var 1 Var 2
Some text describing Some text describing something but only until some point
something but only until
some point
Then again some text Then again some text Describing something But this time it runs until Line four
Describing something
But this time it runs until
Line four
Is there a way to do this with a loop? E.g. ‘carry out gen var2 = var1[_n]+var1[_n+1]+…+var1[_n+X] if observations 1-X are non-empty and observation X+1 is empty’.
I think I need a loop with a stopping rule, but I do not know how to write one that will do exactly this.
Thank you so much for all your help in advance!!
All the best,
Victoria
Comment