Dear Statafriends,
I got a tricky issue. The thing is, I have to create new observations, not variables.
Obviously, creating new observations in Stata is neither comfortable nor does it make sense in most occassions, but this time it is necessary.
I already found this helpful thread, but my issue is more complex: http://www.stata.com/statalist/archi.../msg00875.html
My Data
My Dataset
What do I have to do?
Create 2 new observations, before the first month_id of each company_id.
It does not matter what happens with uniq_company no_company, they are only support-variables,
but I have to create two new (previous) month_id s for each company.
Why do I need this?
The intuition is, that I use this list as a criterion which is merged with other datasets and defines whether to discard a specific company_id in a specific month.
Unfortunately, the current, non-adjusted data would lag everything by 2 months, which implies that my list would tell the other dataset to exclude some months that would be necessary, creating a bias in my subsequently used data.
My Approach up to now
First, created the necessary number of empty cells for at least 1 month [set obs `=_N+no_company'] (how can I double that number?).
Then I tried to fill the empty cells with company_id and month_id less 1 month. I tried every way I could think of, this seemed to be the most promising, but it just does not work.
Any suggestions would be highly welcome!
Thank you,
Carlos
I got a tricky issue. The thing is, I have to create new observations, not variables.
Obviously, creating new observations in Stata is neither comfortable nor does it make sense in most occassions, but this time it is necessary.
I already found this helpful thread, but my issue is more complex: http://www.stata.com/statalist/archi.../msg00875.html
My Data
- I have panel data, identified by company_id and month_id.
- uniq_company: unique value, one for each company_id
- no_company: total number of distinct company_ids
My Dataset
Code:
clear input double company_id float(month_id uniq_company no_company) 10001 337 10001 11306 10001 338 . 11306 10001 339 . 11306 10001 340 . 11306 10001 341 . 11306 10001 342 . 11306 10001 343 . 11306 10001 344 . 11306 10001 345 . 11306 10001 346 . 11306 10001 347 . 11306 10001 348 . 11306 10001 349 . 11306 10002 414 10002 11306 10002 415 . 11306 10002 416 . 11306 10002 417 . 11306 10002 418 . 11306 10002 419 . 11306 10002 420 . 11306 10002 421 . 11306 10002 422 . 11306 10002 423 . 11306 10002 424 . 11306 10002 425 . 11306 10002 426 . 11306 10002 427 . 11306 10002 428 . 11306 10002 429 . 11306 10003 426 10003 11306 10003 427 . 11306 10003 428 . 11306 10003 429 . 11306 10003 430 . 11306 10003 431 . 11306 end format %tm month_id
Create 2 new observations, before the first month_id of each company_id.
It does not matter what happens with uniq_company no_company, they are only support-variables,
but I have to create two new (previous) month_id s for each company.
Code:
clear input double company_id float(month_id uniq_company no_company) 10001 335 . 11306 10001 336 . 11306 10001 337 10001 11306 10001 338 . 11306 10001 339 . 11306 . . . . 10002 412 . 11306 10002 413 . 11306 10002 414 10002 11306 10002 415 . 11306 10002 416 . 11306 end format %tm month_id
Why do I need this?
The intuition is, that I use this list as a criterion which is merged with other datasets and defines whether to discard a specific company_id in a specific month.
Unfortunately, the current, non-adjusted data would lag everything by 2 months, which implies that my list would tell the other dataset to exclude some months that would be necessary, creating a bias in my subsequently used data.
My Approach up to now
First, created the necessary number of empty cells for at least 1 month [set obs `=_N+no_company'] (how can I double that number?).
Then I tried to fill the empty cells with company_id and month_id less 1 month. I tried every way I could think of, this seemed to be the most promising, but it just does not work.
Code:
sort company_id month_id generate uniq_company = . by company_id: replace uniq_company = company_id if _n==1 egen no_company = count(uniq_company) set obs `=_N+no_company' by company_id month_id: replace company_id = uniq_company[_n==1] if missing(company_id) by company_id month_id: replace month_id = month_id[_n==1] - 1 if missing(month_id)
Any suggestions would be highly welcome!
Thank you,
Carlos
Comment