Hello,
I read stata manuals and "An introduction to Survival Analysis Using Stata" book. Yet, I am still hesitant about which data format I should use for survival analysis.
I have a dataset where I observe purchase of different items. For those items, dataset includes variables such as prices at which products are sold, market, product code, the number of days that product is online, number of days after the last discount.
If a product is sold without a discount,
is NULL.
When stset, should I create two seperate observations for items that are sold with a discount i.e. in its first observation I will use the difference of [CODE]days_online-n_days_after_discount/CODE] as my
time variable and failure will be set to be 0, and for the second observation, I will use [CODE]n_days_after_discount/CODE] as my time variable and set DV to be one.
Or I can just simply use n_days_after_discount as time variable for the discounted products and days_online for the products that are sold without a discount?
I am running these regressions seperately for each market by using following code:
At the end, I want to calculate weights of quantity sold for given duration, let's say hundred days. Do you have any idea about how to do it ?
I read stata manuals and "An introduction to Survival Analysis Using Stata" book. Yet, I am still hesitant about which data format I should use for survival analysis.
I have a dataset where I observe purchase of different items. For those items, dataset includes variables such as prices at which products are sold, market, product code, the number of days that product is online, number of days after the last discount.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str4 n_days_after_discount int days_online long(brand product_code) float lnp str2 market "NULL" 81 16084 11000002 5.129899 "FR" "NULL" 96 16084 11000002 5.129899 "FR" "NULL" 130 16084 11000002 5.068904 "FR" "23" 156 16084 11000002 4.890349 "IT" "23" 156 16084 11000002 4.820282 "IT" "49" 182 16084 11000002 5.087596 "IT" "11" 203 16084 11000002 5.164786 "FR" "18" 223 16084 11000002 4.5849676 "FR" "37" 242 16084 11000002 4.5849676 "IT" "40" 83 16084 11000002 4.4998097 "IT" end
Code:
n_days_after_discount
When stset, should I create two seperate observations for items that are sold with a discount i.e. in its first observation I will use the difference of [CODE]days_online-n_days_after_discount/CODE] as my
time variable and failure will be set to be 0, and for the second observation, I will use [CODE]n_days_after_discount/CODE] as my time variable and set DV to be one.
Or I can just simply use n_days_after_discount as time variable for the discounted products and days_online for the products that are sold without a discount?
I am running these regressions seperately for each market by using following code:
Code:
stset days, failure(q) stcox lnp, strata(product_code) nolog vce(cluster product_code)
Comment