Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replication study (sample selection)

    Hi everyone,

    I have to make a replication study for my Master, but I'm a beginner with Stata. I have some problems with the sample selection (I don't know if I do everything correct).
    In the paper they said this:
    We begin by gathering all observations on Compustat with non-missing values of the amount of the increase to UTB from current year positions (TXTUBPOSINC) aggregated over the years t-4 to t, and average total assets (AT) over the years t-4 to t greater than $10 million, with t ranging from 2012 to 2014. We exclude observations that have negative pretax earnings (PI) aggregated over the period t-4 to t, and observations with missing values of cash tax paid (TXPD). These screens are necessary to have interpretable cash effective tax rates. In addition, we remove firms that are incorporated outside of the U.S., and firms that are missing any of the other variables used in the regressions. Finally, we exclude firms that had no additions to their UTB from current year positions (TXTUBPOSINC) over the period t-4 to t. After applying these screens, the main sample used in our tests has 1,896 observations corresponding to 861 firms.

    All the variables in Stata I need:
    • Gvkey
    • Fyear
    • Country
    • AT (Total assets)
    • PI (Pretax income)
    • Sales
    • TXPD (income taxes paid)
    • TXTUBPOSINC (Increase- current tax position)
    • SPI (Special item)
    • DLTT (Long-Term Debt – Total)
    • DLC (Debt in Current Liabilities – Total)
    • TLCF (Tax Loss Carry Forward)
    I have created this on Stata but I don't know if this is correct:

    keep if !missing(txtubposinc) & !missing(at)
    keep if txtubposinc > 0

    sort gvkey year
    by gvkey: egen avg_AT_08_12 = mean(at) if year >= 2008 & year <= 2012
    by gvkey: egen avg_AT_09_13 = mean(at) if year >= 2009 & year <= 2013
    by gvkey: egen avg_AT_10_14 = mean(at) if year >= 2010 & year <= 2014
    keep if avg_AT_08_12 > 10000000 | avg_AT_09_13 > 10000000 | avg_AT_10_14 > 10000000

    keep if pi > 0
    drop if missing(txpd)
    drop if country != "USD"

    drop if missing(dlc) | missing(dltt) | missing(sale) | missing(spi) | missing(tlcf)

    I was thinking that I maybe need a lagged variable for the missing observations of some variables, but I'm not sure.
    If someone can help me, that would be perfect.
Working...
X