Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Split my sample and do t-test

    Hello,
    I need a help to group/segregate my data as next:
    I am working on a cross-sectional data over the period (2011-2015) using around 21 variables (I am testing a variable called CSR before and after new regulations launched in 2014) and I need to split my sample for two groups using benchmark year (2014), on other words, I want to run t-test to compare the first sample (group of the years 2011and 2012) with the second sample (group of the years 2014 and 2015).
    I know that I need to run the paired t-test but to do that I have to split the sample first to be pre and post the new regulations depending on the year.
    I run the following code to split the sample :

    "gen post = csr if inrange (year,2014,2015)"
    "gen pre = csr if inrange (year,2011,2012)"

    but when I tried to run the paired t-test using the following code it says that no observations:
    "ttest pre==post"

    I am stuck at this point and I need a professional advice, please

  • #2
    Ruba:
    some comments on your query.
    Stata reports no observations because when one dummy has a value, the other dummy is actually missing.
    Please note that Stata omit observations with missing data in any variable.
    The simplest approach that spring to my mind is:
    Code:
    gen pre_post=0 if year<=2012
    replace pre_post=1 if year>2012
    ttest CSR, by(pre_post) unequal
    Warning: code not test.ed
    Last edited by Carlo Lazzaro; 30 Jan 2018, 03:32.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thanks Carlo for that comprehensive answer.

      I have a similar issue and am not sure whether to use the prtest or ztest command:

      Data: Representative sample survey data on firms (almost exclusively binary & indicator variables)
      Variables of interest (have many more, but these two as an example):
      1. Industry (I have created a dummy for each of the 6 industries
      2. Mortgage (0 respectively 1 if a firm has no / has a mortgage)
      Groups: I split the sample into two groups:
      1. Firm applied for external finance
      2. did not apply for external finance
      Now I'm interested whether Firms exhibit differences across the 2 groups regarding the 2 variables of interest (This is my 1st step; in a 2nd step I will run a heckprobit with the selection variable "needing finance").

      After researching on what test is appropriate, I run both a prtest and ztest for this issue. Still I am unsure on which to choose, confidence bands are much narrower in the prtest (see attachment). Why is that, what am I not understanding?

      Click image for larger version

Name:	Bildschirmfoto 2018-01-30 um 11.53.01.png
Views:	1
Size:	51.6 KB
ID:	1428028

      Thanks for your precious time!
      Reto

      Comment


      • #4
        Dear Carlo, Thank you so much for the help the code works perfectly

        Comment


        • #5
          Re #3: The use of -ztest- is not appropriate for this data. -ztest- should be used to test the equality of means from two samples drawn from populations with known standard deviations--and you don't have that here, or if you do, you have not described that in your question.

          If you had those standard deviations, they must be specified in options for the -ztest- command. You don't show the command you used, but based on the output, I infer that you specified -sd(1)-, or specified nothing at all and allowed -sd()- to default to 1, is not even a possible value of the standard deviation for indicator variables. In any case, it is not really even a question of specifying the right values for -sd()- here: you don't really have known population standard deviations for this data anyway.

          Comment


          • #6
            Reto:
            welcome to the list.
            Two asides to Clyde helpful advice:
            -why creeating yourself dummies when in -long- format you can exploit the wonderful capabilities of -fvvarlist-?
            -if you are dealing with survey data, any inferencer should be -svy- prefixed (see -help svy-).
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Thank you very much Clyde & Carlo for your immediate & helpful reply.
              So prtest is the way to go then? (seems identical to ttest regarding the p-values)
              Also - assuming results being identical with a ttest - I have just found the convenient package -ttable2- (by Xuan Zhang & Chuntao Li), which allows me to test several variables simultaneously and produces a nice table:
              Click image for larger version

Name:	Bildschirmfoto 2018-01-31 um 10.49.58.png
Views:	1
Size:	49.1 KB
ID:	1428191

              .
              .
              .

              -why creeating yourself dummies when in -long- format you can exploit the wonderful capabilities of -fvvarlist-?
              -if you are dealing with survey data, any inferencer should be -svy- prefixed (see -help svy-).
              This are 2 very good points. For my regressions I am indeed using the i.-prefix for my factor variables like industry and the svy prefix to weight the observations. So I have been scratching my head over how to apply this for my descriptive stats & tests.
              • With tabstat -svy- is not allowed, but i am using frequency weights instead: tabstat Ind* Size* [fw=myweight] ,stat(mean sd) col(stat)
              • factor variables are not feasible either with tabstat, this was the reason I created those dummies for industry, size, etc.
              • I did not find any option to use weights (or factor variables) while testing. The only workaround I've found so far is to run a regression for each individual variable of interest. But I was hoping there would be a more elegant way?
              Many Thanks
              Reto

              Comment


              • #8
                Hi Reto,
                I ran ttable2 codes introduced by Xuan Zhang & Chuntao Li, but in my output, the significant level is not determined by the star (*). I see your output has the star sign.
                Could you please let me know how did you determine the star sign or significant level?
                My output is attached.
                Attached Files

                Comment

                Working...
                X