Univariate T-test

Jens Heinrich

Join Date: Oct 2015

Posts: 11
#1

Univariate T-test

16 Oct 2015, 06:14

Hello, II want to run a command in stata thats allwos me to do a univariate analysis on my variables. I want to show the mean for all independent variables when the dependent variable is =1 and I also want to show the mean for all independent variables when the dependent variable is = 0.

My problem in stata is when I want to show the significance levels based on the t-statistics results. The command I am looking for is will create a T-statistics results with the given variables and their significance levels. I have tried several things, but I am not able to execute it properly. Thus asking for help here on this great forum. This is, indeed, a fairly easy question, but I am new to Stata (previously used R and SPPS) and find myself stuck. I also want to transition more into using Stata as I find this program more adequate for my purpose.

The table I am looking for will look something like this from the article by Dedman, Lennox and Pitman 2014. "Demand for audits in private firms".

Best regards, Jens H.

1 Photo
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#2

16 Oct 2015, 08:23

There are a variety of ways to do this. One using postfile is illustrated below. (The do-file is attached if you want to explore this approach further.) For further information, type

Code:

help postfile help list

in Stata's command line.

.ÿversionÿ14.0

.ÿ
.ÿclearÿ*

.ÿsetÿmoreÿoff

.ÿsysuseÿauto
(1978ÿAutomobileÿData)

.ÿ
.ÿtempfileÿtmpfil0

.ÿtempnameÿfile_handle

.ÿ
.ÿpostfileÿ`file_handle'ÿstr244ÿitemÿdouble(average_0ÿaverage_1ÿt)ÿusingÿ`tmpfil0'

.ÿ
.ÿforeachÿvarÿofÿvarlistÿpriceÿmpgÿheadroomÿweightÿ{
ÿÿ2.ÿÿÿÿÿÿÿÿÿlocalÿvarlabelÿ:ÿvariableÿlabelÿ`var'
ÿÿ3.ÿÿÿÿÿÿÿÿÿquietlyÿttestÿ`var',ÿby(foreign)
ÿÿ4.ÿÿÿÿÿÿÿÿÿpostÿ`file_handle'ÿ("`varlabel'")ÿ(r(mu_1))ÿ(r(mu_2))ÿ(r(t))
ÿÿ5.ÿ}

.ÿ
.ÿpostcloseÿ`file_handle'

.ÿuseÿ`tmpfil0',ÿclear

.ÿ
.ÿlabelÿvariableÿitemÿItem

.ÿlabelÿvariableÿaverage_0ÿ"Foreignÿ=ÿ0"

.ÿlabelÿvariableÿaverage_1ÿ"Foreignÿ=ÿ1"

.ÿlabelÿvariableÿtÿ"tÿStatistic"

.ÿforeachÿvarÿofÿvarlistÿ_allÿ{
ÿÿ2.ÿÿÿÿÿÿÿÿÿlocalÿvarlabelÿ:ÿvariableÿlabelÿ`var'
ÿÿ3.ÿÿÿÿÿÿÿÿÿcharÿdefineÿ`var'[varname]ÿ`varlabel'
ÿÿ4.ÿ}

.ÿ
.ÿformatÿaverage_0-tÿ%8.2f

.ÿlist,ÿnoobsÿsubvarnameÿseparator(0)ÿabbreviate(15)

ÿÿ+----------------------------------------------------------+
ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿItemÿÿÿForeignÿ=ÿ0ÿÿÿForeignÿ=ÿ1ÿÿÿtÿStatisticÿ|
ÿÿ|----------------------------------------------------------|
ÿÿ|ÿÿÿÿÿÿÿÿÿÿPriceÿÿÿÿÿÿÿ6072.42ÿÿÿÿÿÿÿ6384.68ÿÿÿÿÿÿÿÿÿ-0.41ÿ|
ÿÿ|ÿÿMileageÿ(mpg)ÿÿÿÿÿÿÿÿÿ19.83ÿÿÿÿÿÿÿÿÿ24.77ÿÿÿÿÿÿÿÿÿ-3.63ÿ|
ÿÿ|ÿHeadroomÿ(in.)ÿÿÿÿÿÿÿÿÿÿ3.15ÿÿÿÿÿÿÿÿÿÿ2.61ÿÿÿÿÿÿÿÿÿÿ2.61ÿ|
ÿÿ|ÿÿWeightÿ(lbs.)ÿÿÿÿÿÿÿ3317.12ÿÿÿÿÿÿÿ2315.91ÿÿÿÿÿÿÿÿÿÿ6.25ÿ|
ÿÿ+----------------------------------------------------------+

.ÿ
.ÿexit

endÿofÿdo-file

.
Attached Files

Heinrich.do (764 Bytes, 1 view)
Comment
Rune Sollihagen

Join Date: Jun 2015

Posts: 29
#3

22 Oct 2015, 14:37

Hello, I have a similar problem to this case. If I want to find out whether the t-statistics are significante, how will I be able to check this in stata?
Comment
William Mensah

Join Date: May 2022

Posts: 1
#4

27 May 2022, 10:24

how to find mean differences of two variables (A and B) by high and low quartiles of another variable C
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17673

27 May 2022, 10:43

William:
welcome to this forum.
Please read and (act on) the FAQ to post more effectively. Thanks.
That said, you may want to consider something along the following lines:

Code:

use "C:\Program Files\Stata17\ado\base\a\auto.dta"
. xtile quart = trunk , nq(4)

. bysort quart: ttest price if quart<=3, by(foreign) unequal

------------------------------------------------------------------------------------------------------------------------------------------
-> quart = 1

Two-sample t test with unequal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
Domestic |      11    4326.818    251.6949    834.7776    3766.007    4887.629
 Foreign |       8    5245.875    511.8339    1447.685     4035.58     6456.17
---------+--------------------------------------------------------------------
Combined |      19    4713.789    273.3334    1191.433    4139.537    5288.042
---------+--------------------------------------------------------------------
    diff |           -919.0568     570.372                 -2183.8    345.6867
------------------------------------------------------------------------------
    diff = mean(Domestic) - mean(Foreign)                         t =  -1.6113
H0: diff = 0                     Satterthwaite's degrees of freedom =  10.3703

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0685         Pr(|T| > |t|) = 0.1371          Pr(T > t) = 0.9315

------------------------------------------------------------------------------------------------------------------------------------------
-> quart = 2

Two-sample t test with unequal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
Domestic |      10      5426.7    1168.236    3694.287    2783.966    8069.434
 Foreign |       9    7507.333    1109.595    3328.785    4948.603    10066.06
---------+--------------------------------------------------------------------
Combined |      19    6412.263    823.5962    3589.973    4681.952    8142.575
---------+--------------------------------------------------------------------
    diff |           -2080.633    1611.204                -5479.99    1318.723
------------------------------------------------------------------------------
    diff = mean(Domestic) - mean(Foreign)                         t =  -1.2914
H0: diff = 0                     Satterthwaite's degrees of freedom =  16.9991

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.1069         Pr(|T| > |t|) = 0.2139          Pr(T > t) = 0.8931

------------------------------------------------------------------------------------------------------------------------------------------
-> quart = 3

Two-sample t test with unequal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
Domestic |      20     6304.95    714.5812    3195.704    4809.314    7800.586
 Foreign |       5        6186    978.0249     2186.93    3470.568    8901.432
---------+--------------------------------------------------------------------
Combined |      25     6281.16    596.1337    2980.669      5050.8     7511.52
---------+--------------------------------------------------------------------
    diff |              118.95    1211.263               -2626.873    2864.773
------------------------------------------------------------------------------
    diff = mean(Domestic) - mean(Foreign)                         t =   0.0982
H0: diff = 0                     Satterthwaite's degrees of freedom =  8.87792

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.5380         Pr(|T| > |t|) = 0.9240          Pr(T > t) = 0.4620

Please note that the chunk of code:

Code:

if quart<=3

was added because the 4th quartile of -trunk- includes -domestic- cars only; therefore, any comparison in unfeasible.
Otherwise, the code could have stretched over the 4th quartile, too.

Kind regards,
Carlo
(Stata 19.0)

Comment

Mohamed Mahmoud

Join Date: Apr 2022
Posts: 36

05 Sep 2022, 12:45

Dear carlo,

I have two questions related to this issue,

First , how to split my sample into high debts & low debts based on the debts median , for instance , high debts ( debts > median) , low debts (debts< median)

Second , how to find mean differences for all variables ( high - low ) & t-stat ?

my data looks like

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float debts double y-variable float(ROA LEV FSIZE)
  .5504027  38.88888888888889   .07166211   .684252 10.813444
  .9041479  69.44444444444444  .019451436  .8125423 12.294147
         0                  0  .036151733  .5517906 12.306182
  .7389823 63.888888888888886   .01946511  .7916672 12.549685
         0                  0    .0426926  .5370785 12.585744
  .3450878  38.88888888888889   .08990926  .6094201  11.17907
         0                  0   .03930702  .5381348 12.821323
  .8781649 63.888888888888886   .02334722  .7889516  12.44568
  .3578038  72.22222222222221    .0932817  .6039031  11.26648
  .3673007  72.22222222222221   .10062575  .6206505 11.287158
  .7828102                 50 -.010739316  .7907212 12.585073
 .05649431                  0    .0501465  .7490519  9.374003
         0                  0   .04466254 .53169084 12.693089
         0                  0   .04104301 .53876597 12.904175
  .4299507  77.77777777777779   .09296167  .6389391 11.386666
  .6621854 61.111111111111114   .02058342  .7795188  12.52912
         0  77.77777777777779   .02915452  .7062657  11.65406
 .11937457  72.22222222222221    .0820868 .57202095  9.615317
         0                  0  .067939125  .4987089 13.052382
 1.1094313 61.111111111111114  -.01553972  .7740809 11.928183
         0                  0  .005703995 .50197375 13.052382
 .18014494  72.22222222222221   -.1598295  .7020952  8.969523
  .0717049  77.77777777777779    .1154831  .6821179 11.476207
 1.1363796 61.111111111111114  -.06507305  .8322286 11.095528
 .07698283  94.44444444444444  -.01731293  .8861108 11.487823
 .03770439  88.88888888888889    .1055027  .6810955 11.742963
  .2143007  72.22222222222221  .006391194  .6806035  9.004505
         0                  0    .1067398  .4759107 13.052382
         0                  0   .05027733  .4834371 13.052382
 .10043196          72.222222  -.05075473  .7346939 9.1060915
 .04139112                100   .07754026  .7244449  11.90638
.018972583          94.444444  .021963427  .8518023 11.458124
 .02423795          94.444444 -.028824344  .7900178 11.550463
 .05935454          72.222222   .03961356  .7114747  9.137008
         0                  0   .09800203  .4629343 13.052382
  .3855635          94.444444  .016014863  .4123426  10.26767
  .8736585          94.444444   .04139164  .3948848 10.444363
  .4901554  27.77777777777778   .04618609  .9328766  10.94983
  .1758431  8.333333333333332   .06800962  .6945833 11.229963
 .27302766                100     .050507  .8381554 11.532824
 .04818375  33.33333333333333  .067577444 .59092194 10.427258
 .12436452 19.444444444444446  .063565604  .6320949 11.555207
   .165586 19.444444444444446  .068395615  .6417531  11.55769
  .7304869  94.44444444444444   .05677027  .9113894 11.428026
 .03016507  5.555555555555555   .07155064  .6654221 10.730125
 .28472638 19.444444444444446   .08511353  .6688995 11.296506
 .02605753  41.66666666666667    .0880449  .6644061 10.668767
 1.4097238                100   .05346942  .9322409  11.47613
 .09976367 2.7777777777777777   .04755569  .6624966  9.351868
.018100241  41.66666666666667    .0911067  .6660988 10.863138
end

wanted format

variables

high debts

low debts

diff

t-test

Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17673

06 Sep 2022, 00:30

Mohamed:
you may want to try:

Code:

. g wanted=0 if debts<=r(p50)
(25 missing values generated)

. replace wanted=1 if debts>r(p50)
(25 real changes made)

. regress y_variable debts ROA LEV FSIZE i.wanted

      Source |       SS           df       MS      Number of obs   =        50
-------------+----------------------------------   F(5, 44)        =      4.82
       Model |  23026.4763         5  4605.29527   Prob > F        =    0.0013
    Residual |  42022.2876        44  955.051992   R-squared       =    0.3540
-------------+----------------------------------   Adj R-squared   =    0.2806
       Total |   65048.764        49   1327.5258   Root MSE        =    30.904

------------------------------------------------------------------------------
  y_variable | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       debts |    23.7144   20.17187     1.18   0.246    -16.93933    64.36813
         ROA |   .2160795   95.33655     0.00   0.998    -191.9221    192.3543
         LEV |    87.3874   38.88847     2.25   0.030     9.012841     165.762
       FSIZE |  -8.170314   4.123506    -1.98   0.054    -16.48069    .1400664
    1.wanted |   3.438655   13.27273     0.26   0.797    -23.31077    30.18808
       _cons |   74.28713   57.71717     1.29   0.205    -42.03418    190.6084
------------------------------------------------------------------------------

or, as far as the OLS is concerned:

Code:

. bysort wanted: regress y_variable debts ROA LEV FSIZE

------------------------------------------------------------------------------------------------------------------------------------------
-> wanted = 0

      Source |       SS           df       MS      Number of obs   =        25
-------------+----------------------------------   F(4, 20)        =     12.91
       Model |  28557.0422         4  7139.26056   Prob > F        =    0.0000
    Residual |  11055.9199        20  552.795993   R-squared       =    0.7209
-------------+----------------------------------   Adj R-squared   =    0.6651
       Total |  39612.9621        24  1650.54009   Root MSE        =    23.512

------------------------------------------------------------------------------
  y_variable | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       debts |   213.5224   260.4035     0.82   0.422    -329.6697    756.7145
         ROA |   120.5173   123.7947     0.97   0.342    -137.7139    378.7486
         LEV |   338.9003   56.44631     6.00   0.000     221.1554    456.6453
       FSIZE |   12.24027   6.478521     1.89   0.073    -1.273692    25.75422
       _cons |   -333.261   96.45075    -3.46   0.003    -534.4537   -132.0682
------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------
-> wanted = 1

      Source |       SS           df       MS      Number of obs   =        25
-------------+----------------------------------   F(4, 20)        =      2.14
       Model |  5261.19732         4  1315.29933   Prob > F        =    0.1134
    Residual |  12292.5058        20  614.625292   R-squared       =    0.2997
-------------+----------------------------------   Adj R-squared   =    0.1597
       Total |  17553.7032        24  731.404298   Root MSE        =    24.792

------------------------------------------------------------------------------
  y_variable | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       debts |   47.95644    17.8264     2.69   0.014     10.77121    85.14166
         ROA |   26.10073   102.9631     0.25   0.802    -188.6765    240.8779
         LEV |  -22.39877   44.02392    -0.51   0.616     -114.231    69.43351
       FSIZE |  -11.40662   6.731531    -1.69   0.106    -25.44835    2.635111
       _cons |   177.7442   65.08515     2.73   0.013     41.97898    313.5095
------------------------------------------------------------------------------

.

Kind regards,
Carlo
(Stata 19.0)

Announcement

Comment

Comment

Comment

Comment

Comment

Comment