Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Basic Regression Analysis Help

    Hello all,

    I'm new to regression analysis in general. I've read all the rules so I hope you can all bear with me and help me become better with STATA in general!
    My first batch of questions:


    Here is the dataset. The variables are:
    gvkey: its a numerical code given by the mother dataset to each company
    fyear: year. The dataset only includes one year.
    tic: the public abbreviation of the company name
    conm: company name
    opinc: operating income of the company
    assets: the assets a company has
    ind: industry the company is in
    return: operating income / assets
    logassets: log(assets)

    The first thing: I am trying to destring ind and it doesn't allow me. "destring ind, replace ind: contains nonnumeric characters; no replace".

    I want to do this, as I want to try to regress return onto logassets with industry (ind) fixed effects. This code would be:
    xtreg return logassets, fe [ind] ? Am I correct here? Also, do I have to do xtset before even if I have one year of the database?

    Another queston, when I reg return logassets, I get the coefficient of the regression of logassets being .04. This means that for an increase of 1 in logassets, we would get an increase of .04 return. Correct?

    After I get the fixed effects regression done, I would like to test adding a squared term into the regression. Would this be the correct way of doing it? reg return logassets logassets*logassets, or is it logassets^2?


    Sorry if these are very basic questions! It's only my third week into this world! Hopefully I can grow to learn.


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long gvkey int fyear str7 tic str29 conm float(opinc assets) str2 ind float(return logassets)
    1004 2015 "AIR"   "AAR CORP"                         66.1    1442.1 "20"    .04583593  7.273856
    1045 2015 "AAL"   "AMERICAN AIRLINES GROUP INC"      7284     48415 "20"    .15044925 10.787565
    1050 2015 "CECE"  "CECO ENVIRONMENTAL CORP"        28.651   598.819 "20"    .04784584  6.394959
    1062 2015 "ASA"   "ASA GOLD AND PRECIOUS METALS"   -1.713    162.35 "40"  -.010551278  5.089755
    1072 2015 "AVX"   "AVX CORP"                      169.302  2409.819 "45"    .07025506  7.787307
    1075 2015 "PNW"   "PINNACLE WEST CAPITAL CORP"    854.602 15028.258 "55"    .05686634  9.617687
    1076 2015 "AAN"   "AARON'S INC"                   244.117  2658.875 "25"    .09181214  7.885658
    1078 2015 "ABT"   "ABBOTT LABORATORIES"              3372     41247 "35"     .0817514 10.627334
    1094 2015 "ACET"  "ACETO CORP"                     53.827   489.774 "35"    .10990171  6.193944
    1097 2015 "ACMTA" "ACMAT CORP  -CL A"               1.214    73.518 "40"   .016512964   4.29753
    1104 2015 "ACU"   "ACME UNITED CORP"                7.747    81.421 "20"    .09514745 4.3996334
    1117 2015 "BKTI"  "BK TECHNOLOGIES"                  1.43    39.449 "45"   .036249332  3.675009
    1121 2015 "AE"    "ADAMS RESOURCES & ENERGY INC"   -2.359   243.215 "10"  -.009699238  5.493946
    1161 2015 "AMD"   "ADVANCED MICRO DEVICES"           -319      3109 "45"   -.10260534  8.042056
    1166 2015 "ASMIY" "ASM INTERNATIONAL NV"          140.052  2254.303 "45"    .06212652  7.720596
    1177 2015 "AET"   "AETNA INC"                      4762.9   53424.1 "35"    .08915264 10.886017
    1186 2015 "AEM"   "AGNICO EAGLE MINES LTD"        168.035   6683.18 "15"    .02514297  8.807349
    1209 2015 "APD"   "AIR PRODUCTS & CHEMICALS INC"   1870.3   17438.1 "15"    .10725366  9.766413
    1210 2015 "AIRT"  "AIR T INC"                       6.285    52.155 "20"    .12050618   3.95422
    1224 2015 "EGN1"  "ALABAMA GAS CORP"                 89.2      1519 "55"    .05872284  7.325808
    1225 2015 "SO1"   "ALABAMA POWER CO"                 1563     21721 "55"    .07195801  9.986034
    1230 2015 "ALK"   "ALASKA AIR GROUP INC"             1330      6533 "20"     .2035818  8.784621
    1234 2015 "ATRI"  "ATRION CORP"                     42.51   164.336 "35"    .25867733  5.101913
    1254 2015 "MATX"  "MATSON INC"                      212.1    1669.8 "20"     .1270212  7.420459
    1257 2015 "ALX"   "ALEXANDER'S INC"                95.205  1447.808 "60"    .06575803  7.277806
    1266 2015 "ALCO"  "ALICO INC"                      32.702    460.58 "30"    .07100178  6.132486
    1274 2015 "Y"     "ALLEGHANY CORP"                851.346  22846.33 "40"    .03726401 10.036546
    1300 2015 "HON"   "HONEYWELL INTERNATIONAL INC"      7374     49316 "20"     .1495255 10.806004
    1327 2015 "SWKS"  "SKYWORKS SOLUTIONS INC"         1026.7    3719.4 "45"    .27603915  8.221317
    1356 2015 "AA.3"  "ALCOA INC"                        1993     36528 "15"    .05456088 10.505835
    1380 2015 "HES"   "HESS CORP"                       -2359     34195 "10"   -.06898669 10.439835
    1388 2015 "AMR1"  "AMERICAN AIRLINES INC"            7258     50439 "20"     .1438966  10.82852
    1393 2015 "UHAL"  "AMERCO"                        773.111  8150.725 "20"     .0948518  9.005862
    1397 2015 "ABLT"  "AMERICAN BILTRITE INC"            .709   114.836 "15"   .006174022  4.743505
    1410 2015 "ABM"   "ABM INDUSTRIES INC"               95.5    2149.8 "20"    .04442273   7.67313
    1414 2015 "PRI"   "PRIMERICA INC"                 324.488  10612.12 "40"    .03057712  9.269752
    1439 2015 "ECOL"  "US ECOLOGY INC"                 79.531   771.987 "20"    .10302116  6.648968
    1440 2015 "AEP"   "AMERICAN ELECTRIC POWER CO"     3333.5   61683.1 "55"    .05404235 11.029765
    1447 2015 "AXP"   "AMERICAN EXPRESS CO"              8968    161184 "40"    .05563828 11.990302
    1448 2015 "AXP1"  "AMERICAN EXPRESS CREDIT CORP"      590     33285 "40"   .017725702 10.412862
    1449 2015 "AFL"   "AFLAC INC"                        4413    118296 "40"    .03730473 11.680945
    1487 2015 "AIG"   "AMERICAN INTERNATIONAL GROUP"     5136    496943 "40"    .01033519  13.11623
    1491 2015 "HAIPF" "HADERA PAPER LTD"                9.077   656.126 "15"   .013834233  6.486353
    1526 2015 "ANAT"  "AMERICAN NATIONAL INSURANCE"   268.688  23746.96 "40"   .011314625  10.07521
    1545 2015 "ARL"   "AMERICAN REALTY INVESTORS"      22.982  1117.368 "60"    .02056798  7.018731
    1554 2015 "ASEI"  "AMERICAN SCIENCE ENGINEERING"    -2.23   171.229 "20"  -.013023495  5.143002
    1559 2015 "AMS"   "AMERICAN SHARED HSPTL SERV"      3.219    54.114 "35"    .05948553  3.991093
    1562 2015 "AMSWA" "AMERICAN SOFTWARE  -CL A"       13.527   136.724 "45"    .09893655  4.917964
    1585 2015 "AVD"   "AMERICAN VANGUARD CORP"         11.524   443.539 "15"   .025981933  6.094786
    1598 2015 "AME"   "AMETEK INC"                    944.321   6664.53 "20"    .14169356  8.804555
    1602 2015 "AMGN"  "AMGEN INC"                        8603     71576 "35"    .12019392 11.178515
    1613 2015 "AP"    "AMPCO-PITTSBURGH CORP"          -4.222   506.156 "15"  -.008341302  6.226845
    1618 2015 "AXR"   "AMREP CORP"                     -6.174   120.628 "20"   -.05118215  4.792711
    1632 2015 "ADI"   "ANALOG DEVICES"               1064.392  7062.178 "45"    .15071723  8.862509
    1633 2015 "ALOG"  "ANALOGIC CORP"                  40.283    627.97 "35"    .06414797  6.442492
    1659 2015 "ANDE"  "ANDERSONS INC"                  42.624  2359.101 "30"     .0180679  7.766036
    1661 2015 "NBR"   "NABORS INDUSTRIES LTD"         148.767   9537.84 "10"   .015597557  9.163022
    1678 2015 "APA"   "APACHE CORP"                    -25913     18842 "10"   -1.3752786  9.843843
    1686 2015 "APOG"  "APOGEE ENTERPRISES INC"         97.393    657.44 "20"    .14813974  6.488354
    1689 2015 "AEP1"  "APPALACHIAN POWER"               710.8   11648.3 "55"    .06102178  9.362915
    1690 2015 "AAPL"  "APPLE INC"                       71230    290479 "45"    .24521565 12.579287
    1704 2015 "AMAT"  "APPLIED MATERIALS INC"            1692     15308 "45"    .11053044  9.636131
    1706 2015 "ATU"   "ACTUANT CORP  -CL A"           142.207  1636.917 "20"     .0868749   7.40057
    1712 2015 "TREC"  "TRECORA RESOURCES"              36.041   258.811 "15"    .13925606  5.556098
    1722 2015 "ADM"   "ARCHER-DANIELS-MIDLAND CO"        2010     40157 "30"    .05005354 10.600552
    1742 2015 "PNW1"  "ARIZONA PUBLIC SERVICE CO"     872.127 14982.182 "55"    .05821095  9.614617
    1743 2015 "ARCB"  "ARCBEST CORP"                    75.17  1262.909 "20"    .05952131  7.141173
    1745 2015 "ETR1"  "ENTERGY ARKANSAS"              179.406  8747.774 "55"    .02050876  9.076554
    1773 2015 "ARW"   "ARROW ELECTRONICS INC"         893.247  13021.93 "45"   .068595596   9.47439
    1783 2015 "ARTW"  "ARTS WAY MFG INC"                 .005    31.332 "20" .00015958125   3.44464
    1794 2015 "ASH"   "ASHLAND GLOBAL HOLDINGS INC"       477     10064 "15"    .04739666   9.21672
    1820 2015 "ALOT"  "ASTRONOVA INC"                   6.043    77.963 "45"    .07751113 4.3562346
    1823 2015 "ATRO"  "ASTRONICS CORP"                101.601   609.243 "20"    .16676597  6.412217
    1837 2015 "SO7"   "SOUTHERN CO GAS"                   804     14754 "55"     .0544937   9.59927
    1860 2015 "ATW"   "ATWOOD OCEANICS"                607.51  4809.011 "10"    .12632743  8.478247
    1864 2015 "REX"   "REX AMERICAN RESOURCES CORP"    31.021   414.685 "10"    .07480618  6.027519
    1878 2015 "ADSK"  "AUTODESK INC"                      1.3    5515.3 "45" .00023570794  8.615281
    1891 2015 "ADP"   "AUTOMATIC DATA PROCESSING"        2014   33110.5 "45"    .06082663 10.407606
    1906 2015 "AVHI"  "A V HOMES INC"                  22.171   742.016 "25"    .02987941  6.609371
    1913 2015 "AVY"   "AVERY DENNISON CORP"             537.7    4133.7 "15"    .13007717  8.326928
    1919 2015 "AVT"   "AVNET INC"                     918.478 10799.953 "45"    .08504463  9.287297
    1920 2015 "AVP"   "AVON PRODUCTS"                   351.6    3879.5 "30"    .09063024  8.263461
    1926 2015 "AZZ"   "AZZ INC"                       123.188   983.371 "20"    .12527114  6.890986
    1932 2015 "BTI"   "BRITISH AMER TOBACCO PLC"      6968.96  46472.02 "30"    .14996034 10.746606
    1949 2015 "BRT"   "BRT APARTMENTS CORP"             7.567   835.879 "60"   .009052745  6.728484
    1968 2015 "BMI"   "BADGER METER INC"               49.769    355.48 "45"    .14000507  5.873469
    1976 2015 "BHI"   "BAKER HUGHES INC"                   78     24080 "10"  .0032392025 10.089137
    1979 2015 "BCPC"  "BALCHEM CORP  -CL B"            95.742   881.223 "15"    .10864673  6.781311
    1982 2015 "PTVCB" "PROTECTIVE INSURANCE CORP"      35.639  1085.771 "40"    .03282368  6.990046
    1988 2015 "BLL"   "BALL CORP"                       799.9      9777 "15"    .08181447  9.187788
    2002 2015 "BPOP"  "POPULAR INC"                  1004.036 35769.535 "40"   .028069586 10.484852
    2005 2015 "BOH"   "BANK OF HAWAII CORP"           325.198 15455.016 "40"   .021041583  9.645689
    2019 2015 "BK"    "BANK OF NEW YORK MELLON CORP"     6860    393780 "40"   .017420895 12.883548
    2044 2015 "BCR"   "BARD (C.R.) INC"                   855    4942.9 "35"    .17297538  8.505708
    2049 2015 "B"     "BARNES GROUP INC"              190.679  2061.866 "20"    .09247886  7.631367
    2052 2015 "BRN"   "BARNWELL INDUSTRIES"            -4.515    41.553 "10"    -.1086564   3.72697
    2055 2015 "ABX"   "BARRICK GOLD CORP"                 798     26308 "15"    .03033298 10.177629
    2080 2015 "BSET"  "BASSETT FURNITURE INDS"         26.463   282.543 "25"    .09366008  5.643831
    2086 2015 "BAX"   "BAXTER INTERNATIONAL INC"          735     20975 "35"   .035041716  9.951087
    2101 2015 "AIT"   "APPLIED INDUSTRIAL TECH INC"   184.619  1434.968 "20"    .12865722  7.268898
    end

  • #2
    Welcome to Statalist.

    The first thing: I am trying to destring ind and it doesn't allow me. "destring ind, replace ind: contains nonnumeric characters; no replace".
    It is not apparent in the example data, but some values of ind in your dataset have non-numeric characters, as Stata told you. Since the objective of destring is to convert a string into a numeric value, I assume you expected that ind is a string containing a number.

    In the code below, I took the first 5 observations of your example data and changed ind in the fifth observation. You'll see in the output that I then got the same error you did, and the next command helps identify the observation (or observations) where the problem occurs.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long gvkey int fyear str7 tic str29 conm float(opinc assets) str2 ind float(return logassets)
    1004 2015 "AIR"   "AAR CORP"                         66.1    1442.1 "20"    .04583593  7.273856
    1045 2015 "AAL"   "AMERICAN AIRLINES GROUP INC"      7284     48415 "20"    .15044925 10.787565
    1050 2015 "CECE"  "CECO ENVIRONMENTAL CORP"        28.651   598.819 "20"    .04784584  6.394959
    1062 2015 "ASA"   "ASA GOLD AND PRECIOUS METALS"   -1.713    162.35 "40"  -.010551278  5.089755
    1072 2015 "AVX"   "AVX CORP"                      169.302  2409.819 "x5"    .07025506  7.787307
    end
    destring ind, replace
    list gvkey fyear ind if real(ind)==.
    Code:
    . destring ind, replace
    ind: contains nonnumeric characters; no replace
    
    . list gvkey fyear ind if real(ind)==.
    
         +---------------------+
         | gvkey   fyear   ind |
         |---------------------|
      5. |  1072    2015    x5 |
         +---------------------+
    when I reg return logassets, I get the coefficient of the regression of logassets being .04. This means that for an increase of 1 in logassets, we would get an increase of .04 return. Correct?
    Yes.

    I would like to test adding a squared term into the regression. Would this be the correct way of doing it? reg return logassets logassets*logassets, or is it logassets^2?
    Neither. You should start by reading the output of help factor variables and you will see that the correct syntax would be
    Code:
    reg return c.logassets##c.logassets
    Let me give one further piece of advice. It seems that along with being new to regression analysis, you are new to Stata as well. In general, inventing syntax as you did in your third question is not a helpful approach. I'm sympathetic to you as a new user of Stata - it's a lot to absorb. I'd like to encourage you to take a step back from your immediate tasks and do some reading.

    When I began using Stata in a serious way, I started, as have others here, by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. There are a lot of examples to copy and paste into Stata's do-file editor to run yourself, and better yet, to experiment with changing the options to see how the results change.

    All of these manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu. The objective in doing the reading was not so much to master Stata as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax, and know how to find out more about them in the help files and PDF manuals.

    Stata supplies exceptionally good documentation that amply repays the time spent studying it - there's just a lot of it. The path I followed surfaces the things you need to know to get started in a hurry and to work effectively.

    Comment


    • #3
      Andreu:
      as an aside to William's as always excellent reply, if you're dealing with a dataset that includes one wave of data only as your regressand is continuos, you should consider -regress- instead of -xtreg-, as the latter is conceived for panel datasets (ie, two waves of data at least).
      Kind regards,
      Carlo
      (StataNow 18.5)

      Comment


      • #4
        Thank you both.

        Carlo, If I want to do a fixed effects at the industry level, I have to use xtreg, right? Or is there another way to do it?

        William, thank you for your tips. Is there a way to destring the variable, so that it ignores the letters or that when I replace it, it leaves them out?

        Comment


        • #5
          Andreu:
          the first question should be: are you dealing with a panel dataset or a cross-sectional one?
          If, as I surmised from your original post, your dataset is cross-.sectional, you can include -industry- as a predictor in the right-hand side of your regression equation:
          Code:
          reg return c.logassets##c.logassets i.industry
          Kind regards,
          Carlo
          (StataNow 18.5)

          Comment


          • #6
            Carlo, you are correct. It is indeed cross sectional.

            I have the results of the two regressions (below are the images. I don't know how to import them in another format without them looking wonky).
            I have a few questions about the results, perhaps someone can help me understand myself a bit better.

            1. If I want to observe the industry sector with the highest proftability, that means the industry sector with the highest coefficient.
            Or rather, I could do summarize ind if max(return) ?
            2. In the regression i did before, "logassets" had a coefficient of 0.4. Now I have a coef of 0.37. This means when industry fixed effects are accounted for, the increse of a unit of logassets has less of an effect since each industry has different charactaristic of assets and the average effect taking all these differences is less. Is this a correct line of thought?


            For the quadratic regression. I am still thinking and reading on it..
            What does it mean that the quadratic form has a negative coefficient?
            If I wanted to see the level of assets for which, when profitability (return) is at a minimum or maximum? How could I do that?
            What is the best graph to illustrate the relationship here between return and assets?



            Click image for larger version

Name:	ind fe.png
Views:	1
Size:	38.4 KB
ID:	1525103


            Click image for larger version

Name:	quadratic form.png
Views:	1
Size:	22.6 KB
ID:	1525104

            Comment


            • #7
              Andreu:
              1) the industry with the highest effect on -return- (other things being equal) seems to be #30;
              2) 0.4 vs 0.37 is something I would never ever consider as different;
              3) and 4) the negative coefficient of the quadratic term means that you have a "crying parabola" with a maximum at (-.1959865/2*(-.0106253)) of -logassets-;
              5) see https://www.statalist.org/forums/for...c-relationship.

              For the future, please use CODE delimiters (see the FAQ) to share what you typed and what Stata gave you back. Thanks.
              As an aside, you migh find the following textbook helpful: https://www.stata.com/bookstore/micr...metrics-stata/.
              Kind regards,
              Carlo
              (StataNow 18.5)

              Comment


              • #8
                From post #4

                Is there a way to destring the variable, so that it ignores the letters or that when I replace it, it leaves them out?
                From the results in post #6 it appears you read the output of help destring and accomplished what you needed. The output of the help command should be your first source of information about a command.

                I will add that before using the ignore option on destring, you should follow the process I suggested in post #2 to understand your data and determine whether ignoring the nonnumeric characters results in the "correct" industry code for the firm, or whether there is a more serious problem with that observation. For example, if an industry code is "1o" rather than "10", and you ignore the "o", you will have industry 1 (Agricultural Production - Crops) rather than industry 10 (Metal Mining). assuming these are 2-digit SIC codes.

                Comment


                • #9
                  Thanks again for all the help.
                  William, I saw that the main problem was that there were 19 observations that were unspecified as "NA". I decided to drop them.

                  I have a question regarding the quadratic form:
                  1. How did you find out that value is the maxium, Carlo? And how could I find the minimums if I wanted to?
                  2. I did the margins, dydx(lat) at(lat = (6.4(0.1)9.8)) but I do not know how to interpret the results. I graphed it and I get a downward sloping curve, which seems odd. I want to find an explanation with for the relationbetween RoA and Company size. However the plot didnt show me the parabola (which is reasonable in my mind) it showed me a downward sloping line. Is this becuase I may have delimited the plot wrongly?

                  Comment


                  • #10
                    Andreu:
                    1) because a negative squared terms identifies a crying parabola (hence, it does not have a minimum).
                    2) you should adapt the example reported in that thread to your data.
                    Kind regards,
                    Carlo
                    (StataNow 18.5)

                    Comment


                    • #11
                      Thank you both!
                      I'm having a hard time understanding the code of the margins and so forth but I think im on the right path.
                      You've been both very helpful. THank you.

                      Comment

                      Working...
                      X