Dear all,
I've found really helpful advice in this forum already, so first of all a big thank you to everybody contributing here. Everything I've done so far is self taught, so I'm happy for any advice guiding me into things I should further investigate. I've gathered a few questions (mostly in bold) that hopefully somebody is able to answer since I'm feeling a bit stuck. I can't share the data since it's confidential, I hope you can still help out. If not, I will try to generate dummy data. Sorry for the lengthy post in advance
Context:
2: Standardization of variables
With z transformation:
Without z transformation:
CONTINUED IN NEXT COMMENT DUE TO CHARACTER LIMIT
I've found really helpful advice in this forum already, so first of all a big thank you to everybody contributing here. Everything I've done so far is self taught, so I'm happy for any advice guiding me into things I should further investigate. I've gathered a few questions (mostly in bold) that hopefully somebody is able to answer since I'm feeling a bit stuck. I can't share the data since it's confidential, I hope you can still help out. If not, I will try to generate dummy data. Sorry for the lengthy post in advance
Context:
- I want to analyze, if certain linguistic cues correlate with a binary outcome, therefore I apply computer aided text analysis to count occurence of words that are related to different intentions.
- The measure always counts the occurence of specified words (e.g., "support, helpful, reiterate...") and divides this number by the total count of words e.g., of a forum posting, blog article etc. This is then "normalized" to 100 words, so the measure tells you: "Number of words related to intention / 100 words"
- Let's assume I want to find out, if statalist forum members, that use language related to solving problems (IV1: go_score_n - continuous ) or related to offer emotional support (IV2: ecvo_score_n - continuous) are more likely to become administrators/moderators (DV: furoyn - binary)
- I also want to analyze, whether different additional linguistic signals moderate those relationships (Mod1: innovativeness_n- continuous, Mod2: Proactiveness_n- continuous, Mod3: risk_taking_n- continuous))
- I'm using the most current version of Stata
- I'm using logit
- I'm concerned, that IV1 is 0 in 95% of the observations. There is just not many post including terms related to problem solving at all. I looked at zero inflation, right skewed data and others, but it seems not to be a problem. Should I further investigate here?
Code:
. sum go_score_n Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- go_score_n | 13,847 .074657 .3567864 0 8.064516 . sum ecvo_score_n Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- ecvo_score_n | 13,847 2.927646 2.970964 0 22.64151 . sum furoyn Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- furoyn | 13,847 .5579548 .4966478 0 1 . sum innovativeness_n Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- innovative~n | 13,847 .7639856 1.125253 0 11.53846 . sum proactiveness_n Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- proactiven~n | 13,847 .163814 .5410325 0 7.692308 . sum risk_taking_n Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- risk_takin~n | 13,847 .0687102 .349786 0 7.619048
2: Standardization of variables
- IV2 takes higher values than IV1 by design (generally more words related to emotional support)
- I control for several variables (some binary) that are on different scales (Age of the member, number of posts per member)
- I hence decided to standardize/Center (mean 0, std. dev. 1) all variables (Except binary) to make the results easier to interpret using
Code:
egen zgo_score = std(go_score_n)
- This should not change the results of the regression as far as I know and this holds true as long as I don't include interaction terms
- If I now include interaction terms of different moderators, the results change significantly (p values of innovativeness and proactiveness (line 3+4)), everything else stays unchanged
- Can anybody explain why this could be the case?
With z transformation:
Code:
Logistic regression Number of obs = 13,742 LR chi2(106) = 1692.27 Prob > chi2 = 0.0000 Log likelihood = -8576.1432 Pseudo R2 = 0.0898 --------------------------------------------------------------------------------------------------- furoyn | Coef. Std. Err. z P>|z| [95% Conf. Interval] ----------------------------------+---------------------------------------------------------------- zgo_score | .0573774 .0202306 2.84 0.005 .0177261 .0970287 zecvo_score | .0667573 .0202507 3.30 0.001 .0270667 .1064478 zinnovativeness | -.0337875 .018816 -1.80 0.073 -.0706662 .0030912 zproactiveness | -.0512685 .0188153 -2.72 0.006 -.0881458 -.0143913 zrisk_taking | .0579722 .0204958 2.83 0.005 .0178011 .0981433 single_founder | -.3199351 .0609761 -5.25 0.000 -.439446 -.2004242 con_team_degree | .6123062 .0475026 12.89 0.000 .5192028 .7054095 con_team_previous_venture | .0386609 .0477949 0.81 0.419 -.0550153 .1323371 con_team_tech_education | .2128431 .0525024 4.05 0.000 .1099403 .3157459 con_team_business_education | -.1074691 .0527645 -2.04 0.042 -.2108857 -.0040526 zcount_founders | .1913459 .0326648 5.86 0.000 .1273241 .2553676 zword_count | -.0543813 .0183626 -2.96 0.003 -.0903713 -.0183913 foundingyear | .0021185 .0114429 0.19 0.853 -.0203093 .0245462 | region | Alaska | -1.109519 .8046796 -1.38 0.168 -2.686662 .4676237 Arizona | -.7251805 .4376708 -1.66 0.098 -1.583 .1326386 Arkansas | .3232572 .5457719 0.59 0.554 -.7464361 1.392951 California | -.3185187 .3969132 -0.80 0.422 -1.096454 .4594168 Colorado | -.6014498 .4106057 -1.46 0.143 -1.406222 .2033226 Connecticut | -.4330013 .465866 -0.93 0.353 -1.346082 .4800793 Delaware | -.7672515 .4569004 -1.68 0.093 -1.66276 .1282568 District of Columbia | -.690878 .4276217 -1.62 0.106 -1.529001 .1472451 Florida | -1.010855 .408132 -2.48 0.013 -1.810779 -.2109306 Georgia | -.7572474 .4176518 -1.81 0.070 -1.57583 .0613351 Hawaii | .4668803 .6618001 0.71 0.481 -.8302241 1.763985 Idaho | -.1884798 .6145301 -0.31 0.759 -1.392937 1.015977 Illinois | -.9273684 .4063148 -2.28 0.022 -1.723731 -.131006 Indiana | -.2560208 .4787829 -0.53 0.593 -1.194418 .6823765 Iowa | -.2581004 .6201546 -0.42 0.677 -1.473581 .9573804 Kansas | -.2179371 .6186952 -0.35 0.725 -1.430557 .9946832 Kentucky | -.6792111 .4947542 -1.37 0.170 -1.648912 .2904893 Louisiana | -1.214356 .6802647 -1.79 0.074 -2.54765 .1189382 Maine | .153582 .7241754 0.21 0.832 -1.265776 1.57294 Maryland | -.480212 .43328 -1.11 0.268 -1.329425 .3690011 Massachusetts | -.3841625 .4064893 -0.95 0.345 -1.180867 .4125419 Michigan | -.8431387 .4462405 -1.89 0.059 -1.717754 .0314765 Minnesota | -.442101 .4466314 -0.99 0.322 -1.317482 .4332804 Mississippi | 0 (empty) Missouri | -.1174401 .4509215 -0.26 0.795 -1.00123 .7663497 Montana | -1.538069 .9945414 -1.55 0.122 -3.487335 .411196 Nebraska | .6689198 .5867952 1.14 0.254 -.4811776 1.819017 Nevada | -1.042005 .4543644 -2.29 0.022 -1.932543 -.1514666 New Hampshire | -1.173678 .5665448 -2.07 0.038 -2.284085 -.0632702 New Jersey | -.9301801 .4344686 -2.14 0.032 -1.781723 -.0786372 New Mexico | -.9545475 .8320588 -1.15 0.251 -2.585353 .6762578 New York | -.3207621 .3983335 -0.81 0.421 -1.101482 .4599573 North Carolina | -.4658146 .4251189 -1.10 0.273 -1.299032 .3674031 North Dakota | -2.44293 1.147913 -2.13 0.033 -4.692797 -.1930628 Ohio | -.2814488 .433807 -0.65 0.516 -1.131695 .5687973 Oklahoma | -.2217325 .5522796 -0.40 0.688 -1.304181 .8607156 Oregon | -.2773838 .4381029 -0.63 0.527 -1.13605 .5812822 Pennsylvania | -.3560048 .4180474 -0.85 0.394 -1.175363 .4633532 Rhode Island | -.6084539 .6234432 -0.98 0.329 -1.83038 .6134723 South Carolina | -.2939946 .5118774 -0.57 0.566 -1.297256 .7092666 South Dakota | -.5589396 1.131233 -0.49 0.621 -2.776115 1.658236 Tennessee | .0421886 .438486 0.10 0.923 -.8172281 .9016053 Texas | -.6331516 .4036925 -1.57 0.117 -1.424374 .1580712 Utah | -.4721358 .4403477 -1.07 0.284 -1.335201 .3909298 Vermont | 1.67578 1.15223 1.45 0.146 -.5825492 3.934109 Virginia | -.4078477 .425317 -0.96 0.338 -1.241454 .4257584 Washington | -.2937702 .4100078 -0.72 0.474 -1.097371 .5098304 Wisconsin | .1017823 .4835633 0.21 0.833 -.8459843 1.049549 Wyoming | -.2999933 .7248317 -0.41 0.679 -1.720637 1.120651 | industry1 | Advertising | -.325256 .1719015 -1.89 0.058 -.6621767 .0116647 Agriculture and Farming | .8336457 .2669505 3.12 0.002 .3104324 1.356859 Apps | .0223488 .163654 0.14 0.891 -.2984071 .3431047 Artificial Intelligence | .5262165 .1693271 3.11 0.002 .1943414 .8580915 Biotechnology | .9522141 .2025336 4.70 0.000 .5552554 1.349173 Clothing and Apparel | -.0796111 .1914206 -0.42 0.677 -.4547886 .2955664 Commerce and Shopping | -.0517229 .1639394 -0.32 0.752 -.3730382 .2695924 Community and Lifestyle | -.3504756 .1795559 -1.95 0.051 -.7023986 .0014475 Consumer Electronics | .519147 .181004 2.87 0.004 .1643856 .8739084 Consumer Goods | .3690071 .2614281 1.41 0.158 -.1433826 .8813967 Content and Publishing | -.4925994 .1872672 -2.63 0.009 -.8596363 -.1255624 Data and Analytics | .0854468 .1712017 0.50 0.618 -.2501024 .420996 Design | -.8222705 .2427893 -3.39 0.001 -1.298129 -.3464123 Education | -.5124529 .1779941 -2.88 0.004 -.8613149 -.163591 Energy | -.0274701 .262089 -0.10 0.917 -.5411551 .4862149 Events | -.5981025 .2268772 -2.64 0.008 -1.042774 -.1534313 Financial Services | -.1599506 .1690107 -0.95 0.344 -.4912055 .1713043 Food and Beverage | .4842532 .2271892 2.13 0.033 .0389706 .9295359 Gaming | .0481925 .2457615 0.20 0.845 -.4334913 .5298763 Government and Military | -.0933653 .3277165 -0.28 0.776 -.7356777 .5489472 Hardware | .2492227 .1899067 1.31 0.189 -.1229875 .621433 Health Care | .2881997 .1743739 1.65 0.098 -.0535667 .6299662 Information Technology | -.2723342 .1713707 -1.59 0.112 -.6082147 .0635463 Internet Services | -.305543 .17835 -1.71 0.087 -.6551026 .0440165 Manufacturing | .2425024 .3271221 0.74 0.458 -.398645 .8836499 Media and Entertainment | -.5598453 .2147598 -2.61 0.009 -.9807668 -.1389239 Messaging and Telecommunications | -.4363826 1.462768 -0.30 0.765 -3.303355 2.43059 Mobile | -.2656979 .2275671 -1.17 0.243 -.7117213 .1803255 Natural Resources | 0 (empty) Navigation and Mapping | .2282353 .9033722 0.25 0.801 -1.542342 1.998812 Platforms | 0 (empty) Privacy and Security | -.8363825 .6651409 -1.26 0.209 -2.140035 .4672696 Professional Services | -.8785513 .2525927 -3.48 0.001 -1.373624 -.3834787 Real Estate | -.4114463 .2387765 -1.72 0.085 -.8794397 .0565471 Sales and Marketing | -1.337407 .3007298 -4.45 0.000 -1.926826 -.7479872 Science and Engineering | .0059357 .5217708 0.01 0.991 -1.016716 1.028588 Software | -.6327325 .2042772 -3.10 0.002 -1.033108 -.2323566 Sports | -.5129398 .386754 -1.33 0.185 -1.270964 .2450841 Sustainability | -.7314088 .6993522 -1.05 0.296 -2.102114 .6392963 Transportation | -.1276911 .2870238 -0.44 0.656 -.6902473 .4348651 Travel and Tourism | -.905871 .315702 -2.87 0.004 -1.524636 -.2871064 | c.zgo_score#c.zinnovativeness | -.0141557 .0194638 -0.73 0.467 -.052304 .0239926 | c.zgo_score#c.zproactiveness | -.0353104 .0250604 -1.41 0.159 -.084428 .0138071 | c.zgo_score#c.zrisk_taking | .0585036 .0368148 1.59 0.112 -.0136521 .1306593 | c.zecvo_score#c.zinnovativeness | .0018892 .0201918 0.09 0.925 -.037686 .0414644 | c.zecvo_score#c.zproactiveness | -.0585501 .0189837 -3.08 0.002 -.0957574 -.0213428 | c.zecvo_score#c.zrisk_taking | -.0370386 .0186474 -1.99 0.047 -.0735869 -.0004903 | _cons | -3.756265 23.06196 -0.16 0.871 -48.95688 41.44435 ---------------------------------------------------------------------------------------------------
Without z transformation:
Code:
Logistic regression Number of obs = 13,742 LR chi2(106) = 1692.63 Prob > chi2 = 0.0000 Log likelihood = -8575.9622 Pseudo R2 = 0.0898 --------------------------------------------------------------------------------------------------- furoyn | Coef. Std. Err. z P>|z| [95% Conf. Interval] ----------------------------------+---------------------------------------------------------------- go_score_n | .1857598 .0651084 2.85 0.004 .0581498 .3133698 ecvo_score_n | .0304762 .0081967 3.72 0.000 .0144111 .0465413 innovativeness_n | -.0290167 .0225704 -1.29 0.199 -.0732539 .0152206 proactiveness_n | .0254278 .047654 0.53 0.594 -.0679723 .118828 risk_taking_n | .234966 .0834066 2.82 0.005 .0714921 .3984398 single_founder | -.3206551 .0609715 -5.26 0.000 -.4401571 -.201153 con_team_degree | .6121687 .0475049 12.89 0.000 .5190608 .7052766 con_team_previous_venture | .0385023 .0477944 0.81 0.420 -.0551729 .1321775 con_team_tech_education | .2134653 .052501 4.07 0.000 .1105653 .3163654 con_team_business_education | -.1083126 .0527666 -2.05 0.040 -.2117332 -.0048921 con_count_founders | .1973004 .0337704 5.84 0.000 .1311117 .2634891 con_word_count | -.0008846 .0002945 -3.00 0.003 -.0014618 -.0003075 foundingyear | .0022315 .0114411 0.20 0.845 -.0201927 .0246557 | region | Alaska | -1.109111 .8047711 -1.38 0.168 -2.686434 .468211 Arizona | -.7249093 .4377157 -1.66 0.098 -1.582816 .1329978 Arkansas | .3393901 .5460102 0.62 0.534 -.7307703 1.409551 California | -.3174591 .3969562 -0.80 0.424 -1.095479 .4605606 Colorado | -.5999918 .410643 -1.46 0.144 -1.404837 .2048537 Connecticut | -.432064 .465898 -0.93 0.354 -1.345207 .4810793 Delaware | -.7621999 .456989 -1.67 0.095 -1.657882 .1334821 District of Columbia | -.6909295 .4276567 -1.62 0.106 -1.529121 .1472623 Florida | -1.008612 .4081808 -2.47 0.013 -1.808632 -.2085927 Georgia | -.7568877 .4176902 -1.81 0.070 -1.575545 .0617701 Hawaii | .4645618 .6618281 0.70 0.483 -.8325973 1.761721 Idaho | -.1881197 .6145961 -0.31 0.760 -1.392706 1.016467 Illinois | -.9261033 .4063537 -2.28 0.023 -1.722542 -.1296647 Indiana | -.2501133 .4788137 -0.52 0.601 -1.188571 .6883443 Iowa | -.2580802 .6201997 -0.42 0.677 -1.473649 .957489 Kansas | -.2162479 .618802 -0.35 0.727 -1.429078 .9965817 Kentucky | -.6796527 .4947599 -1.37 0.170 -1.649364 .2900588 Louisiana | -1.212405 .6802093 -1.78 0.075 -2.54559 .120781 Maine | .1536597 .724234 0.21 0.832 -1.265813 1.573132 Maryland | -.479964 .4333462 -1.11 0.268 -1.329307 .3693789 Massachusetts | -.3830277 .4065299 -0.94 0.346 -1.179812 .4137562 Michigan | -.8424482 .4462722 -1.89 0.059 -1.717126 .0322291 Minnesota | -.4407286 .4466884 -0.99 0.324 -1.316222 .4347647 Mississippi | 0 (empty) Missouri | -.1165391 .4509421 -0.26 0.796 -1.000369 .7672912 Montana | -1.540827 .994091 -1.55 0.121 -3.489209 .407556 Nebraska | .6693885 .5868211 1.14 0.254 -.4807597 1.819537 Nevada | -1.039474 .4544065 -2.29 0.022 -1.930094 -.1488536 New Hampshire | -1.176375 .5664964 -2.08 0.038 -2.286688 -.0660625 New Jersey | -.9297181 .4344896 -2.14 0.032 -1.781302 -.0781342 New Mexico | -.9611424 .8323167 -1.15 0.248 -2.592453 .6701684 New York | -.3195699 .3983761 -0.80 0.422 -1.100373 .4612329 North Carolina | -.4638198 .4251485 -1.09 0.275 -1.297096 .369456 North Dakota | -2.438739 1.147858 -2.12 0.034 -4.6885 -.1889781 Ohio | -.2810673 .4338334 -0.65 0.517 -1.131365 .5692305 Oklahoma | -.2231829 .5522519 -0.40 0.686 -1.305577 .859211 Oregon | -.273123 .4382201 -0.62 0.533 -1.132019 .5857727 Pennsylvania | -.3555153 .4180836 -0.85 0.395 -1.174944 .4639134 Rhode Island | -.6073499 .6234847 -0.97 0.330 -1.829357 .6146577 South Carolina | -.2950072 .5118405 -0.58 0.564 -1.298196 .7081818 South Dakota | -.557638 1.131027 -0.49 0.622 -2.774411 1.659135 Tennessee | .0420345 .4385161 0.10 0.924 -.8174412 .9015101 Texas | -.629307 .4037416 -1.56 0.119 -1.420626 .1620121 Utah | -.4716458 .4403674 -1.07 0.284 -1.33475 .3914586 Vermont | 1.672156 1.152024 1.45 0.147 -.585769 3.930081 Virginia | -.4056221 .4253884 -0.95 0.340 -1.239368 .4281238 Washington | -.2928753 .4100479 -0.71 0.475 -1.096554 .5108039 Wisconsin | .107011 .4836524 0.22 0.825 -.8409303 1.054952 Wyoming | -.3031328 .724674 -0.42 0.676 -1.723468 1.117202 | industry1 | Advertising | -.3235814 .1718969 -1.88 0.060 -.6604931 .0133302 Agriculture and Farming | .8334421 .2669152 3.12 0.002 .310298 1.356586 Apps | .0245531 .1636442 0.15 0.881 -.2961836 .3452898 Artificial Intelligence | .5269334 .1693098 3.11 0.002 .1950923 .8587745 Biotechnology | .9526859 .2025142 4.70 0.000 .5557655 1.349606 Clothing and Apparel | -.0786473 .1914114 -0.41 0.681 -.4538068 .2965122 Commerce and Shopping | -.0502132 .163925 -0.31 0.759 -.3715003 .2710739 Community and Lifestyle | -.3506888 .1795289 -1.95 0.051 -.7025591 .0011815 Consumer Electronics | .5191581 .180977 2.87 0.004 .1644497 .8738664 Consumer Goods | .3713495 .2614782 1.42 0.156 -.1411384 .8838373 Content and Publishing | -.4915518 .1872494 -2.63 0.009 -.8585539 -.1245496 Data and Analytics | .0860204 .1711798 0.50 0.615 -.2494858 .4215265 Design | -.8209494 .2427359 -3.38 0.001 -1.296703 -.3451959 Education | -.5116925 .1779732 -2.88 0.004 -.8605136 -.1628714 Energy | -.0266922 .262059 -0.10 0.919 -.5403183 .4869339 Events | -.5980316 .2268579 -2.64 0.008 -1.042665 -.1533983 Financial Services | -.1580492 .168994 -0.94 0.350 -.4892713 .1731729 Food and Beverage | .4862599 .2271832 2.14 0.032 .040989 .9315308 Gaming | .0474327 .2457226 0.19 0.847 -.4341747 .5290401 Government and Military | -.0926754 .3276943 -0.28 0.777 -.7349444 .5495936 Hardware | .2505003 .1898872 1.32 0.187 -.1216718 .6226725 Health Care | .2897721 .1743523 1.66 0.097 -.0519522 .6314964 Information Technology | -.2716878 .1713481 -1.59 0.113 -.6075239 .0641483 Internet Services | -.3034245 .178332 -1.70 0.089 -.6529487 .0460997 Manufacturing | .2436891 .3270453 0.75 0.456 -.3973079 .8846861 Media and Entertainment | -.5577907 .2147538 -2.60 0.009 -.9787004 -.1368811 Messaging and Telecommunications | -.4438552 1.462689 -0.30 0.762 -3.310673 2.422963 Mobile | -.2658972 .2275387 -1.17 0.243 -.7118648 .1800705 Natural Resources | 0 (empty) Navigation and Mapping | .23026 .903679 0.25 0.799 -1.540918 2.001438 Platforms | 0 (empty) Privacy and Security | -.8250114 .6663974 -1.24 0.216 -2.131126 .4811036 Professional Services | -.8783809 .2525578 -3.48 0.001 -1.373385 -.3833767 Real Estate | -.4079659 .2387648 -1.71 0.088 -.8759362 .0600044 Sales and Marketing | -1.338194 .3006448 -4.45 0.000 -1.927447 -.7489409 Science and Engineering | .0078115 .5217189 0.01 0.988 -1.014739 1.030362 Software | -.6324878 .2042539 -3.10 0.002 -1.032818 -.2321575 Sports | -.5118419 .3867198 -1.32 0.186 -1.269799 .246115 Sustainability | -.727901 .6994042 -1.04 0.298 -2.098708 .6429062 Transportation | -.1287526 .2870052 -0.45 0.654 -.6912724 .4337672 Travel and Tourism | -.9046213 .3156622 -2.87 0.004 -1.523308 -.2859346 | c.go_score_n#c.innovativeness_n | -.0352607 .0484613 -0.73 0.467 -.130243 .0597217 | c.go_score_n#c.proactiveness_n | -.1833083 .1298296 -1.41 0.158 -.4377696 .071153 | c.go_score_n#c.risk_taking_n | .466917 .294718 1.58 0.113 -.1107196 1.044554 | c.ecvo_score_n#c.innovativeness_n | .0005422 .0060391 0.09 0.928 -.0112942 .0123786 | c.ecvo_score_n#c.proactiveness_n | -.0364081 .011813 -3.08 0.002 -.0595611 -.0132551 | c.ecvo_score_n#c.risk_taking_n | -.035593 .0179387 -1.98 0.047 -.0707521 -.0004339 | _cons | -4.327526 23.06105 -0.19 0.851 -49.52635 40.8713 ---------------------------------------------------------------------------------------------------
CONTINUED IN NEXT COMMENT DUE TO CHARACTER LIMIT
Comment