No observations for different size groups

Marie Luna

Join Date: Nov 2024

Posts: 3
#1

No observations for different size groups

02 Dec 2024, 12:47

Hi all,

I have a problem regarding the following STATA Code:

************************************************** ******************************
* TABELLE 3
************************************************** ******************************

* 0 Allgemeine Einstellungen

set more off
clear all

cd "XXX" // Dateipfad des Arbeitsverzeichnisses

* 1 Daten einlesen und aufbereiten

use "combined.dta", clear

************************************************** ******************************
* 2 Berechnungen der Variablen
************************************************** ******************************

gen prccd_mil = prccd / 1000000 // lineare Transformation der prccd Variable

gen debt_maturity_1 = (dltt - dd1) / (dltt + dlc) // Anteil der Schulden, die in mehr als 1 Jahr fällig werden
gen size = csho * prccd
gen market_to_book = (at + csho * prccd - ceq) / at
gen leverage = (dltt + dlc) / at
gen rd = xrd / at
gen capex = capx / at
gen ppe = ppent / at
gen roa = ebitda / at
gen cash = che / at
gen ipo_year = year(ipodate)
gen age = fyear - ipo_year
gen taxes = txt / pi

************************************************** ******************************
* 3. Gruppierung der Daten
************************************************** ******************************

* Berechnung der 20. und 50. Perzentile nach Jahr
egen size_20th = pctile(size), p(20) by(year)
egen size_50th = pctile(size), p(50) by(year)

* Gruppierung der Firmen nach Größe
gen size_group = ""
replace size_group = "small" if size <= size_20th
replace size_group = "medium" if size > size_20th & size <= size_50th
replace size_group = "large" if size > size_50th

* Gruppierung für andere Variablen
foreach var in market_to_book leverage ppe roa cash age {

* Median (50. Perzentil) der jeweiligen Variable berechnen
egen `var'_median = pctile(`var'), p(50) by(year)

* Gruppierung basierend auf dem Median
gen `var'_group = ""
replace `var'_group = "low" if `var' < `var'_median
replace `var'_group = "high" if `var' >= `var'_median

}

* Berechnung des 75. Perzentils für R&D pro Jahr
egen rd_75th = pctile(rd), p(75) by(year)

* Gruppierung basierend auf dem 75. Perzentil
gen rd_group = ""
replace rd_group = "low" if rd < rd_75th
replace rd_group = "high" if rd >= rd_75th

************************************************** ******************************
* 4. Berechnung der Zeitspannen
************************************************** ******************************

gen year_group = floor(((year - 1990) / 5) * 5) + 1990
label var year_group "4-Year Period Start"

************************************************** ******************************
* 5. Berechnung und Ausgabe der Ergebnisse
************************************************** ******************************

************************************************** ******************************
* Umwandlung der size_group in numerische Werte
************************************************** ******************************

gen size_group_num = .
replace size_group_num = 1 if size_group == "small"
replace size_group_num = 2 if size_group == "medium"
replace size_group_num = 3 if size_group == "large"

label define size_groups 1 "small" 2 "medium" 3 "large"
label values size_group_num size_groups

************************************************** ******************************
* Schleife über 4-Jahresperioden und separate Verarbeitung der Größenkategorien
************************************************** ******************************

local periods_start 1990
local periods_end 2024

* Ergebnisse speichern
matrix results = J(3, 2, .) // 3 Gruppen x 2 Werte (Trend x100 und P-Wert)

forvalues start = `periods_start'(4)`periods_end' {
local end = `start' + 3

* Daten auf die jeweilige Zeitspanne filtern
preserve
keep if year >= `start' & year <= `end'

di "Period: `start' - `end'"

* Verarbeitung für jede Größenkategorie
forvalues size_group = 1/3 {
di " Size Group: " `size_group'

* Daten für die aktuelle Gruppe filtern
keep if size_group_num == `size_group'

if _N == 0 {
di " Keine Daten für Size Group " `size_group'
continue
}

* Berechnung des Medians
quietly summarize debt_maturity_1, detail
local median_dm = r(p50)
di " Median Debt Maturity (Group " `size_group' "): " %6.3f `median_dm'

* Trendberechnung
reg debt_maturity_1 year

* Extrahiere den Koeffizienten und berechne den P-Wert
local trend = _b[year] * 100 // Koeffizientenwert extrahieren
local pval = 2 * ttail(e(df_r), abs(_b[year] / _se[year])) // P-Wert berechnen

* Ausgabe der berechneten Werte
di " Trend x100 (Group " `size_group' "): " %6.3f `trend'
di " P-Wert (Group " `size_group' "): " %6.3f `pval'

* Speicherung der Ergebnisse in der Matrix
matrix results[`size_group', 1] = `trend'
matrix results[`size_group', 2] = `pval'
}

restore
}

************************************************** ******************************
* Ergebnisse anzeigen
************************************************** ******************************

di "Ergebnisse der Trends und P-Werte nach Größenkategorien:"
matrix list results

For my size groups medium and large there are no observations. Any ideas how I can solve this?
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29792
#2

02 Dec 2024, 13:22

Code:

forvalues size_group = 1/3 { di " Size Group: " `size_group' * Daten für die aktuelle Gruppe filtern keep if size_group_num == `size_group' if _N == 0 { di " Keine Daten für Size Group " `size_group' continue } * Berechnung des Medians quietly summarize debt_maturity_1, detail local median_dm = r(p50) di " Median Debt Maturity (Group " `size_group' "): " %6.3f `median_dm' * Trendberechnung reg debt_maturity_1 year * Extrahiere den Koeffizienten und berechne den P-Wert local trend = _b[year] * 100 // Koeffizientenwert extrahieren local pval = 2 * ttail(e(df_r), abs(_b[year] / _se[year])) // P-Wert berechnen * Ausgabe der berechneten Werte di " Trend x100 (Group " `size_group' "): " %6.3f `trend' di " P-Wert (Group " `size_group' "): " %6.3f `pval' * Speicherung der Ergebnisse in der Matrix matrix results[`size_group', 1] = `trend' matrix results[`size_group', 2] = `pval' }

will, on its first iteration, execute the command -keep if size_group_num == `size_group'-, which expands to -keep if size_group_num == 1-. At that point all of the data from size-groups 2 and 3 have been discarded--lost and gone for ever. So there is nothing left to generate results for the medium and large groups.

There are a few ways to fix this
Get rid of the -keep if size_group_num == `size_group' command and apply -if size_group_num == `size_group' to the appropriate commands inside the loop, or

get rid of the loop, wrapping its code in a program, and use the -runby- command to iterate the program over the size_group_num-defined subsets of the data. (-runby is written by Robert Picard and me, and is available from SSC.), or

save the data in a tempfile just before entering the -forvalues- loop, and then -use- the tempfile at the bottom of the loop to bring back the data for later iterations, or

Added: Of these fixes, I think 1 is the simplest and best solution here because there are only two commands, -regress- and -summarize-, that require modification (plus the removal of the -keep- command), and unless your data set is really huge, it will run efficiently enough. If that turns out to run unacceptably slowly, 2 will speed it up considerably. I would only use 3 for a situation where the data set is pretty small, so the disk operations are not very time-consuming, but the computations in the loop are very intensive and sensitive to the size of the data set--which does not appear to be the case here.

Last edited by Clyde Schechter; 02 Dec 2024, 13:30.
2 likes
Comment
Marie Luna

Join Date: Nov 2024

Posts: 3
#3

02 Dec 2024, 15:51

Originally posted by Clyde Schechter View Post

[code]

There are a few ways to fix this
Get rid of the -keep if size_group_num == `size_group' command and apply -if size_group_num == `size_group' to the appropriate commands inside the loop, or

get rid of the loop, wrapping its code in a program, and use the -runby- command to iterate the program over the size_group_num-defined subsets of the data. (-runby is written by Robert Picard and me, and is available from SSC.), or

save the data in a tempfile just before entering the -forvalues- loop, and then -use- the tempfile at the bottom of the loop to bring back the data for later iterations, or

Added: Of these fixes, I think 1 is the simplest and best solution here because there are only two commands, -regress- and -summarize-, that require modification (plus the removal of the -keep- command), and unless your data set is really huge, it will run efficiently enough. If that turns out to run unacceptably slowly, 2 will speed it up considerably. I would only use 3 for a situation where the data set is pretty small, so the disk operations are not very time-consuming, but the computations in the loop are very intensive and sensitive to the size of the data set--which does not appear to be the case here.

1 fixed my problem, thank you so much!
Comment

Announcement

No observations for different size groups

Comment

Comment