Dear Statalist,
I am currently running a random forest classification algorithm in STATA, and I wish to construct a cost function needed to run a weighted RF (using the crtrees algorithm).
In order to do this, I need to create an NxN size matrix, which starts off as an identity matrix, where the diagonal represents the relative weight of each observation in the dataset (i.e., using the identity matrix in itself means that all outcomes are equally weighed). To give you some context, my dataset consists of 19,000 data points, so it will be a 19,000x19,000 matrix.
I have figured out how to create an identity matrix equal to the number of observations in the dataset using Mata. To provide a concrete example, take the following:
Here is where I am currently stumped, however.
To go from the identity matrix to the matrix that I aim to build (i.e., the cost function), I need to change specific elements of the matrix based on values of a variable in the dataset (dummy variable).
Using the "Automobile" dataset again, this is comparable to changing values of elements in "MyMatrix" for all observations where foreign==1.
For the sake of illustration, assume that I would, for example, wish to attribute the value "2" to all of those observations.
My question is now: Is this possible to perform this operation in this manner, or have I misunderstood how Mata works?
Sincerely
Johan Karlsson
I am currently running a random forest classification algorithm in STATA, and I wish to construct a cost function needed to run a weighted RF (using the crtrees algorithm).
In order to do this, I need to create an NxN size matrix, which starts off as an identity matrix, where the diagonal represents the relative weight of each observation in the dataset (i.e., using the identity matrix in itself means that all outcomes are equally weighed). To give you some context, my dataset consists of 19,000 data points, so it will be a 19,000x19,000 matrix.
I have figured out how to create an identity matrix equal to the number of observations in the dataset using Mata. To provide a concrete example, take the following:
sysuse auto gen N=_N global N=N mata: st_matrix("MyMatrix", I($N) |
Here is where I am currently stumped, however.
To go from the identity matrix to the matrix that I aim to build (i.e., the cost function), I need to change specific elements of the matrix based on values of a variable in the dataset (dummy variable).
Using the "Automobile" dataset again, this is comparable to changing values of elements in "MyMatrix" for all observations where foreign==1.
For the sake of illustration, assume that I would, for example, wish to attribute the value "2" to all of those observations.
My question is now: Is this possible to perform this operation in this manner, or have I misunderstood how Mata works?
Sincerely
Johan Karlsson
Comment