Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • STATA SE 17.0 - Issue with graph options in overlaying plots after an MCA

    Hello everybody,

    I'm currently using data that are caracterized by a large number of dummies. In order to avoid the curse of dimensionality, I want to reduce them using multiple component analysis.
    The MCA in itself runs perfectly, but the ploting of it isn't as clear as I wished it to be, particularly about the overlaying of it, in order to visualize the coordinates of all the dummies simultaneously in the first 2 dimensions.

    I run a MCA of 22 dummies and only get 15 different colors to represent them, which makes it unclear for some points in the scatterplot. I also would like to connect them in order to see if it improves the readability.

    This is a working example of what I'm doing. My own data has more than 80% of the variance expressed through the first 2 dimensions so it seems relevant, even tho using this dataset below is not. It's just for the sake of the example.
    Code:
    clear
    
    *use data from auto16
    sysuse auto16
    
    *Generate dummies based on the modalities of the variable gear_ratio
    tab(gear_ratio), gen(d_gr)
    
    *Create a variable list to use in further MCA
    global  mca_d_gr  d_gr1 d_gr2 d_gr3 d_gr4 d_gr5 d_gr6 d_gr7 d_gr8 d_gr9 d_gr10 d_gr11 d_gr12 d_gr13 d_gr14 d_gr15 d_gr16 d_gr17 d_gr18 d_gr19 d_gr20 d_gr21 d_gr22
    
    
    *Produce the MCA
    mca $mca_d_gr
    From this starting point, I've explored different possibilities.

    This one generate a overlayed plot of all 22 dummies in the same two dimensions. However, the colors are limited to 15, which makes multiple dummies having the same color.
    Code:
    mcaplot, overlay origin legend(symy(10) symx(10) textw(10) cols(8) forces size(tiny) span)  legend(off) scale(0.6) mlabel()
    Trying to add a color list to mcolor matching the numbers of variables doesn't seems to work as it gives the error: "p21(marker(fillcolor(erose) linestyle(color(erose)))): too many arguments"

    Code:
    mcaplot, overlay origin legend(symy(10) symx(10) textw(10) cols(8) forces size(tiny) span)   scale(0.6) mlabel() mcolor(blue red green yellow orange purple pink cyan navy brown gold sienna gray olive lime teal magenta maroon lavender sandb erose mint)
    Trying to connect the dots between them also don't seem to work as they're all connected to the origin in the graph, and not by paired values for each dummy:

    Code:
    mcaplot, overlay origin legend(symy(10) symx(10) textw(10) cols(8) forces size(tiny) span)   scale(0.6) mlabel() connect(l)
    However, it does work in the case of not overlayed plots:
    Code:
    mcaplot, overlay origin legend(symy(10) symx(10) textw(10) cols(8) forces size(tiny) span)   scale(0.6) mlabel() connect(l)
    Without having figured out how to modify the plot through the options of mcaplot, I decided to try to redo manually what mcaplot does, allowing me more freedom in the options.

    I re-run the MCA, using the normalization used by default in mcaplot overlay, and extract the coordinates of the values 0 and 1 for each of my dummy in the first 2 dimensions :
    Code:
    sysuse auto16
    
    *Generate dummies based on the modalities of the variable gear_ratio
    tab(gear_ratio), gen(d_gr)
    
    *Create a variable list to use in further MCA
    global  mca_d_gr  d_gr1 d_gr2 d_gr3 d_gr4 d_gr5 d_gr6 d_gr7 d_gr8 d_gr9 d_gr10 d_gr11 d_gr12 d_gr13 d_gr14 d_gr15 d_gr16 d_gr17 d_gr18 d_gr19 d_gr20 d_gr21 d_gr22
    
    
    *Produce the MCA, with normalization
    mca $mca_d_gr, normalize(standard) compact
    
    *Print the normalized coordinates of the values 0 and 1 for each dummy
    mat li e(cGS)
    
    *generate a list of colors to use and an incremental counter
    local colors blue red green yellow orange purple pink cyan navy brown gold sienna gray olive lime teal magenta maroon lavender sandb erose mint
    local i 1
    I then loop over the dummies in order to obtain individual plot that look as desired:
    Code:
    *Loop over each dummy used in the mca
    foreach var of varlist $mca_d_gr {
        *Extract normalized coordinates for values 0 and 1 for the current dummy variable
        local `var'_coord0_dim1 = e(cGS)["`var':0", "dim1:Coord"]
        local `var'_coord0_dim2 = e(cGS)["`var':0", "dim2:Coord"]
        local `var'_coord1_dim1 = e(cGS)["`var':1", "dim1:Coord"]
        local `var'_coord1_dim2 = e(cGS)["`var':1", "dim2:Coord"]
    
        *Prepare data for scatter plot
        preserve
        clear
        set obs 2
        gen `var'_dim1 = .
        gen `var'_dim2 = .
        replace `var'_dim1 = ``var'_coord0_dim1' in 1
        replace `var'_dim2 = ``var'_coord0_dim2' in 1
        replace `var'_dim1 = ``var'_coord1_dim1' in 2
        replace `var'_dim2 = ``var'_coord1_dim2' in 2
        label define val 0 "0" 1 "1"
        gen value = _n - 1
        label values value val
    
        *Plot the coordinates for values 0 and 1 of the current dummy variable and connect the points
        twoway (scatter `var'_dim2 `var'_dim1, ///
                    mlabel(value) ///
                    mcolor("`: word `i' of `colors''")) ///
               (line `var'_dim2 `var'_dim1, ///
                    lcolor("`: word `i' of `colors''")) ///
               , xline(0, lcolor(gs10)) yline(0, lcolor(gs10)) ///
               title("Coordinates for `var' (0 and 1)") ///
               legend(order(1 "Value 0" 2 "Value 1")) ///
               ytitle("Dimension 2") xtitle("Dimension 1") ///
               name(`var', replace) ||
            
        restore
        local i = `i' + 1
    }
    However, with this method I can't produce a twoway scatterplot of all scatter plots overlayed one over the other, as the data is only stored in the loop.
    When the twoway scatter plot is outside the loop, it can't call back the variables and their coordinates.
    When it's inside the loop, I'm assuming there is a solution using addplot, but it's not clear to me how to achieve it.

    I feel like I'm missing an obivous and fundamental solution to this, but I can't manage to find it. I saw quite a lot of posts with issues somewhat similar to this one (https://www.statalist.org/forums/for...oop-line-graph, https://www.statalist.org/forums/for...kdensity-plots, etc. but I didn't find a solution applicable to my case.
    Any help would be appreciated !

    Thank you in advance and sorry for the long post.

    Best regards,
    François
Working...
X