Implements various data trasformation methods with optimal scaling for ordinal or nominal data, and to help relax the assumption of normality (gaussianity) for continuous data.
transformData(x, method = "npn", ...)
A matrix or data.frame (n x p). Rows correspond to subjects, and columns to graph nodes.
Trasform data method. It can be one of the following:
"npn" (default), performs nonparanormal(npn) or semiparametric Gaussian copula model (Liu et al, 2009), estimating the Gaussian copula by marginally transforming the variables using smooth ECDF functions. The npn distribution corresponds to the latent underlying multivariate normal distribution, preserving the conditional independence structure of the original variables.
"spearman", computes a trigonometric trasformation of Spearman rho correlation for estimation of latent Gaussian correlations parameter of a nonparanormal distribution (Harris & Dorton (2013), and generates the data matrix with the exact same sample covariance matrix as the estimated one.
"kendall", computes a trigonometric trasformation of Kendall tau correlation for estimation of latent Gaussian correlations parameter of a nonparanormal distribution (Harris & Dorton (2013), and generates the data matrix with the exact same sample covariance matrix as the estimated one.
"polichoric", computes the polychoric correlation matrix and generates the data matrix with the exact same sample covariance matrix as the estimated one. The polychoric correlation (Olsson, 1974) is a measure of association between two ordinal variables. It is based on the assumption that two latent bivariate normally distributed random variables generate couples of ordinal scores. Tetrachoric (two binary variables) and biserial (an ordinal and a numeric variables) correlations are special cases.
"lineals", performs optimal scaling in order to achieve linearizing transformations for each bivariate regression between pairwise variables for subsequent structural equation models using the resulting correlation matrix computed on the transformed data (de Leeuw, 1988).
"mca", performs optimal scaling of categorical data by Multiple Correspondence Analysis (MCA, a.k.a homogeneity analysis) maximizing the first eigenvalues of the trasformed correlation matrix. The estimates of the corresponding structural parameters are consistent if the underlying latent space of the observed variables is unidimensional.
Currently ignored.
A list of 2 objects is returned:
"data", the matrix (n x p) of n observations and p transformed variables or the matrix (n x p) of simulate observations based on the selected correlation matrix.
"catscores", the category weights for "lineals" or "mca" methods or NULL otherwise.
Nonparanormal trasformation is computationally very efficient
and only requires one ECDF pass of the data matrix. Polychoric correlation
matrix is computed with the lavCor()
function of the lavaan
package. Optimal scaling (lineals and mca) is performed with the
lineals()
and corAspect()
functions of the aspect
package (Mair and De Leeuw, 2008). To note, SEM fitting of the generate data
(fake data) must be done with a covariance-based method and bootstrap SE,
i.e., with SEMrun(..., algo="ricf", n_rep=1000)
.
Liu H, Lafferty J, and Wasserman L (2009). The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs. Journal of Machine Learning Research 10(80): 2295-2328
Harris N, and Drton M (2013). PC Algorithm for Nonparanormal Graphical Models. Journal of Machine Learning Research 14 (69): 3365-3383
Olsson U (1979). Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika, 44(4), 443-460.
Mair P, and De Leeuw J (2008). Scaling variables by optimizing correlational and non-correlational aspects in R. Journal of Statistical Software, 32(9), 1-23.
de Leeuw J (1988). Multivariate analysis with linearizable regressions. Psychometrika, 53, 437-454.
#... with continuous ALS data
graph<- alsData$graph
data<- alsData$exprs; dim(data)
#> [1] 160 318
X<- data[, colnames(data) %in% V(graph)$name]; dim(X)
#> [1] 160 31
npn.data<- transformData(X, method="npn")
#> Conducting the nonparanormal transformation via shrunkun ECDF...done.
sem0.npn<- SEMrun(graph, npn.data$data)
#> NLMINB solver ended normally after 1 iterations
#>
#> deviance/df: 10.92504 srmr: 0.2858859
#>
mvnS.data<- transformData(X, method="spearman")
#> Simulating gaussian data via Spearman correlations...done.
sem0.mvnS<- SEMrun(graph, mvnS.data$data)
#> NLMINB solver ended normally after 1 iterations
#>
#> deviance/df: 11.71544 srmr: 0.2881886
#>
mvnK.data<- transformData(X, method="kendall")
#> Simulating gaussian data via Kendall correlations...done.
sem0.mvnK<- SEMrun(graph, mvnK.data$data)
#> NLMINB solver ended normally after 6 iterations
#>
#> deviance/df: 14.10983 srmr: 0.395726
#>
#...with ordinal (K=4 categories) ALS data
Xord <- data.frame(X)
Xord <- as.data.frame(lapply(Xord, cut, 4, labels = FALSE))
colnames(Xord) <- sub("X", "", colnames(Xord))
# \dontrun{
mvnP.data<- transformData(Xord, method="polychoric")
#> Simulating gaussian data via polychoric correlations...done.
sem0.mvnP<- SEMrun(graph, mvnP.data$data, algo="ricf", n_rep=1000)
#> RICF solver ended normally after 2 iterations
#>
#> deviance/df: 23.32914 srmr: 0.3188226
#>
#> Model randomization with B = 1000 bootstrap samples ...
# }
lin.data<- transformData(Xord, method="lineals")
#> Conducting the optimal (ordinal) linearizing transformation...
#> Warning: the standard deviation is zero
#> done.
sem0.lin<- SEMrun(graph, lin.data$data)
#> Estimating optimal shrinkage intensity lambda (correlation matrix): 0.1642
#>
#> NLMINB solver ended normally after 1 iterations
#>
#> deviance/df: 34.40295 srmr: 0.2453185
#>
lin.data$catscores; head(lin.data$data)
#> $`317`
#> score
#> 1 -0.13521162
#> 2 -0.07583129
#> 3 0.07355762
#> 4 0.07355762
#>
#> $`572`
#> score
#> 1 -0.191746085
#> 2 0.003537308
#> 3 0.003537308
#> 4 0.123269847
#>
#> $`581`
#> score
#> 1 -0.1216547441
#> 2 0.0007229829
#> 3 0.1071747055
#> 4 0.1071747055
#>
#> $`596`
#> score
#> 1 -0.18630619
#> 2 -0.03480288
#> 3 0.08244315
#> 4 0.08244315
#>
#> $`598`
#> score
#> 1 -0.09628079
#> 2 0.01069357
#> 3 0.14889455
#> 4 0.14889455
#>
#> $`836`
#> score
#> 1 -0.18169318
#> 2 -0.04286281
#> 3 0.07004522
#> 4 0.15463583
#>
#> $`842`
#> score
#> 1 -0.17052695
#> 2 -0.11277710
#> 3 0.05186556
#> 4 0.05186556
#>
#> $`54205`
#> score
#> 1 -0.022997786
#> 2 -0.022997786
#> 3 0.005749447
#> 4 0.005749447
#>
#> $`1616`
#> score
#> 1 -0.15626481
#> 2 0.01940459
#> 3 0.06599235
#> 4 0.11576736
#>
#> $`79139`
#> score
#> 1 -6.505213e-18
#> 2 -6.505213e-18
#> 3 -6.505213e-18
#> 4 -6.505213e-18
#>
#> $`5606`
#> score
#> 1 -0.050817752
#> 2 0.006721351
#> 3 0.006721351
#> 4 0.091466747
#>
#> $`5608`
#> score
#> 1 2.602085e-18
#> 2 2.602085e-18
#> 3 2.602085e-18
#> 4 2.602085e-18
#>
#> $`4217`
#> score
#> 1 -0.15520107
#> 2 -0.02573987
#> 3 0.09240626
#> 4 0.10384974
#>
#> $`5600`
#> score
#> 1 -0.0575267058
#> 2 0.0007281861
#> 3 0.0007281861
#> 4 0.0007281861
#>
#> $`6300`
#> score
#> 1 -0.01692033
#> 2 -0.01692033
#> 3 0.04068079
#> 4 0.04068079
#>
#> $`5603`
#> score
#> 1 -1.105886e-17
#> 2 -1.105886e-17
#> 3 -1.105886e-17
#> 4 -1.105886e-17
#>
#> $`1432`
#> score
#> 1 8.182338e-18
#> 2 8.182338e-18
#> 3 8.182338e-18
#> 4 8.182338e-18
#>
#> $`4744`
#> score
#> 1 -0.0059010283
#> 2 -0.0059010283
#> 3 0.0003517169
#> 4 0.0003517169
#>
#> $`4747`
#> score
#> 1 -0.047142139
#> 2 -0.047142139
#> 3 0.001836707
#> 4 0.001836707
#>
#> $`4741`
#> score
#> 1 -0.0223512546
#> 2 -0.0223512546
#> 3 0.0008708281
#> 4 0.0008708281
#>
#> $`5530`
#> score
#> 1 -0.043859676
#> 2 -0.043859676
#> 3 0.002614153
#> 4 0.002614153
#>
#> $`5532`
#> score
#> 1 -0.091071226
#> 2 -0.091071226
#> 3 0.009421161
#> 4 0.009421161
#>
#> $`5533`
#> score
#> 1 -0.276543764
#> 2 -0.007196858
#> 3 0.033276472
#> 4 0.033276472
#>
#> $`5534`
#> score
#> 1 -0.038447032
#> 2 -0.038447032
#> 3 0.002291545
#> 4 0.002291545
#>
#> $`5535`
#> score
#> 1 -6.288373e-18
#> 2 -6.288373e-18
#> 3 -6.288373e-18
#> 4 -6.288373e-18
#>
#> $`5630`
#> score
#> 1 -0.12292007
#> 2 0.04662486
#> 3 0.04662486
#> 4 0.04662486
#>
#> $`6647`
#> score
#> 1 -0.08251265
#> 2 -0.08251265
#> 3 0.01750268
#> 4 0.01750268
#>
#> $`7132`
#> score
#> 1 -0.150056849
#> 2 0.009326866
#> 3 0.053775689
#> 4 0.053775689
#>
#> $`7133`
#> score
#> 1 -0.005233238
#> 2 -0.005233238
#> 3 -0.005233238
#> 4 0.050587967
#>
#> $`10452`
#> score
#> 1 -1.734723e-18
#> 2 -1.734723e-18
#> 3 -1.734723e-18
#> 4 -1.734723e-18
#>
#> $`84134`
#> score
#> 1 -3.676123e-18
#> 2 -3.676123e-18
#> 3 -3.676123e-18
#> 4 -3.676123e-18
#>
#> 317 572 581 596 598 836
#> 1 -0.9275263 -0.04460376 1.534007954 -1.0395686 -0.1348408 0.5404794
#> 2 -0.9275263 2.41782613 -0.009116467 -1.0395686 1.2140546 -0.8832366
#> 3 -0.9275263 -0.04460376 -0.009116467 -1.0395686 -0.1348408 0.5404794
#> 4 -0.9275263 -0.04460376 1.534007954 -1.0395686 -0.1348408 -0.8832366
#> 5 -0.9275263 -0.04460376 1.534007954 -1.0395686 -0.1348408 0.5404794
#> 6 -0.9275263 2.41782613 -0.009116467 0.4388476 1.2140546 -0.8832366
#> 842 54205 1616 5606 4217 5600
#> 1 1.4220651 -0.07249776 -0.2446826 -0.08475302 0.3245673 -0.009182078
#> 2 -0.6539998 -0.07249776 -0.2446826 0.64078747 0.3245673 -0.009182078
#> 3 -0.6539998 0.28999105 -0.2446826 -0.08475302 -1.1651986 -0.009182078
#> 4 -0.6539998 -0.07249776 -0.2446826 -0.08475302 -1.1651986 -0.009182078
#> 5 -0.6539998 -0.07249776 -0.2446826 -0.08475302 -1.1651986 -0.009182078
#> 6 1.4220651 -0.07249776 1.9704243 -0.08475302 -1.1651986 -0.009182078
#> 6300 4744 4747 4741 5530 5532
#> 1 0.2133572 -0.004434982 -0.02315999 -0.01098072 -0.03296321 -0.1187963
#> 2 0.2133572 -0.004434982 -0.02315999 -0.01098072 -0.03296321 -0.1187963
#> 3 -0.5129652 -0.004434982 -0.02315999 -0.01098072 -0.03296321 -0.1187963
#> 4 0.2133572 -0.004434982 -0.02315999 -0.01098072 -0.03296321 -0.1187963
#> 5 0.2133572 -0.004434982 -0.02315999 -0.01098072 -0.03296321 -0.1187963
#> 6 0.2133572 -0.004434982 -0.02315999 -0.01098072 -0.03296321 -0.1187963
#> 5533 5534 5630 6647 7132 7133 79139
#> 1 -0.41960034 -0.02889528 1.5499632 -0.2207004 -0.1176073 0.06598862 3
#> 2 0.09074892 -0.02889528 -0.5879171 -0.2207004 1.8921449 0.06598862 4
#> 3 0.09074892 -0.02889528 -0.5879171 1.0404449 -0.6780856 0.06598862 2
#> 4 -0.41960034 -0.02889528 1.5499632 -0.2207004 -0.6780856 0.06598862 4
#> 5 -0.41960034 -0.02889528 -0.5879171 -0.2207004 -0.6780856 0.06598862 3
#> 6 0.09074892 -0.02889528 -0.5879171 -0.2207004 -0.6780856 0.06598862 3
#> 5608 5603 1432 5535 10452 84134
#> 1 4 4 3 3 2 3
#> 2 4 3 3 4 2 2
#> 3 3 4 4 4 2 3
#> 4 4 4 4 4 1 3
#> 5 3 3 4 3 1 3
#> 6 4 4 3 3 2 3
#...with nominal (K=4 categories) ALS data
mca.data<- transformData(Xord, method="mca")
#> Conducting the first solution of Multiple Correspondence Analysis...done.
sem0.mca<- SEMrun(graph, mca.data$data)
#> NLMINB solver ended normally after 6 iterations
#>
#> deviance/df: 9.120595 srmr: 0.3147939
#>
mca.data$catscores
#> $`317`
#> score
#> 1 -0.1962657
#> 2 -0.2617196
#> 3 -0.1722545
#> 4 4.6708724
#>
#> $`572`
#> score
#> 1 -1.56811059
#> 2 -0.01214174
#> 3 -0.39796571
#> 4 2.17855892
#>
#> $`581`
#> score
#> 1 -0.5830986
#> 2 -0.3508265
#> 3 0.6334362
#> 4 3.9618753
#>
#> $`596`
#> score
#> 1 -0.37120214
#> 2 -0.32350875
#> 3 0.03534731
#> 4 4.98964541
#>
#> $`598`
#> score
#> 1 -0.4275410
#> 2 -0.3000455
#> 3 1.6425919
#> 4 9.0188630
#>
#> $`836`
#> score
#> 1 -0.8246498
#> 2 0.9622589
#> 3 -1.0752097
#> 4 0.9484153
#>
#> $`842`
#> score
#> 1 -0.37629484
#> 2 -0.87079806
#> 3 -0.05751661
#> 4 2.71162665
#>
#> $`54205`
#> score
#> 1 -4.2248124
#> 2 -0.3278173
#> 3 0.2640036
#> 4 0.4608450
#>
#> $`1616`
#> score
#> 1 -0.2933227
#> 2 -0.5602894
#> 3 1.0761805
#> 4 3.6211774
#>
#> $`79139`
#> score
#> 1 -3.8618504
#> 2 0.3795132
#> 3 0.1963661
#> 4 0.2805683
#>
#> $`5606`
#> score
#> 1 -0.7623373
#> 2 -0.4419482
#> 3 1.0495265
#> 4 3.5296781
#>
#> $`5608`
#> score
#> 1 -6.55312157
#> 2 -1.37203915
#> 3 0.06807267
#> 4 0.97343682
#>
#> $`4217`
#> score
#> 1 -2.47120556
#> 2 0.09442145
#> 3 0.50146382
#> 4 2.10891397
#>
#> $`5600`
#> score
#> 1 -7.3048616
#> 2 -1.9849719
#> 3 0.3058562
#> 4 0.1420784
#>
#> $`6300`
#> score
#> 1 1.5586231
#> 2 -0.7581470
#> 3 -0.1197511
#> 4 2.3544993
#>
#> $`5603`
#> score
#> 1 -5.3934562
#> 2 -2.6648240
#> 3 0.2065484
#> 4 0.2501853
#>
#> $`1432`
#> score
#> 1 -4.8281167
#> 2 -0.5646018
#> 3 0.5355894
#> 4 0.4203385
#>
#> $`4744`
#> score
#> 1 -5.9814189
#> 2 -2.7263541
#> 3 0.1956457
#> 4 0.2854548
#>
#> $`4747`
#> score
#> 1 -5.2331098
#> 2 -4.5697151
#> 3 -0.5415581
#> 4 0.2758942
#>
#> $`4741`
#> score
#> 1 -5.31340342
#> 2 -4.63982140
#> 3 -0.06801511
#> 4 0.31611824
#>
#> $`5530`
#> score
#> 1 -4.5208030
#> 2 -3.2836035
#> 3 -0.1797704
#> 4 0.3356702
#>
#> $`5532`
#> score
#> 1 -4.75775317
#> 2 -1.77981428
#> 3 0.05365221
#> 4 0.38187879
#>
#> $`5533`
#> score
#> 1 1.1082106
#> 2 -0.4455077
#> 3 0.9706363
#> 4 -1.5709647
#>
#> $`5534`
#> score
#> 1 -5.22236671
#> 2 -3.28835298
#> 3 -0.04944201
#> 4 0.35114642
#>
#> $`5535`
#> score
#> 1 -5.0434670
#> 2 -1.9436026
#> 3 0.4107327
#> 4 0.3970010
#>
#> $`5630`
#> score
#> 1 -0.5501326
#> 2 -0.1669097
#> 3 1.1752097
#> 4 6.3028266
#>
#> $`6647`
#> score
#> 1 -5.0962196
#> 2 -0.7961947
#> 3 0.3901020
#> 4 0.1976048
#>
#> $`7132`
#> score
#> 1 -0.9298468
#> 2 -0.6719772
#> 3 0.4159697
#> 4 2.3462125
#>
#> $`7133`
#> score
#> 1 -0.7326115
#> 2 -0.6018502
#> 3 0.1549589
#> 4 2.8883909
#>
#> $`10452`
#> score
#> 1 -3.5037678
#> 2 -0.3863463
#> 3 0.5991077
#> 4 0.1937679
#>
#> $`84134`
#> score
#> 1 -5.3007012
#> 2 -0.5510670
#> 3 0.3288807
#> 4 0.2508747
#>
# plot colored graphs
#par(mfrow=c(3,2), mar=rep(1,4))
#gplot(sem0.npn$graph, l="fdp", main="ALS npm")
#gplot(sem0.mvnS$graph, l="fdp", main="ALS mvnS")
#gplot(sem0.mvnK$graph, l="fdp", main="ALS mvnK")
#gplot(sem0.mvnP$graph, l="fdp", main="ALS mvnP")
#gplot(sem0.lin$graph, l="fdp", main="ALS lin")
#gplot(sem0.mca$graph, l="fdp", main="ALS mca")