Implements various data trasformation methods with optimal scaling for ordinal or nominal data, and to help relax the assumption of normality (gaussianity) for continuous data.

transformData(x, method = "npn", ...)

Arguments

x

A matrix or data.frame (n x p). Rows correspond to subjects, and columns to graph nodes.

method

Trasform data method. It can be one of the following:

  1. "npn" (default), performs nonparanormal(npn) or semiparametric Gaussian copula model (Liu et al, 2009), estimating the Gaussian copula by marginally transforming the variables using smooth ECDF functions. The npn distribution corresponds to the latent underlying multivariate normal distribution, preserving the conditional independence structure of the original variables.

  2. "spearman", computes a trigonometric trasformation of Spearman rho correlation for estimation of latent Gaussian correlations parameter of a nonparanormal distribution (Harris & Dorton (2013), and generates the data matrix with the exact same sample covariance matrix as the estimated one.

  3. "kendall", computes a trigonometric trasformation of Kendall tau correlation for estimation of latent Gaussian correlations parameter of a nonparanormal distribution (Harris & Dorton (2013), and generates the data matrix with the exact same sample covariance matrix as the estimated one.

  4. "polichoric", computes the polychoric correlation matrix and generates the data matrix with the exact same sample covariance matrix as the estimated one. The polychoric correlation (Olsson, 1974) is a measure of association between two ordinal variables. It is based on the assumption that two latent bivariate normally distributed random variables generate couples of ordinal scores. Tetrachoric (two binary variables) and biserial (an ordinal and a numeric variables) correlations are special cases.

  5. "lineals", performs optimal scaling in order to achieve linearizing transformations for each bivariate regression between pairwise variables for subsequent structural equation models using the resulting correlation matrix computed on the transformed data (de Leeuw, 1988).

  6. "mca", performs optimal scaling of categorical data by Multiple Correspondence Analysis (MCA, a.k.a homogeneity analysis) maximizing the first eigenvalues of the trasformed correlation matrix. The estimates of the corresponding structural parameters are consistent if the underlying latent space of the observed variables is unidimensional.

...

Currently ignored.

Value

A list of 2 objects is returned:

  1. "data", the matrix (n x p) of n observations and p transformed variables or the matrix (n x p) of simulate observations based on the selected correlation matrix.

  2. "catscores", the category weights for "lineals" or "mca" methods or NULL otherwise.

Details

Nonparanormal trasformation is computationally very efficient and only requires one ECDF pass of the data matrix. Polychoric correlation matrix is computed with the lavCor() function of the lavaan package. Optimal scaling (lineals and mca) is performed with the lineals() and corAspect() functions of the aspect package (Mair and De Leeuw, 2008). To note, SEM fitting of the generate data (fake data) must be done with a covariance-based method and bootstrap SE, i.e., with SEMrun(..., algo="ricf", n_rep=1000).

References

Liu H, Lafferty J, and Wasserman L (2009). The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs. Journal of Machine Learning Research 10(80): 2295-2328

Harris N, and Drton M (2013). PC Algorithm for Nonparanormal Graphical Models. Journal of Machine Learning Research 14 (69): 3365-3383

Olsson U (1979). Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika, 44(4), 443-460.

Mair P, and De Leeuw J (2008). Scaling variables by optimizing correlational and non-correlational aspects in R. Journal of Statistical Software, 32(9), 1-23.

de Leeuw J (1988). Multivariate analysis with linearizable regressions. Psychometrika, 53, 437-454.

Author

Mario Grassi mario.grassi@unipv.it

Examples


#... with continuous ALS data
graph<- alsData$graph
data<- alsData$exprs; dim(data)
#> [1] 160 318
X<- data[, colnames(data) %in% V(graph)$name]; dim(X)
#> [1] 160  31

npn.data<- transformData(X, method="npn")
#> Conducting the nonparanormal transformation via shrunkun ECDF...done.
sem0.npn<- SEMrun(graph, npn.data$data)
#> NLMINB solver ended normally after 1 iterations 
#> 
#> deviance/df: 10.92504  srmr: 0.2858859 
#> 

mvnS.data<- transformData(X, method="spearman")
#> Simulating gaussian data via Spearman correlations...done.
sem0.mvnS<- SEMrun(graph, mvnS.data$data)
#> NLMINB solver ended normally after 1 iterations 
#> 
#> deviance/df: 11.71544  srmr: 0.2881886 
#> 

mvnK.data<- transformData(X, method="kendall")
#> Simulating gaussian data via Kendall correlations...done.
sem0.mvnK<- SEMrun(graph, mvnK.data$data)
#> NLMINB solver ended normally after 6 iterations 
#> 
#> deviance/df: 14.10983  srmr: 0.395726 
#> 

#...with ordinal (K=4 categories) ALS data
Xord <- data.frame(X)
Xord <- as.data.frame(lapply(Xord, cut, 4, labels = FALSE))
colnames(Xord) <- sub("X", "", colnames(Xord))

# \dontrun{

mvnP.data<- transformData(Xord, method="polychoric")
#> Simulating gaussian data via polychoric correlations...done.
sem0.mvnP<- SEMrun(graph, mvnP.data$data, algo="ricf", n_rep=1000)
#> RICF solver ended normally after 2 iterations 
#> 
#> deviance/df: 23.32914  srmr: 0.3188226 
#> 
#> Model randomization with B = 1000 bootstrap samples ...

# }

lin.data<- transformData(Xord, method="lineals")
#> Conducting the optimal (ordinal) linearizing transformation...
#> Warning: the standard deviation is zero
#> done.
sem0.lin<- SEMrun(graph, lin.data$data)
#> Estimating optimal shrinkage intensity lambda (correlation matrix): 0.1642 
#> 
#> NLMINB solver ended normally after 1 iterations 
#> 
#> deviance/df: 34.40295  srmr: 0.2453185 
#> 
lin.data$catscores; head(lin.data$data)
#> $`317`
#>         score
#> 1 -0.13521162
#> 2 -0.07583129
#> 3  0.07355762
#> 4  0.07355762
#> 
#> $`572`
#>          score
#> 1 -0.191746085
#> 2  0.003537308
#> 3  0.003537308
#> 4  0.123269847
#> 
#> $`581`
#>           score
#> 1 -0.1216547441
#> 2  0.0007229829
#> 3  0.1071747055
#> 4  0.1071747055
#> 
#> $`596`
#>         score
#> 1 -0.18630619
#> 2 -0.03480288
#> 3  0.08244315
#> 4  0.08244315
#> 
#> $`598`
#>         score
#> 1 -0.09628079
#> 2  0.01069357
#> 3  0.14889455
#> 4  0.14889455
#> 
#> $`836`
#>         score
#> 1 -0.18169318
#> 2 -0.04286281
#> 3  0.07004522
#> 4  0.15463583
#> 
#> $`842`
#>         score
#> 1 -0.17052695
#> 2 -0.11277710
#> 3  0.05186556
#> 4  0.05186556
#> 
#> $`54205`
#>          score
#> 1 -0.022997786
#> 2 -0.022997786
#> 3  0.005749447
#> 4  0.005749447
#> 
#> $`1616`
#>         score
#> 1 -0.15626481
#> 2  0.01940459
#> 3  0.06599235
#> 4  0.11576736
#> 
#> $`79139`
#>           score
#> 1 -6.505213e-18
#> 2 -6.505213e-18
#> 3 -6.505213e-18
#> 4 -6.505213e-18
#> 
#> $`5606`
#>          score
#> 1 -0.050817752
#> 2  0.006721351
#> 3  0.006721351
#> 4  0.091466747
#> 
#> $`5608`
#>          score
#> 1 2.602085e-18
#> 2 2.602085e-18
#> 3 2.602085e-18
#> 4 2.602085e-18
#> 
#> $`4217`
#>         score
#> 1 -0.15520107
#> 2 -0.02573987
#> 3  0.09240626
#> 4  0.10384974
#> 
#> $`5600`
#>           score
#> 1 -0.0575267058
#> 2  0.0007281861
#> 3  0.0007281861
#> 4  0.0007281861
#> 
#> $`6300`
#>         score
#> 1 -0.01692033
#> 2 -0.01692033
#> 3  0.04068079
#> 4  0.04068079
#> 
#> $`5603`
#>           score
#> 1 -1.105886e-17
#> 2 -1.105886e-17
#> 3 -1.105886e-17
#> 4 -1.105886e-17
#> 
#> $`1432`
#>          score
#> 1 8.182338e-18
#> 2 8.182338e-18
#> 3 8.182338e-18
#> 4 8.182338e-18
#> 
#> $`4744`
#>           score
#> 1 -0.0059010283
#> 2 -0.0059010283
#> 3  0.0003517169
#> 4  0.0003517169
#> 
#> $`4747`
#>          score
#> 1 -0.047142139
#> 2 -0.047142139
#> 3  0.001836707
#> 4  0.001836707
#> 
#> $`4741`
#>           score
#> 1 -0.0223512546
#> 2 -0.0223512546
#> 3  0.0008708281
#> 4  0.0008708281
#> 
#> $`5530`
#>          score
#> 1 -0.043859676
#> 2 -0.043859676
#> 3  0.002614153
#> 4  0.002614153
#> 
#> $`5532`
#>          score
#> 1 -0.091071226
#> 2 -0.091071226
#> 3  0.009421161
#> 4  0.009421161
#> 
#> $`5533`
#>          score
#> 1 -0.276543764
#> 2 -0.007196858
#> 3  0.033276472
#> 4  0.033276472
#> 
#> $`5534`
#>          score
#> 1 -0.038447032
#> 2 -0.038447032
#> 3  0.002291545
#> 4  0.002291545
#> 
#> $`5535`
#>           score
#> 1 -6.288373e-18
#> 2 -6.288373e-18
#> 3 -6.288373e-18
#> 4 -6.288373e-18
#> 
#> $`5630`
#>         score
#> 1 -0.12292007
#> 2  0.04662486
#> 3  0.04662486
#> 4  0.04662486
#> 
#> $`6647`
#>         score
#> 1 -0.08251265
#> 2 -0.08251265
#> 3  0.01750268
#> 4  0.01750268
#> 
#> $`7132`
#>          score
#> 1 -0.150056849
#> 2  0.009326866
#> 3  0.053775689
#> 4  0.053775689
#> 
#> $`7133`
#>          score
#> 1 -0.005233238
#> 2 -0.005233238
#> 3 -0.005233238
#> 4  0.050587967
#> 
#> $`10452`
#>           score
#> 1 -1.734723e-18
#> 2 -1.734723e-18
#> 3 -1.734723e-18
#> 4 -1.734723e-18
#> 
#> $`84134`
#>           score
#> 1 -3.676123e-18
#> 2 -3.676123e-18
#> 3 -3.676123e-18
#> 4 -3.676123e-18
#> 
#>          317         572          581        596        598        836
#> 1 -0.9275263 -0.04460376  1.534007954 -1.0395686 -0.1348408  0.5404794
#> 2 -0.9275263  2.41782613 -0.009116467 -1.0395686  1.2140546 -0.8832366
#> 3 -0.9275263 -0.04460376 -0.009116467 -1.0395686 -0.1348408  0.5404794
#> 4 -0.9275263 -0.04460376  1.534007954 -1.0395686 -0.1348408 -0.8832366
#> 5 -0.9275263 -0.04460376  1.534007954 -1.0395686 -0.1348408  0.5404794
#> 6 -0.9275263  2.41782613 -0.009116467  0.4388476  1.2140546 -0.8832366
#>          842       54205       1616        5606       4217         5600
#> 1  1.4220651 -0.07249776 -0.2446826 -0.08475302  0.3245673 -0.009182078
#> 2 -0.6539998 -0.07249776 -0.2446826  0.64078747  0.3245673 -0.009182078
#> 3 -0.6539998  0.28999105 -0.2446826 -0.08475302 -1.1651986 -0.009182078
#> 4 -0.6539998 -0.07249776 -0.2446826 -0.08475302 -1.1651986 -0.009182078
#> 5 -0.6539998 -0.07249776 -0.2446826 -0.08475302 -1.1651986 -0.009182078
#> 6  1.4220651 -0.07249776  1.9704243 -0.08475302 -1.1651986 -0.009182078
#>         6300         4744        4747        4741        5530       5532
#> 1  0.2133572 -0.004434982 -0.02315999 -0.01098072 -0.03296321 -0.1187963
#> 2  0.2133572 -0.004434982 -0.02315999 -0.01098072 -0.03296321 -0.1187963
#> 3 -0.5129652 -0.004434982 -0.02315999 -0.01098072 -0.03296321 -0.1187963
#> 4  0.2133572 -0.004434982 -0.02315999 -0.01098072 -0.03296321 -0.1187963
#> 5  0.2133572 -0.004434982 -0.02315999 -0.01098072 -0.03296321 -0.1187963
#> 6  0.2133572 -0.004434982 -0.02315999 -0.01098072 -0.03296321 -0.1187963
#>          5533        5534       5630       6647       7132       7133 79139
#> 1 -0.41960034 -0.02889528  1.5499632 -0.2207004 -0.1176073 0.06598862     3
#> 2  0.09074892 -0.02889528 -0.5879171 -0.2207004  1.8921449 0.06598862     4
#> 3  0.09074892 -0.02889528 -0.5879171  1.0404449 -0.6780856 0.06598862     2
#> 4 -0.41960034 -0.02889528  1.5499632 -0.2207004 -0.6780856 0.06598862     4
#> 5 -0.41960034 -0.02889528 -0.5879171 -0.2207004 -0.6780856 0.06598862     3
#> 6  0.09074892 -0.02889528 -0.5879171 -0.2207004 -0.6780856 0.06598862     3
#>   5608 5603 1432 5535 10452 84134
#> 1    4    4    3    3     2     3
#> 2    4    3    3    4     2     2
#> 3    3    4    4    4     2     3
#> 4    4    4    4    4     1     3
#> 5    3    3    4    3     1     3
#> 6    4    4    3    3     2     3

#...with nominal (K=4 categories) ALS data
mca.data<- transformData(Xord, method="mca")
#> Conducting the first solution of Multiple Correspondence Analysis...done.
sem0.mca<- SEMrun(graph, mca.data$data)
#> NLMINB solver ended normally after 6 iterations 
#> 
#> deviance/df: 9.120595  srmr: 0.3147939 
#> 
mca.data$catscores
#> $`317`
#>        score
#> 1 -0.1962657
#> 2 -0.2617196
#> 3 -0.1722545
#> 4  4.6708724
#> 
#> $`572`
#>         score
#> 1 -1.56811059
#> 2 -0.01214174
#> 3 -0.39796571
#> 4  2.17855892
#> 
#> $`581`
#>        score
#> 1 -0.5830986
#> 2 -0.3508265
#> 3  0.6334362
#> 4  3.9618753
#> 
#> $`596`
#>         score
#> 1 -0.37120214
#> 2 -0.32350875
#> 3  0.03534731
#> 4  4.98964541
#> 
#> $`598`
#>        score
#> 1 -0.4275410
#> 2 -0.3000455
#> 3  1.6425919
#> 4  9.0188630
#> 
#> $`836`
#>        score
#> 1 -0.8246498
#> 2  0.9622589
#> 3 -1.0752097
#> 4  0.9484153
#> 
#> $`842`
#>         score
#> 1 -0.37629484
#> 2 -0.87079806
#> 3 -0.05751661
#> 4  2.71162665
#> 
#> $`54205`
#>        score
#> 1 -4.2248124
#> 2 -0.3278173
#> 3  0.2640036
#> 4  0.4608450
#> 
#> $`1616`
#>        score
#> 1 -0.2933227
#> 2 -0.5602894
#> 3  1.0761805
#> 4  3.6211774
#> 
#> $`79139`
#>        score
#> 1 -3.8618504
#> 2  0.3795132
#> 3  0.1963661
#> 4  0.2805683
#> 
#> $`5606`
#>        score
#> 1 -0.7623373
#> 2 -0.4419482
#> 3  1.0495265
#> 4  3.5296781
#> 
#> $`5608`
#>         score
#> 1 -6.55312157
#> 2 -1.37203915
#> 3  0.06807267
#> 4  0.97343682
#> 
#> $`4217`
#>         score
#> 1 -2.47120556
#> 2  0.09442145
#> 3  0.50146382
#> 4  2.10891397
#> 
#> $`5600`
#>        score
#> 1 -7.3048616
#> 2 -1.9849719
#> 3  0.3058562
#> 4  0.1420784
#> 
#> $`6300`
#>        score
#> 1  1.5586231
#> 2 -0.7581470
#> 3 -0.1197511
#> 4  2.3544993
#> 
#> $`5603`
#>        score
#> 1 -5.3934562
#> 2 -2.6648240
#> 3  0.2065484
#> 4  0.2501853
#> 
#> $`1432`
#>        score
#> 1 -4.8281167
#> 2 -0.5646018
#> 3  0.5355894
#> 4  0.4203385
#> 
#> $`4744`
#>        score
#> 1 -5.9814189
#> 2 -2.7263541
#> 3  0.1956457
#> 4  0.2854548
#> 
#> $`4747`
#>        score
#> 1 -5.2331098
#> 2 -4.5697151
#> 3 -0.5415581
#> 4  0.2758942
#> 
#> $`4741`
#>         score
#> 1 -5.31340342
#> 2 -4.63982140
#> 3 -0.06801511
#> 4  0.31611824
#> 
#> $`5530`
#>        score
#> 1 -4.5208030
#> 2 -3.2836035
#> 3 -0.1797704
#> 4  0.3356702
#> 
#> $`5532`
#>         score
#> 1 -4.75775317
#> 2 -1.77981428
#> 3  0.05365221
#> 4  0.38187879
#> 
#> $`5533`
#>        score
#> 1  1.1082106
#> 2 -0.4455077
#> 3  0.9706363
#> 4 -1.5709647
#> 
#> $`5534`
#>         score
#> 1 -5.22236671
#> 2 -3.28835298
#> 3 -0.04944201
#> 4  0.35114642
#> 
#> $`5535`
#>        score
#> 1 -5.0434670
#> 2 -1.9436026
#> 3  0.4107327
#> 4  0.3970010
#> 
#> $`5630`
#>        score
#> 1 -0.5501326
#> 2 -0.1669097
#> 3  1.1752097
#> 4  6.3028266
#> 
#> $`6647`
#>        score
#> 1 -5.0962196
#> 2 -0.7961947
#> 3  0.3901020
#> 4  0.1976048
#> 
#> $`7132`
#>        score
#> 1 -0.9298468
#> 2 -0.6719772
#> 3  0.4159697
#> 4  2.3462125
#> 
#> $`7133`
#>        score
#> 1 -0.7326115
#> 2 -0.6018502
#> 3  0.1549589
#> 4  2.8883909
#> 
#> $`10452`
#>        score
#> 1 -3.5037678
#> 2 -0.3863463
#> 3  0.5991077
#> 4  0.1937679
#> 
#> $`84134`
#>        score
#> 1 -5.3007012
#> 2 -0.5510670
#> 3  0.3288807
#> 4  0.2508747
#> 

# plot colored graphs
#par(mfrow=c(3,2), mar=rep(1,4))
#gplot(sem0.npn$graph, l="fdp", main="ALS npm")
#gplot(sem0.mvnS$graph, l="fdp", main="ALS mvnS")
#gplot(sem0.mvnK$graph, l="fdp", main="ALS mvnK")
#gplot(sem0.mvnP$graph, l="fdp", main="ALS mvnP")
#gplot(sem0.lin$graph, l="fdp", main="ALS lin")
#gplot(sem0.mca$graph, l="fdp", main="ALS mca")