Creates a sub-network with perturbed edges obtained from the
output of SEMace
, comparable to the procedure in
Jablonski et al (2022), or of SEMrun
with two-group
and CGGM solver, comparable to the algorithm 2 in Belyaeva et al (2021).
To increase the efficiency of computations for large graphs, users can
select to break the network structure into clusters, and select the
topological clustering method (see clusterGraph
).
The function SEMrun
is applied iteratively on
each cluster (with size min > 10 and max < 500) to obtain the graph
with the full list of perturbed edges.
SEMdci(graph, data, group, type = "ace", method = "BH", alpha = 0.05, ...)
Input network as an igraph object.
A matrix or data.frame. Rows correspond to subjects, and columns to graph nodes (variables).
A binary vector. This vector must be as long as the number of subjects. Each vector element must be 1 for cases and 0 for control subjects.
Average Causal Effect (ACE) with two-group, "parents"
(back-door) adjustement set, and "direct" effects (type = "ace"
,
default), or CGGM solver with two-group using a clustering method.
If type = "tahc"
, network modules are generated using the tree
agglomerative hierarchical clustering method, or non-tree clustering
methods from igraph package, i.e., type = "wtc"
(walktrap community
structure with short random walks), type ="ebc"
(edge betweeness
clustering), type = "fgc"
(fast greedy method), type = "lbc"
(label propagation method), type = "lec"
(leading eigenvector method),
type = "loc"
(multi-level optimization), type = "opc"
(optimal
community structure), type = "sgc"
(spinglass statistical mechanics),
type = "none"
(no breaking network structure into clusters).
Multiple testing correction method. One of the values
available in p.adjust
. By default, method is set
to "BH" (i.e., FDR multiple test correction).
Significance level (default = 0.05) for edge set selection.
Currently ignored.
An igraph object.
Belyaeva A, Squires C, Uhler C (2021). DCI: learning causal differences between gene regulatory networks. Bioinformatics, 37(18): 3067–3069. <https://doi: 10.1093/bioinformatics/btab167>
Jablonski K, Pirkl M, Ćevid D, Bühlmann P, Beerenwinkel N (2022). Identifying cancer pathway dysregulations using differential causal effects. Bioinformatics, 38(6):1550–1559. <https://doi.org/10.1093/bioinformatics/btab847>
# \dontrun{
#load SEMdata package for ALS data with 17K genes:
#devtools::install_github("fernandoPalluzzi/SEMdata")
#library(SEMdata)
# Nonparanormal(npn) transformation
library(huge)
data.npn<- huge.npn(alsData$exprs)
#> Conducting the nonparanormal (npn) transformation via shrunkun ECDF....done.
dim(data.npn) #160 17695
#> [1] 160 318
# Extract KEGG interactome (max component)
KEGG<- properties(kegg)[[1]]
#> Frequency distribution of graph components
#>
#> n.nodes n.graphs
#> 1 4910 1
#>
#> Percent of vertices in the giant component: 100 %
#>
#> is.simple is.dag is.directed is.weighted
#> TRUE FALSE TRUE FALSE
#>
#> which.mutual.FALSE which.mutual.TRUE
#> 41824 3376
summary(KEGG)
#> IGRAPH b806e5d DN-- 4910 45200 --
#> + attr: name (v/c)
# KEGG modules with ALS perturbed edges using fast gready clustering
gD<- SEMdci(KEGG, data.npn, alsData$group, type="fgc")
#> modularity = 0.556916
#>
#> Community sizes
#> 43 44 45 46 47 36 37 40 41 42 28 29 32 34 35 38
#> 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 4
#> 39 18 24 25 26 30 33 17 20 27 31 16 22 19 21 9
#> 4 6 6 6 6 7 7 8 8 10 11 12 12 16 17 20
#> 23 12 13 15 10 14 11 7 8 2 1 5 4 6 3
#> 21 22 24 25 31 33 39 82 97 245 387 546 763 829 1561
#>
#> fit cluster = 1
#> fit cluster = 2
#> fit cluster = 3
#> fit cluster = 4
#> fit cluster = 5
#> fit cluster = 6
#> fit cluster = 7
#> fit cluster = 8
#> fit cluster = 9
#> fit cluster = 10
#> fit cluster = 11
#> fit cluster = 12
#> fit cluster = 13
#> fit cluster = 14
#> fit cluster = 15
#> fit cluster = 16
#> fit cluster = 19
#> fit cluster = 21
#> fit cluster = 22
#> fit cluster = 23
#> fit cluster = 27
#> fit cluster = 31
#> Done.
summary(gD)
#> IGRAPH bd187f2 DN-- 2 1 --
#> + attr: name (v/c)
gcD<- properties(gD)
#> Frequency distribution of graph components
#>
#> n.nodes n.graphs
#> 1 2 1
#>
#> Percent of vertices in the giant component: 100 %
#>
#> is.simple is.dag is.directed is.weighted
#> TRUE TRUE TRUE FALSE
#>
#> which.mutual.FALSE
#> 1
old.par <- par(no.readonly = TRUE)
par(mfrow=c(2,2), mar=rep(2,4))
gplot(gcD[[1]], l="fdp", main="max component")
gplot(gcD[[2]], l="fdp", main="2nd component")
#> Error in gcD[[2]]: subscript out of bounds
gplot(gcD[[3]], l="fdp", main="3rd component")
#> Error in gcD[[3]]: subscript out of bounds
gplot(gcD[[4]], l="fdp", main="4th component")
#> Error in gcD[[4]]: subscript out of bounds
par(old.par)
# }