Creates the lmpDataList from a SummarizedExperiment or by manually defining the design, the outcomes and the model formula.
lmpDataList serves as an input for the lmpModelMatrix
function to start the limpca modeling.
Usage
data2LmpDataList(
se = NULL,
assay_name = NULL,
outcomes = NULL,
design = NULL,
formula = NULL,
verbose = TRUE
)
Arguments
- se
A
SummarizedExperiment
object.- assay_name
If not
NULL
(default), a character string naming the assay from theSummarizedExperiment
objectse
. IfNULL
, the first assay is selected.- outcomes
If not
NULL
(default), a numerical matrix with n observations and m response variables. The rownames needs to be non-NULL and match those of the design matrix.- design
If not
NULL
(default), a data.frame with the experimental design of n observations and q explanatory variables. The rownames of design has to match the rownames of outcomes.- formula
If not
NULL
(default), a character string with the formula that will be used to analyze the data. Only the right part of the formula is necessary, eg:"~ A + B"
, The names of the formula should match the column names of the design- verbose
If
TRUE
, prints useful information about the outputted list.
Value
A list with the 3 following named elements:
outcomes
A nxm matrix with the m response variables.
design
A nxq data.frame with the experimental design.
formula
A character string with the model formula.
Details
Data can be included as a SummarizedExperiment
(SE) object or by manually defining one or multiple
elements of outcomes
, design
and formula
. If a SE is provided,
the outcomes
corresponds to a transposed assay of the SE (by default the first one),
the design
corresponds to the colData
of the SE and the formula
can be provided as a
formula
element in the S4Vectors::metadata
of SE (metadata(se)$formula
).
In the outputted list, the outcomes are structured in a standard statistical fashion,
i.e. with observations in rows and the variables (features) in column.
If the outcomes
argument is not NULL
, it has to be formatted that way (see Arguments).
Note that there is a priority to the outcomes
, design
and formula
arguments if they are not NULL
(e.g. if both se
and outcomes
arguments are provided,
the resulting outcomes matrix will be from the outcomes
argument). outcomes
and design
elements are mandatory.
Multiple checks are performed to ensure that the data are correctly formatted:
the rownames of
design
andoutcomes
should matchthe names of the model terms in the
formula
should match column names from thedesign
Examples
data(UCH)
### create manually the dataset
res <- data2LmpDataList(
outcomes = UCH$outcomes,
design = UCH$design[, 1, drop = FALSE], formula = "~ Hippurate"
)
#> | dim outcomes: 34x600
#> | formula: ~ Hippurate
#> | design variables (1):
#> * Hippurate (factor)
### create the dataset from a SummarizedExperiment
library(SummarizedExperiment)
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: ‘MatrixGenerics’
#> The following objects are masked from ‘package:matrixStats’:
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#>
#> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:stats’:
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#> lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#> pmin.int, rank, rbind, rownames, sapply, setdiff, table, tapply,
#> union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: ‘S4Vectors’
#> The following object is masked from ‘package:utils’:
#>
#> findMatches
#> The following objects are masked from ‘package:base’:
#>
#> I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: ‘Biobase’
#> The following object is masked from ‘package:MatrixGenerics’:
#>
#> rowMedians
#> The following objects are masked from ‘package:matrixStats’:
#>
#> anyMissing, rowMedians
se <- SummarizedExperiment(
assays = list(
counts = t(UCH$outcomes),
counts2 = t(UCH$outcomes * 2)
), colData = UCH$design,
metadata = list(formula = "~ Hippurate + Citrate")
)
res <- data2LmpDataList(se, assay_name = "counts2")
#> | dim outcomes: 34x600
#> | formula: ~ Hippurate + Citrate
#> | design variables (5):
#> * Hippurate (factor)
#> * Citrate (factor)
#> * Dilution (factor)
#> * Day (factor)
#> * Time (factor)
# changing the formula:
res <- data2LmpDataList(se,
assay_name = "counts2",
formula = "~ Hippurate + Citrate + Time"
)
#> | dim outcomes: 34x600
#> | formula: ~ Hippurate + Citrate + Time
#> | design variables (5):
#> * Hippurate (factor)
#> * Citrate (factor)
#> * Dilution (factor)
#> * Day (factor)
#> * Time (factor)