Creates the lmpDataList from a SummarizedExperiment or by manually defining the design, the outcomes and the model formula.
lmpDataList serves as an input for the lmpModelMatrix function to start the limpca modeling.
Usage
data2LmpDataList(
se = NULL,
assay_name = NULL,
outcomes = NULL,
design = NULL,
formula = NULL,
verbose = TRUE
)Arguments
- se
A
SummarizedExperimentobject.- assay_name
If not
NULL(default), a character string naming the assay from theSummarizedExperimentobjectse. IfNULL, the first assay is selected.- outcomes
If not
NULL(default), a numerical matrix with n observations and m response variables. The rownames needs to be non-NULL and match those of the design matrix.- design
If not
NULL(default), a data.frame with the experimental design of n observations and q explanatory variables. The rownames of design has to match the rownames of outcomes.- formula
If not
NULL(default), a character string with the formula that will be used to analyze the data. Only the right part of the formula is necessary, eg:"~ A + B", The names of the formula should match the column names of the design- verbose
If
TRUE, prints useful information about the outputted list.
Value
A list with the 3 following named elements:
outcomesA nxm matrix with the m response variables.
designA nxq data.frame with the experimental design.
formulaA character string with the model formula.
Details
Data can be included as a SummarizedExperiment (SE) object or by manually defining one or multiple
elements of outcomes, design and formula. If a SE is provided,
the outcomes corresponds to a transposed assay of the SE (by default the first one),
the design corresponds to the colData of the SE and the formula can be provided as a
formula element in the S4Vectors::metadata of SE (metadata(se)$formula).
In the outputted list, the outcomes are structured in a standard statistical fashion,
i.e. with observations in rows and the variables (features) in column.
If the outcomes argument is not NULL, it has to be formatted that way (see Arguments).
Note that there is a priority to the outcomes, design and formula
arguments if they are not NULL (e.g. if both se and outcomes arguments are provided,
the resulting outcomes matrix will be from the outcomes argument). outcomes and design elements are mandatory.
Multiple checks are performed to ensure that the data are correctly formatted:
the rownames of
designandoutcomesshould matchthe names of the model terms in the
formulashould match column names from thedesign
Examples
data(UCH)
### create manually the dataset
res <- data2LmpDataList(
outcomes = UCH$outcomes,
design = UCH$design[, 1, drop = FALSE], formula = "~ Hippurate"
)
#> | dim outcomes: 34x600
#> | formula: ~ Hippurate
#> | design variables (1):
#> * Hippurate (factor)
### create the dataset from a SummarizedExperiment
library(SummarizedExperiment)
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: ‘MatrixGenerics’
#> The following objects are masked from ‘package:matrixStats’:
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#>
#> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:stats’:
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#> lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#> pmin.int, rank, rbind, rownames, sapply, setdiff, table, tapply,
#> union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: ‘S4Vectors’
#> The following object is masked from ‘package:utils’:
#>
#> findMatches
#> The following objects are masked from ‘package:base’:
#>
#> I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: ‘Biobase’
#> The following object is masked from ‘package:MatrixGenerics’:
#>
#> rowMedians
#> The following objects are masked from ‘package:matrixStats’:
#>
#> anyMissing, rowMedians
se <- SummarizedExperiment(
assays = list(
counts = t(UCH$outcomes),
counts2 = t(UCH$outcomes * 2)
), colData = UCH$design,
metadata = list(formula = "~ Hippurate + Citrate")
)
res <- data2LmpDataList(se, assay_name = "counts2")
#> | dim outcomes: 34x600
#> | formula: ~ Hippurate + Citrate
#> | design variables (5):
#> * Hippurate (factor)
#> * Citrate (factor)
#> * Dilution (factor)
#> * Day (factor)
#> * Time (factor)
# changing the formula:
res <- data2LmpDataList(se,
assay_name = "counts2",
formula = "~ Hippurate + Citrate + Time"
)
#> | dim outcomes: 34x600
#> | formula: ~ Hippurate + Citrate + Time
#> | design variables (5):
#> * Hippurate (factor)
#> * Citrate (factor)
#> * Dilution (factor)
#> * Day (factor)
#> * Time (factor)