Skip to contents

Creates the lmpDataList from a SummarizedExperiment or by manually defining the design, the outcomes and the model formula. lmpDataList serves as an input for the lmpModelMatrix function to start the limpca modeling.

Usage

data2LmpDataList(
  se = NULL,
  assay_name = NULL,
  outcomes = NULL,
  design = NULL,
  formula = NULL,
  verbose = TRUE
)

Arguments

se

A SummarizedExperiment object.

assay_name

If not NULL (default), a character string naming the assay from the SummarizedExperiment object se. If NULL, the first assay is selected.

outcomes

If not NULL (default), a numerical matrix with n observations and m response variables. The rownames needs to be non-NULL and match those of the design matrix.

design

If not NULL (default), a data.frame with the experimental design of n observations and q explanatory variables. The rownames of design has to match the rownames of outcomes.

formula

If not NULL (default), a character string with the formula that will be used to analyze the data. Only the right part of the formula is necessary, eg: "~ A + B", The names of the formula should match the column names of the design

verbose

If TRUE, prints useful information about the outputted list.

Value

A list with the 3 following named elements:

outcomes

A nxm matrix with the m response variables.

design

A nxq data.frame with the experimental design.

formula

A character string with the model formula.

Details

Data can be included as a SummarizedExperiment (SE) object or by manually defining one or multiple elements of outcomes, design and formula. If a SE is provided, the outcomes corresponds to a transposed assay of the SE (by default the first one), the design corresponds to the colData of the SE and the formula can be provided as a formula element in the S4Vectors::metadata of SE (metadata(se)$formula).

In the outputted list, the outcomes are structured in a standard statistical fashion, i.e. with observations in rows and the variables (features) in column. If the outcomes argument is not NULL, it has to be formatted that way (see Arguments).

Note that there is a priority to the outcomes, design and formula arguments if they are not NULL (e.g. if both se and outcomes arguments are provided, the resulting outcomes matrix will be from the outcomes argument). outcomes and design elements are mandatory.

Multiple checks are performed to ensure that the data are correctly formatted:

  • the rownames of design and outcomes should match

  • the names of the model terms in the formula should match column names from the design

Examples


data(UCH)

### create manually the dataset

res <- data2LmpDataList(
  outcomes = UCH$outcomes,
  design = UCH$design[, 1, drop = FALSE], formula = "~ Hippurate"
)
#> | dim outcomes: 34x600
#> | formula: ~ Hippurate
#> | design variables (1): 
#> * Hippurate (factor)

### create the dataset from a SummarizedExperiment

library(SummarizedExperiment)
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: ‘MatrixGenerics’
#> The following objects are masked from ‘package:matrixStats’:
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> 
#> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:stats’:
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#>     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#>     pmin.int, rank, rbind, rownames, sapply, setdiff, table, tapply,
#>     union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: ‘S4Vectors’
#> The following object is masked from ‘package:utils’:
#> 
#>     findMatches
#> The following objects are masked from ‘package:base’:
#> 
#>     I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: ‘Biobase’
#> The following object is masked from ‘package:MatrixGenerics’:
#> 
#>     rowMedians
#> The following objects are masked from ‘package:matrixStats’:
#> 
#>     anyMissing, rowMedians

se <- SummarizedExperiment(
  assays = list(
    counts = t(UCH$outcomes),
    counts2 = t(UCH$outcomes * 2)
  ), colData = UCH$design,
  metadata = list(formula = "~ Hippurate + Citrate")
)

res <- data2LmpDataList(se, assay_name = "counts2")
#> | dim outcomes: 34x600
#> | formula: ~ Hippurate + Citrate
#> | design variables (5): 
#> * Hippurate (factor)
#> * Citrate (factor)
#> * Dilution (factor)
#> * Day (factor)
#> * Time (factor)

# changing the formula:
res <- data2LmpDataList(se,
  assay_name = "counts2",
  formula = "~ Hippurate + Citrate + Time"
)
#> | dim outcomes: 34x600
#> | formula: ~ Hippurate + Citrate + Time
#> | design variables (5): 
#> * Hippurate (factor)
#> * Citrate (factor)
#> * Dilution (factor)
#> * Day (factor)
#> * Time (factor)