Title: | Workflows for Mass-Spectrometry Based Non-Target Analysis |
---|---|
Description: | Provides an easy-to-use interface to a mass spectrometry based non-target analysis workflow. Various (open-source) tools are combined which provide algorithms for extraction and grouping of features, extraction of MS and MS/MS data, automatic formula and compound annotation and grouping related features to components. In addition, various tools are provided for e.g. data preparation and cleanup, plotting results and automatic reporting. |
Authors: | Rick Helmus [aut, cre] , Olaf Brock [ctb] , Vittorio Albergamo [ctb] , Andrea Brunner [ctb] , Emma Schymanski [ctb] , Bas van de Velde [ctb] |
Maintainer: | Rick Helmus <[email protected]> |
License: | GPL-3 |
Version: | 2.3.3 |
Built: | 2024-11-21 08:39:15 UTC |
Source: | https://github.com/rickhelmus/patRoon |
Provides an easy-to-use interface to a mass spectrometry based non-target analysis workflow. Various (open-source) tools are combined which provide algorithms for extraction and grouping of features, extraction of MS and MS/MS data, automatic formula and compound annotation and grouping related features to components. In addition, various tools are provided for e.g. data preparation and cleanup, plotting results and automatic reporting.
The following package options (see options
) can be set:
patRoon.checkCentroided
: If set to TRUE
(the default) then the analyses files are verified to
be centroided before loading any MS data. While these checks are optimized and cached, it may be useful to
set this option to FALSE
when processing very large numbers of analyses.
patRoon.cache.mode
: A character
setting the current caching mode: "save"
and
"load"
will only save/load results to/from the cache, "both"
(default) will do both and "none"
to completely disable caching. This option can be changed anytime, which might be useful, for instance, to
temporarily disable cached results before running a function.
patRoon.cache.fileName
: a character
specifying the name of the cache file (default is
‘cache.sqlite’).
patRoon.MP.maxProcs
: The maximum number of processes that should be initiated in parallel. A good
starting point is the number of physical cores, which is the default as detected by
detectCores
. This option is only used when patRoon.MP.method="classic".
patRoon.MP.method
: Either "classic"
or "future"
. The former is the default and uses
processx to execute multiple commands in parallel. When "future"
the future.apply
package is used for parallelization, which is especially useful for e.g. cluster computing.
patRoon.MP.futureSched
: Sets the future.scheduling
function argument for
future_lapply
. Only used if patRoon.MP.method="future".
patRoon.MP.logPath
: The path used for logging of output from commands executed by multiprocess. Set to
FALSE
to disable logging.
patRoon.path.pwiz
: The path in which the ProteoWizard
binaries are installed. If unset an
attempt is made to find this directory from the Windows registry and PATH environment variable.
patRoon.path.GenForm
: The path to the GenForm
executable. If not set (the default) the
internal GenForm
binary is used. Only set if you want to override the executable.
patRoon.path.MetFragCL
: The complete file path to the MetFrag CL ‘jar’ to be used by
generateCompoundsMetFrag
. Example: "C:/MetFrag2.4.2-CL.jar"
.
patRoon.path.MetFragCompTox
: The complete file path to the CompTox database ‘csv’ file. See
generateCompounds
for more details.
patRoon.path.MetFragPubChemLite
: The complete file path to the PubChemLite database ‘csv’ file.
See generateCompounds
for more details.
patRoon.path.SIRIUS
: The directory in which the SIRIUS binaries are installed. Used by all functions that interface with SIRIUS
, such as generateFormulasSIRIUS
and generateCompoundsSIRIUS
. Example: "C:/sirius-win64-3.5.1"
. Note that the location of the
binaries differs for each operating system.
are installed in different subdirectories for each location inside this differs for each operating system
patRoon.path.OpenMS
: The path in which the OpenMS
binaries are installed.
patRoon.path.pngquant
: The path of the pngquant
binary that is used when optimizing
‘.png’ plots generated by reportHTML
(with optimizePng
set to TRUE
). If the binary
can be located through the PATH environment variable this option can remain empty. Note that some of the
functionality of reportHTML
only locates the binary through the PATH environment variable, hence,
it is recommended to set up PATH instead.
patRoon.path.obabel
: The path in which the OpenBabel
binaries are installed.
patRoon.path.BiotransFormer
The full file path to the biotransformer
‘.jar’ command
line utility. This needs to be set when generateTPsBioTransformer
is used. For more details see
https://bitbucket.org/djoumbou/biotransformer/src/master.
Most external dependencies are provided by patRoonExt or otherwise found in the system environment
PATH variable. However, the patRoon.path.*
options should be set if this fails or you want to
override the location. The verifyDependencies
function can be used to assess if dependencies are
found.
Maintainer: Rick Helmus [email protected] (ORCID)
Other contributors:
Olaf Brock (ORCID) [contributor]
Vittorio Albergamo (ORCID) [contributor]
Andrea Brunner (ORCID) [contributor]
Emma Schymanski (ORCID) [contributor]
Bas van de Velde (ORCID) [contributor]
Useful links:
Contains data for compound annotations for feature groups.
addFormulaScoring( compounds, formulas, updateScore = FALSE, formulaScoreWeight = 1 ) ## S4 method for signature 'compounds' defaultExclNormScores(obj) ## S4 method for signature 'compounds' show(object) ## S4 method for signature 'compounds' identifiers(compounds) ## S4 method for signature 'compounds' filter( obj, minExplainedPeaks = NULL, minScore = NULL, minFragScore = NULL, minFormulaScore = NULL, scoreLimits = NULL, ... ) ## S4 method for signature 'compounds' addFormulaScoring( compounds, formulas, updateScore = FALSE, formulaScoreWeight = 1 ) ## S4 method for signature 'compounds' getMCS(obj, index, groupName) ## S4 method for signature 'compounds' plotStructure(obj, index, groupName, width = 500, height = 500) ## S4 method for signature 'compounds' plotScores( obj, index, groupName, normalizeScores = "max", excludeNormScores = defaultExclNormScores(obj), onlyUsed = TRUE ) ## S4 method for signature 'compounds' annotatedPeakList( obj, index, groupName, MSPeakLists, formulas = NULL, onlyAnnotated = FALSE ) ## S4 method for signature 'compounds' plotSpectrum( obj, index, groupName, MSPeakLists, formulas = NULL, plotStruct = FALSE, title = NULL, specSimParams = getDefSpecSimParams(), mincex = 0.9, xlim = NULL, ylim = NULL, maxMolSize = c(0.2, 0.4), molRes = c(100, 100), ... ) ## S4 method for signature 'compounds' consensus( obj, ..., absMinAbundance = NULL, relMinAbundance = NULL, uniqueFrom = NULL, uniqueOuter = FALSE, rankWeights = 1, labels = NULL ) ## S4 method for signature 'compoundsSet' show(object) ## S4 method for signature 'compoundsSet' delete(obj, i, j, ...) ## S4 method for signature 'compoundsSet,ANY,missing,missing' x[i, j, ..., sets = NULL, updateConsensus = FALSE, drop = TRUE] ## S4 method for signature 'compoundsSet' filter(obj, ..., sets = NULL, updateConsensus = FALSE, negate = FALSE) ## S4 method for signature 'compoundsSet' plotSpectrum( obj, index, groupName, MSPeakLists, formulas = NULL, plotStruct = FALSE, title = NULL, specSimParams = getDefSpecSimParams(), mincex = 0.9, xlim = NULL, ylim = NULL, maxMolSize = c(0.2, 0.4), molRes = c(100, 100), perSet = TRUE, mirror = TRUE, ... ) ## S4 method for signature 'compoundsSet' addFormulaScoring( compounds, formulas, updateScore = FALSE, formulaScoreWeight = 1 ) ## S4 method for signature 'compoundsSet' annotatedPeakList(obj, index, groupName, MSPeakLists, formulas = NULL, ...) ## S4 method for signature 'compoundsSet' consensus( obj, ..., absMinAbundance = NULL, relMinAbundance = NULL, uniqueFrom = NULL, uniqueOuter = FALSE, rankWeights = 1, labels = NULL, filterSets = FALSE, setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE ) ## S4 method for signature 'compoundsSet' unset(obj, set) ## S4 method for signature 'compoundsConsensusSet' unset(obj, set) ## S4 method for signature 'compoundsSIRIUS' delete(obj, i = NULL, j = NULL, ...)
addFormulaScoring( compounds, formulas, updateScore = FALSE, formulaScoreWeight = 1 ) ## S4 method for signature 'compounds' defaultExclNormScores(obj) ## S4 method for signature 'compounds' show(object) ## S4 method for signature 'compounds' identifiers(compounds) ## S4 method for signature 'compounds' filter( obj, minExplainedPeaks = NULL, minScore = NULL, minFragScore = NULL, minFormulaScore = NULL, scoreLimits = NULL, ... ) ## S4 method for signature 'compounds' addFormulaScoring( compounds, formulas, updateScore = FALSE, formulaScoreWeight = 1 ) ## S4 method for signature 'compounds' getMCS(obj, index, groupName) ## S4 method for signature 'compounds' plotStructure(obj, index, groupName, width = 500, height = 500) ## S4 method for signature 'compounds' plotScores( obj, index, groupName, normalizeScores = "max", excludeNormScores = defaultExclNormScores(obj), onlyUsed = TRUE ) ## S4 method for signature 'compounds' annotatedPeakList( obj, index, groupName, MSPeakLists, formulas = NULL, onlyAnnotated = FALSE ) ## S4 method for signature 'compounds' plotSpectrum( obj, index, groupName, MSPeakLists, formulas = NULL, plotStruct = FALSE, title = NULL, specSimParams = getDefSpecSimParams(), mincex = 0.9, xlim = NULL, ylim = NULL, maxMolSize = c(0.2, 0.4), molRes = c(100, 100), ... ) ## S4 method for signature 'compounds' consensus( obj, ..., absMinAbundance = NULL, relMinAbundance = NULL, uniqueFrom = NULL, uniqueOuter = FALSE, rankWeights = 1, labels = NULL ) ## S4 method for signature 'compoundsSet' show(object) ## S4 method for signature 'compoundsSet' delete(obj, i, j, ...) ## S4 method for signature 'compoundsSet,ANY,missing,missing' x[i, j, ..., sets = NULL, updateConsensus = FALSE, drop = TRUE] ## S4 method for signature 'compoundsSet' filter(obj, ..., sets = NULL, updateConsensus = FALSE, negate = FALSE) ## S4 method for signature 'compoundsSet' plotSpectrum( obj, index, groupName, MSPeakLists, formulas = NULL, plotStruct = FALSE, title = NULL, specSimParams = getDefSpecSimParams(), mincex = 0.9, xlim = NULL, ylim = NULL, maxMolSize = c(0.2, 0.4), molRes = c(100, 100), perSet = TRUE, mirror = TRUE, ... ) ## S4 method for signature 'compoundsSet' addFormulaScoring( compounds, formulas, updateScore = FALSE, formulaScoreWeight = 1 ) ## S4 method for signature 'compoundsSet' annotatedPeakList(obj, index, groupName, MSPeakLists, formulas = NULL, ...) ## S4 method for signature 'compoundsSet' consensus( obj, ..., absMinAbundance = NULL, relMinAbundance = NULL, uniqueFrom = NULL, uniqueOuter = FALSE, rankWeights = 1, labels = NULL, filterSets = FALSE, setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE ) ## S4 method for signature 'compoundsSet' unset(obj, set) ## S4 method for signature 'compoundsConsensusSet' unset(obj, set) ## S4 method for signature 'compoundsSIRIUS' delete(obj, i = NULL, j = NULL, ...)
formulas |
The |
updateScore , formulaScoreWeight
|
If |
obj , object , compounds , x
|
The |
minExplainedPeaks , scoreLimits
|
Passed to the
|
minScore , minFragScore , minFormulaScore
|
Minimum overall score, in-silico fragmentation score and formula score,
respectively. Set to |
... |
For For for For For sets workflow methods: further arguments passed to the base |
index |
The numeric index of the candidate structure. For For |
groupName |
The name of the feature group (or feature groups when comparing spectra) to which the candidate belongs. |
width , height
|
The dimensions (in pixels) of the raster image that should be plotted. |
normalizeScores |
A |
excludeNormScores |
A
For |
onlyUsed |
If |
MSPeakLists |
The |
onlyAnnotated |
Set to |
plotStruct |
If |
title |
The title of the plot. If |
specSimParams |
A named |
mincex |
The formula annotation labels are automatically scaled. The |
xlim , ylim
|
Sets the plot size limits used by
|
maxMolSize |
Numeric vector of size two with the maximum width/height of the candidate structure (relative to the plot size). |
molRes |
Numeric vector of size two with the resolution of the candidate structure (in pixels). |
absMinAbundance , relMinAbundance
|
Minimum absolute or relative
(‘0-1’) abundance across objects for a result to be kept. For
instance, |
uniqueFrom |
Set this argument to only retain compounds that are unique
within one or more of the objects for which the consensus is made.
Selection is done by setting the value of |
uniqueOuter |
If |
rankWeights |
A numeric vector with weights of to calculate the mean ranking score for each candidate. The value will be re-cycled if necessary, hence, the default value of ‘1’ means equal weights for all considered objects. |
labels |
A |
i , j , drop
|
Passed to the |
sets |
(sets workflow) A |
updateConsensus |
(sets workflow) If |
negate |
Passed to the |
perSet , mirror
|
(sets workflow) If |
filterSets |
(sets workflow) Controls how algorithms concensus abundance filters are applied. See the |
setThreshold , setThresholdAnn
|
(sets workflow) Thresholds used to create the annotation set consensus. See
|
setAvgSpecificScores |
(sets workflow) If |
set |
(sets workflow) The name of the set. |
compounds
objects are obtained from compound generators. This class is derived from
the featureAnnotations
class, please see its documentation for more methods and other details.
addFormulaScoring
returns a compounds
object updated
with formula scoring.
getMCS
returns an rcdk molecule object
(IAtomContainer
).
consensus
returns a compounds
object that is produced by merging multiple specified
compounds
objects.
defaultExclNormScores(compounds)
: Returns default scorings that are excluded from normalization.
show(compounds)
: Show summary information for this object.
identifiers(compounds)
: Returns a list containing for each feature group a
character vector with database identifiers for all candidate compounds. The
list is named by feature group names, and is typically used with the
identifiers
option of generateCompoundsMetFrag
.
filter(compounds)
: Provides rule based filtering for generated compounds. Useful to eliminate unlikely candidates
and speed up further processing. Also see the featureAnnotations
method.
addFormulaScoring(compounds)
: Adds formula ranking data from a formulas
object as an extra compound candidate scoring (formulaScore
column).
The formula score for each compound candidate is between ‘0-1’, where
zero means no match with any formula candidates, and one
means that the compound candidate's formula is the highest ranked.
getMCS(compounds)
: Calculates the maximum common substructure (MCS)
for two or more candidate structures for a feature group. This method uses
the get.mcs
function from rcdk.
plotStructure(compounds)
: Plots a structure of a candidate compound using the
rcdk package. If multiple candidates are specified (i.e.
by specifying a vector
for index
) then the maximum common
substructure (MCS) of the selected candidates is drawn.
plotScores(compounds)
: Plots a barplot with scoring of a candidate compound.
annotatedPeakList(compounds)
: Returns an MS/MS peak list annotated with data from a
given candidate compound for a feature group.
plotSpectrum(compounds)
: Plots an annotated spectrum for a given candidate compound for a feature group. Two spectra can
be compared by specifying a two-sized vector for the index
and groupName
arguments.
consensus(compounds)
: Generates a consensus of results from multiple
objects. In order to rank the consensus candidates, first
each of the candidates are scored based on their original ranking
(the scores are normalized and the highest ranked candidate gets value
‘1’). The (weighted) mean is then calculated for all scorings of each
candidate to derive the final ranking (if an object lacks the candidate its
score will be ‘0’). The original rankings for each object is stored in
the rank
columns.
MS2QuantMeta
Metadata from MS2Quant filled in by predictRespFactors
.
setThreshold,setThresholdAnn,setAvgSpecificScores
(sets workflow) A copy of the equally named arguments that were
passed when this object was created by generateCompounds
.
origFGNames
(sets workflow) The original (order of) names of the featureGroups
object that was used to
create this object.
Subscripting of formulae for plots generated by
plotSpectrum
is based on the chemistry2expression
function
from the ReSOLUTION package.
The compoundsSet
class is applicable for sets workflows. This class is derived from compounds
and therefore largely follows the same user interface.
The following methods are specifically defined for sets workflows:
All the methods from base class workflowStepSet
.
unset
Converts the object data for a specified set into a 'non-set' object (compoundsUnset
), which allows it to be used in 'regular' workflows. Only the annotation results that are present in the specified set are kept
(based on the set consensus, see below for implications).
The following methods are changed or with new functionality:
filter
and the subset operator ([
) Can be used to select data that is only present for selected
sets. Depending on the updateConsenus
, both either operate on set consensus or original data (see below for
implications).
annotatedPeakList
Returns a combined annotation table with all sets.
plotSpectrum
Is able to highlight set specific mass peaks (perSet
and mirror
arguments).
consensus
Creates the algorithm consensus based on the original annotation data (see below for
implications). Then, like the sets workflow method for generateCompounds
, a consensus is made for all
sets, which can be controlled with the setThreshold
and setThresholdAnn
arguments. The candidate
coverage among the different algorithms is calculated for each set (e.g. coverage-positive
column)
and for all sets (coverage
column), which is based on the presence of a candidate in all the algorithms from
all sets data. The consensus
method for sets workflow data supports the filterSets
argument. This
controls how the algorithm consensus abundance filters (absMinAbundance
/relMinAbundance
) are applied:
if filterSets=TRUE
then the minimum of all coverage
set specific columns is used to obtain the
algorithm abundance. Otherwise the overall coverage
column is used. For instance, consider a consensus
object to be generated from two objects generated by different algorithms (e.g. SIRIUS
and
MetFrag
), which both have a positive and negative set. Then, if a candidate occurs with both
algorithms for the positive mode set, but only with the first algorithm in the negative mode set,
relMinAbundance=1
will remove the candidate if filterSets=TRUE
(because the minimum relative
algorithm abundance is ‘0.5’), while filterSets=FALSE
will not remove the candidate (because based on
all sets data the candidate occurs in both algorithms).
addFormulaScoring
Adds the formula scorings to the original data and re-creates the annotation set consensus (see below for implications).
Two types of annotation data are stored in a compoundsSet
object:
Annotations that are produced from a consensus between set results (see generateCompounds
).
The 'original' annotation data per set, prior to when the set consensus was made. This includes candidates
that were filtered out because of the thresholds set by setThreshold
and setThresholdAnn
. However,
when filter
or subsetting ([
) operations are performed, the original data is also updated.
In most cases the first data is used. However, in a few cases the original annotation data is used (as indicated
above), for instance, to re-create the set consensus. It is important to realize that the original annotation data
may have additional candidates, and a newly created set consensus may therefore have 'new' candidates. For
instance, when the object consists of the sets "positive"
and "negative"
and setThreshold=1
was used to create it, then compounds[, sets = "positive", updateConsensus = TRUE]
may now have additional
candidates, i.e. those that were not present in the "negative"
set and were previously removed due to
the consensus threshold filter.
The values ranges in the scoreLimits
slot, which are used for normalization of scores, are based on the
original scorings when the compounds were generated (prior to employing the topMost
filter to
generateCompounds
).
Guha, R. (2007). 'Chemical Informatics Functionality in R'. Journal of Statistical Software 6(18)
The featureAnnotations
base class for more relevant methods and
generateCompounds
.
Objects from this class are used to specify adduct information in an algorithm independent way.
adduct(...) ## S4 method for signature 'adduct' show(object) ## S4 method for signature 'adduct' as.character(x, format = "generic", err = TRUE)
adduct(...) ## S4 method for signature 'adduct' show(object) ## S4 method for signature 'adduct' as.character(x, format = "generic", err = TRUE)
x , object
|
An |
format |
A
|
err |
If |
... |
Any of |
show(adduct)
: Shows summary information for this object.
as.character(adduct)
: Converts an adduct
object to a specified
character
format.
add,sub
A character
with one or more formulas to add/subtract.
molMult
How many times the original molecule is present in this molecule (e.g. for a dimer this would be ‘2’). Default is ‘1’.
charge
The final charge of the adduct (default ‘1’).
as.adduct
for easy creation of adduct
objects
and adduct utilities for other adduct functionality.
adduct("H") # [M+H]+ adduct(sub = "H", charge = -1) # [M-H]- adduct(add = "K", sub = "H2", charge = -1) # [M+K-H2]+ adduct(add = "H3", charge = 3) # [M+H3]3+ adduct(add = "H", molMult = 2) # [2M+H]+ as.character(adduct("H")) # returns "[M+H]+"
adduct("H") # [M+H]+ adduct(sub = "H", charge = -1) # [M-H]- adduct(add = "K", sub = "H2", charge = -1) # [M+K-H2]+ adduct(add = "H3", charge = 3) # [M+H3]3+ adduct(add = "H", molMult = 2) # [2M+H]+ as.character(adduct("H")) # returns "[M+H]+"
Several utility functions to work with adducts.
GenFormAdducts() MetFragAdducts() as.adduct(x, format = "generic", isPositive = NULL, charge = NULL, err = TRUE) calculateIonFormula(formula, adduct) calculateNeutralFormula(formula, adduct)
GenFormAdducts() MetFragAdducts() as.adduct(x, format = "generic", isPositive = NULL, charge = NULL, err = TRUE) calculateIonFormula(formula, adduct) calculateNeutralFormula(formula, adduct)
x |
The object that should be converted. Should be a |
format |
A
|
isPositive |
A logical that specifies whether the adduct should be
positive. Should only be set when |
charge |
The final charge. Only needs to be set when |
err |
If |
formula |
A |
adduct |
An |
GenFormAdducts
returns a table with information on adducts
supported by GenForm
.
MetFragAdducts
returns a table with information on adducts
supported by MetFrag
.
as.adduct
Converts an object in to an adduct
object.
calculateIonFormula
Converts one or more neutral formulae to
adduct ions.
calculateNeutralFormula
Converts one or more adduct ions to
neutral formulae.
as.adduct("[M+H]+") as.adduct("[M+H2]2+") as.adduct("[2M+H]+") as.adduct("[M-H]-") as.adduct("+H", format = "genform") as.adduct(1, isPositive = TRUE, format = "metfrag") # MetFrag adduct ID 1 --> returns [M+H]+ calculateIonFormula("C2H4O", "[M+H]+") # C2H5O calculateNeutralFormula("C2H5O", "[M+H]+") # C2H4O
as.adduct("[M+H]+") as.adduct("[M+H2]2+") as.adduct("[2M+H]+") as.adduct("[M-H]-") as.adduct("+H", format = "genform") as.adduct(1, isPositive = TRUE, format = "metfrag") # MetFrag adduct ID 1 --> returns [M+H]+ calculateIonFormula("C2H4O", "[M+H]+") # C2H5O calculateNeutralFormula("C2H5O", "[M+H]+") # C2H4O
Properties for the sample analyses used in the workflow and utilities to automatically generate this information.
generateAnalysisInfo( paths, groups = "", blanks = "", concs = NULL, norm_concs = NULL, formats = MSFileFormats() ) generateAnalysisInfoFromEnviMass(path)
generateAnalysisInfo( paths, groups = "", blanks = "", concs = NULL, norm_concs = NULL, formats = MSFileFormats() ) generateAnalysisInfoFromEnviMass(path)
paths |
A character vector containing one or more file paths that should be used for finding the analyses. |
groups , blanks
|
An (optional) character vector containing replicate groups and blanks, respectively (will be
recycled). If |
concs |
An optional numeric vector containing concentration values for each analysis. Can be |
norm_concs |
An optional numeric vector containing concentrations used for feature normalization (see the
|
formats |
A character vector of analyses file types to consider. Analyses not present in these formats will be
ignored. For valid values see |
path |
The path of the enviMass project. |
In patRoon a sample analysis, or simply analysis, refers to a single MS analysis file (sometimes
also called sample or file). The analysis information summarizes several properties for the
analyses, and is used in various steps throughout the workflow, such as findFeatures
, averaging
intensities of feature groups and blank subtraction. This information should be in a data.frame
, with the
following columns:
path
the full path to the directory of the analysis.
analysis
the file name without extension. Must be unique, even if the path
is
different.
group
name of replicate group. A replicate group is used to group analyses together that are
replicates of each other. Thus, the group
column for all analyses considered to be belonging to the same
replicate group should have an equal (but unique) value. Used for e.g. averaging and
filter
.
blank
all analyses within this replicate group are used by the featureGroups
method of
filter
for blank subtraction. Multiple entries can be entered by
separation with a comma.
conc
a numeric value specifying the 'concentration' for the analysis. This can be actually any kind of
numeric value such as exposure time, dilution factor or anything else which may be used to form a linear
relationship.
norm_conc
a numeric value specifying the normalization concentration for the analysis. See the
Feature intensity normalization
section in the featureGroups documentation) for
more details.
Most workflows steps work with ‘mzXML’ and ‘mzML’ file formats. However, some algorithms only support
support one format (e.g. findFeaturesOpenMS
, findFeaturesEnviPick
) or a
proprietary format (findFeaturesBruker
). To mix such algorithms in the same workflow, the analyses
should be present in all required formats within the same directory as specified by the path
column.
Each analysis should only be specified once in the analysis information, even if multiple file formats are
available. The path
and analysis
columns are internally used by patRoon to automatically find the
path of analysis files with the required format.
The group
column is mandatory and needs to be non-empty for each analysis. The blank
column
should also be present, however, this may be empty (""
) for analyses where no blank subtraction should occur.
The conc
column is only required when obtaining regression information is desired with the
as.data.table
method. Similarly, the norm_conc
is only
necessary for the normInts
method.
generateAnalysisInfo
is an utility function that automatically generates a data.frame
with
analysis information. It scans the directories specified from the paths
argument for analyses, and uses this
to automatically fill in the analysis
and path
columns. Furthermore, this function also correctly
handles analyses which are available in multiple formats.
generateAnalysisInfoFromEnviMass
loads analysis information
from an enviMass project. Note: this funtionality has only been
tested with older versions of enviMass.
Various parsing and plotting functions for the analysisInfo data.frame.
## S4 method for signature 'data.frame' getTICs(obj, retentionRange = NULL, MSLevel = 1) ## S4 method for signature 'data.frame' getBPCs(obj, retentionRange = NULL, MSLevel = 1) ## S4 method for signature 'data.frame' plotTICs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, colourBy = c("none", "analyses", "rGroups"), showLegend = TRUE, xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'data.frame' plotBPCs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, colourBy = c("none", "analyses", "rGroups"), showLegend = TRUE, xlim = NULL, ylim = NULL, ... )
## S4 method for signature 'data.frame' getTICs(obj, retentionRange = NULL, MSLevel = 1) ## S4 method for signature 'data.frame' getBPCs(obj, retentionRange = NULL, MSLevel = 1) ## S4 method for signature 'data.frame' plotTICs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, colourBy = c("none", "analyses", "rGroups"), showLegend = TRUE, xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'data.frame' plotBPCs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, colourBy = c("none", "analyses", "rGroups"), showLegend = TRUE, xlim = NULL, ylim = NULL, ... )
obj |
An |
retentionRange |
Range of retention time (in seconds), m/z, respectively. Should be a numeric vector with length of two containing the min/max values. The maximum can be Inf to specify no maximum range. Set to NULL to skip this step. |
MSLevel |
Integer vector with the ms levels (i.e., 1 for MS1 and 2 for MS2) to obtain traces. |
retMin |
Plot retention time in minutes (instead of seconds). |
title |
Character string used for title of the plot. If |
colourBy |
Sets the automatic colour selection: "none" for a single colour or "analyses"/"rGroups" for a distinct colour per analysis or analysis replicate group. |
showLegend |
Plot a legend if TRUE. |
xlim , ylim
|
Sets the plot size limits used by
|
... |
Further arguments passed to |
getTICs(data.frame)
: Obtain the total ion chromatogram/s (TICs) of the analyses.
getBPCs(data.frame)
: Obtain the base peak chromatogram/s (BPCs) of the analyses.
plotTICs(data.frame)
: Plots the TICs of the analyses.
plotBPCs(data.frame)
: Plots the BPCs of the analyses.
Ricardo Cunha, [email protected]
Miscellaneous utility functions which interface with Bruker DataAnalysis
showDataAnalysis() setDAMethod(anaInfo, method, close = TRUE) revertDAAnalyses(anaInfo, close = TRUE, save = close) recalibrarateDAFiles(anaInfo, close = TRUE, save = close) getDACalibrationError(anaInfo) addDAEIC( analysis, path, mz, mzWindow = 0.005, ctype = "EIC", mtype = "MS", polarity = "both", bgsubtr = FALSE, fragpath = "", name = NULL, hideDA = TRUE, close = FALSE, save = close ) addAllDAEICs( fGroups, mzWindow = 0.005, ctype = "EIC", bgsubtr = FALSE, name = TRUE, onlyPresent = TRUE, hideDA = TRUE, close = FALSE, save = close )
showDataAnalysis() setDAMethod(anaInfo, method, close = TRUE) revertDAAnalyses(anaInfo, close = TRUE, save = close) recalibrarateDAFiles(anaInfo, close = TRUE, save = close) getDACalibrationError(anaInfo) addDAEIC( analysis, path, mz, mzWindow = 0.005, ctype = "EIC", mtype = "MS", polarity = "both", bgsubtr = FALSE, fragpath = "", name = NULL, hideDA = TRUE, close = FALSE, save = close ) addAllDAEICs( fGroups, mzWindow = 0.005, ctype = "EIC", bgsubtr = FALSE, name = TRUE, onlyPresent = TRUE, hideDA = TRUE, close = FALSE, save = close )
anaInfo |
|
method |
The full path of the DataAnalysis method. |
close , save
|
If |
analysis |
Analysis name (without file extension). |
path |
path of the analysis. |
mz |
m/z (Da) value used for the chromatographic trace (if applicable). |
mzWindow |
m/z window (in Da) used for the chromatographic trace (if applicable). |
ctype |
Type of the chromatographic trace. Valid options are:
|
mtype |
MS filter for chromatographic trace. Valid values are:
|
polarity |
Polarity filter for chromatographic trace. Valid values:
|
bgsubtr |
If |
fragpath |
Precursor m/z used for MS/MS traces ( |
name |
For |
hideDA |
Hides DataAnalysis while adding the chromatographic trace (faster). |
fGroups |
The |
onlyPresent |
If |
These functions communicate directly with Bruker DataAnalysis to provide various functionality, such as calibrating and exporting data and adding chromatographic traces. For this the RDCOMClient package is required to be installed.
showDataAnalysis
makes a hidden DataAnalysis window visible
again. Most functions using DataAnalysis will hide the window during
processing for efficiency reasons. If the window remains hidden
(e.g. because there was an error) this function can be used to make
it visible again. This function can also be used to start DataAnalysis if
it is not running yet.
setDAMethod
Sets a given DataAnalysis method (‘.m’ file)
to a set of analyses. NOTE: as a workaround for a bug in
DataAnalysis, this function will save(!), close and re-open any analyses
that are already open prior to setting the new method. The close
argument only controls whether the file should be closed after setting the
method (files are always saved).
revertDAAnalyses
Reverts a given set of analyses to their
unprocessed raw state.
recalibrarateDAFiles
Performs automatic mass recalibration of
a given set of analyses. The current method settings for each analyses will
be used.
getDACalibrationError
is used to obtain the standard
deviation of the current mass calibration (in ppm).
addDAEIC
adds an Extracted Ion Chromatogram (EIC) or other
chromatographic trace to a given analysis which can be used directly with
DataAnalysis.
addAllDAEICs
adds Extracted Ion Chromatograms (EICs) for all
features within a featureGroups
object.
getDACalibrationError
returns a data.frame
with a
column of all analyses (named analysis
) and their mass error (named
error
).
Several utility functions for caching workflow data. The most important function is clearCache
; other
functions are primarily for internal use.
makeHash(..., checkDT = TRUE) makeFileHash(..., length = Inf) loadCacheData(category, hashes, dbArg = NULL, simplify = TRUE, fixDTs = TRUE) saveCacheData(category, data, hash, dbArg = NULL) clearCache(what = NULL, file = NULL, vacuum = TRUE)
makeHash(..., checkDT = TRUE) makeFileHash(..., length = Inf) loadCacheData(category, hashes, dbArg = NULL, simplify = TRUE, fixDTs = TRUE) saveCacheData(category, data, hash, dbArg = NULL) clearCache(what = NULL, file = NULL, vacuum = TRUE)
... |
Arguments/objects to be used for hashing. |
checkDT |
|
length |
Maximum file length to hash. Passed to |
category |
The category of the object to be cached. |
hashes |
A |
dbArg |
Alternative connection to database. Default is |
simplify |
If |
fixDTs |
Should be |
data |
The object to be cached. |
hash |
The hash string of the object to be cached (e.g. obtained with |
what |
This argument describes what should be done. When |
file |
The cache file. If |
vacuum |
If |
makeHash
Make a hash string of given arguments.
makeFileHash
Generates a hash from the contents of one or more files.
loadCacheData
Loads cached data from a database.
saveCacheData
caches data in a database.
clearCache
will either remove one or more tables within the cache sqlite
database or simply
wipe the whole cache file. Removing tables will VACUUM
the database (unless vacuum=FALSE
), which may
take some time for large cache files.
These functions provide interactive utilities to explore and review workflow data using a shiny graphical user interface (GUI). In addition, unsatisfactory data (e.g. noise identified as a feature and unrelated feature groups in a component) can easily be selected for removal.
checkFeatures( fGroups, session = "checked-features.yml", EICParams = getDefEICParams(), clearSession = FALSE ) checkComponents( components, fGroups, session = "checked-components.yml", EICParams = getDefEICParams(), clearSession = FALSE ) ## S4 method for signature 'components' checkComponents( components, fGroups, session = "checked-components.yml", EICParams = getDefEICParams(), clearSession = FALSE ) importCheckFeaturesSession( sessionIn, sessionOut, fGroups, rtWindow = 6, mzWindow = 0.002, overWrite = FALSE ) ## S4 method for signature 'featureGroups' checkFeatures( fGroups, session = "checked-features.yml", EICParams = getDefEICParams(), clearSession = FALSE ) getMCTrainData(fGroups, session) predictCheckFeaturesSession(fGroups, session, model = NULL, overWrite = FALSE)
checkFeatures( fGroups, session = "checked-features.yml", EICParams = getDefEICParams(), clearSession = FALSE ) checkComponents( components, fGroups, session = "checked-components.yml", EICParams = getDefEICParams(), clearSession = FALSE ) ## S4 method for signature 'components' checkComponents( components, fGroups, session = "checked-components.yml", EICParams = getDefEICParams(), clearSession = FALSE ) importCheckFeaturesSession( sessionIn, sessionOut, fGroups, rtWindow = 6, mzWindow = 0.002, overWrite = FALSE ) ## S4 method for signature 'featureGroups' checkFeatures( fGroups, session = "checked-features.yml", EICParams = getDefEICParams(), clearSession = FALSE ) getMCTrainData(fGroups, session) predictCheckFeaturesSession(fGroups, session, model = NULL, overWrite = FALSE)
fGroups |
A This should be the 'new' object for |
session |
The session file name. |
EICParams |
A named |
clearSession |
If |
components |
The |
sessionIn , sessionOut
|
The file names for the input and output sessions. |
rtWindow |
The retention time window (seconds) used to relate 'old' with 'new' feature groups. |
mzWindow |
The m/z window (in Da) used to relate 'old' with 'new' feature groups. |
overWrite |
Set to |
model |
The model that was created with MetaClean and that should be used to predict pass/fail data. If
|
The data selected for removal is stored in sessions. These are ‘YAML’ files to allow easy external
manipulation. The sessions can be used to restore the selections that were made for data removal when the GUI tool is
executed again. Furthermore, functionality is provided to import and export sessions. To actually remove the data the
filter
method should be used with the session file as input.
checkComponents
is used to review components and their feature groups contained within. A typical use
case is to verify that peaks from features that were annotated as related adducts and/or isotopes are correctly
aligned.
importCheckFeaturesSession
is used to import a session file that was generated from a different
featureGroups
object. This is useful to avoid re-doing manual interpretation of chromatographic peaks
when, for instance, feature group data is re-created with different parameters.
checkFeatures
is used to review chromatographic information for feature groups. Its main purpose is
to assist in reviewing the quality of detected feature (groups) and easily select unwanted data such as features
with poor peak shapes or noise.
getMCTrainData
converts a session created by checkFeatures
to a data.frame
that can be
used by the MetaClean to train a new model. The output format is comparable to that from
getPeakQualityMetrics
.
predictCheckFeaturesSession
Uses ML data from MetaClean to predict the quality (Pass/Fail) of
feature group data, and converts this to a session which can be reviewed with checkFeatures
and used to
remove unwanted feature groups by filter
.
The topMost
and topMostByRGroup
EIC parameters (EICParams
) are ignored.
checkComponents
: Some componentization algorithms (e.g. generateComponentsNontarget
and generateComponentsTPs
) may output components where the same feature group in a component is
present multiple times, for instance, when multiple TPs are matched to the same feature group. If such a feature
group is selected for removal, then all of its result in the component will be marked for removal.
getMCTrainData
only uses session data for selected feature groups. Selected features for removal are
ignored, as this is not supported by MetaClean.
Chetnik K, Petrick L, Pandey G (2020). “MetaClean: a machine learning-based classifier for reduced false positive peak detection in untargeted LC-MS metabolomics data.” Metabolomics, 16(11). doi:10.1007/s11306-020-01738-3.
Functionality to compare feature groups and make a consensus.
comparison(..., groupAlgo, groupArgs = list(rtalign = FALSE)) ## S4 method for signature 'featureGroups' comparison(..., groupAlgo, groupArgs = list(rtalign = FALSE)) ## S4 method for signature 'featureGroupsComparison,missing' plot(x, retMin = FALSE, ...) ## S4 method for signature 'featureGroupsComparison' plotVenn(obj, which = NULL, ...) ## S4 method for signature 'featureGroupsComparison' plotUpSet(obj, which = NULL, ...) ## S4 method for signature 'featureGroupsComparison' plotChord(obj, addSelfLinks = FALSE, addRetMzPlots = TRUE, ...) ## S4 method for signature 'featureGroupsComparison' consensus( obj, absMinAbundance = NULL, relMinAbundance = NULL, uniqueFrom = NULL, uniqueOuter = FALSE, verifyAnaInfo = TRUE ) ## S4 method for signature 'featureGroupsSet' comparison(..., groupAlgo, groupArgs = list(rtalign = FALSE)) ## S4 method for signature 'featureGroupsComparisonSet' consensus(obj, ...)
comparison(..., groupAlgo, groupArgs = list(rtalign = FALSE)) ## S4 method for signature 'featureGroups' comparison(..., groupAlgo, groupArgs = list(rtalign = FALSE)) ## S4 method for signature 'featureGroupsComparison,missing' plot(x, retMin = FALSE, ...) ## S4 method for signature 'featureGroupsComparison' plotVenn(obj, which = NULL, ...) ## S4 method for signature 'featureGroupsComparison' plotUpSet(obj, which = NULL, ...) ## S4 method for signature 'featureGroupsComparison' plotChord(obj, addSelfLinks = FALSE, addRetMzPlots = TRUE, ...) ## S4 method for signature 'featureGroupsComparison' consensus( obj, absMinAbundance = NULL, relMinAbundance = NULL, uniqueFrom = NULL, uniqueOuter = FALSE, verifyAnaInfo = TRUE ) ## S4 method for signature 'featureGroupsSet' comparison(..., groupAlgo, groupArgs = list(rtalign = FALSE)) ## S4 method for signature 'featureGroupsComparisonSet' consensus(obj, ...)
... |
For For For |
groupAlgo |
The |
groupArgs |
A |
x , obj
|
The |
retMin |
If |
which |
A character vector specifying one or more labels of compared
feature groups. For |
addSelfLinks |
If |
addRetMzPlots |
Set to |
absMinAbundance , relMinAbundance
|
Minimum absolute or relative
(‘0-1’) abundance across objects for a result to be kept. For
instance, |
uniqueFrom |
Set this argument to only retain feature groups that are unique
within one or more of the objects for which the consensus is made.
Selection is done by setting the value of |
uniqueOuter |
If |
verifyAnaInfo |
If |
Feature groups objects originating from differing feature finding and/or grouping algorithms (or their parameters) may be compared to assess their output and generate a consensus.
The comparison
method generates a
featureGroupsComparison
object from given feature groups
objects, which in turn may be used for (visually) comparing presence of
feature groups and generating a consensus. Internally, this function will
collapse each feature groups object to pseudo features objects by
averaging their retention times, m/z values and intensities, where
each original feature groups object becomes an 'analysis'. All
pseudo features are then grouped using
regular feature grouping algorithms so that a
comparison can be made.
plot
generates an m/z vs retention time plot.
plotVenn
plots a Venn diagram outlining unique and shared
feature groups between up to five compared feature groups.
plotUpSet
plots an UpSet diagram outlining unique and shared
feature groups.
plotChord
plots a chord diagram to visualize the distribution
of feature groups.
consensus
combines all compared feature groups and averages their retention, m/z and intensity
data. Not yet supported for sets workflows.
comparison
returns a featureGroupsComparison
object.
plotVenn
(invisibly) returns a list with the following fields:
gList
the gList
object that was returned by
the utilized VennDiagram plotting function.
areas
The total area for each plotted group.
intersectionCounts
The number of intersections between groups.
The order for the areas
and intersectionCounts
fields is the same as the parameter order
from the used plotting function (see e.g. draw.pairwise.venn
and
draw.triple.venn
).
consensus
returns a featureGroups
object with a consensus from the compared feature
groups.
This base class is derived from components
and is used to store components resulting from hierarchical
clustering information, for instance, generated by generateComponentsIntClust
and
generateComponentsSpecClust
.
## S4 method for signature 'componentsClust' delete(obj, ...) ## S4 method for signature 'componentsClust' clusters(obj) ## S4 method for signature 'componentsClust' cutClusters(obj) ## S4 method for signature 'componentsClust' clusterProperties(obj) ## S4 method for signature 'componentsClust' treeCut(obj, k = NULL, h = NULL) ## S4 method for signature 'componentsClust' treeCutDynamic(obj, maxTreeHeight, deepSplit, minModuleSize) ## S4 method for signature 'componentsClust,missing' plot( x, pal = "Paired", numericLabels = TRUE, colourBranches = length(x) < 50, showLegend = length(x) < 20, ... ) ## S4 method for signature 'componentsClust' plotSilhouettes(obj, kSeq, pch = 16, type = "b", ...)
## S4 method for signature 'componentsClust' delete(obj, ...) ## S4 method for signature 'componentsClust' clusters(obj) ## S4 method for signature 'componentsClust' cutClusters(obj) ## S4 method for signature 'componentsClust' clusterProperties(obj) ## S4 method for signature 'componentsClust' treeCut(obj, k = NULL, h = NULL) ## S4 method for signature 'componentsClust' treeCutDynamic(obj, maxTreeHeight, deepSplit, minModuleSize) ## S4 method for signature 'componentsClust,missing' plot( x, pal = "Paired", numericLabels = TRUE, colourBranches = length(x) < 50, showLegend = length(x) < 20, ... ) ## S4 method for signature 'componentsClust' plotSilhouettes(obj, kSeq, pch = 16, type = "b", ...)
... |
Further options passed to |
k , h
|
Desired number of clusters or tree height to be used for cutting the dendrogram, respectively. One or the
other must be specified. Analogous to |
maxTreeHeight , deepSplit , minModuleSize
|
Arguments used by
|
x , obj
|
A |
pal |
Colour palette to be used from RColorBrewer. |
numericLabels |
Set to |
colourBranches |
Whether branches from cut clusters (and their labels)
should be coloured. Might be slow with large numbers of clusters, hence,
the default is only |
showLegend |
If |
kSeq |
An integer vector containing the sequence that should be used for average silhouette width calculation. |
pch , type
|
Passed to |
clusters(componentsClust)
: Accessor method to the clust
slot, which was generated by hclust
.
cutClusters(componentsClust)
: Accessor method to the cutClusters
slot. Returns a vector with cluster membership
for each candidate (format as cutree
).
clusterProperties(componentsClust)
: Returns a list with properties on how the
clustering was performed.
treeCut(componentsClust)
: Manually (re-)cut the dendrogram.
treeCutDynamic(componentsClust)
: Automatically (re-)cut the dendrogram using the cutreeDynamicTree
function
from dynamicTreeCut.
plot(x = componentsClust, y = missing)
: generates a dendrogram from a given cluster object and optionally highlights resulting
branches when the cluster is cut.
plotSilhouettes(componentsClust)
: Plots the average silhouette width when the
clusters are cut by a sequence of k numbers. The k value with the highest
value (marked in the plot) may be considered as the optimal number of
clusters.
distm
Distance matrix that was used for clustering (obtained with daisy
).
clust
Object returned by hclust
.
cutClusters
A list
with assigned clusters (same format as what cutree
returns).
gInfo
The groupInfo
of the feature groups object that was used.
properties
A list containing general properties and parameters used for clustering.
altered
Set to TRUE
if the object was altered (e.g. filtered) after its creation.
The intensity values for components (used by plotSpectrum
) are set
to a dummy value (1) as no single intensity value exists for this kind of
components.
When the object is altered (e.g. by filtering or subsetting it), methods that need the original clustered data such as plotting methods do not work anymore and stop with an error.
Schollee JE, Bourgin M, von Gunten U, McArdell CS, Hollender J (2018). “Non-target screening to trace ozonation transformation products in a wastewater treatment train including different post-treatments.” Water Research, 142, 267–278. doi:10.1016/j.watres.2018.05.045.
components
and generateComponents
This class is derived from components
and is used to store
results from unsupervised homolog detection with the nontarget
package.
## S4 method for signature 'componentsNT' plotGraph(obj, onlyLinked = TRUE, width = NULL, height = NULL) ## S4 method for signature 'componentsNTSet' plotGraph(obj, onlyLinked = TRUE, set, ...) ## S4 method for signature 'componentsNTSet' unset(obj, set)
## S4 method for signature 'componentsNT' plotGraph(obj, onlyLinked = TRUE, width = NULL, height = NULL) ## S4 method for signature 'componentsNTSet' plotGraph(obj, onlyLinked = TRUE, set, ...) ## S4 method for signature 'componentsNTSet' unset(obj, set)
obj |
The |
onlyLinked |
If |
width , height
|
Passed to |
set |
(sets workflow) The name of the set. |
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
Objects from this class are generated by
generateComponentsNontarget
plotGraph
returns the result of visNetwork
.
plotGraph(componentsNT)
: Plots an interactive network graph for linked
homologous series (i.e. series with (partial) overlap which could
not be merged). The resulting graph can be browsed interactively and allows
quick inspection of series which may be related. The graph is constructed
with the igraph package and rendered with
visNetwork.
homol
A list
with homol
objects for each replicate group
as returned by homol.search
The componentsNTSet
class is applicable for sets workflows. This class is derived from componentsNT
and therefore largely follows the same user interface.
The following methods are specifically defined for sets workflows:
All the methods from base class workflowStepSet
.
unset
Converts the object data for a specified set into a 'non-set' object (componentsNTUnset
), which allows it to be used in 'regular' workflows. Only the components in the specified set are kept. Furthermore, the
component names are restored to non-set specific names (see generateComponents
for more details).
The following methods are changed or with new functionality:
plotGraph
Currently can only create graph networks from one set (specified by the set
argument).
Note that the componentsNTSet
class does not have a homol
slot. Instead, the setObjects
method can be used to access this data for a specific set.
Loos M, Singer H (2017).
“Nontargeted homologue series extraction from hyphenated high resolution mass spectrometry data.”
Journal of Cheminformatics, 9(1).
doi:10.1186/s13321-017-0197-z.
Loos, M., Gerber, C., Corona, F., Hollender, J., Singer, H. (2015).
Accelerated isotope fine structure calculation using pruned transition trees,
Analytical Chemistry 87(11), 5738-5744.
Csárdi G, Nepusz T, Traag V, Horvát Sz, Zanini F, Noom D, Müller K (2024). _igraph: Network Analysis and Visualization in R_. doi:10.5281/zenodo.7682609 <https://doi.org/10.5281/zenodo.7682609>, R package version 2.1.1, <https://CRAN.R-project.org/package=igraph>.
components
and generateComponents
This class is derived from componentsClust
and is used to store components from feature groups that
were clustered based on their MS/MS similarities.
Objects from this class are generated by generateComponentsSpecClust
When the object is altered (e.g. by filtering or subsetting it), methods that need the original clustered data such as plotting methods do not work anymore and stop with an error.
componentsClust
for other relevant methods and generateComponents
This class is derived from components
and is used to store components that result from linking feature
groups that are (predicted to be) parents with feature groups that (are predicted to be) transformation products. For
more details, see generateComponentsTPs
.
## S4 method for signature 'componentsTPs' as.data.table(x) ## S4 method for signature 'componentsTPs' filter( obj, ..., retDirMatch = FALSE, minSpecSim = NULL, minSpecSimPrec = NULL, minSpecSimBoth = NULL, minFragMatches = NULL, minNLMatches = NULL, formulas = NULL, verbose = TRUE, negate = FALSE ) ## S4 method for signature 'componentsTPs' plotGraph(obj, onlyLinked = TRUE, width = NULL, height = NULL)
## S4 method for signature 'componentsTPs' as.data.table(x) ## S4 method for signature 'componentsTPs' filter( obj, ..., retDirMatch = FALSE, minSpecSim = NULL, minSpecSimPrec = NULL, minSpecSimBoth = NULL, minFragMatches = NULL, minNLMatches = NULL, formulas = NULL, verbose = TRUE, negate = FALSE ) ## S4 method for signature 'componentsTPs' plotGraph(obj, onlyLinked = TRUE, width = NULL, height = NULL)
x , obj
|
A |
... , verbose
|
Further arguments passed to the base |
retDirMatch |
If set to |
minSpecSim , minSpecSimPrec , minSpecSimBoth
|
The minimum spectral similarity of a TP compared to its parent
(‘0-1’). The |
minFragMatches , minNLMatches
|
Minimum number of parent/TP fragment and neutral loss matches, respectively. Set
to |
formulas |
A |
negate |
If |
onlyLinked |
If |
width , height
|
Passed to |
filter
returns a filtered componentsTPs
object.
plotGraph
returns the result of visNetwork
.
as.data.table(componentsTPs)
: Returns all component data as a data.table
.
filter(componentsTPs)
: Provides various rule based filtering options to clean and prioritize TP data.
plotGraph(componentsTPs)
: Plots an interactive network graph for linked components. Components are linked with each
other if one or more transformation products overlap. The graph is constructed with the igraph package
and rendered with visNetwork.
fromTPs
A logical
that is TRUE
when the componentization was performed with
transformationProducts
data.
The intensity values for components (used by plotSpectrum
) are set
to a dummy value (1) as no single intensity value exists for this kind of
components.
Csárdi G, Nepusz T, Traag V, Horvát Sz, Zanini F, Noom D, Müller K (2024). _igraph: Network Analysis and Visualization in R_. doi:10.5281/zenodo.7682609 <https://doi.org/10.5281/zenodo.7682609>, R package version 2.1.1, <https://CRAN.R-project.org/package=igraph>.
components
for other relevant methods and generateComponents
Contains data for feature groups that are related in some way. These components commonly include adducts, isotopes and homologues.
componentTable(obj) componentInfo(obj) findFGroup(obj, fGroup) ## S4 method for signature 'components' componentTable(obj) ## S4 method for signature 'components' componentInfo(obj) ## S4 method for signature 'components' groupNames(obj) ## S4 method for signature 'components' length(x) ## S4 method for signature 'components' names(x) ## S4 method for signature 'components' show(object) ## S4 method for signature 'components,ANY,ANY,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'components,ANY,ANY' x[[i, j]] ## S4 method for signature 'components' x$name ## S4 method for signature 'components' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'components' as.data.table(x) ## S4 method for signature 'components' filter( obj, size = NULL, adducts = NULL, isotopes = NULL, rtIncrement = NULL, mzIncrement = NULL, checkComponentsSession = NULL, negate = FALSE, verbose = TRUE ) ## S4 method for signature 'components' findFGroup(obj, fGroup) ## S4 method for signature 'components' plotSpectrum(obj, index, markFGroup = NULL, xlim = NULL, ylim = NULL, ...) ## S4 method for signature 'components' plotChroms(obj, index, fGroups, EICParams = getDefEICParams(rtWindow = 5), ...) ## S4 method for signature 'components' consensus(obj, ...) ## S4 method for signature 'componentsFeatures' show(object) ## S4 method for signature 'componentsSet' show(object) ## S4 method for signature 'componentsSet,ANY,ANY,missing' x[i, j, ..., sets = NULL, drop = TRUE] ## S4 method for signature 'componentsSet' filter(obj, ..., negate = FALSE, sets = NULL) ## S4 method for signature 'componentsSet' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'componentsSet' consensus(obj, ...) ## S4 method for signature 'componentsSet' unset(obj, set)
componentTable(obj) componentInfo(obj) findFGroup(obj, fGroup) ## S4 method for signature 'components' componentTable(obj) ## S4 method for signature 'components' componentInfo(obj) ## S4 method for signature 'components' groupNames(obj) ## S4 method for signature 'components' length(x) ## S4 method for signature 'components' names(x) ## S4 method for signature 'components' show(object) ## S4 method for signature 'components,ANY,ANY,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'components,ANY,ANY' x[[i, j]] ## S4 method for signature 'components' x$name ## S4 method for signature 'components' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'components' as.data.table(x) ## S4 method for signature 'components' filter( obj, size = NULL, adducts = NULL, isotopes = NULL, rtIncrement = NULL, mzIncrement = NULL, checkComponentsSession = NULL, negate = FALSE, verbose = TRUE ) ## S4 method for signature 'components' findFGroup(obj, fGroup) ## S4 method for signature 'components' plotSpectrum(obj, index, markFGroup = NULL, xlim = NULL, ylim = NULL, ...) ## S4 method for signature 'components' plotChroms(obj, index, fGroups, EICParams = getDefEICParams(rtWindow = 5), ...) ## S4 method for signature 'components' consensus(obj, ...) ## S4 method for signature 'componentsFeatures' show(object) ## S4 method for signature 'componentsSet' show(object) ## S4 method for signature 'componentsSet,ANY,ANY,missing' x[i, j, ..., sets = NULL, drop = TRUE] ## S4 method for signature 'componentsSet' filter(obj, ..., negate = FALSE, sets = NULL) ## S4 method for signature 'componentsSet' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'componentsSet' consensus(obj, ...) ## S4 method for signature 'componentsSet' unset(obj, set)
obj , object , x
|
The |
fGroup |
The name (thus a character) of the feature group that should be searched for. |
i , j
|
For |
... |
For For For For For sets workflow methods: further arguments passed to the base |
drop |
ignored. |
name |
The component name (partially matched). |
size |
Should be a two sized vector with the minimum/maximum size of a component. Set to |
adducts |
Remove any feature groups within components that do not match given adduct rules. If |
isotopes |
Only keep results that match a given isotope rule. If |
rtIncrement , mzIncrement
|
Should be a two sized vector with the minimum/maximum retention or mz increment of a
homologous series. Set to |
checkComponentsSession |
If set then components and/or feature groups are removed that were selected for removal
(see check-GUI and the |
negate |
If |
verbose |
If set to |
index |
The index of the component. Can be a numeric index or a character with its name. |
markFGroup |
If specified (i.e. not |
xlim , ylim
|
Sets the plot size limits used by
|
fGroups |
The |
EICParams |
A named |
sets |
(sets workflow) A |
set |
(sets workflow) The name of the set. |
components
objects are obtained from generateComponents
.
delete
returns the object for which the specified data was removed.
consensus
returns a components
object that is produced
by merging multiple specified components
objects.
componentTable(components)
: Accessor method for the components
slot of a
components
class. Each component is stored as a
data.table
.
componentInfo(components)
: Accessor method for the componentInfo
slot of a
components
class.
groupNames(components)
: returns a character
vector with the names of the
feature groups for which data is present in this object.
length(components)
: Obtain total number of components.
names(components)
: Obtain the names of all components.
show(components)
: Show summary information for this object.
x[i
: Subset on components/feature groups.
x[[i
: Extracts a component table, optionally filtered by a feature group.
$
: Extracts a component table by component name.
delete(components)
: Completely deletes specified (parts of) components.
as.data.table(components)
: Returns all component data in a table.
filter(components)
: Provides rule based filtering for components.
findFGroup(components)
: Returns the component id(s) to which a feature group
belongs.
plotSpectrum(components)
: Plot a pseudo mass spectrum for a single
component.
plotChroms(components)
: Plot an extracted ion chromatogram (EIC) for all feature groups within a single component.
consensus(components)
: Generates a consensus from multiple components
objects. At this point results are simply combined and no attempt is made to
merge similar components.
components
List of all components in this object. Use the
componentTable
method for access.
componentInfo
A data.table
containing general information
for each component. Use the componentInfo
method for access.
The componentsSet
class is applicable for sets workflows. This class is derived from components
and therefore largely follows the same user interface.
The following methods are specifically defined for sets workflows:
All the methods from base class workflowStepSet
.
unset
Converts the object data for a specified set into a 'non-set' object (componentsUnset
), which allows it to be used in 'regular' workflows. Only the components in the specified set are kept.
The following methods are changed or with new functionality:
filter
and the subset operator ([
) Can be used to select components that are only present for
selected sets.
filter
Applies only those filters for which a component has data available. For instance, filtering by
adduct will only filter any results within a component if that component contains adduct information.
For plotChroms
: The topMost
and topMostByRGroup
EIC parameters are ignored unless the
components are from homologous series.
Objects from this class are used to store hierarchical clustering data of
candidate structures within compounds
objects.
## S4 method for signature 'compoundsCluster' clusters(obj) ## S4 method for signature 'compoundsCluster' cutClusters(obj) ## S4 method for signature 'compoundsCluster' clusterProperties(obj) ## S4 method for signature 'compoundsCluster' groupNames(obj) ## S4 method for signature 'compoundsCluster' length(x) ## S4 method for signature 'compoundsCluster' lengths(x, use.names = TRUE) ## S4 method for signature 'compoundsCluster' show(object) ## S4 method for signature 'compoundsCluster,ANY,missing,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'compoundsCluster' treeCut(obj, k = NULL, h = NULL, groupName) ## S4 method for signature 'compoundsCluster' treeCutDynamic(obj, maxTreeHeight, deepSplit, minModuleSize, groupName) ## S4 method for signature 'compoundsCluster,missing' plot( x, ..., groupName, pal = "Paired", colourBranches = lengths(x)[groupName] < 50, showLegend = lengths(x)[groupName] < 20 ) ## S4 method for signature 'compoundsCluster' getMCS(obj, groupName, cluster) ## S4 method for signature 'compoundsCluster' plotStructure( obj, groupName, cluster, width = 500, height = 500, withTitle = TRUE ) ## S4 method for signature 'compoundsCluster' plotSilhouettes(obj, kSeq, groupName, pch = 16, type = "b", ...)
## S4 method for signature 'compoundsCluster' clusters(obj) ## S4 method for signature 'compoundsCluster' cutClusters(obj) ## S4 method for signature 'compoundsCluster' clusterProperties(obj) ## S4 method for signature 'compoundsCluster' groupNames(obj) ## S4 method for signature 'compoundsCluster' length(x) ## S4 method for signature 'compoundsCluster' lengths(x, use.names = TRUE) ## S4 method for signature 'compoundsCluster' show(object) ## S4 method for signature 'compoundsCluster,ANY,missing,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'compoundsCluster' treeCut(obj, k = NULL, h = NULL, groupName) ## S4 method for signature 'compoundsCluster' treeCutDynamic(obj, maxTreeHeight, deepSplit, minModuleSize, groupName) ## S4 method for signature 'compoundsCluster,missing' plot( x, ..., groupName, pal = "Paired", colourBranches = lengths(x)[groupName] < 50, showLegend = lengths(x)[groupName] < 20 ) ## S4 method for signature 'compoundsCluster' getMCS(obj, groupName, cluster) ## S4 method for signature 'compoundsCluster' plotStructure( obj, groupName, cluster, width = 500, height = 500, withTitle = TRUE ) ## S4 method for signature 'compoundsCluster' plotSilhouettes(obj, kSeq, groupName, pch = 16, type = "b", ...)
obj , x , object
|
A |
use.names |
A logical value specifying whether the returned vector should be named with the feature group names. |
i |
For |
... |
Further arguments passed directly to the plotting function
( |
drop , j
|
ignored. |
k , h
|
Desired number of clusters or tree height to be used for cutting
the dendrogram, respecitively. One or the other must be specified.
Analogous to |
groupName |
A character specifying the feature group name. |
maxTreeHeight , deepSplit , minModuleSize
|
Arguments used by
|
pal |
Colour palette to be used from RColorBrewer. |
colourBranches |
Whether branches from cut clusters (and their labels)
should be coloured. Might be slow with large numbers of clusters, hence,
the default is only |
showLegend |
If |
cluster |
A numeric value specifying the cluster. |
width , height
|
The dimensions (in pixels) of the raster image that should be plotted. |
withTitle |
A logical value specifying whether a title should be added. |
kSeq |
An integer vector containing the sequence that should be used for average silhouette width calculation. |
pch , type
|
Passed to |
Objects from this type are returned by the compounds
method for
makeHCluster
.
cutTree
and cutTreeDynamic
return the modified
compoundsCluster
object.
getMCS
returns an rcdk molecule object
(IAtomContainer
).
clusters(compoundsCluster)
: Accessor method to the clusters
slot.
Returns a list that contains for each feature group an object as returned
by hclust
.
cutClusters(compoundsCluster)
: Accessor method to the cutClusters
slot.
Returns a list that contains for each feature group a vector with cluster
membership for each candidate (format as cutree
).
clusterProperties(compoundsCluster)
: Returns a list with properties on how the
clustering was performed.
groupNames(compoundsCluster)
: returns a character
vector with the names of the
feature groups for which data is present in this object.
length(compoundsCluster)
: Returns the total number of clusters.
lengths(compoundsCluster)
: Returns a vector
with the number of
clusters per feature group.
show(compoundsCluster)
: Show summary information for this object.
x[i
: Subset on feature groups.
treeCut(compoundsCluster)
: Manually (re-)cut a dendrogram that was
generated for a feature group.
treeCutDynamic(compoundsCluster)
: Automatically (re-)cut a dendrogram that was
generated for a feature group using the cutreeDynamicTree
function from dynamicTreeCut.
plot(x = compoundsCluster, y = missing)
: Plot the dendrogram for clustered compounds of a
feature group. Clusters are highlighted using dendextend.
getMCS(compoundsCluster)
: Calculates the maximum common substructure (MCS)
for all candidate structures within a specified cluster. This method uses
the get.mcs
function from rcdk.
plotStructure(compoundsCluster)
: Plots the maximum common substructure (MCS) for
all candidate structures within a specified cluster.
plotSilhouettes(compoundsCluster)
: Plots the average silhouette width when the
clusters are cut by a sequence of k numbers. The k value with the highest
value (marked in the plot) may be considered as the optimal number of
clusters.
clusters
A list
with hclust
objects for each
feature group.
dists
A list
with distance matrices for each feature group.
SMILES
A list
containing a vector with SMILES
for all
candidate structures per feature group.
cutClusters
A list
with assigned clusters for all candidates per
feature group (same format as what cutree
returns).
properties
A list containing general properties and parameters used for clustering.
Returns an overview of scorings may be applied to rank candidate compounds.
compoundScorings( algorithm = NULL, database = NULL, includeSuspectLists = TRUE, onlyDefault = FALSE, includeNoDB = TRUE )
compoundScorings( algorithm = NULL, database = NULL, includeSuspectLists = TRUE, onlyDefault = FALSE, includeNoDB = TRUE )
algorithm |
The algorithm: |
database |
The database for which results should be returned (e.g. |
includeSuspectLists , onlyDefault , includeNoDB
|
A logical specifying whether scoring terms related to suspect lists, default scoring terms and non-database specific scoring terms should be included in the output, respectively. |
A data.frame
with information on which scoring terms are used, what their algorithm specific name is
and other information such as to which database they apply and short remarks.
generateCompounds
This class is derived from compounds
and contains additional specific SIRIUS
data.
Objects from this class are generated by generateCompoundsSIRIUS
fingerprints
A list
with for each feature group result a data.table
containing fingerprints
obtained with CSI:FingerID
.
Duhrkop K, Fleischauer M, Ludwig M, Aksenov AA, Melnik AV, Meusel M, Dorrestein PC, Rousu J, Bocker S (2019).
“SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information.”
Nature Methods, 16(4), 299–302.
doi:10.1038/s41592-019-0344-8.
Duhrkop K, Bocker S (2015).
“Fragmentation Trees Reloaded.”
In Przytycka TM (ed.), Research in Computational Molecular Biology, 65–79.
ISBN 978-3-319-16706-0.
Duhrkop K, Shen H, Meusel M, Rousu J, Bocker S (2015).
“Searching molecular structure databases with tandem mass spectra using CSI:FingerID.”
Proceedings of the National Academy of Sciences, 112(41), 12580–12585.
doi:10.1073/pnas.1509788112.
Bocker S, Letzel MC, Liptak Z, Pervukhin A (2008).
“SIRIUS: decomposing isotope patterns for metabolite identification.”
Bioinformatics, 25(2), 218–224.
doi:10.1093/bioinformatics/btn603.
compounds
and generateCompoundsSIRIUS
Conversion of MS analysis files between several open and closed data formats.
MSFileFormats(algorithm = "pwiz", vendor = FALSE) convertMSFiles( files = NULL, outPath = NULL, dirs = TRUE, anaInfo = NULL, from = NULL, to = "mzML", overWrite = FALSE, algorithm = "pwiz", centroid = algorithm != "openms", filters = NULL, extraOpts = NULL, PWizBatchSize = 1 )
MSFileFormats(algorithm = "pwiz", vendor = FALSE) convertMSFiles( files = NULL, outPath = NULL, dirs = TRUE, anaInfo = NULL, from = NULL, to = "mzML", overWrite = FALSE, algorithm = "pwiz", centroid = algorithm != "openms", filters = NULL, extraOpts = NULL, PWizBatchSize = 1 )
algorithm |
Either |
vendor |
If |
files , dirs
|
The |
outPath |
A character vector specifying directories that should be used
for the output. Will be re-cycled if necessary. If |
anaInfo |
An analysis info table used to
retrieve input files. Either this argument or |
from |
Input format (see below). These are used to find analyses when
|
to |
Output format: |
overWrite |
Should existing destination file be overwritten
( |
centroid |
Set to |
filters |
When |
extraOpts |
A |
PWizBatchSize |
When |
MSFileFormats
returns a character
with all supported
input formats (see below).
convertMSFiles
converts the data format of an analysis to
another. It uses tools from
ProteoWizard
(msConvert
command), OpenMS
(FileConverter
command) or Bruker DataAnalysis to perform the
conversion. Supported input and output formats include ‘mzXML’,
‘.mzML’ and several vendor formats, depending on which algorithm is
used.
convertMSFiles
(except if algorithm="bruker"
) uses multiprocessing to parallelize
computations. Please see the parallelization section in the handbook for
more details and patRoon options for configuration
options.
Possible output formats (to
argument) are
mzXML
and mzML
.
Possible input formats (from
argument) depend on the algorithm that
was chosen and may include:
thermo
: Thermo ‘.RAW’ files (only
algorithm="pwiz"
).
bruker
: Bruker ‘.d’, ‘.yep’, ‘.baf’ and
‘.fid’ files (only algorithm="pwiz"
or
algorithm="bruker"
).
agilent
: Agilent ‘.d’ files (only
algorithm="pwiz"
).
ab
: AB Sciex ‘.wiff’ files (only
algorithm="pwiz"
).
waters
Waters ‘.RAW’ files (only
algorithm="pwiz"
).
mzXML
/mzML
: Open format ‘.mzXML’/‘.mzML’
files (only algorithm="pwiz"
or algorithm="openms"
).
Note that the actual supported file formats of ProteoWizard depend on how it was installed (see here).
Rost HL, Sachsenberg T, Aiche S, Bielow C, Weisser H, Aicheler F, Andreotti S, Ehrlich H, Gutenbrunner P, Kenar E, Liang X, Nahnsen S, Nilse L, Pfeuffer J, Rosenberger G, Rurik M, Schmitt U, Veit J, Walzer M, Wojnar D, Wolski WE, Schilling O, Choudhary JS, Malmstrom L, Aebersold R, Reinert K, Kohlbacher O (2016).
“OpenMS: a flexible open-source software platform for mass spectrometry data analysis.”
Nature Methods, 13(9), 741–748.
doi:10.1038/nmeth.3959.
Chambers MC, Maclean B, Burke R, Amodei D, Ruderman DL, Neumann S, Gatto L, Fischer B, Pratt B, Egertson J, Hoff K, Kessner D, Tasman N, Shulman N, Frewen B, Baker TA, Brusniak M, Paulse C, Creasy D, Flashner L, Kani K, Moulding C, Seymour SL, Nuwaysir LM, Lefebvre B, Kuhlmann F, Roark J, Rainer P, Detlev S, Hemenway T, Huhmer A, Langridge J, Connolly B, Chadick T, Holly K, Eckels J, Deutsch EW, Moritz RL, Katz JE, Agus DB, MacCoss M, Tabb DL, Mallick P (2012).
“A cross-platform toolkit for mass spectrometry and proteomics.”
Nature Biotechnology, 30(10), 918–920.
doi:10.1038/nbt.2377.
## Not run: # Use FileConverter of OpenMS to convert between open mzXML/mzML format convertMSFiles("standard-1.mzXML", to = "mzML", algorithm = "openms") # Convert all Thermo .RAW files in the analyses/raw directory to mzML and # store the files in analyses/mzml. During conversion files are centroided by # the peakPicking filter and only MS 1 data is kept. convertMSFiles("analyses/raw", "analyses/mzml", dirs = TRUE, from = "thermo", centroid = "vendor", filters = "msLevel 1") ## End(Not run)
## Not run: # Use FileConverter of OpenMS to convert between open mzXML/mzML format convertMSFiles("standard-1.mzXML", to = "mzML", algorithm = "openms") # Convert all Thermo .RAW files in the analyses/raw directory to mzML and # store the files in analyses/mzml. During conversion files are centroided by # the peakPicking filter and only MS 1 data is kept. convertMSFiles("analyses/raw", "analyses/mzml", dirs = TRUE, from = "thermo", centroid = "vendor", filters = "msLevel 1") ## End(Not run)
Returns the default adducts and their probabilities when the OpenMS algorithm is used for componentization.
defaultOpenMSAdducts(ionization)
defaultOpenMSAdducts(ionization)
ionization |
The ionization polarity: either |
See the potentialAdducts
argument of generateComponentsOpenMS
for more details.
Parameters for creation of extracted ion chromatograms.
getDefEICParams(...)
getDefEICParams(...)
... |
optional named arguments that override defaults. |
To configure the creation of extracted ion chromatograms (EICs) several parameters exist:
rtWindow
Retention time (in seconds) that will be subtracted/added to respectively the minimum and
maximum retention time of the feature. Thus, setting this value to ‘>0’ will 'zoom out' on the retention time
axis.
topMost
Only create EICs for this number of top most intense features. If NULL
then EICs are
created for all features.
topMostByRGroup
If set to TRUE
and topMost
is set: only create EICs for the top most
features in each replicate group. For instance, when topMost=1
and topMostByRGroup=TRUE
, then EICs will
be plotted for the most intense feature of each replicate group.
onlyPresent
If TRUE
then EICs are created only for analyses in which a feature was detected. If
onlyPresent=FALSE
then EICs are generated for all analyses. The latter is handy to evaluate if a peak
was 'missed' during feature detection or removed during e.g. filtering.
if onlyPresent=FALSE
then the following parameters are also relevant:
mzExpWindow
To create EICs for analyses in which no feature was found, the m/z value is derived
from the min/max values of all features in the feature group. The value of mzExpWindow
further expands this
window.
setsAdductPos
,setsAdductNeg
(sets workflow) In sets workflows the adduct must be known to calculate the
ionized m/z. If a feature is completely absent in a particular set then it follows no adduct annotations are
available and the value of setsAdductPos
(positive ionization data) or setsAdductNeg
(negative
ionization data) will be used instead.
These parameters are passed as a named list
as the EICParams
argument to functions that use EICs. The
getDefEICParams
function can be used to generate such parameter list with defaults.
Automatic optimization of feature finding and grouping parameters through Design of Experiments (DoE).
optimizeFeatureGrouping( features, algorithm, ..., templateParams = list(), paramRanges = list(), maxIterations = 50, maxModelDeviation = 0.1, parallel = TRUE ) generateFGroupsOptPSet(algorithm, ...) getDefFGroupsOptParamRanges(algorithm) optimizeFeatureFinding( anaInfo, algorithm, ..., templateParams = list(), paramRanges = list(), isoIdent = if (algorithm == "openms") "OpenMS" else "IPO", checkPeakShape = "none", CAMERAOpts = list(), maxIterations = 50, maxModelDeviation = 0.1, parallel = TRUE ) generateFeatureOptPSet(algorithm, ...) getDefFeaturesOptParamRanges(algorithm, method = "centWave")
optimizeFeatureGrouping( features, algorithm, ..., templateParams = list(), paramRanges = list(), maxIterations = 50, maxModelDeviation = 0.1, parallel = TRUE ) generateFGroupsOptPSet(algorithm, ...) getDefFGroupsOptParamRanges(algorithm) optimizeFeatureFinding( anaInfo, algorithm, ..., templateParams = list(), paramRanges = list(), isoIdent = if (algorithm == "openms") "OpenMS" else "IPO", checkPeakShape = "none", CAMERAOpts = list(), maxIterations = 50, maxModelDeviation = 0.1, parallel = TRUE ) generateFeatureOptPSet(algorithm, ...) getDefFeaturesOptParamRanges(algorithm, method = "centWave")
features |
A |
algorithm |
The algorithm used for finding or grouping features (see |
... |
One or more lists with parameter sets (see below) (for |
templateParams |
Template parameter set (see below). |
paramRanges |
A list with vectors containing absolute parameter ranges (minimum/maximum) that constrain numeric
parameters choosen during experiments. See the |
maxIterations |
Maximum number of iterations that may be performed to find optimimum values. Used to restrict neededless long optimization procedures. In IPO this was fixed to ‘50’. |
maxModelDeviation |
See the |
parallel |
If set to |
anaInfo |
Analysis info table (passed to |
isoIdent |
Sets the algorithm used to identify isotopes. Valid values
are: |
checkPeakShape |
Additional peak shape checking of isotopes. Only used
if |
CAMERAOpts |
A |
method |
Method used by XCMS to find features (only if |
Many different parameters exist that may affect the output quality of feature finding and grouping. To avoid time consuming manual experimentation, functionality is provided to largely automate the optimization process. The methodology, which uses design of experiments (DoE), is based on the excellent Isotopologue Parameter Optimization (IPO) R package. The functionality of this package is directly integrated in patRoon. Some functionality was added or changed, however, the principle algorithm workings are nearly identical.
Compared to IPO, the following functionality was added or changed:
The code was made more generic in order to include support for other feature finding/grouping algorithms (e.g. OpenMS, enviPick, XCMS3).
The methodology of FeatureFinderMetabo
(OpenMS) may be used to
find isotopes.
The maxModelDeviation
parameter was added to potentially avoid suboptimal results
(issue discussed here).
The use of multiple 'parameter sets' (discussed below) which, for instance, allow optimizing qualitative
paremeters more easily (see examples
).
More consistent optimization code for feature finding/grouping.
More consistent output using S4 classes (i.e. optimizationResult
class).
Parallelization is performed via the future package instead of BiocParallel. If this is enabled
(parallel=TRUE
) then any parallelization supported by the feature finding or grouping algorithm is disabled.
The optimizeFeatureFinding
and optimizeFeatureGrouping
return their results in a
optimizationResult
object.
Which parameters should be optimized is determined by a parameter set. A set is
defined by a named list
containing the minimum and maximum starting range for each parameter that should be
tested. For instance, the set list(chromFWHM = c(5, 10), mzPPM = c(5, 15))
specifies that the
chromFWHM
and mzPPM
parameters (used by OpenMS feature finding) should be optimized within a range of
‘5’-‘10’ and ‘5’-‘15’, respectively. Note that this range may be increased or decreased after a
DoE iteration in order to find a better optimum. The absolute limits are controlled by the paramRanges
function argument.
Multiple parameter sets may be specified (i.e. through the ... function argument). In this situation, the
optimization algorithm is repeated for each set, and the final optimum is determined from the parameter set with
the best response. The templateParams
function argument may be useful in this case to define a template for
each parameter set. Actual parameter sets are then constructed by joining each parameter set with the set specified
for templateParams
. When a parameter is defined in both a regular and template set, the parameter in the
regular set takes precedence.
Parameters that should not be optimized but still need to be set for the feature finding/grouping functions should
also be defined in a (template) parameter set. Which parameters should be optimized is determined whether its value
is specified as a vector range or a single fixed value. For instance, when a set is defined as list(chromFWHM
= c(5, 10), mzPPM = 5)
, only the chromFWHM
parameter is optimized, whereas mzPPM
is kept constant at
‘5’.
Using multiple parameter sets with differing fixed values allows optimization of qualitative values (see examples below).
The parameters specified in parameter sets are directly passed through the findFeatures
or
groupFeatures
functions. Hence, grouping and retention time alignment parameters used by XCMS should
(still) be set through the groupArgs
and retcorArgs
parameters.
NOTE: For XCMS3, which normally uses parameter classes for settings its options, the parameters must be
defined in a named list like any other algorithm. The set parameters are then used passed to the constructor of the
right parameter class object (e.g. CentWaveParam
, ObiwarpParam
). For grouping/alignment
sets, these parameters need to be specified in nested lists called groupParams
and retAlignParams
,
respectively (similar to groupArgs
/retcorArgs
for algorithm="xcms"
). Finally, the underlying
XCMS method to be used should be defined in the parameter set (i.e. by setting the method
field for
feature parameter sets and the groupMethod
and retAlignMethod
for grouping/aligning parameter sets).
See the examples below for more details.
NOTE: Similar to IPO, the peakwidth
and prefilter
parameters for XCMS feature finding should
be split in two different values:
The minimum and maximum ranges for peakwidth
are optimized by setting min_peakwidth
and
max_peakwidth
, respectively.
The k
and I
parameters contained in prefilter
are split in prefilter
and
value_of_prefilter
, respectively.
Similary, for KPIC2, the following parameters should be split:
the width
parameter (feature optimization) is optimized by specifying the min_width
and
max_width
parameters.
the tolerance
and weight
parameters (feature grouping optimization) are optimized by setting
mz_tolerance
/rt_tolerance
and mz_weight
/rt_weight
parameters, respectively.
The optimizeFeatureFinding
and optimizeFeatureGrouping
are the functions to be used
to optimize parameters for feature finding and grouping, respectively. These functions are analogous to
optimizeXcmsSet
and optimizeRetGroup
from IPO.
The generateFeatureOptPSet
and generateFGroupsOptPSet
functions may be used to generate a parameter
set for feature finding and grouping, respectively. Some algorithm dependent default parameter optimization ranges
will be returned. These functions are analogous to getDefaultXcmsSetStartingParams
and
getDefaultRetGroupStartingParams
from IPO. However, unlike their IPO counterparts, these
functions will not output default fixed values. The generateFGroupsOptPSet
will only generate defaults for
density grouping if algorithm="xcms"
.
The getDefFeaturesOptParamRanges
and getDefFGroupsOptParamRanges
return the default absolute
optimization parameter ranges for feature finding and grouping, respectively. These functions are useful if you
want to set the paramRanges
function argument.
After each experiment iteration an optimimum parameter
set is found by generating a model containing the tested parameters and their responses. Sometimes the actual
response from the parameters derived from the model is actually signficantly lower than expected. When the response
is lower than the maximum reponse found during the experiment, the parameters belonging to this experimental
maximum may be choosen instead. The maxModelDeviation
argument sets the maximum deviation in response
between the modelled and experimental maxima. The value is relative: ‘0’ means that experimental values will
always be favored when leading to improved responses, whereas 1
will effectively disable this procedure (and
return to 'regular' IPO behaviour).
The code and methodology is a direct adaptation from the IPO R package.
Libiseller G, Dvorzak M, Kleb U, Gander E, Eisenberg T, Madeo F, Neumann S, Trausinger G, Sinner F, Pieber T, Magnes C (2015). “IPO: a tool for automated optimization of XCMS parameters.” BMC Bioinformatics, 16(1). doi:10.1186/s12859-015-0562-8.
# example data from patRoonData package dataDir <- patRoonData::exampleDataPath() anaInfo <- generateAnalysisInfo(dataDir) anaInfo <- anaInfo[1:2, ] # only focus on first two analyses (e.g. training set) # optimize mzPPM and chromFWHM parameters ftOpt <- optimizeFeatureFinding(anaInfo, "openms", list(mzPPM = c(5, 10), chromFWHM = c(4, 8))) # optimize chromFWHM and isotopeFilteringModel (a qualitative parameter) ftOpt2 <- optimizeFeatureFinding(anaInfo, "openms", list(isotopeFilteringModel = "metabolites (5% RMS)"), list(isotopeFilteringModel = "metabolites (2% RMS)"), templateParams = list(chromFWHM = c(4, 8))) # perform grouping optimization with optimized features object fgOpt <- optimizeFeatureGrouping(optimizedObject(ftOpt), "xcms", list(groupArgs = list(bw = c(22, 28)), retcorArgs = list(method = "obiwarp"))) # same, but using the XCMS3 interface fgOpt2 <- optimizeFeatureGrouping(optimizedObject(ftOpt), "xcms3", list(groupMethod = "density", groupParams = list(bw = c(22, 28)), retAlignMethod = "obiwarp")) # plot contour of first parameter set/DoE iteration plot(ftOpt, paramSet = 1, DoEIteration = 1, type = "contour") # generate parameter set with some predefined and custom parameters to be # optimized. pSet <- generateFeatureOptPSet("openms", chromSNR = c(3, 9), useSmoothedInts = FALSE)
# example data from patRoonData package dataDir <- patRoonData::exampleDataPath() anaInfo <- generateAnalysisInfo(dataDir) anaInfo <- anaInfo[1:2, ] # only focus on first two analyses (e.g. training set) # optimize mzPPM and chromFWHM parameters ftOpt <- optimizeFeatureFinding(anaInfo, "openms", list(mzPPM = c(5, 10), chromFWHM = c(4, 8))) # optimize chromFWHM and isotopeFilteringModel (a qualitative parameter) ftOpt2 <- optimizeFeatureFinding(anaInfo, "openms", list(isotopeFilteringModel = "metabolites (5% RMS)"), list(isotopeFilteringModel = "metabolites (2% RMS)"), templateParams = list(chromFWHM = c(4, 8))) # perform grouping optimization with optimized features object fgOpt <- optimizeFeatureGrouping(optimizedObject(ftOpt), "xcms", list(groupArgs = list(bw = c(22, 28)), retcorArgs = list(method = "obiwarp"))) # same, but using the XCMS3 interface fgOpt2 <- optimizeFeatureGrouping(optimizedObject(ftOpt), "xcms3", list(groupMethod = "density", groupParams = list(bw = c(22, 28)), retAlignMethod = "obiwarp")) # plot contour of first parameter set/DoE iteration plot(ftOpt, paramSet = 1, DoEIteration = 1, type = "contour") # generate parameter set with some predefined and custom parameters to be # optimized. pSet <- generateFeatureOptPSet("openms", chromSNR = c(3, 9), useSmoothedInts = FALSE)
Various plotting functions for feature group data.
## S4 method for signature 'featureGroups,missing' plot( x, colourBy = c("none", "rGroups", "fGroups"), onlyUnique = FALSE, retMin = FALSE, showLegend = TRUE, col = NULL, pch = NULL, ... ) ## S4 method for signature 'featureGroups' plotInt( obj, average = FALSE, normalized = FALSE, xnames = TRUE, showLegend = FALSE, pch = 20, type = "b", lty = 3, col = NULL, plotArgs = NULL, linesArgs = NULL ) ## S4 method for signature 'featureGroupsSet' plotInt( obj, average = FALSE, normalized = FALSE, xnames = !sets, showLegend = sets, pch = 20, type = "b", lty = 3, col = NULL, plotArgs = NULL, linesArgs = NULL, sets = FALSE ) ## S4 method for signature 'featureGroups' plotChord( obj, addSelfLinks = FALSE, addRetMzPlots = TRUE, average = FALSE, outerGroups = NULL, addIntraOuterGroupLinks = FALSE, ... ) ## S4 method for signature 'featureGroups' plotChroms( obj, analysis = analyses(obj), groupName = names(obj), retMin = FALSE, showPeakArea = FALSE, showFGroupRect = TRUE, title = NULL, colourBy = c("none", "rGroups", "fGroups"), showLegend = TRUE, annotate = c("none", "ret", "mz"), intMax = "eic", EICParams = getDefEICParams(), showProgress = FALSE, xlim = NULL, ylim = NULL, EICs = NULL, ... ) ## S4 method for signature 'featureGroups' plotVenn(obj, which = NULL, ...) ## S4 method for signature 'featureGroupsSet' plotVenn(obj, which = NULL, ..., sets = FALSE) ## S4 method for signature 'featureGroups' plotUpSet(obj, which = NULL, nsets = length(which), nintersects = NA, ...) ## S4 method for signature 'featureGroups' plotVolcano( obj, FCParams, showLegend = TRUE, averageFunc = mean, normalized = FALSE, col = NULL, pch = 19, ... ) ## S4 method for signature 'featureGroups' plotGraph(obj, onlyPresent = TRUE, width = NULL, height = NULL) ## S4 method for signature 'featureGroupsSet' plotGraph(obj, onlyPresent = TRUE, set, ...)
## S4 method for signature 'featureGroups,missing' plot( x, colourBy = c("none", "rGroups", "fGroups"), onlyUnique = FALSE, retMin = FALSE, showLegend = TRUE, col = NULL, pch = NULL, ... ) ## S4 method for signature 'featureGroups' plotInt( obj, average = FALSE, normalized = FALSE, xnames = TRUE, showLegend = FALSE, pch = 20, type = "b", lty = 3, col = NULL, plotArgs = NULL, linesArgs = NULL ) ## S4 method for signature 'featureGroupsSet' plotInt( obj, average = FALSE, normalized = FALSE, xnames = !sets, showLegend = sets, pch = 20, type = "b", lty = 3, col = NULL, plotArgs = NULL, linesArgs = NULL, sets = FALSE ) ## S4 method for signature 'featureGroups' plotChord( obj, addSelfLinks = FALSE, addRetMzPlots = TRUE, average = FALSE, outerGroups = NULL, addIntraOuterGroupLinks = FALSE, ... ) ## S4 method for signature 'featureGroups' plotChroms( obj, analysis = analyses(obj), groupName = names(obj), retMin = FALSE, showPeakArea = FALSE, showFGroupRect = TRUE, title = NULL, colourBy = c("none", "rGroups", "fGroups"), showLegend = TRUE, annotate = c("none", "ret", "mz"), intMax = "eic", EICParams = getDefEICParams(), showProgress = FALSE, xlim = NULL, ylim = NULL, EICs = NULL, ... ) ## S4 method for signature 'featureGroups' plotVenn(obj, which = NULL, ...) ## S4 method for signature 'featureGroupsSet' plotVenn(obj, which = NULL, ..., sets = FALSE) ## S4 method for signature 'featureGroups' plotUpSet(obj, which = NULL, nsets = length(which), nintersects = NA, ...) ## S4 method for signature 'featureGroups' plotVolcano( obj, FCParams, showLegend = TRUE, averageFunc = mean, normalized = FALSE, col = NULL, pch = 19, ... ) ## S4 method for signature 'featureGroups' plotGraph(obj, onlyPresent = TRUE, width = NULL, height = NULL) ## S4 method for signature 'featureGroupsSet' plotGraph(obj, onlyPresent = TRUE, set, ...)
colourBy |
Sets the automatic colour selection: |
onlyUnique |
If |
retMin |
Plot retention time in minutes (instead of seconds). |
showLegend |
Plot a legend if |
col |
Colour(s) used. If |
pch , type , lty
|
Common plotting parameters passed to e.g. |
... |
passed to |
obj , x
|
|
average |
If For |
xnames |
Plot analysis (or replicate group if |
plotArgs , linesArgs
|
A |
sets |
(sets workflow) For For |
addSelfLinks |
If |
addRetMzPlots |
Set to |
outerGroups |
Character vector of names to be used as outer groups. The
values in the specified vector should be named by analysis names
( |
addIntraOuterGroupLinks |
If |
analysis , groupName
|
|
showPeakArea |
Set to |
showFGroupRect |
Set to |
title |
Character string used for title of the plot. If |
annotate |
If set to |
intMax |
Method used to determine the maximum intensity plot limit. Should be |
EICParams |
A named |
showProgress |
if set to |
xlim , ylim
|
Sets the plot size limits used by
|
EICs |
Internal parameter for now and should be kept at |
which |
A character vector with replicate groups used for comparison. Set to For |
nsets , nintersects
|
See |
FCParams |
A parameter list to calculate Fold change data. See |
averageFunc , normalized
|
Used for intensity data treatment, see the documentation for the
|
onlyPresent |
Only plot feature groups of internal standards that are still present in the |
width , height
|
Passed to |
set |
(sets workflow) The set for which data must be plotted. |
plot
Generates an m/z vs retention time
plot for all featue groups. Optionally highlights unique/overlapping
presence amongst replicate groups.
plotInt
Generates a line plot for the (averaged) intensity of feature groups within all analyses
plotChord
Generates a chord diagram which can be used to
visualize shared presence of feature groups between analyses or replicate
groups. In addition, analyses/replicates sharing similar properties
(e.g. location, age, type) may be grouped to enhance visualization
between these 'outer groups'.
plotChroms
Plots extracted ion chromatograms (EICs) of feature groups.
plotVenn
plots a Venn diagram (using VennDiagram) outlining unique and shared feature
groups between up to five replicate groups.
plotUpSet
plots an UpSet diagram (using the upset
function) outlining unique
and shared feature groups between given replicate groups.
plotVolcano
Plots Fold change data in a 'Volcano plot'.
plotGraph
generates an interactive network plot which is used to explore internal standard (IS)
assignments to each feature group. This requires the availability of IS assignments, see the documentation for
normInts
for details. The graph is rendered with visNetwork.
plotVenn
(invisibly) returns a list with the following fields:
gList
the gList
object that was returned by
the utilized VennDiagram plotting function.
areas
The total area for each plotted group.
intersectionCounts
The number of intersections between groups.
The order for the areas
and intersectionCounts
fields is the same as the parameter order
from the used plotting function (see e.g. draw.pairwise.venn
and
draw.triple.venn
).
plotGraph
returns the result of visNetwork
.
The following methods are changed or with new functionality:
plotVenn
and plotInt
allow to handle data per set. See the sets
argument description.
plotGraph
only plots data per set, and requires the set
argument to be set.
Rick Helmus <[email protected]> and Ricardo Cunha <[email protected]> (plotTICs
and
plotBPCs
functions)
Gu, Z. (2014) circlize implements and enhances circular visualization in R. Bioinformatics.
Conway JR, Lex A, Gehlenborg N (2017).
“UpSetR: an R package for the visualization of intersecting sets and their properties.”
Bioinformatics, 33(18), 2938-2940.
doi:10.1093/bioinformatics/btx364, http://dx.doi.org/10.1093/bioinformatics/btx364.
Lex A, Gehlenborg N, Strobelt H, Vuillemot R, Pfister H (2014).
“UpSet: Visualization of Intersecting Sets.”
IEEE Transactions on Visualization and Computer Graphics, 20(12), 1983–1992.
doi:10.1109/tvcg.2014.2346248.
featureGroups-class
, groupFeatures
Holds information for all feature group annotations.
## S4 method for signature 'featureAnnotations' annotations(obj) ## S4 method for signature 'featureAnnotations' groupNames(obj) ## S4 method for signature 'featureAnnotations' length(x) ## S4 method for signature 'featureAnnotations,ANY,missing,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'featureAnnotations,ANY,missing' x[[i, j]] ## S4 method for signature 'featureAnnotations' x$name ## S4 method for signature 'featureAnnotations' as.data.table( x, fGroups = NULL, fragments = FALSE, countElements = NULL, countFragElements = NULL, OM = FALSE, normalizeScores = "none", excludeNormScores = defaultExclNormScores(x) ) ## S4 method for signature 'featureAnnotations' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'featureAnnotations' filter( obj, minExplainedPeaks = NULL, scoreLimits = NULL, elements = NULL, fragElements = NULL, lossElements = NULL, topMost = NULL, OM = FALSE, negate = FALSE ) ## S4 method for signature 'featureAnnotations' plotVenn(obj, ..., labels = NULL, vennArgs = NULL) ## S4 method for signature 'featureAnnotations' plotUpSet( obj, ..., labels = NULL, nsets = length(list(...)) + 1, nintersects = NA, upsetArgs = NULL )
## S4 method for signature 'featureAnnotations' annotations(obj) ## S4 method for signature 'featureAnnotations' groupNames(obj) ## S4 method for signature 'featureAnnotations' length(x) ## S4 method for signature 'featureAnnotations,ANY,missing,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'featureAnnotations,ANY,missing' x[[i, j]] ## S4 method for signature 'featureAnnotations' x$name ## S4 method for signature 'featureAnnotations' as.data.table( x, fGroups = NULL, fragments = FALSE, countElements = NULL, countFragElements = NULL, OM = FALSE, normalizeScores = "none", excludeNormScores = defaultExclNormScores(x) ) ## S4 method for signature 'featureAnnotations' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'featureAnnotations' filter( obj, minExplainedPeaks = NULL, scoreLimits = NULL, elements = NULL, fragElements = NULL, lossElements = NULL, topMost = NULL, OM = FALSE, negate = FALSE ) ## S4 method for signature 'featureAnnotations' plotVenn(obj, ..., labels = NULL, vennArgs = NULL) ## S4 method for signature 'featureAnnotations' plotUpSet( obj, ..., labels = NULL, nsets = length(list(...)) + 1, nintersects = NA, upsetArgs = NULL )
obj , x
|
|
i , j
|
For |
... |
For the For Others: Any further (and unique) |
drop |
ignored. |
name |
The feature group name (partially matched). |
fGroups |
The |
fragments |
If |
countElements , countFragElements
|
A |
OM |
For For |
normalizeScores |
A |
excludeNormScores |
A
For |
minExplainedPeaks |
Minimum number of explained peaks. Set to |
scoreLimits |
Filter results by their scores. Should be a named |
elements |
Only retain candidate formulae (neutral form) that match a
given elemental restriction. The format of |
fragElements , lossElements
|
Specifies elemental restrictions for
fragment or neutral loss formulae (charged form). Candidates are retained
if at least one of the fragment formulae follow (or not follow if
|
topMost |
Only keep a maximum of |
negate |
If |
labels |
A |
vennArgs |
A |
nsets , nintersects
|
See |
upsetArgs |
A list with any further arguments to be passed to
|
This class stores annotation data for feature groups, such as molecular formulae, SMILES identifiers, compound names
etc. The class of objects that are generated by formula and compound annotation (generateFormulas
and
generateCompounds
) are based on this class.
as.data.table
returns a data.table
.
delete
returns the object for which the specified data was removed.
filter
returns a filtered featureAnnotations
object.
plotVenn
(invisibly) returns a list with the following fields:
gList
the gList
object that was returned by
the utilized VennDiagram plotting function.
areas
The total area for each plotted group.
intersectionCounts
The number of intersections between groups.
The order for the areas
and intersectionCounts
fields is the same as the parameter order
from the used plotting function (see e.g. draw.pairwise.venn
and
draw.triple.venn
).
annotations(featureAnnotations)
: Accessor for the groupAnnotations
slot.
groupNames(featureAnnotations)
: returns a character
vector with the names of the
feature groups for which data is present in this object.
length(featureAnnotations)
: Obtain total number of candidates.
x[i
: Subset on feature groups.
x[[i
: Extracts annotation data for a feature group.
$
: Extracts annotation data for a feature group.
as.data.table(featureAnnotations)
: Generates a table with all annotation data for each feature group and other
information such as element counts.
delete(featureAnnotations)
: Completely deletes specified annotations.
filter(featureAnnotations)
: Provides rule based filtering for feature group annotations. Useful to eliminate
unlikely candidates and speed up further processing.
plotVenn(featureAnnotations)
: plots a Venn diagram (using VennDiagram) outlining unique and shared
candidates of up to five different featureAnnotations
objects.
plotUpSet(featureAnnotations)
: plots an UpSet diagram (using the upset
function) outlining
unique and shared candidates between different featureAnnotations
objects.
groupAnnotations
A list
with for each annotated feature group a data.table
with annotation data.
Use the annotations
method for access.
scoreTypes
A character
with all the score types present in this object.
scoreRanges
The minimum and maximum score values of all candidates for each feature group. Used for normalization.
Calculation of the aromaticity index (AI) and related double bond equivalents (DBE_AI) is performed as described in Koch 2015. Formula classification is performed by the rules described in Abdulla 2013. Filtering of OM related molecules is performed as described in Koch 2006 and Kujawinski 2006. (see references).
Koch BP, Dittmar T (2015).
“From mass to structure: an aromaticity index for high-resolution mass data of natural organic matter.”
Rapid Communications in Mass Spectrometry, 30(1), 250–250.
doi:10.1002/rcm.7433.
Abdulla HA, Sleighter RL, Hatcher PG (2013).
“Two Dimensional Correlation Analysis of Fourier Transform Ion Cyclotron Resonance Mass Spectra of Dissolved Organic Matter: A New Graphical Analysis of Trends.”
Analytical Chemistry, 85(8), 3895–3902.
doi:10.1021/ac303221j.
Koch BP, Dittmar T (2006).
“From mass to structure: an aromaticity index for high-resolution mass data of natural organic matter.”
Rapid Communications in Mass Spectrometry, 20(5), 926–932.
doi:10.1002/rcm.2386.
Kujawinski EB, Behn MD (2006).
“Automated Analysis of Electrospray Ionization Fourier Transform Ion Cyclotron Resonance Mass Spectra of Natural Organic Matter.”
Analytical Chemistry, 78(13), 4363–4373.
doi:10.1021/ac0600306.
Conway JR, Lex A, Gehlenborg N (2017).
“UpSetR: an R package for the visualization of intersecting sets and their properties.”
Bioinformatics, 33(18), 2938-2940.
doi:10.1093/bioinformatics/btx364, http://dx.doi.org/10.1093/bioinformatics/btx364.
Lex A, Gehlenborg N, Strobelt H, Vuillemot R, Pfister H (2014).
“UpSet: Visualization of Intersecting Sets.”
IEEE Transactions on Visualization and Computer Graphics, 20(12), 1983–1992.
doi:10.1109/tvcg.2014.2346248.
formulas-class
and compounds-class
The derived formulas
and compounds
classes.
This class is used for comparing different featureGroups
objects.
## S4 method for signature 'featureGroupsComparison' names(x) ## S4 method for signature 'featureGroupsComparison' length(x) ## S4 method for signature 'featureGroupsComparison,ANY,missing,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'featureGroupsComparison,ANY,missing' x[[i, j]] ## S4 method for signature 'featureGroupsComparison' x$name
## S4 method for signature 'featureGroupsComparison' names(x) ## S4 method for signature 'featureGroupsComparison' length(x) ## S4 method for signature 'featureGroupsComparison,ANY,missing,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'featureGroupsComparison,ANY,missing' x[[i, j]] ## S4 method for signature 'featureGroupsComparison' x$name
x |
A |
i |
For |
... |
Ignored. |
drop , j
|
ignored. |
name |
The label name (partially matched). |
Objects from this class are returned by comparison
.
names(featureGroupsComparison)
: Obtain the labels that were given to each compared feature group.
length(featureGroupsComparison)
: Number of feature groups objects that were compared.
x[i
: Subset on labels that were assigned to compared feature groups.
x[[i
: Extract a featureGroups
object by its label.
$
: Extract a compound table for a feature group.
fGroupsList
A list
of featureGroups
object that
were compared
comparedFGroups
A pseudo featureGroups
object containing
grouped feature groups.
Returns chromatographic peak quality and score names for features and/or feature groups.
featureQualityNames(feat = TRUE, group = TRUE, scores = FALSE, totScore = TRUE)
featureQualityNames(feat = TRUE, group = TRUE, scores = FALSE, totScore = TRUE)
feat |
If |
group |
If |
scores |
If |
totScore |
If |
Holds information for all features present within a set of analysis.
## S4 method for signature 'features' length(x) ## S4 method for signature 'features' show(object) ## S4 method for signature 'features' featureTable(obj) ## S4 method for signature 'features' analysisInfo(obj) ## S4 method for signature 'features' analyses(obj) ## S4 method for signature 'features' replicateGroups(obj) ## S4 method for signature 'features' as.data.table(x) ## S4 method for signature 'features' filter( obj, absMinIntensity = NULL, relMinIntensity = NULL, retentionRange = NULL, mzRange = NULL, mzDefectRange = NULL, chromWidthRange = NULL, qualityRange = NULL, negate = FALSE ) ## S4 method for signature 'features,ANY,missing,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'features,ANY,missing' x[[i]] ## S4 method for signature 'features' x$name ## S4 method for signature 'features' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'features' calculatePeakQualities(obj, weights, flatnessFactor, parallel = TRUE) ## S4 method for signature 'features' getTICs(obj, retentionRange = NULL, MSLevel = 1) ## S4 method for signature 'features' getBPCs(obj, retentionRange = NULL, MSLevel = 1) ## S4 method for signature 'features' plotTICs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, colourBy = c("none", "analyses", "rGroups"), showLegend = TRUE, xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'features' plotBPCs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, colourBy = c("none", "analyses", "rGroups"), showLegend = TRUE, xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'featuresSet' sets(obj) ## S4 method for signature 'featuresSet' show(object) ## S4 method for signature 'featuresSet' as.data.table(x) ## S4 method for signature 'featuresSet,ANY,missing,missing' x[i, ..., sets = NULL, drop = TRUE] ## S4 method for signature 'featuresSet' filter(obj, ..., negate = FALSE, sets = NULL) ## S4 method for signature 'featuresSet' unset(obj, set) ## S4 method for signature 'featuresKPIC2' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'featuresXCMS' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'featuresXCMS3' delete(obj, i = NULL, j = NULL, ...)
## S4 method for signature 'features' length(x) ## S4 method for signature 'features' show(object) ## S4 method for signature 'features' featureTable(obj) ## S4 method for signature 'features' analysisInfo(obj) ## S4 method for signature 'features' analyses(obj) ## S4 method for signature 'features' replicateGroups(obj) ## S4 method for signature 'features' as.data.table(x) ## S4 method for signature 'features' filter( obj, absMinIntensity = NULL, relMinIntensity = NULL, retentionRange = NULL, mzRange = NULL, mzDefectRange = NULL, chromWidthRange = NULL, qualityRange = NULL, negate = FALSE ) ## S4 method for signature 'features,ANY,missing,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'features,ANY,missing' x[[i]] ## S4 method for signature 'features' x$name ## S4 method for signature 'features' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'features' calculatePeakQualities(obj, weights, flatnessFactor, parallel = TRUE) ## S4 method for signature 'features' getTICs(obj, retentionRange = NULL, MSLevel = 1) ## S4 method for signature 'features' getBPCs(obj, retentionRange = NULL, MSLevel = 1) ## S4 method for signature 'features' plotTICs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, colourBy = c("none", "analyses", "rGroups"), showLegend = TRUE, xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'features' plotBPCs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, colourBy = c("none", "analyses", "rGroups"), showLegend = TRUE, xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'featuresSet' sets(obj) ## S4 method for signature 'featuresSet' show(object) ## S4 method for signature 'featuresSet' as.data.table(x) ## S4 method for signature 'featuresSet,ANY,missing,missing' x[i, ..., sets = NULL, drop = TRUE] ## S4 method for signature 'featuresSet' filter(obj, ..., negate = FALSE, sets = NULL) ## S4 method for signature 'featuresSet' unset(obj, set) ## S4 method for signature 'featuresKPIC2' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'featuresXCMS' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'featuresXCMS3' delete(obj, i = NULL, j = NULL, ...)
obj , x , object
|
|
absMinIntensity , relMinIntensity
|
Minimum absolute/relative intensity for features to be kept. The relative
intensity is determined from the feature with highest intensity (within the same analysis). Set to ‘0’ or |
retentionRange , mzRange , mzDefectRange , chromWidthRange
|
Range of retention time (in seconds), m/z, mass
defect (defined as the decimal part of m/z values) or chromatographic peak width (in seconds), respectively.
Features outside this range will be removed. Should be a numeric vector with length of two containing the min/max
values. The maximum can be |
qualityRange |
Used to filter features by their peak qualities/scores
(see |
negate |
If set to |
i , j
|
For |
... |
For For For sets workflow methods: further arguments passed to the base |
drop |
ignored. |
name |
The analysis name (partially matched). |
weights |
A named |
flatnessFactor |
Passed to MetaClean as the |
parallel |
If set to |
MSLevel |
Integer vector with the ms levels (i.e., 1 for MS1 and 2 for MS2) to obtain TIC traces. |
retMin |
Plot retention time in minutes (instead of seconds). |
title |
Character string used for title of the plot. If |
colourBy |
Sets the automatic colour selection: "none" for a single colour or "analyses"/"rGroups" for a distinct colour per analysis or analysis replicate group. |
showLegend |
Plot a legend if TRUE. |
xlim , ylim
|
Sets the plot size limits used by
|
sets |
(sets workflow) For |
set |
(sets workflow) The name of the set. |
This class provides a way to store intensity, retention times, m/z and other data for all features in a set of
analyses. The class is virtual
and derived objects are created by 'feature finders' such as
findFeaturesOpenMS
, findFeaturesXCMS
and findFeaturesBruker
.
featureTable
: A list
containing a
data.table
for each analysis with feature data
analysisInfo
: A data.frame
containing a column with
analysis name (analysis
), its path (path
), and other columns
such as replicate group name (group
) and blank reference
(blank
).
delete
returns the object for which the specified data was removed.
calculatePeakQualities
returns a modified object amended with peak qualities and scores.
length(features)
: Obtain total number of features.
show(features)
: Shows summary information for this object.
featureTable(features)
: Get table with feature information
analysisInfo(features)
: Get analysis information
analyses(features)
: returns a character
vector with the names of the
analyses for which data is present in this object.
replicateGroups(features)
: returns a character
vector with the names of the
replicate groups for which data is present in this object.
as.data.table(features)
: Returns all feature data in a table.
filter(features)
: Performs common rule based filtering of features. Note
that this (and much more) functionality is also provided by the
filter
method defined for featureGroups
. However,
filtering a features
object may be useful to avoid grouping large
amounts of features.
x[i
: Subset on analyses.
x[[i
: Extract a feature table for an analysis.
$
: Extract a feature table for an analysis.
delete(features)
: Completely deletes specified features.
calculatePeakQualities(features)
: Calculates peak qualities for each feature. This uses
MetaClean R package to calculate the following metrics:
Apex-Boundary Ratio
, FWHM2Base
, Jaggedness
, Modality
, Symmetry
, Gaussian
Similarity
, Sharpness
, Triangle Peak Area Similarity Ratio
and Zig-Zag index
. Please see the
MetaClean publication (referenced below) for more details. For each metric, an additional score is calculated
by normalizing all feature values (unless the quality metric definition has a fixed range) and scale from ‘0’
(worst) to ‘1’ (best). Then, a totalScore
for each feature is calculated by the (weighted) sum of all
score values.
getTICs(features)
: Obtain the total ion chromatogram/s (TICs) of the analyses.
getBPCs(features)
: Obtain the base peak chromatogram/s (BPCs) of the analyses.
plotTICs(features)
: Plots the TICs of the analyses.
plotBPCs(features)
: Plots the BPCs of the analyses.
features
List of features per analysis file. Use the featureTable
method for access.
analysisInfo
Analysis group information. Use the analysisInfo
method for access.
The featuresSet
class is applicable for sets workflows. This class is derived from features
and therefore largely follows the same user interface.
The following methods are specifically defined for sets workflows:
sets
Returns the set names for this object.
unset
Converts the object data for a specified set into a 'non-set' object (featuresUnset
), which allows it to be used in 'regular' workflows. The adduct annotations for the selected set (e.g. as passed to
makeSet
) are used to convert all feature masses to ionic m/z values.
The following methods are changed or with new functionality:
filter
and the subset operator ([
) have specific arguments to choose/filter by (feature
presence in) sets. See the sets
argument description.
For calculatePeakQualities
: sometimes MetaClean may return NA
for the Gaussian
Similarity
metric, in which case it will be set to ‘0’.
Rick Helmus <[email protected]> and Ricardo Cunha <[email protected]> (getTICs
,
getBPCs
, plotTICs
and plotBPCs
functions)
Chetnik K, Petrick L, Pandey G (2020). “MetaClean: a machine learning-based classifier for reduced false positive peak detection in untargeted LC-MS metabolomics data.” Metabolomics, 16(11). doi:10.1007/s11306-020-01738-3.
Automatically find features.
findFeatures(analysisInfo, algorithm, ..., verbose = TRUE)
findFeatures(analysisInfo, algorithm, ..., verbose = TRUE)
analysisInfo |
A |
algorithm |
A character string describing the algorithm that should be
used: |
... |
Further parameters passed to the selected feature finding algorithms. |
verbose |
If set to |
Several functions exist to collect features (i.e. retention and MS information that represent potential
compounds) from a set of analyses. All 'feature finders' return an object derived from the features
base class. The next step in a general workflow is to group and align these features across analyses with
groupFeatures
. Note that some feature finders have a plethora of options which sometimes may have a
large effect on the quality of results. Fine-tuning parameters is therefore important, and the optimum is largely
dependent upon applied analysis methodology and instrumentation.
findFeatures
is a generic function that will find features by one of the supported algorithms. The actual
functionality is provided by algorithm specific functions such as findFeaturesOpenMS
and findFeaturesXCMS
. While these
functions may be called directly, findFeatures
provides a generic interface and is therefore usually preferred.
An object of a class which is derived from features
.
In most cases it will be necessary to centroid your MS input files. The only exception is Bruker
,
however, you will still need centroided ‘mzXML’/‘mzML’ files for e.g. plotting chromatograms. In
this case the centroided MS files should be stored in the same directory as the raw Bruker
‘.d’
files. The convertMSFiles
function can be used to centroid data.
The features
output class and its methods and the algorithm specific functions:
findFeaturesBruker
, findFeaturesOpenMS
, findFeaturesXCMS
, findFeaturesXCMS3
, findFeaturesEnviPick
, findFeaturesSIRIUS
, findFeaturesKPIC2
, findFeaturesSAFD
Uses the 'Find Molecular Features' (FMF) algorithm of Bruker DataAnalysis vendor software to find features.
findFeaturesBruker( analysisInfo, doFMF = "auto", startRange = 0, endRange = 0, save = TRUE, close = save, verbose = TRUE )
findFeaturesBruker( analysisInfo, doFMF = "auto", startRange = 0, endRange = 0, save = TRUE, close = save, verbose = TRUE )
analysisInfo |
A |
doFMF |
Run the 'Find Molecular Features' algorithm before loading compounds. Valid options are: |
startRange , endRange
|
Start/End retention range (seconds) from which to collect features. A 0 (zero) for
|
close , save
|
If |
verbose |
If set to |
This function uses Bruker to automatically find features. This function is called when calling findFeatures
with
algorithm="bruker"
.
The resulting 'compounds' are transferred from DataAnalysis and stored as features.
This algorithm only works with Bruker data files (.d
extension) and requires Bruker DataAnalysis
and the RDCOMClient package to be installed. Furthermore, DataAnalysis combines multiple related masses in a
feature (e.g. isotopes, adducts) but does not report the actual (monoisotopic) mass of the feature.
Therefore, it is simply assumed that the feature mass equals that of the highest intensity mass peak.
An object of a class which is derived from features
.
If any errors related to DCOM
appear it might be necessary to
terminate DataAnalysis (note that DataAnalysis might still be running as a
background process). The ProcessCleaner
application installed
with DataAnalayis can be used for this.
findFeatures
for more details and other algorithms.
Uses the enviPickwrap
function from the enviPick R package to extract features.
findFeaturesEnviPick(analysisInfo, ..., parallel = TRUE, verbose = TRUE)
findFeaturesEnviPick(analysisInfo, ..., parallel = TRUE, verbose = TRUE)
analysisInfo |
A |
... |
Further parameters passed to |
parallel |
If set to |
verbose |
If set to |
This function uses enviPick to automatically find features. This function is called when calling findFeatures
with
algorithm="envipick"
.
The input MS data files need to be centroided. The convertMSFiles
function can be used to
centroid data.
An object of a class which is derived from features
.
The analysis files must be in the mzXML
format.
findFeatures
for more details and other algorithms.
Uses the KPIC2 R package to extract features.
findFeaturesKPIC2( analysisInfo, kmeans = TRUE, level = 1000, ..., parallel = TRUE, verbose = TRUE )
findFeaturesKPIC2( analysisInfo, kmeans = TRUE, level = 1000, ..., parallel = TRUE, verbose = TRUE )
analysisInfo |
A |
kmeans |
If |
level |
Passed to |
... |
Further parameters passed to |
parallel |
If set to |
verbose |
If set to |
This function uses KPIC2 to automatically find features. This function is called when calling findFeatures
with
algorithm="kpic2"
.
The MS files should be in the mzML
or mzXML
format.
The input MS data files need to be centroided. The convertMSFiles
function can be used to
centroid data.
An object of a class which is derived from features
.
Ji H, Zeng F, Xu Y, Lu H, Zhang Z (2017). “KPIC2: An Effective Framework for Mass Spectrometry-Based Metabolomics Using Pure Ion Chromatograms.” Analytical Chemistry, 89(14), 7631–7640. doi:10.1021/acs.analchem.7b01547.
findFeatures
for more details and other algorithms.
uses the FeatureFinderMetabo TOPP tool (see http://www.openms.de) to find features.
findFeaturesOpenMS( analysisInfo, noiseThrInt = 1000, chromSNR = 3, chromFWHM = 5, mzPPM = 10, reEstimateMTSD = TRUE, traceTermCriterion = "sample_rate", traceTermOutliers = 5, minSampleRate = 0.5, minTraceLength = 3, maxTraceLength = -1, widthFiltering = "fixed", minFWHM = 1, maxFWHM = 30, traceSNRFiltering = FALSE, localRTRange = 10, localMZRange = 6.5, isotopeFilteringModel = "metabolites (5% RMS)", MZScoring13C = FALSE, useSmoothedInts = TRUE, extraOpts = NULL, intSearchRTWindow = 3, useFFMIntensities = FALSE, verbose = TRUE )
findFeaturesOpenMS( analysisInfo, noiseThrInt = 1000, chromSNR = 3, chromFWHM = 5, mzPPM = 10, reEstimateMTSD = TRUE, traceTermCriterion = "sample_rate", traceTermOutliers = 5, minSampleRate = 0.5, minTraceLength = 3, maxTraceLength = -1, widthFiltering = "fixed", minFWHM = 1, maxFWHM = 30, traceSNRFiltering = FALSE, localRTRange = 10, localMZRange = 6.5, isotopeFilteringModel = "metabolites (5% RMS)", MZScoring13C = FALSE, useSmoothedInts = TRUE, extraOpts = NULL, intSearchRTWindow = 3, useFFMIntensities = FALSE, verbose = TRUE )
analysisInfo |
A |
noiseThrInt |
Noise intensity threshold. Sets |
chromSNR |
Minimum S/N of a mass trace. Sets |
chromFWHM |
Expected chromatographic peak width (in seconds). Sets |
mzPPM |
Allowed mass deviation (ppm) for trace detection. Sets |
reEstimateMTSD |
If |
traceTermCriterion , traceTermOutliers , minSampleRate
|
Termination criterion for the extension of mass traces. See
FeatureFinderMetabo.
Sets the |
minTraceLength , maxTraceLength
|
Minimum/Maximum length of mass trace (seconds). Set negative value for maxlength
to disable maximum. Sets |
widthFiltering , minFWHM , maxFWHM
|
Enable filtering of unlikely peak widths. See
FeatureFinderMetabo.
Sets |
traceSNRFiltering |
If |
localRTRange , localMZRange
|
Retention/MZ range where to look for coeluting/isotopic mass traces. Sets the
|
isotopeFilteringModel |
Remove/score candidate assemblies based on isotope intensities. See
FeatureFinderMetabo.
Sets the |
MZScoring13C |
Use the 13C isotope as the expected shift for isotope mass traces. See
FeatureFinderMetabo.
Sets |
useSmoothedInts |
If |
extraOpts |
Named |
intSearchRTWindow |
Retention time window (in seconds, +/- feature retention time) that is used to find the closest data point to the retention time to obtain the intensity of a feature (this is needed since OpenMS does not provide this data). |
useFFMIntensities |
If |
verbose |
If set to |
This function uses OpenMS to automatically find features. This function is called when calling findFeatures
with
algorithm="openms"
.
This functionality has been tested with OpenMS version >= 2.0. Please make sure it is installed and
configured, e.g. by installing patRoonExt
or configuring the path of the binaries with
the patRoon.path.OpenMS
option or the system PATH variable.
The file format of analyses must be ‘mzML’.
The input MS data files need to be centroided. The convertMSFiles
function can be used to
centroid data.
An object of a class which is derived from features
.
findFeaturesOpenMS
uses multiprocessing to parallelize
computations. Please see the parallelization section in the handbook for
more details and patRoon options for configuration
options.
Note that for caching purposes, the analyses files must always exist on the local host computer, even if it is not participating in computations.
Rost HL, Sachsenberg T, Aiche S, Bielow C, Weisser H, Aicheler F, Andreotti S, Ehrlich H, Gutenbrunner P, Kenar E, Liang X, Nahnsen S, Nilse L, Pfeuffer J, Rosenberger G, Rurik M, Schmitt U, Veit J, Walzer M, Wojnar D, Wolski WE, Schilling O, Choudhary JS, Malmstrom L, Aebersold R, Reinert K, Kohlbacher O (2016).
“OpenMS: a flexible open-source software platform for mass spectrometry data analysis.”
Nature Methods, 13(9), 741–748.
doi:10.1038/nmeth.3959.
pugixml (via Rcpp) is used to process OpenMS XML output.
findFeatures
for more details and other algorithms.
Uses SAFD to obtain features. This functionality is still experimental. Please see the details below.
findFeaturesSAFD( analysisInfo, profPath = NULL, mzRange = c(0, 400), maxNumbIter = 1000, maxTPeakW = 300, resolution = 30000, minMSW = 0.02, RThreshold = 0.75, minInt = 2000, sigIncThreshold = 5, S2N = 2, minPeakWS = 3, verbose = TRUE )
findFeaturesSAFD( analysisInfo, profPath = NULL, mzRange = c(0, 400), maxNumbIter = 1000, maxTPeakW = 300, resolution = 30000, minMSW = 0.02, RThreshold = 0.75, minInt = 2000, sigIncThreshold = 5, S2N = 2, minPeakWS = 3, verbose = TRUE )
analysisInfo |
A |
profPath |
A |
mzRange |
The m/z window to be imported (passed to the |
maxNumbIter , maxTPeakW , resolution , minMSW , RThreshold , minInt , sigIncThreshold , S2N , minPeakWS
|
Parameters directly
passed to the |
verbose |
If set to |
This function uses SAFD to automatically find features. This function is called when calling findFeatures
with
algorithm="safd"
.
The support for SAFD is still experimental, and its interface might change in the future.
In order to use SAFD, please make sure that its julia
packages are installed and you have verified that
everything works, e.g. by running the test data.
This algorithm supports profile and centroided MS data. If the use of profile data is desired, centroided data
must still be available for other functionality of patRoon
. The centroided data is specified through the
'regular' analysis info mechanism. The location to any profile data is specified
through the profPath
argument (NULL
for no profile data). The base file names (i.e. the file
name without path and extension) of both centroid and profile data must be the same. Furthermore, the format of the
profile data must be ‘mzXML’.
An object of a class which is derived from features
.
findFeaturesSAFD
uses multiprocessing to parallelize
computations. Please see the parallelization section in the handbook for
more details and patRoon options for configuration
options.
Note that for caching purposes, the analyses files must always exist on the local host computer, even if it is not participating in computations.
Samanipour S, OBrien JW, Reid MJ, Thomas KV (2019). “Self Adjusting Algorithm for the Nontargeted Feature Detection of High Resolution Mass Spectrometry Coupled with Liquid Chromatography Profile Data.” Analytical Chemistry, 91(16), 10800–10807. doi:10.1021/acs.analchem.9b02422.
findFeatures
for more details and other algorithms.
Uses SIRIUS to find features.
findFeaturesSIRIUS(analysisInfo, verbose = TRUE)
findFeaturesSIRIUS(analysisInfo, verbose = TRUE)
analysisInfo |
A |
verbose |
If set to |
This function uses SIRIUS to automatically find features. This function is called when calling findFeatures
with
algorithm="sirius"
.
The features are collected by running the lcms-align
SIRIUS
command for every analysis.
The MS files should be in the ‘mzML’ or ‘mzXML’ format. Furthermore, this algorithms requires the presence of (data-dependent) MS/MS data.
The input MS data files need to be centroided. The convertMSFiles
function can be used to
centroid data.
An object of a class which is derived from features
.
findFeaturesSIRIUS
uses multiprocessing to parallelize
computations. Please see the parallelization section in the handbook for
more details and patRoon options for configuration
options.
Note that for caching purposes, the analyses files must always exist on the local host computer, even if it is not participating in computations.
Duhrkop K, Fleischauer M, Ludwig M, Aksenov AA, Melnik AV, Meusel M, Dorrestein PC, Rousu J, Bocker S (2019). “SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information.” Nature Methods, 16(4), 299–302. doi:10.1038/s41592-019-0344-8.
findFeatures
for more details and other algorithms.
Uses the legacy xcmsSet
function from the xcms package to find features.
findFeaturesXCMS(analysisInfo, method = "centWave", ..., verbose = TRUE)
findFeaturesXCMS(analysisInfo, method = "centWave", ..., verbose = TRUE)
analysisInfo |
A |
method |
The method setting used by XCMS peak finding, see |
... |
Further parameters passed to |
verbose |
If set to |
This function uses XCMS to automatically find features. This function is called when calling findFeatures
with
algorithm="xcms"
.
This function uses the legacy interface of xcms. It is recommended to use
findFeaturesXCMS3
instead.
The file format of analyses must be mzML
or mzXML
.
The input MS data files need to be centroided. The convertMSFiles
function can be used to
centroid data.
An object of a class which is derived from features
.
Smith, C.A. and Want, E.J. and O'Maille, G. and Abagyan,R. and Siuzdak, G.: XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching and identification, Analytical Chemistry, 78:779-787 (2006)
Ralf Tautenhahn, Christoph Boettcher, Steffen Neumann: Highly sensitive feature detection for high resolution LC/MS BMC Bioinformatics, 9:504 (2008)
H. Paul Benton, Elizabeth J. Want and Timothy M. D. Ebbels Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data Bioinformatics, 26:2488 (2010)
findFeatures
for more details and other algorithms.
Uses the new xcms3
interface from the xcms package to find features.
findFeaturesXCMS3( analysisInfo, param = xcms::CentWaveParam(), ..., verbose = TRUE )
findFeaturesXCMS3( analysisInfo, param = xcms::CentWaveParam(), ..., verbose = TRUE )
analysisInfo |
A |
param |
The method parameters used by XCMS peak finding, see
|
... |
Further parameters passed to |
verbose |
If set to |
This function uses XCMS3 to automatically find features. This function is called when calling findFeatures
with
algorithm="xcms3"
.
The file format of analyses must be mzML
or mzXML
.
The input MS data files need to be centroided. The convertMSFiles
function can be used to
centroid data.
An object of a class which is derived from features
.
Smith, C.A. and Want, E.J. and O'Maille, G. and Abagyan,R. and Siuzdak, G.: XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching and identification, Analytical Chemistry, 78:779-787 (2006)
Ralf Tautenhahn, Christoph Boettcher, Steffen Neumann: Highly sensitive feature detection for high resolution LC/MS BMC Bioinformatics, 9:504 (2008)
H. Paul Benton, Elizabeth J. Want and Timothy M. D. Ebbels Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data Bioinformatics, 26:2488 (2010)
findFeatures
for more details and other algorithms.
Contains data of generated chemical formulae for given feature groups.
## S4 method for signature 'formulas' annotations(obj, features = FALSE) ## S4 method for signature 'formulas' analyses(obj) ## S4 method for signature 'formulas' defaultExclNormScores(obj) ## S4 method for signature 'formulas' show(object) ## S4 method for signature 'formulas,ANY,ANY' x[[i, j]] ## S4 method for signature 'formulas' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'formulas' as.data.table( x, fGroups = NULL, fragments = FALSE, countElements = NULL, countFragElements = NULL, OM = FALSE, normalizeScores = "none", excludeNormScores = defaultExclNormScores(x), average = FALSE ) ## S4 method for signature 'formulas' annotatedPeakList( obj, index, groupName, analysis = NULL, MSPeakLists, onlyAnnotated = FALSE ) ## S4 method for signature 'formulas' plotSpectrum( obj, index, groupName, analysis = NULL, MSPeakLists, title = NULL, specSimParams = getDefSpecSimParams(), mincex = 0.9, xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'formulas' plotScores( obj, index, groupName, analysis = NULL, normalizeScores = "max", excludeNormScores = defaultExclNormScores(obj) ) ## S4 method for signature 'formulas' consensus( obj, ..., absMinAbundance = NULL, relMinAbundance = NULL, uniqueFrom = NULL, uniqueOuter = FALSE, rankWeights = 1, labels = NULL ) ## S4 method for signature 'formulasSet' show(object) ## S4 method for signature 'formulasSet' delete(obj, i, j, ...) ## S4 method for signature 'formulasSet,ANY,missing,missing' x[i, j, ..., sets = NULL, updateConsensus = FALSE, drop = TRUE] ## S4 method for signature 'formulasSet' filter(obj, ..., sets = NULL, updateConsensus = FALSE, negate = FALSE) ## S4 method for signature 'formulasSet' plotSpectrum( obj, index, groupName, analysis = NULL, MSPeakLists, title = NULL, specSimParams = getDefSpecSimParams(), mincex = 0.9, xlim = NULL, ylim = NULL, perSet = TRUE, mirror = TRUE, ... ) ## S4 method for signature 'formulasSet' annotatedPeakList(obj, index, groupName, analysis = NULL, MSPeakLists, ...) ## S4 method for signature 'formulasSet' consensus( obj, ..., absMinAbundance = NULL, relMinAbundance = NULL, uniqueFrom = NULL, uniqueOuter = FALSE, rankWeights = 1, labels = NULL, filterSets = FALSE, setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE ) ## S4 method for signature 'formulasSet' unset(obj, set) ## S4 method for signature 'formulasConsensusSet' unset(obj, set) ## S4 method for signature 'formulasSIRIUS' delete(obj, i = NULL, j = NULL, ...)
## S4 method for signature 'formulas' annotations(obj, features = FALSE) ## S4 method for signature 'formulas' analyses(obj) ## S4 method for signature 'formulas' defaultExclNormScores(obj) ## S4 method for signature 'formulas' show(object) ## S4 method for signature 'formulas,ANY,ANY' x[[i, j]] ## S4 method for signature 'formulas' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'formulas' as.data.table( x, fGroups = NULL, fragments = FALSE, countElements = NULL, countFragElements = NULL, OM = FALSE, normalizeScores = "none", excludeNormScores = defaultExclNormScores(x), average = FALSE ) ## S4 method for signature 'formulas' annotatedPeakList( obj, index, groupName, analysis = NULL, MSPeakLists, onlyAnnotated = FALSE ) ## S4 method for signature 'formulas' plotSpectrum( obj, index, groupName, analysis = NULL, MSPeakLists, title = NULL, specSimParams = getDefSpecSimParams(), mincex = 0.9, xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'formulas' plotScores( obj, index, groupName, analysis = NULL, normalizeScores = "max", excludeNormScores = defaultExclNormScores(obj) ) ## S4 method for signature 'formulas' consensus( obj, ..., absMinAbundance = NULL, relMinAbundance = NULL, uniqueFrom = NULL, uniqueOuter = FALSE, rankWeights = 1, labels = NULL ) ## S4 method for signature 'formulasSet' show(object) ## S4 method for signature 'formulasSet' delete(obj, i, j, ...) ## S4 method for signature 'formulasSet,ANY,missing,missing' x[i, j, ..., sets = NULL, updateConsensus = FALSE, drop = TRUE] ## S4 method for signature 'formulasSet' filter(obj, ..., sets = NULL, updateConsensus = FALSE, negate = FALSE) ## S4 method for signature 'formulasSet' plotSpectrum( obj, index, groupName, analysis = NULL, MSPeakLists, title = NULL, specSimParams = getDefSpecSimParams(), mincex = 0.9, xlim = NULL, ylim = NULL, perSet = TRUE, mirror = TRUE, ... ) ## S4 method for signature 'formulasSet' annotatedPeakList(obj, index, groupName, analysis = NULL, MSPeakLists, ...) ## S4 method for signature 'formulasSet' consensus( obj, ..., absMinAbundance = NULL, relMinAbundance = NULL, uniqueFrom = NULL, uniqueOuter = FALSE, rankWeights = 1, labels = NULL, filterSets = FALSE, setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE ) ## S4 method for signature 'formulasSet' unset(obj, set) ## S4 method for signature 'formulasConsensusSet' unset(obj, set) ## S4 method for signature 'formulasSIRIUS' delete(obj, i = NULL, j = NULL, ...)
obj , x , object
|
The |
features |
If |
i , j
|
For Otherwise passed to the |
... |
For For For For sets workflow methods: further arguments passed to the base |
fGroups , fragments , countElements , countFragElements , OM
|
Passed to the
|
normalizeScores |
A |
excludeNormScores |
A
For |
average |
If set to |
index |
The candidate index (row). For |
groupName |
The name of the feature group (or feature groups when comparing spectra) to which the candidate belongs. |
analysis |
A |
MSPeakLists |
The |
onlyAnnotated |
Set to |
title |
The title of the plot. Set to |
specSimParams |
A named |
mincex |
The formula annotation labels are automatically scaled. The |
xlim , ylim
|
Sets the plot size limits used by
|
absMinAbundance , relMinAbundance
|
Minimum absolute or relative
(‘0-1’) abundance across objects for a result to be kept. For
instance, |
uniqueFrom |
Set this argument to only retain formulas that are unique
within one or more of the objects for which the consensus is made.
Selection is done by setting the value of |
uniqueOuter |
If |
rankWeights |
A numeric vector with weights of to calculate the mean ranking score for each candidate. The value will be re-cycled if necessary, hence, the default value of ‘1’ means equal weights for all considered objects. |
labels |
A |
sets |
(sets workflow) A |
updateConsensus |
(sets workflow) If |
drop |
Passed to the |
negate |
Passed to the |
perSet , mirror
|
(sets workflow) If |
filterSets |
(sets workflow) Controls how algorithms concensus abundance filters are applied. See the |
setThreshold , setThresholdAnn
|
(sets workflow) Thresholds used to create the annotation set consensus. See
|
setAvgSpecificScores |
(sets workflow) If |
set |
(sets workflow) The name of the set. |
formulas
objects are obtained with generateFormulas
. This class is derived from the
featureAnnotations
class, please see its documentation for more methods and other details.
annotations
returns a list
containing for each feature
group (or feature if features=TRUE
) a data.table
with an overview of all generated formulae and other data such as candidate
scoring and MS/MS fragments.
consensus
returns a formulas
object that is produced by
merging results from multiple formulas
objects.
annotations(formulas)
: Accessor method to obtain generated formulae.
analyses(formulas)
: returns a character
vector with the names of the
analyses for which data is present in this object.
defaultExclNormScores(formulas)
: Returns default scorings that are excluded from normalization.
show(formulas)
: Show summary information for this object.
x[[i
: Extracts a formula table, either for a feature group or for features in an analysis.
as.data.table(formulas)
: Generates a table with all candidate formulae for each feature group and other information such
as element counts.
annotatedPeakList(formulas)
: Returns an MS/MS peak list annotated with data from a
given candidate formula.
plotSpectrum(formulas)
: Plots an annotated spectrum for a given candidate formula of a feature or feature group. Two
spectra can be compared by specifying a two-sized vector for the index
, groupName
and (if desired)
analysis
arguments.
plotScores(formulas)
: Plots a barplot with scoring of a candidate formula.
consensus(formulas)
: Generates a consensus of results from multiple
objects. In order to rank the consensus candidates, first
each of the candidates are scored based on their original ranking
(the scores are normalized and the highest ranked candidate gets value
‘1’). The (weighted) mean is then calculated for all scorings of each
candidate to derive the final ranking (if an object lacks the candidate its
score will be ‘0’). The original rankings for each object is stored in
the rank
columns.
featureFormulas
A list
with all generated formulae for each analysis/feature group. Use the
annotations
method for access.
setThreshold,setThresholdAnn,setAvgSpecificScores
(sets workflow) A copy of the equally named arguments that were
passed when this object was created by generateFormulas
.
origFGNames
(sets workflow) The original (order of) names of the featureGroups
object that was used to
create this object.
Subscripting of formulae for plots generated by
plotSpectrum
is based on the chemistry2expression
function
from the ReSOLUTION package.
The formulasSet
class is applicable for sets workflows. This class is derived from formulas
and therefore largely follows the same user interface.
The following methods are specifically defined for sets workflows:
All the methods from base class workflowStepSet
.
unset
Converts the object data for a specified set into a 'non-set' object (formulasUnset
), which allows it to be used in 'regular' workflows. Only the annotation results that are present in the specified set are kept
(based on the set consensus, see below for implications).
The following methods are changed or with new functionality:
filter
and the subset operator ([
) Can be used to select data that is only present for selected
sets. Depending on the updateConsenus
, both either operate on set consensus or original data (see below for
implications).
annotatedPeakList
Returns a combined annotation table with all sets.
plotSpectrum
Is able to highlight set specific mass peaks (perSet
and mirror
arguments).
consensus
Creates the algorithm consensus based on the original annotation data (see below for
implications). Then, like the sets workflow method for generateFormulas
, a consensus is made for all
sets, which can be controlled with the setThreshold
and setThresholdAnn
arguments. The candidate
coverage among the different algorithms is calculated for each set (e.g. coverage-positive
column)
and for all sets (coverage
column), which is based on the presence of a candidate in all the algorithms from
all sets data. The consensus
method for sets workflow data supports the filterSets
argument. This
controls how the algorithm consensus abundance filters (absMinAbundance
/relMinAbundance
) are applied:
if filterSets=TRUE
then the minimum of all coverage
set specific columns is used to obtain the
algorithm abundance. Otherwise the overall coverage
column is used. For instance, consider a consensus
object to be generated from two objects generated by different algorithms (e.g. SIRIUS
and
GenForm
), which both have a positive and negative set. Then, if a candidate occurs with both
algorithms for the positive mode set, but only with the first algorithm in the negative mode set,
relMinAbundance=1
will remove the candidate if filterSets=TRUE
(because the minimum relative
algorithm abundance is ‘0.5’), while filterSets=FALSE
will not remove the candidate (because based on
all sets data the candidate occurs in both algorithms).
Two types of annotation data are stored in a formulasSet
object:
Annotations that are produced from a consensus between set results (see generateFormulas
).
The 'original' annotation data per set, prior to when the set consensus was made. This includes candidates
that were filtered out because of the thresholds set by setThreshold
and setThresholdAnn
. However,
when filter
or subsetting ([
) operations are performed, the original data is also updated.
In most cases the first data is used. However, in a few cases the original annotation data is used (as indicated
above), for instance, to re-create the set consensus. It is important to realize that the original annotation data
may have additional candidates, and a newly created set consensus may therefore have 'new' candidates. For
instance, when the object consists of the sets "positive"
and "negative"
and setThreshold=1
was used to create it, then formulas[, sets = "positive", updateConsensus = TRUE]
may now have additional
candidates, i.e. those that were not present in the "negative"
set and were previously removed due to
the consensus threshold filter.
The featureAnnotations
base class for more relevant methods and
generateFormulas
.
Returns a data.frame
with information on which scoring terms are used and what their algorithm specific name
is.
formulaScorings()
formulaScorings()
generateFormulas
This class is derived from formulas
and contains additional specific SIRIUS
data.
Objects from this class are generated by generateFormulasSIRIUS
fingerprints
A list
with for each feature group result a data.table
containing fingerprints
obtained with CSI:FingerID
. Will be empty unless the getFingerprints
argument to
generateFormulasSIRIUS
was set to TRUE
.
MS2QuantMeta
Metadata from MS2Quant filled in by predictRespFactors
.
Duhrkop K, Fleischauer M, Ludwig M, Aksenov AA, Melnik AV, Meusel M, Dorrestein PC, Rousu J, Bocker S (2019).
“SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information.”
Nature Methods, 16(4), 299–302.
doi:10.1038/s41592-019-0344-8.
Duhrkop K, Bocker S (2015).
“Fragmentation Trees Reloaded.”
In Przytycka TM (ed.), Research in Computational Molecular Biology, 65–79.
ISBN 978-3-319-16706-0.
Duhrkop K, Shen H, Meusel M, Rousu J, Bocker S (2015).
“Searching molecular structure databases with tandem mass spectra using CSI:FingerID.”
Proceedings of the National Academy of Sciences, 112(41), 12580–12585.
doi:10.1073/pnas.1509788112.
Bocker S, Letzel MC, Liptak Z, Pervukhin A (2008).
“SIRIUS: decomposing isotope patterns for metabolite identification.”
Bioinformatics, 25(2), 218–224.
doi:10.1093/bioinformatics/btn603.
formulas
and generateFormulasSIRIUS
Functionality to automatically group related feature groups (e.g. isotopes, adducts and homologues) to assist and simplify annotation.
generateComponents(fGroups, algorithm, ...) ## S4 method for signature 'featureGroups' generateComponents(fGroups, algorithm, ...)
generateComponents(fGroups, algorithm, ...) ## S4 method for signature 'featureGroups' generateComponents(fGroups, algorithm, ...)
fGroups |
|
algorithm |
A character string describing the algorithm that should be
used: |
... |
Any parameters to be passed to the selected component generation algorithm. |
Several algorithms are provided to group feature groups that are related in some (chemical) way to each other. How feature groups are related depends on the algorithm: examples include adducts, statistics and parents/transformation products. The linking of this data is generally useful for annotation purposes and reducing data complexity.
generateComponents
is a generic function that will generateComponents by one of the supported algorithms. The actual
functionality is provided by algorithm specific functions such as generateComponentsRAMClustR
and generateComponentsNontarget
. While these
functions may be called directly, generateComponents
provides a generic interface and is therefore usually preferred.
A components
(derived) object containing all generated components.
In a sets workflow the componentization data is generated differently
depending on the used algorithm. Please see the details in the algorithm specific functions linked in the See Also
section.
The components
output class and its methods and the algorithm specific functions:
generateComponentsRAMClustR
, generateComponentsCAMERA
, generateComponentsNontarget
, generateComponentsIntClust
, generateComponentsOpenMS
, generateComponentsCliqueMS
, generateComponentsSpecClust
, generateComponentsTPs
Interfaces with CAMERA to generate components from known adducts, isotopes and in-source fragments.
generateComponentsCAMERA(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsCAMERA( fGroups, ionization = NULL, onlyIsotopes = FALSE, minSize = 2, relMinReplicates = 0.5, extraOpts = NULL ) ## S4 method for signature 'featureGroupsSet' generateComponentsCAMERA(fGroups, ionization = NULL, ...)
generateComponentsCAMERA(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsCAMERA( fGroups, ionization = NULL, onlyIsotopes = FALSE, minSize = 2, relMinReplicates = 0.5, extraOpts = NULL ) ## S4 method for signature 'featureGroupsSet' generateComponentsCAMERA(fGroups, ionization = NULL, ...)
fGroups |
|
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
ionization |
Which ionization polarity was used to generate the data: should be (sets workflow) This parameter is not supported for sets workflows, as the ionization will always be detected automatically. |
onlyIsotopes |
Logical value. If |
minSize |
The minimum size of a component. Smaller components than this size will be removed. See note below. |
relMinReplicates |
Feature groups within a component are only kept when they contain data for at least this (relative) amount of replicate analyses. For instance, ‘0.5’ means that at least half of the replicates should contain data for a particular feature group in a component. In this calculation replicates that are fully absent within a component are not taken in to account. See note below. |
extraOpts |
Named character vector with extra arguments directly passed to
|
This function uses CAMERA to generate components. This function is called when calling generateComponents
with
algorithm="camera"
.
The specified featureGroups
object is automatically converted to an xcmsSet
object
using getXCMSSet
.
A components
(derived) object containing all generated components.
In a sets workflow the componentization is first performed for each
set independently. The resulting components are then all combined in a componentsSet
object. Note that
the components themselves are never merged. The components are renamed to include the set name from which they were
generated (e.g. "CMP1"
becomes "CMP1-positive"
).
The default value for minSize
and relMinReplicates
results in
extra filtering, hence, the final results may be different than what the algorithm normally would return.
Kuhl, C., Tautenhahn, R., Boettcher, C., Larson, T. R. and Neumann, S. CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. Analytical Chemistry, 84:283-289 (2012)
generateComponents
for more details and other algorithms.
Uses cliqueMS to generate components using the
cliqueMS::getCliques
function.
generateComponentsCliqueMS(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsCliqueMS( fGroups, ionization = NULL, maxCharge = 1, maxGrade = 2, ppm = 10, adductInfo = NULL, absMzDev = 0.005, minSize = 2, relMinAdductAbundance = 0.75, adductConflictsUsePref = TRUE, NMConflicts = c("preferential", "mostAbundant", "mostIntense"), prefAdducts = c("[M+H]+", "[M-H]-"), extraOptsCli = NULL, extraOptsIso = NULL, extraOptsAnn = NULL, parallel = TRUE ) ## S4 method for signature 'featureGroupsSet' generateComponentsCliqueMS(fGroups, ionization = NULL, ...)
generateComponentsCliqueMS(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsCliqueMS( fGroups, ionization = NULL, maxCharge = 1, maxGrade = 2, ppm = 10, adductInfo = NULL, absMzDev = 0.005, minSize = 2, relMinAdductAbundance = 0.75, adductConflictsUsePref = TRUE, NMConflicts = c("preferential", "mostAbundant", "mostIntense"), prefAdducts = c("[M+H]+", "[M-H]-"), extraOptsCli = NULL, extraOptsIso = NULL, extraOptsAnn = NULL, parallel = TRUE ) ## S4 method for signature 'featureGroupsSet' generateComponentsCliqueMS(fGroups, ionization = NULL, ...)
fGroups |
|
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
ionization |
Which ionization polarity was used to generate the data: should be (sets workflow) This parameter is not supported for sets workflows, as the ionization will always be detected automatically. |
maxCharge , maxGrade , ppm
|
Arguments passed to |
adductInfo |
Sets the |
absMzDev |
Maximum absolute m/z deviation. |
minSize |
The minimum size of a component. Smaller components than this size will be removed. See note below. |
relMinAdductAbundance |
The minimum relative abundance (‘0-1’) that an adduct should be assigned to
features within the same feature group. See the |
adductConflictsUsePref |
If set to |
NMConflicts |
The strategies to employ when not all neutral masses within a component are equal. Valid options
are: |
prefAdducts |
A |
extraOptsCli , extraOptsIso , extraOptsAnn
|
Named |
parallel |
If set to |
This function uses cliqueMS to generate components. This function is called when calling generateComponents
with
algorithm="cliquems"
.
The grouping of features in each component ('clique') is based on high similarity of chromatographic elution
profiles. All features in each component are then annotated with the
cliqueMS::getIsotopes
and
cliqueMS::getAnnotation
functions.
A componentsFeatures
derived object.
The returned components are based on so called feature components. Unlike other algorithms, components are first made on a feature level (per analysis), instead of for complete feature groups. In the final step the feature components are converted to 'regular' components by employing a consensus approach with the following steps:
If an adduct assigned to a feature only occurs as a minority compared to other adduct assigments within the
same feature group, it is considered as an outlier and removed accordingly (controlled by the
relMinAdductAbundance
argument).
For features within a feature group, only keep their adduct assignment if it occurs as the most frequent or
is preferential (controlled by adductConflictsUsePref
and prefAdducts
arguments).
Components are made by combining the feature groups for which at least one of their features are jointly present in the same feature component.
Conflicts of neutral mass assignments within a component (i.e. not all are the same) are dealt with.
Firstly, all feature groups with an unknown neutral mass are split in another component. Then, if conflicts still
occur, the feature groups with similar neutral mass (determined by absMzDev
argument) are grouped. Depending
on the NMConflicts
argument, the group with one or more preferential adduct(s) or that is the largest or
most intense is selected, whereas others are removed from the component. In case multiple groups contain
preferential adducts, and ‘>1’ preferential adducts are available, the group with the adduct that matches
first in prefAdducts
'wins'. In case of ties, one of the next strategies in NMConflicts
is tried.
If a feature group occurs in multiple components it will be removed completely.
the minSize
filter is applied.
In a sets workflow the componentization is first performed for each
set independently. The resulting components are then all combined in a componentsSet
object. Note that
the components themselves are never merged. The components are renamed to include the set name from which they were
generated (e.g. "CMP1"
becomes "CMP1-positive"
).
Senan O, Aguilar-Mogas A, Navarro M, Capellades J, Noon L, Burks D, Yanes O, Guimera R, Sales-Pardo M (2019). “CliqueMS: a computational tool for annotating in-source metabolite ions from LC-MS untargeted metabolomics data based on a coelution similarity network.” Bioinformatics, 35(20), 4089–4097. doi:10.1093/bioinformatics/btz207.
generateComponents
for more details and other algorithms.
Generates components based on intensity profiles of feature groups.
generateComponentsIntClust(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsIntClust( fGroups, method = "complete", metric = "euclidean", normalized = TRUE, average = TRUE, maxTreeHeight = 1, deepSplit = TRUE, minModuleSize = 1 )
generateComponentsIntClust(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsIntClust( fGroups, method = "complete", metric = "euclidean", normalized = TRUE, average = TRUE, maxTreeHeight = 1, deepSplit = TRUE, minModuleSize = 1 )
fGroups |
|
... |
Any parameters to be passed to the selected component generation algorithm. |
method |
Clustering method that should be applied (passed to
|
metric |
Distance metric used to calculate the distance matrix (passed to |
normalized , average
|
Passed to |
maxTreeHeight , deepSplit , minModuleSize
|
Arguments used by
|
This function uses hierarchical clustering of intensity profiles to generate components. This function is called when calling generateComponents
with
algorithm="intclust"
.
Hierarchical clustering is performed on normalized (and optionally replicate averaged) intensity data and
the resulting dendrogram is automatically cut with cutreeDynamicTree
. The distance matrix is
calculated with daisy
and clustering is performed with
fastcluster::hclust
. The clustering of the resulting components can be further
visualized and modified using the methods defined for componentsIntClust
.
The components are stored in objects derived from componentsIntClust
.
In a sets workflow normalization of feature intensities occur per set.
Schollee JE, Bourgin M, von Gunten U, McArdell CS, Hollender J (2018). “Non-target screening to trace ozonation transformation products in a wastewater treatment train including different post-treatments.” Water Research, 142, 267–278. doi:10.1016/j.watres.2018.05.045.
generateComponents
for more details and other algorithms.
Uses the nontarget R package to generate components by unsupervised detection of homologous series.
generateComponentsNontarget(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsNontarget( fGroups, ionization = NULL, rtRange = c(-120, 120), mzRange = c(5, 120), elements = c("C", "H", "O"), rtDev = 30, absMzDev = 0.002, absMzDevLink = absMzDev * 2, traceHack = all(R.Version()[c("major", "minor")] >= c(3, 4)), ... ) ## S4 method for signature 'featureGroupsSet' generateComponentsNontarget(fGroups, ionization = NULL, ...)
generateComponentsNontarget(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsNontarget( fGroups, ionization = NULL, rtRange = c(-120, 120), mzRange = c(5, 120), elements = c("C", "H", "O"), rtDev = 30, absMzDev = 0.002, absMzDevLink = absMzDev * 2, traceHack = all(R.Version()[c("major", "minor")] >= c(3, 4)), ... ) ## S4 method for signature 'featureGroupsSet' generateComponentsNontarget(fGroups, ionization = NULL, ...)
fGroups |
|
... |
Any further arguments passed to |
ionization |
Which ionization polarity was used to generate the data: should be (sets workflow) This parameter is not supported for sets workflows, as the ionization will always be detected automatically. |
rtRange |
A numeric vector containing the minimum and maximum retention time (in seconds) between homologues.
Series are always considered from low to high m/z, thus, a negative minimum retention time allows detection
of homologous series with increasing m/z and decreasing retention times. These values set the |
mzRange |
A numeric vector specifying the minimum and maximum m/z increment of a homologous series. Sets
the |
elements |
A character vector with elements to be considered for detection of repeating units. Sets the
|
rtDev |
Maximum retention time deviation. Sets the |
absMzDev |
Maximum absolute m/z deviation. Sets the |
absMzDevLink |
Maximum absolute m/z deviation when linking series. This should usually be a bit higher
than |
traceHack |
Currently |
This function uses nontarget to generate components. This function is called when calling generateComponents
with
algorithm="nontarget"
.
In the first step the homol.search
function is used to detect all homologous series
within each replicate group (analyses within each replicate group are averaged prior to detection). Then,
homologous series across replicate groups are merged in case of full overlap or when merging of partial overlapping
series causes no conflicts.
The generated comnponents are returned as an object from the componentsNT
class.
In a sets workflow the componentization is first performed for each
set independently. The resulting components are then all combined in a componentsNTSet
object. Note that
the components themselves are never merged. The components are renamed to include the set name from which they were
generated (e.g. "CMP1"
becomes "CMP1-positive"
).
The output class supports additional methods such as plotGraph
.
Loos M, Singer H (2017).
“Nontargeted homologue series extraction from hyphenated high resolution mass spectrometry data.”
Journal of Cheminformatics, 9(1).
doi:10.1186/s13321-017-0197-z.
Loos, M., Gerber, C., Corona, F., Hollender, J., Singer, H. (2015).
Accelerated isotope fine structure calculation using pruned transition trees,
Analytical Chemistry 87(11), 5738-5744.
generateComponents
for more details and other algorithms.
Uses the MetaboliteAdductDecharger utility (see http://www.openms.de) to generate components.
generateComponentsOpenMS(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsOpenMS( fGroups, ionization = NULL, chargeMin = 1, chargeMax = 1, chargeSpan = 3, qTry = "heuristic", potentialAdducts = NULL, minRTOverlap = 0.66, retWindow = 1, absMzDev = 0.005, minSize = 2, relMinAdductAbundance = 0.75, adductConflictsUsePref = TRUE, NMConflicts = c("preferential", "mostAbundant", "mostIntense"), prefAdducts = c("[M+H]+", "[M-H]-"), extraOpts = NULL ) ## S4 method for signature 'featureGroupsSet' generateComponentsOpenMS( fGroups, ionization = NULL, chargeMin = 1, chargeMax = 1, chargeSpan = 3, qTry = "heuristic", potentialAdducts = NULL, ... )
generateComponentsOpenMS(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsOpenMS( fGroups, ionization = NULL, chargeMin = 1, chargeMax = 1, chargeSpan = 3, qTry = "heuristic", potentialAdducts = NULL, minRTOverlap = 0.66, retWindow = 1, absMzDev = 0.005, minSize = 2, relMinAdductAbundance = 0.75, adductConflictsUsePref = TRUE, NMConflicts = c("preferential", "mostAbundant", "mostIntense"), prefAdducts = c("[M+H]+", "[M-H]-"), extraOpts = NULL ) ## S4 method for signature 'featureGroupsSet' generateComponentsOpenMS( fGroups, ionization = NULL, chargeMin = 1, chargeMax = 1, chargeSpan = 3, qTry = "heuristic", potentialAdducts = NULL, ... )
fGroups |
|
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
ionization |
Which ionization polarity was used to generate the data: should be (sets workflow) This parameter is not supported for sets workflows, as the ionization will always be detected automatically. |
chargeMin , chargeMax
|
The minimum/maximum charge to consider. Corresponds to the
|
chargeSpan |
The maximum charge span for a single analyte. Corresponds to
|
qTry |
Sets how charges are determined. Corresponds to |
potentialAdducts |
The adducts to consider. Should be a (sets workflow) Should be a |
minRTOverlap , retWindow
|
Sets feature retention tolerances when grouping features. Sets the
|
absMzDev |
Maximum absolute m/z deviation. Sets the |
minSize |
The minimum size of a component. Smaller components than this size will be removed. See note below. |
relMinAdductAbundance |
The minimum relative abundance (‘0-1’) that an adduct should be assigned to
features within the same feature group. See the |
adductConflictsUsePref |
If set to |
NMConflicts |
The strategies to employ when not all neutral masses within a component are equal. Valid options
are: |
prefAdducts |
A |
extraOpts |
Named character vector with extra command line parameters directly passed to
|
This function uses OpenMS to generate components. This function is called when calling generateComponents
with
algorithm="openms"
.
Features that show highly similar chromatographic elution profiles are grouped, and subsequently annotated with their adducts.
A componentsFeatures
derived object.
The returned components are based on so called feature components. Unlike other algorithms, components are first made on a feature level (per analysis), instead of for complete feature groups. In the final step the feature components are converted to 'regular' components by employing a consensus approach with the following steps:
If an adduct assigned to a feature only occurs as a minority compared to other adduct assigments within the
same feature group, it is considered as an outlier and removed accordingly (controlled by the
relMinAdductAbundance
argument).
For features within a feature group, only keep their adduct assignment if it occurs as the most frequent or
is preferential (controlled by adductConflictsUsePref
and prefAdducts
arguments).
Components are made by combining the feature groups for which at least one of their features are jointly present in the same feature component.
Conflicts of neutral mass assignments within a component (i.e. not all are the same) are dealt with.
Firstly, all feature groups with an unknown neutral mass are split in another component. Then, if conflicts still
occur, the feature groups with similar neutral mass (determined by absMzDev
argument) are grouped. Depending
on the NMConflicts
argument, the group with one or more preferential adduct(s) or that is the largest or
most intense is selected, whereas others are removed from the component. In case multiple groups contain
preferential adducts, and ‘>1’ preferential adducts are available, the group with the adduct that matches
first in prefAdducts
'wins'. In case of ties, one of the next strategies in NMConflicts
is tried.
If a feature group occurs in multiple components it will be removed completely.
the minSize
filter is applied.
In a sets workflow the componentization is first performed for each
set independently. The resulting components are then all combined in a componentsSet
object. Note that
the components themselves are never merged. The components are renamed to include the set name from which they were
generated (e.g. "CMP1"
becomes "CMP1-positive"
).
generateComponentsOpenMS uses multiprocessing to parallelize computations. Please see the parallelization section in the handbook for more details and patRoon options for configuration options.
Bielow C, Ruzek S, Huber CG, Reinert K (2010). “Optimal Decharging and Clustering of Charge Ladders Generated in ESI-MS.” Journal of Proteome Research, 9(5), 2688–2695. doi:10.1021/pr100177k.
generateComponents
for more details and other algorithms.
Uses RAMClustR to generate components from feature groups which follow similar chromatographic retention profiles and annotate their relationships (e.g. adducts and isotopes).
generateComponentsRAMClustR(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsRAMClustR( fGroups, ionization = NULL, st = NULL, sr = NULL, maxt = 12, hmax = 0.3, normalize = "TIC", absMzDev = 0.002, relMzDev = 5, minSize = 2, relMinReplicates = 0.5, RCExperimentVals = list(design = list(platform = "LC-MS"), instrument = list(ionization = ionization, MSlevs = 1)), extraOptsRC = NULL, extraOptsFM = NULL ) ## S4 method for signature 'featureGroupsSet' generateComponentsRAMClustR(fGroups, ionization = NULL, ...)
generateComponentsRAMClustR(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsRAMClustR( fGroups, ionization = NULL, st = NULL, sr = NULL, maxt = 12, hmax = 0.3, normalize = "TIC", absMzDev = 0.002, relMzDev = 5, minSize = 2, relMinReplicates = 0.5, RCExperimentVals = list(design = list(platform = "LC-MS"), instrument = list(ionization = ionization, MSlevs = 1)), extraOptsRC = NULL, extraOptsFM = NULL ) ## S4 method for signature 'featureGroupsSet' generateComponentsRAMClustR(fGroups, ionization = NULL, ...)
fGroups |
|
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
ionization |
Which ionization polarity was used to generate the data: should be (sets workflow) This parameter is not supported for sets workflows, as the ionization will always be detected automatically. |
st , sr , maxt , hmax , normalize
|
Arguments to tune the behaviour of feature group clustering. See their documentation
from |
absMzDev |
Maximum absolute m/z deviation. Sets the |
relMzDev |
Maximum relative mass deviation (ppm). Sets the |
minSize |
The minimum size of a component. Smaller components than this size will be removed. See note below. Sets the |
relMinReplicates |
Feature groups within a component are only kept when they contain data for at least this (relative) amount of replicate analyses. For instance, ‘0.5’ means that at least half of the replicates should contain data for a particular feature group in a component. In this calculation replicates that are fully absent within a component are not taken in to account. See note below. |
RCExperimentVals |
A named |
extraOptsRC , extraOptsFM
|
Named |
This function uses RAMClustR to generate components. This function is called when calling generateComponents
with
algorithm="ramclustr"
.
This method uses the ramclustR
functions for generating the components, whereas
do.findmain
is used for annotation.
A components
(derived) object containing all generated components.
In a sets workflow the componentization is first performed for each
set independently. The resulting components are then all combined in a componentsSet
object. Note that
the components themselves are never merged. The components are renamed to include the set name from which they were
generated (e.g. "CMP1"
becomes "CMP1-positive"
).
The default value for relMinReplicates
results in
extra filtering, hence, the final results may be different than what the algorithm normally would return.
Broeckling, Heuberger CD;, Prince AL;, Ingelsson JA;, Prenni E;, E. J (2013).
“Assigning precursor-product ion relationships in indiscriminant MS/MS data from non-targeted metabolite profiling studies.”
Analytical Chemistry, 9, 33-43.
Broeckling CD, Afsar FA, Neumann S, Ben-Hur A, Prenni JE (2014).
“RAMClust: A Novel Feature Clustering Method Enables Spectral-Matching-Based Annotation for Metabolomics Data.”
Analytical Chemistry, 86 (14), 6812–6817.
generateComponents
for more details and other algorithms.
Generates components based on MS/MS similarity between feature groups.
generateComponentsSpecClust(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsSpecClust( fGroups, MSPeakLists, method = "complete", specSimParams = getDefSpecSimParams(), maxTreeHeight = 1, deepSplit = TRUE, minModuleSize = 1 )
generateComponentsSpecClust(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsSpecClust( fGroups, MSPeakLists, method = "complete", specSimParams = getDefSpecSimParams(), maxTreeHeight = 1, deepSplit = TRUE, minModuleSize = 1 )
fGroups |
|
... |
Any parameters to be passed to the selected component generation algorithm. |
MSPeakLists |
The |
method |
Clustering method that should be applied (passed to
|
specSimParams |
A named |
maxTreeHeight , deepSplit , minModuleSize
|
Arguments used by
|
This function uses hierarchical clustering of MS/MS spectra to generate components. This function is called when calling generateComponents
with
algorithm="specclust"
.
The similarities are converted to a distance matrix and used as input for hierarchical clustering, and the
resulting dendrogram is automatically cut with cutreeDynamicTree
. The clustering is performed with
fastcluster::hclust
.
The components are stored in objects derived from componentsSpecClust
.
In a sets workflow the spectral similarities for each set are
combined as is described for the spectrumSimilarity
method
for sets workflows.
Rick Helmus <[email protected]> and Bas van de Velde (major contributions to spectral binning and similarity calculation).
generateComponents
for more details and other algorithms.
Generates components by linking feature groups of transformation products and their parents.
generateComponentsTPs(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsTPs( fGroups, fGroupsTPs = fGroups, ignoreParents = FALSE, TPs = NULL, MSPeakLists = NULL, formulas = NULL, compounds = NULL, minRTDiff = 20, specSimParams = getDefSpecSimParams() ) ## S4 method for signature 'featureGroupsSet' generateComponentsTPs( fGroups, fGroupsTPs = fGroups, ignoreParents = FALSE, TPs = NULL, MSPeakLists = NULL, formulas = NULL, compounds = NULL, minRTDiff = 20, specSimParams = getDefSpecSimParams() )
generateComponentsTPs(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsTPs( fGroups, fGroupsTPs = fGroups, ignoreParents = FALSE, TPs = NULL, MSPeakLists = NULL, formulas = NULL, compounds = NULL, minRTDiff = 20, specSimParams = getDefSpecSimParams() ) ## S4 method for signature 'featureGroupsSet' generateComponentsTPs( fGroups, fGroupsTPs = fGroups, ignoreParents = FALSE, TPs = NULL, MSPeakLists = NULL, formulas = NULL, compounds = NULL, minRTDiff = 20, specSimParams = getDefSpecSimParams() )
fGroups |
The input |
... |
Further arguments specified to the methods. |
fGroupsTPs |
A |
ignoreParents |
If |
TPs |
A |
MSPeakLists , formulas , compounds
|
A |
minRTDiff |
Minimum retention time (in seconds) difference between the parent and a TP to determine whether a TP
elutes prior/after the parent (to calculate |
specSimParams |
A named |
This function uses transformation product screening to generate components. This function is called when calling generateComponents
with
algorithm="tp"
.
This method typically employs data from generated transformation products to find
parents and their TPs. However, this data is not necessary, and components can also be made based on MS/MS
similarity and/or other annotation similarities between the parent and its TPs. For more details see the
Linking parents and transformation products
section below.
The components are stored in objects derived from componentsTPs
.
Each component consists of feature groups that are considered
to be transformation products for one parent (the parent that 'belongs' to the component can be retrieved with the
componentInfo
method). The parent feature groups are taken from the fGroups
parameter, while
the feature groups for TPs are taken from fGroupsTPs
. If a feature group occurs in both variables, it may
therefore be considered as both a parent or TP.
If transformation product data is given, i.e. the TPs
argument is set, then a suspect screening of
the TPs must be performed in advance (see screenSuspects
and convertToSuspects
to
create the suspect list). Furthermore, if TPs were generated with generateTPsBioTransformer
or
generateTPsLibrary
then the suspect screening must also include the parents (e.g. by setting
includeParents=TRUE
when calling convertToSuspects
or by amending results by setting
amend=TRUE
to screenSuspects
). The suspect screening is necessary for the componentization algorithm
to map the feature groups of the parent or TP. If the the suspect screening yields multiple TP hits, all will be
reported. Similarly, if the suspect screening contains multiple hits for a parent, a component is made for each of
the parent hits.
In case no transformation product data is provided (TPs=NULL
), the componentization algorithm simply assumes
that each feature group from fGroupsTPs
is a potential TP for every parent feature group in fGroups
.
For this reason, it is highly recommended to specify which feature groups are parents/TPs (see the
fGroupsTPs
argument description above) and crucial that the data is post-processed, for instance by
only retaining TPs that have high annotation similarity with their parents (see the
filter
method for componentsTPs
).
A typical way to distinguish which feature groups are parents or TPs from two different (groups of) samples is by
calculating Fold Changes (see the as.data.table
method for
feature groups and plotVolcano
). Of course, other statistical techniques from R are also suitable.
During componentization, several characteristics are calculated which may be useful for post-processing:
specSimilarity
: the MS/MS spectral similarity between the feature groups of the TP and its parent
(‘0-1’).
specSimilarityPrec
,specSimilarityBoth
: as specSimilarity
, but calculated with binned
data using the "precursor"
and "both"
method, respectively (see MS spectral
similarity parameters for more details).
fragmentMatches
The number of MS/MS fragment formula annotations that overlap between the TP and
parent. If both the formulas
and compounds
arguments are specified then the annotation data is pooled
prior to calculation. Note that only unique matches are counted. Furthermore, note that annotations from all
candidates are considered, even if the formula/structure of the parent/TP is known. Hence, fragmentMatches
is mainly useful when little or no chemical information is known on the parents/TPs, i.e., when
TPs=NULL
or originates from generateTPsLogic
. Since annotations for all candidates are used,
it is highly recommended that the annotation objects are first processed with the filter
method, for
instance, to select only the top ranked candidates.
neutralLossMatches
As fragmentMatches
, but counting overlapping neutral loss formulae.
retDir
The retention time direction of the TP relative to its parent. See Details in
componentsTPs. If TP data was specified, the expected direction is stored in TP_retDir
.
retDiff
,mzDiff
,formulaDiff
The retention time, m/z and formula difference between
the parent and TP (latter only available if data TP formula is available).
In a sets workflow the component tables are amended with extra information such as overall/specific set spectrum similarities. As sets data is mixed, transformation products are able to be linked with a parent, even if they were not measured in the same set.
The shift
parameter of specSimParams
is ignored by generateComponentsTPs
, since it always
calculates similarities with all supported options.
generateComponents
for more details and other algorithms.
Automatically perform chemical compound annotation for feature groups.
generateCompounds(fGroups, MSPeakLists, algorithm, ...) ## S4 method for signature 'featureGroups' generateCompounds(fGroups, MSPeakLists, algorithm, ...)
generateCompounds(fGroups, MSPeakLists, algorithm, ...) ## S4 method for signature 'featureGroups' generateCompounds(fGroups, MSPeakLists, algorithm, ...)
fGroups |
|
MSPeakLists |
A |
algorithm |
A character string describing the algorithm that should be
used: |
... |
Any parameters to be passed to the selected compound generation algorithm. |
Several algorithms are provided to automatically perform compound annotation for feature groups. To this end, measured masses for all feature groups are searched within online database(s) (e.g. PubChem) to retrieve a list of potential candidate chemical compounds. Depending on the algorithm and its parameters, further scoring of candidates is then performed using, for instance, matching of measured and theoretical isotopic patterns, presence within other data sources such as patent databases and similarity of measured and in-silico predicted MS/MS fragments. Note that this process is often quite time consuming, especially for large feature group sets. Therefore, this is often one of the last steps within the workflow and not performed before feature groups have been prioritized.
generateCompounds
is a generic function that will generateCompounds by one of the supported algorithms. The actual
functionality is provided by algorithm specific functions such as generateCompoundsMetFrag
and generateCompoundsSIRIUS
. While these
functions may be called directly, generateCompounds
provides a generic interface and is therefore usually preferred.
A compounds
derived object containing all compound annotations.
Each algorithm implements their own scoring system. Their names have been simplified and
harmonized where possible. The compoundScorings
function can be used to get an overview of both the
algorithm specific and generic scoring names.
With a sets workflow, annotation is first performed for each set. This is important, since the annotation algorithms typically cannot work with data from mixed ionization modes. The annotation results are then combined to generate a sets consensus:
The annotation tables for each feature group from the set specific data are combined. Rows with overlapping candidates (determined by the first-block InChIKey) are merged.
Set specific data (e.g. the ionic formula) is retained by renaming their columns with set specific names.
The MS/MS fragment annotations (fragInfo
column) from each set are combined.
The scorings for each set are averaged to calculate overall scores. if setAvgSpecificScores=FALSE
then
scorings that are considered set specific (e.g. MS/MS and isotopic pattern match) are not averaged.
The candidates are re-ranked based on their average ranking among the set data (if a candidate is absent in a set it is assigned the poorest rank in that set).
The coverage of each candidate among sets is calculated. Depending on the setThreshold
and
setThresholdAnn
arguments, candidates with low abundance are removed.
The compounds
output class and its methods and the algorithm specific functions:
generateCompoundsMetFrag
, generateCompoundsSIRIUS
, generateCompoundsLibrary
Uses a MS library loaded by loadMSLibrary
for compound annotation.
generateCompoundsLibrary(fGroups, ...) ## S4 method for signature 'featureGroups' generateCompoundsLibrary( fGroups, MSPeakLists, MSLibrary, minSim = 0.75, minAnnSim = minSim, absMzDev = 0.002, adduct = NULL, checkIons = "adduct", spectrumType = "MS2", specSimParams = getDefSpecSimParams(), specSimParamsLib = getDefSpecSimParams() ) ## S4 method for signature 'featureGroupsSet' generateCompoundsLibrary( fGroups, MSPeakLists, MSLibrary, minSim = 0.75, minAnnSim = minSim, absMzDev = 0.002, adduct = NULL, ..., setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE )
generateCompoundsLibrary(fGroups, ...) ## S4 method for signature 'featureGroups' generateCompoundsLibrary( fGroups, MSPeakLists, MSLibrary, minSim = 0.75, minAnnSim = minSim, absMzDev = 0.002, adduct = NULL, checkIons = "adduct", spectrumType = "MS2", specSimParams = getDefSpecSimParams(), specSimParamsLib = getDefSpecSimParams() ) ## S4 method for signature 'featureGroupsSet' generateCompoundsLibrary( fGroups, MSPeakLists, MSLibrary, minSim = 0.75, minAnnSim = minSim, absMzDev = 0.002, adduct = NULL, ..., setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE )
fGroups |
|
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
MSPeakLists |
A |
MSLibrary |
The |
minSim |
The minimum spectral similarity for candidate records. |
minAnnSim |
The minimum spectral similarity of a record for it to be used to find annotations (see the
|
absMzDev |
The maximum absolute m/z deviation between the feature group and library record m/z values for candidate selection. |
adduct |
An (sets workflow) The |
checkIons |
A |
spectrumType |
A |
specSimParams |
A named |
specSimParamsLib |
Like |
setThreshold |
(sets workflow) Minimum abundance for a candidate among all sets (‘0-1’). For instance, a value of ‘1’ means that the candidate needs to be present in all the set data. |
setThresholdAnn |
(sets workflow) As |
setAvgSpecificScores |
(sets workflow) If |
This function uses MS library spectra to generate compound candidates. This function is called when calling generateCompounds
with
algorithm="library"
.
This method matches measured MS/MS data (peak lists) with those from an MS library to find candidate structures. Hence, only feature groups with MS/MS peak list data are annotated.
The library is searched for candidates with the following criteria:
Only records with ion m/z (PrecursorMZ
), SMILES, InChI, InChIKey
and formula
data are considered.
Depending on the value of the checkIons
argument, records with different adduct
(Precursor_type
) or polarity (Ion_mode
) may be ignored.
The m/z values of the candidate and feature group should match (tolerance set by absMzDev
argument).
The spectral similarity should not be lower than the value defined for the minSim
argument.
If multiple candidates with the same first-block InChIKey are found then only the candidate with the best spectral match is kept.
If the library contains annotations these will be added to the matched MS/MS peaks. However, since the candidate
selected from criterion #5 above may not contain all the annotation data available from the MS library, annotations
from other records are also considered (controlled by the minAnnSim
argument). If this leads to different
annotations for the same mass peak then only the most abundant annotation is kept.
generateCompounds
for more details and other algorithms.
loadMSLibrary
to obtain MS library data and the methods for MSLibrary
to treat
the data before using it for annotation.
Uses the metfRag package or MetFrag CL
for compound identification (see
http://ipb-halle.github.io/MetFrag/).
generateCompoundsMetFrag(fGroups, ...) ## S4 method for signature 'featureGroups' generateCompoundsMetFrag( fGroups, MSPeakLists, method = "CL", timeout = 300, timeoutRetries = 2, errorRetries = 2, topMost = 100, dbRelMzDev = 5, fragRelMzDev = 5, fragAbsMzDev = 0.002, adduct = NULL, database = "pubchem", extendedPubChem = "auto", chemSpiderToken = "", scoreTypes = compoundScorings("metfrag", database, onlyDefault = TRUE)$name, scoreWeights = 1, preProcessingFilters = c("UnconnectedCompoundFilter", "IsotopeFilter"), postProcessingFilters = c("InChIKeyFilter"), maxCandidatesToStop = 2500, identifiers = NULL, extraOpts = NULL ) ## S4 method for signature 'featureGroupsSet' generateCompoundsMetFrag( fGroups, MSPeakLists, method = "CL", timeout = 300, timeoutRetries = 2, errorRetries = 2, topMost = 100, dbRelMzDev = 5, fragRelMzDev = 5, fragAbsMzDev = 0.002, adduct = NULL, ..., setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE )
generateCompoundsMetFrag(fGroups, ...) ## S4 method for signature 'featureGroups' generateCompoundsMetFrag( fGroups, MSPeakLists, method = "CL", timeout = 300, timeoutRetries = 2, errorRetries = 2, topMost = 100, dbRelMzDev = 5, fragRelMzDev = 5, fragAbsMzDev = 0.002, adduct = NULL, database = "pubchem", extendedPubChem = "auto", chemSpiderToken = "", scoreTypes = compoundScorings("metfrag", database, onlyDefault = TRUE)$name, scoreWeights = 1, preProcessingFilters = c("UnconnectedCompoundFilter", "IsotopeFilter"), postProcessingFilters = c("InChIKeyFilter"), maxCandidatesToStop = 2500, identifiers = NULL, extraOpts = NULL ) ## S4 method for signature 'featureGroupsSet' generateCompoundsMetFrag( fGroups, MSPeakLists, method = "CL", timeout = 300, timeoutRetries = 2, errorRetries = 2, topMost = 100, dbRelMzDev = 5, fragRelMzDev = 5, fragAbsMzDev = 0.002, adduct = NULL, ..., setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE )
fGroups |
|
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
MSPeakLists |
A |
method |
Which method should be used for MetFrag execution: |
timeout |
Maximum time (in seconds) before a metFrag query for a feature group is stopped. Also see
|
timeoutRetries |
Maximum number of retries after reaching a timeout before completely skipping the metFrag query
for a feature group. Also see |
errorRetries |
Maximum number of retries after an error occurred. This may be useful to handle e.g. connection errors. |
topMost |
Only keep this number of candidates (per feature group) with highest score. Set to |
dbRelMzDev |
Relative mass deviation (in ppm) for database search. Sets the DatabaseSearchRelativeMassDeviation option. |
fragRelMzDev |
Relative mass deviation (in ppm) for fragment matching. Sets the FragmentPeakMatchRelativeMassDeviation option. |
fragAbsMzDev |
Absolute mass deviation (in Da) for fragment matching. Sets the FragmentPeakMatchAbsoluteMassDeviation option. |
adduct |
An (sets workflow) The |
database |
Compound database to use. Valid values are: |
extendedPubChem |
If |
chemSpiderToken |
A character string with the ChemSpider security token that should be set when the ChemSpider database is used. Sets the ChemSpiderToken option. |
scoreTypes |
A character vector defining the scoring types. See the |
scoreWeights |
Numeric vector containing weights of the used scoring types. Order is the same as set in
|
preProcessingFilters , postProcessingFilters
|
A character vector defining pre/post filters applied before/after
fragmentation and scoring (e.g. |
maxCandidatesToStop |
If more than this number of candidate structures are found then processing will be aborted and no results this feature group will be reported. Low values increase the chance of missing data, whereas too high values will use too much computer resources and signficantly slowdown the process. Sets the MaxCandidateLimitToStop option. |
identifiers |
A |
extraOpts |
A named |
setThreshold |
(sets workflow) Minimum abundance for a candidate among all sets (‘0-1’). For instance, a value of ‘1’ means that the candidate needs to be present in all the set data. |
setThresholdAnn |
(sets workflow) As |
setAvgSpecificScores |
(sets workflow) If |
This function uses MetFrag to generate compound candidates. This function is called when calling generateCompounds
with
algorithm="metfrag"
.
Several online compound databases such as PubChem and
ChemSpider may be chosen for retrieval of candidate structures. This method
requires the availability of MS/MS data, and feature groups without it will be ignored. Many options exist to score
and filter resulting data, and it is highly suggested to optimize these to improve results. The MetFrag
options PeakList
, IonizedPrecursorMass
and ExperimentalRetentionTimeValue
(in minutes) fields
are automatically set from feature data.
generateCompoundsMetFrag
returns a compoundsMF
object.
MetFrag
supports many different scorings to rank candidates. The
compoundScorings
function can be used to get an overview: (some columns are omitted)
name | metfrag | database |
score | Score | |
fragScore | FragmenterScore | |
metFusionScore | OfflineMetFusionScore | |
individualMoNAScore | OfflineIndividualMoNAScore | |
numberPatents | PubChemNumberPatents | pubchem |
numberPatents | Patent_Count | pubchemlite |
pubMedReferences | PubChemNumberPubMedReferences | pubchem |
pubMedReferences | ChemSpiderNumberPubMedReferences | chemspider |
pubMedReferences | NUMBER_OF_PUBMED_ARTICLES | comptox |
pubMedReferences | PubMed_Count | pubchemlite |
extReferenceCount | ChemSpiderNumberExternalReferences | chemspider |
dataSourceCount | ChemSpiderDataSourceCount | chemspider |
referenceCount | ChemSpiderReferenceCount | chemspider |
RSCCount | ChemSpiderRSCCount | chemspider |
smartsInclusionScore | SmartsSubstructureInclusionScore | |
smartsExclusionScore | SmartsSubstructureExclusionScore | |
suspectListScore | SuspectListScore | |
retentionTimeScore | RetentionTimeScore | |
CPDATCount | CPDAT_COUNT | comptox |
TOXCASTActive | TOXCAST_PERCENT_ACTIVE | comptox |
dataSources | DATA_SOURCES | comptox |
pubChemDataSources | PUBCHEM_DATA_SOURCES | comptox |
EXPOCASTPredExpo | EXPOCAST_MEDIAN_EXPOSURE_PREDICTION_MG/KG-BW/DAY | comptox |
ECOTOX | ECOTOX | comptox |
NORMANSUSDAT | NORMANSUSDAT | comptox |
MASSBANKEU | MASSBANKEU | comptox |
TOX21SL | TOX21SL | comptox |
TOXCAST | TOXCAST | comptox |
KEMIMARKET | KEMIMARKET | comptox |
MZCLOUD | MZCLOUD | comptox |
pubMedNeuro | PubMedNeuro | comptox |
CIGARETTES | CIGARETTES | comptox |
INDOORCT16 | INDOORCT16 | comptox |
SRM2585DUST | SRM2585DUST | comptox |
SLTCHEMDB | SLTCHEMDB | comptox |
THSMOKE | THSMOKE | comptox |
ITNANTIBIOTIC | ITNANTIBIOTIC | comptox |
STOFFIDENT | STOFFIDENT | comptox |
KEMIMARKET_EXPO | KEMIMARKET_EXPO | comptox |
KEMIMARKET_HAZ | KEMIMARKET_HAZ | comptox |
REACH2017 | REACH2017 | comptox |
KEMIWW_WDUIndex | KEMIWW_WDUIndex | comptox |
KEMIWW_StpSE | KEMIWW_StpSE | comptox |
KEMIWW_SEHitsOverDL | KEMIWW_SEHitsOverDL | comptox |
ZINC15PHARMA | ZINC15PHARMA | comptox |
PFASMASTER | PFASMASTER | comptox |
peakFingerprintScore | AutomatedPeakFingerprintAnnotationScore | |
lossFingerprintScore | AutomatedLossFingerprintAnnotationScore | |
agroChemInfo | AgroChemInfo | pubchemlite |
bioPathway | BioPathway | pubchemlite |
drugMedicInfo | DrugMedicInfo | pubchemlite |
foodRelated | FoodRelated | pubchemlite |
pharmacoInfo | PharmacoInfo | pubchemlite |
safetyInfo | SafetyInfo | pubchemlite |
toxicityInfo | ToxicityInfo | pubchemlite |
knownUse | KnownUse | pubchemlite |
disorderDisease | DisorderDisease | pubchemlite |
identification | Identification | pubchemlite |
annoTypeCount | FPSum | pubchemlite |
annoTypeCount | AnnoTypeCount | pubchemlite |
annotHitCount | AnnotHitCount | pubchemlite |
In addition, the compoundScorings
function is also useful to programmatically
generate a set of scorings to be used for ranking with MetFrag
. For instance, the following can be given
to the scoreTypes
argument to use all default scorings for PubChem: compoundScorings("metfrag",
"pubchem", onlyDefault=TRUE)$name
.
For all MetFrag
scoring types refer to the Candidate Scores
section on the
MetFragR homepage.
When database="chemspider"
setting the chemSpiderToken
argument is
mandatory.
If a local database is chosen via sdf
, psv
, or csv
then its file location should be set with
the LocalDatabasePath
value via the extraOpts
argument. For example: extraOpts =
list(LocalDatabasePath = "C:/myDB.csv")
.
If database="pubchemlite"
or database="comptox"
and patRoonExt is not installed then the
file location must be specified as above or by setting the
patRoon.path.MetFragPubChemLite
/patRoon.path.MetFragCompTox
option. See the installation section in
the handbook for more details.
generateCompoundsMetFrag uses multiprocessing to parallelize computations. Please see the parallelization section in the handbook for more details and patRoon options for configuration options.
When local database files are used with generateCompoundsMetFrag
(e.g. when
database
is set to "pubchemlite"
, "csv"
etc.) and patRoon.MP.method="future", then
the database file must be present on all the nodes. When pubchemlite
or comptox
is used, the location
for these databases can be configured on the host with the respective package options
(patRoon.path.MetFragPubChemLite and patRoon.path.MetFragCompTox) or made available by installing
the patRoonExt package. Note that these files must also be present on the local host computer, even if
it is not participating in computations.
Ruttkies C, Schymanski EL, Wolf S, Hollender J, Neumann S (2016). “MetFrag relaunched: incorporating strategies beyond in silico fragmentation.” Journal of Cheminformatics, 8(1). doi:10.1186/s13321-016-0115-9.
generateCompounds
for more details and other algorithms.
Uses SIRIUS in combination with CSI:FingerID for compound annotation.
generateCompoundsSIRIUS(fGroups, ...) ## S4 method for signature 'featureGroups' generateCompoundsSIRIUS( fGroups, MSPeakLists, relMzDev = 5, adduct = NULL, projectPath = NULL, elements = "CHNOP", profile = "qtof", formulaDatabase = NULL, fingerIDDatabase = "pubchem", noise = NULL, cores = NULL, topMost = 100, topMostFormulas = 5, login = "check", alwaysLogin = FALSE, extraOptsGeneral = NULL, extraOptsFormula = NULL, verbose = TRUE, splitBatches = FALSE, dryRun = FALSE ) ## S4 method for signature 'featureGroupsSet' generateCompoundsSIRIUS( fGroups, MSPeakLists, relMzDev = 5, adduct = NULL, projectPath = NULL, ..., setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE )
generateCompoundsSIRIUS(fGroups, ...) ## S4 method for signature 'featureGroups' generateCompoundsSIRIUS( fGroups, MSPeakLists, relMzDev = 5, adduct = NULL, projectPath = NULL, elements = "CHNOP", profile = "qtof", formulaDatabase = NULL, fingerIDDatabase = "pubchem", noise = NULL, cores = NULL, topMost = 100, topMostFormulas = 5, login = "check", alwaysLogin = FALSE, extraOptsGeneral = NULL, extraOptsFormula = NULL, verbose = TRUE, splitBatches = FALSE, dryRun = FALSE ) ## S4 method for signature 'featureGroupsSet' generateCompoundsSIRIUS( fGroups, MSPeakLists, relMzDev = 5, adduct = NULL, projectPath = NULL, ..., setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE )
fGroups |
|
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
MSPeakLists |
A |
relMzDev |
Maximum relative deviation between the measured and candidate formula m/z values (in ppm). Sets the --ppm-max command line option. |
adduct |
An (sets workflow) The |
projectPath , dryRun
|
These are mainly for internal purposes. (sets workflow) |
elements |
Elements to be considered for formulae calculation. This will heavily affects the number of
candidates! Always try to work with a minimal set by excluding elements you don't expect. The minimum/maximum
number of elements can also be specified, for example: a value of |
profile |
Name of the configuration profile, for example: "qtof", "orbitrap", "fticr". Sets the --profile commandline option. |
formulaDatabase |
If not |
fingerIDDatabase |
Database specifically used for |
noise |
Median intensity of the noise ( |
cores |
The number of cores |
topMost |
Only keep this number of candidates (per feature group) with highest score. Set to |
topMostFormulas |
Do not return more than this number of candidate formulae. Note that only compounds for these formulae will be searched. Sets the --candidates commandline option. |
login , alwaysLogin
|
Specifies if and how account logging of SIRIUS should be handled:
if See the SIRIUS website and patRoon handbook for more information. |
extraOptsGeneral , extraOptsFormula
|
a |
verbose |
If |
splitBatches |
If |
setThreshold |
(sets workflow) Minimum abundance for a candidate among all sets (‘0-1’). For instance, a value of ‘1’ means that the candidate needs to be present in all the set data. |
setThresholdAnn |
(sets workflow) As |
setAvgSpecificScores |
(sets workflow) If |
This function uses SIRIUS to generate compound candidates. This function is called when calling generateCompounds
with
algorithm="sirius"
.
Similar to generateFormulasSIRIUS
, candidate formulae are generated with SIRIUS. These results
are then fed to CSI:FingerID to acquire candidate structures. Candidate formulae without any assigned structure
will be removed (unlike generateFormulasSIRIUS
). This method requires the availability of MS/MS data,
and feature groups without it will be ignored.
A compoundsSIRIUS
object.
generateCompoundsSIRIUS uses multiprocessing to parallelize computations. Please see the parallelization section in the handbook for more details and patRoon options for configuration options.
For annotations performed with SIRIUS
it is often the fastest to keep the default
splitBatches=FALSE
. In this case, all SIRIUS
output will be printed to the terminal (unless
verbose=FALSE
or patRoon.MP.method="future"). Furthermore, please note that only annotations to be
performed for the same adduct are grouped in a single batch execution.
Duhrkop K, Fleischauer M, Ludwig M, Aksenov AA, Melnik AV, Meusel M, Dorrestein PC, Rousu J, Bocker S (2019).
“SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information.”
Nature Methods, 16(4), 299–302.
doi:10.1038/s41592-019-0344-8.
Duhrkop K, Bocker S (2015).
“Fragmentation Trees Reloaded.”
In Przytycka TM (ed.), Research in Computational Molecular Biology, 65–79.
ISBN 978-3-319-16706-0.
Duhrkop K, Shen H, Meusel M, Rousu J, Bocker S (2015).
“Searching molecular structure databases with tandem mass spectra using CSI:FingerID.”
Proceedings of the National Academy of Sciences, 112(41), 12580–12585.
doi:10.1073/pnas.1509788112.
Bocker S, Letzel MC, Liptak Z, Pervukhin A (2008).
“SIRIUS: decomposing isotope patterns for metabolite identification.”
Bioinformatics, 25(2), 218–224.
doi:10.1093/bioinformatics/btn603.
generateCompounds
for more details and other algorithms.
Automatically calculate chemical formulae for all feature groups.
generateFormulas(fGroups, MSPeakLists, algorithm, ...) ## S4 method for signature 'featureGroups' generateFormulas(fGroups, MSPeakLists, algorithm, ...)
generateFormulas(fGroups, MSPeakLists, algorithm, ...) ## S4 method for signature 'featureGroups' generateFormulas(fGroups, MSPeakLists, algorithm, ...)
fGroups |
|
MSPeakLists |
An |
algorithm |
A character string describing the algorithm that should be
used: |
... |
Any parameters to be passed to the selected formula generation algorithm. |
Several algorithms are provided to automatically generate formulae for given feature groups. All algorithms use the accurate mass of a feature to back-calculate candidate formulae. Depending on the algorithm and data availability, other data such as isotopic pattern and MS/MS fragments may be used to further improve formula assignment and ranking.
generateFormulas
is a generic function that will generateFormulas by one of the supported algorithms. The actual
functionality is provided by algorithm specific functions such as generateFormulasDA
and generateFormulasGenForm
. While these
functions may be called directly, generateFormulas
provides a generic interface and is therefore usually preferred.
A formulas
object containing all generated formulae.
Formula candidate assignment occurs in one of the following ways:
Candidates are first generated for each feature and then pooled to form consensus candidates for the feature group.
Candidates are directly generated for each feature group by group averaged MS peak list data.
With approach (1), scorings and mass errors are averaged and outliers are removed (controlled by
featThreshold
and featThresholdAnn
arguments). Other candidate properties that cannot be averaged are
from the feature from the analysis as specified in the "analysis"
column of the results. The second approach only generates candidate formulae once for every feature group, and is therefore generally much
faster. However, this inherently prevents removal of outliers.
Note that with either approach subsequent workflow steps that use formula data (e.g.
addFormulaScoring
and reporting functions) only use formula data that was eventually assigned
to feature groups.
Each algorithm implements their own scoring system. Their names have been harmonized where
possible. An overview is obtained with the formulaScorings
function:
name | genform | sirius | bruker | description |
combMatch | comb_match | - | - | MS and MS/MS combined match value |
isoScore | MS_match | isoScore | - | How well the isotopic pattern matches |
mSigma | - | - | mSigma | Deviation of the isotopic pattern |
MSMSScore | MSMS_match | treeScore | - | How well MS/MS data matches |
score | - | score | Score | Overall MS formula score |
With a sets workflow, annotation is first performed for each set. This is important, since the annotation algorithms typically cannot work with data from mixed ionization modes. The annotation results are then combined to generate a sets consensus:
The annotation tables for each feature group from the set specific data are combined. Rows with overlapping candidates (determined by the neutral formula) are merged.
Set specific data (e.g. the ionic formula) is retained by renaming their columns with set specific names.
The MS/MS fragment annotations (fragInfo
column) from each set are combined.
The scorings for each set are averaged to calculate overall scores. if setAvgSpecificScores=FALSE
then
scorings that are considered set specific (e.g. MS/MS and isotopic pattern match) are not averaged.
The candidates are re-ranked based on their average ranking among the set data (if a candidate is absent in a set it is assigned the poorest rank in that set).
The coverage of each candidate among sets is calculated. Depending on the setThreshold
and
setThresholdAnn
arguments, candidates with low abundance are removed.
The formulas
output class and its methods and the algorithm specific functions:
generateFormulasDA
, generateFormulasGenForm
, generateFormulasSIRIUS
The GenForm manual (also known as MOLGEN-MSMS).
Uses Bruker DataAnalysis to generate chemical formulae.
generateFormulasDA(fGroups, ...) ## S4 method for signature 'featureGroups' generateFormulasDA( fGroups, MSPeakLists, precursorMzSearchWindow = 0.002, MSMode = "both", adduct = NULL, featThreshold = 0, featThresholdAnn = 0.75, absAlignMzDev = 0.002, save = TRUE, close = save ) ## S4 method for signature 'featureGroupsSet' generateFormulasDA( fGroups, MSPeakLists, precursorMzSearchWindow = 0.002, MSMode = "both", adduct = NULL, ..., setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE )
generateFormulasDA(fGroups, ...) ## S4 method for signature 'featureGroups' generateFormulasDA( fGroups, MSPeakLists, precursorMzSearchWindow = 0.002, MSMode = "both", adduct = NULL, featThreshold = 0, featThresholdAnn = 0.75, absAlignMzDev = 0.002, save = TRUE, close = save ) ## S4 method for signature 'featureGroupsSet' generateFormulasDA( fGroups, MSPeakLists, precursorMzSearchWindow = 0.002, MSMode = "both", adduct = NULL, ..., setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE )
fGroups |
|
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
MSPeakLists |
An |
precursorMzSearchWindow |
Search window for m/z values (+/- the feature m/z) used to find back
feature data of precursor/parent ions from MS/MS spectra (this data is not readily available from
|
MSMode |
Whether formulae should be generated only from MS data ( |
adduct |
An (sets workflow) The |
featThreshold |
If |
featThresholdAnn |
As |
absAlignMzDev |
When the group formula annotation consensus is made from feature annotations, the m/z
values of annotated MS/MS fragments may slightly deviate from those of the corresponding group MS/MS peak list. The
|
close , save
|
If |
setThreshold |
(sets workflow) Minimum abundance for a candidate among all sets (‘0-1’). For instance, a value of ‘1’ means that the candidate needs to be present in all the set data. |
setThresholdAnn |
(sets workflow) As |
setAvgSpecificScores |
(sets workflow) If |
This function uses bruker to generate formula candidates. This function is called when calling generateFormulas
with
algorithm="bruker"
.
This method supports scoring based on overlap between measured and theoretical isotopic patterns (both MS
and MS/MS data) and the presence of 'fitting' MS/MS fragments. The method will iterate through all features (or
"Compounds" in DataAnalysis terms) and call SmartFormula
(and SmartFormula3D
if MS/MS data is
available) to generate all formulae. Parameters affecting formula calculation have to be set in advance within the
DataAnalysis method for each analysis (e.g. by setDAMethod
).
This method requires that features were obtained with findFeaturesBruker
. It is recommended, but not
mandatory, that the MSPeakLists
are also generated by DataAnalysis.
Calculation of formulae with DataAnalysis always occurs with the 'feature approach' (see Candidate
assignment
in generateFormulas
).
A formulas
object containing all generated formulae.
If any errors related to DCOM
appear it might be necessary to
terminate DataAnalysis (note that DataAnalysis might still be running as a
background process). The ProcessCleaner
application installed
with DataAnalayis can be used for this.
generateFormulas
for more details and other algorithms.
Uses GenForm to generate chemical formula candidates.
generateFormulasGenForm(fGroups, ...) ## S4 method for signature 'featureGroups' generateFormulasGenForm( fGroups, MSPeakLists, relMzDev = 5, adduct = NULL, elements = "CHNOP", hetero = TRUE, oc = FALSE, thrMS = NULL, thrMSMS = NULL, thrComb = NULL, maxCandidates = Inf, extraOpts = NULL, calculateFeatures = TRUE, featThreshold = 0, featThresholdAnn = 0.75, absAlignMzDev = 0.002, MSMode = "both", isolatePrec = TRUE, timeout = 120, topMost = 50, batchSize = 8 ) ## S4 method for signature 'featureGroupsSet' generateFormulasGenForm( fGroups, MSPeakLists, relMzDev = 5, adduct = NULL, ..., setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE )
generateFormulasGenForm(fGroups, ...) ## S4 method for signature 'featureGroups' generateFormulasGenForm( fGroups, MSPeakLists, relMzDev = 5, adduct = NULL, elements = "CHNOP", hetero = TRUE, oc = FALSE, thrMS = NULL, thrMSMS = NULL, thrComb = NULL, maxCandidates = Inf, extraOpts = NULL, calculateFeatures = TRUE, featThreshold = 0, featThresholdAnn = 0.75, absAlignMzDev = 0.002, MSMode = "both", isolatePrec = TRUE, timeout = 120, topMost = 50, batchSize = 8 ) ## S4 method for signature 'featureGroupsSet' generateFormulasGenForm( fGroups, MSPeakLists, relMzDev = 5, adduct = NULL, ..., setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE )
fGroups |
|
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
MSPeakLists |
An |
relMzDev |
Maximum relative deviation between the measured and candidate formula m/z values (in ppm). Sets the ppm command line option. |
adduct |
An (sets workflow) The |
elements |
Elements to be considered for formulae calculation. This will heavily affects the number of candidates! Always try to work with a minimal set by excluding elements you don't expect. Sets the el command line option. |
hetero |
Only consider formulae with at least one hetero atom. Sets the het commandline option. |
oc |
Only consider organic formulae (i.e. with at least one carbon atom). Sets the oc commandline option. |
thrMS , thrMSMS , thrComb
|
Sets the thresholds for the |
maxCandidates |
If this number of candidates are found then |
extraOpts |
An optional character vector with any other command line options that will be passed to
|
calculateFeatures |
If |
featThreshold |
If |
featThresholdAnn |
As |
absAlignMzDev |
When the group formula annotation consensus is made from feature annotations, the m/z
values of annotated MS/MS fragments may slightly deviate from those of the corresponding group MS/MS peak list. The
|
MSMode |
Whether formulae should be generated only from MS data ( |
isolatePrec |
Settings used for isolation of precursor mass peaks and their isotopes. This isolation is highly
important for accurate isotope scoring of candidates, as non-relevant mass peaks will dramatically decrease the
score. The value of |
timeout |
Maximum time (in seconds) that a |
topMost |
Only keep this number of candidates (per feature group) with highest score. |
batchSize |
Maximum number of |
setThreshold |
(sets workflow) Minimum abundance for a candidate among all sets (‘0-1’). For instance, a value of ‘1’ means that the candidate needs to be present in all the set data. |
setThresholdAnn |
(sets workflow) As |
setAvgSpecificScores |
(sets workflow) If |
This function uses genform to generate formula candidates. This function is called when calling generateFormulas
with
algorithm="genform"
.
When MS/MS data is available it will be used to score candidate formulae by presence of 'fitting' fragments.
A formulas
object containing all generated formulae.
Below is a list of options (generated by running GenForm
without commandline
options) which can be set by the extraOpts
parameter.
Formula calculation from MS and MS/MS data as described in Meringer et al (2011) MATCH Commun Math Comput Chem 65: 259-290 Usage: GenForm ms=<filename> [msms=<filename>] [out=<filename>] [exist[=mv]] [m=<number>] [ion=-e|+e|-H|+H|+Na] [cha=<number>] [ppm=<number>] [msmv=ndp|nsse|nsae] [acc=<number>] [rej=<number>] [thms=<number>] [thmsms=<number>] [thcomb=<number>] [sort[=ppm|msmv|msmsmv|combmv]] [el=<elements> [oc]] [ff=<fuzzy formula>] [vsp[=<even|odd>]] [vsm2mv[=<value>]] [vsm2ap2[=<value>]] [hcf] [kfer[=ex]] [wm[=lin|sqrt|log]] [wi[=lin|sqrt|log]] [exp=<number>] [oei] [dbeexc=<number>] [ivsm2mv=<number>] [vsm2ap2=<number>] [oms[=<filename>]] [omsms[=<filename>]] [oclean[=<filename>]] [analyze [loss] [intens]] [dbe] [cm] [pc] [sc] [max] Explanation: ms : filename of MS data (*.txt) msms : filename of MS/MS data (*.txt) out : output generated formulas exist : allow only molecular formulas for that at least one structural formula exists;overrides vsp, vsm2mv, vsm2ap2; argument mv enables multiple valencies for P and S m : experimental molecular mass (default: mass of MS basepeak) ion : type of ion measured (default: M+H) ppm : accuracy of measurement in parts per million (default: 5) msmv : MS match value based on normalized dot product, normalized sum of squared or absolute errors (default: nsae) acc : allowed deviation for full acceptance of MS/MS peak in ppm (default: 2) rej : allowed deviation for total rejection of MS/MS peak in ppm (default: 4) thms : threshold for the MS match value thmsms : threshold for the MS/MS match value thcomb : threshold for the combined match value sort : sort generated formulas according to mass deviation in ppm, MS match value, MS/MS match value or combined match value el : used chemical elements (default: CHBrClFINOPSSi) oc : only organic compounds, i.e. with at least one C atom ff : overwrites el and oc and uses fuzzy formula for limits of element multiplicities het : formulas must have at least one hetero atom vsp : valency sum parity (even for graphical formulas) vsm2mv : lower bound for valency sum - 2 * maximum valency (>=0 for graphical formulas) vsm2ap2 : lower bound for valency sum - 2 * number of atoms + 2 (>=0 for graphical connected formulas) hcf : apply Heuerding-Clerc filter kfer : apply Kind-Fiehn element ratio (extended) ranges wm : m/z weighting for MS/MS match value wi : intensity weighting for MS/MS match value exp : exponent used, when wi is set to log oei : allow odd electron ions for explaining MS/MS peaks dbeexc : excess of double bond equivalent for ions ivsm2mv : lower bound for valency sum - 2 * maximum valency for fragment ions ivsm2ap2: lower bound for valency sum - 2 * number of atoms + 2 for fragment ions oms : write scaled MS peaks to output omsms : write weighted MS/MS peaks to output oclean : write explained MS/MS peaks to output analyze : write explanations for MS/MS peaks to output loss : for analyzing MS/MS peaks write losses instead of fragments intens : write intensities of MS/MS peaks to output dbe : write double bond equivalents to output cm : write calculated ion masses to output pc : output match values in percent sc : strip calculated isotope distributions noref : hide the reference information max : maximum number of final candidates (0 is no limit)
generateFormulasGenForm uses multiprocessing to parallelize computations. Please see the parallelization section in the handbook for more details and patRoon options for configuration options.
When futures
are used for parallel processing (patRoon.MP.method="future"
),
calculations with GenForm
are done with batch mode disabled (see batchSize
argument), which
generally limit overall performance.
This function always sets the exist and oei GenForm
command line options.
Formula calculation with GenForm
may produce an excessive number of candidates for high m/z values
(e.g. above 600) and/or many elemental combinations (set by elements
). In this scenario formula
calculation may need a very long time. Timeouts are used to avoid excessive computational times by terminating long
running commands (set by the timeout
argument).
Meringer M, Reinker S, Zhang J, Muller A (2011). “MS/MS Data Improves Automated Determination of Molecular Formulas by Mass Spectrometry.” MATCH Commun. Math. Comput. Chem., 65(2), 259–290.
generateFormulas
for more details and other algorithms.
Uses SIRIUS to generate chemical formulae candidates.
generateFormulasSIRIUS(fGroups, ...) ## S4 method for signature 'featureGroups' generateFormulasSIRIUS( fGroups, MSPeakLists, relMzDev = 5, adduct = NULL, projectPath = NULL, elements = "CHNOP", profile = "qtof", database = NULL, noise = NULL, cores = NULL, getFingerprints = FALSE, topMost = 100, login = FALSE, alwaysLogin = FALSE, extraOptsGeneral = NULL, extraOptsFormula = NULL, calculateFeatures = TRUE, featThreshold = 0, featThresholdAnn = 0.75, absAlignMzDev = 0.002, verbose = TRUE, splitBatches = FALSE, dryRun = FALSE ) ## S4 method for signature 'featureGroupsSet' generateFormulasSIRIUS( fGroups, MSPeakLists, relMzDev = 5, adduct = NULL, projectPath = NULL, ..., setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE )
generateFormulasSIRIUS(fGroups, ...) ## S4 method for signature 'featureGroups' generateFormulasSIRIUS( fGroups, MSPeakLists, relMzDev = 5, adduct = NULL, projectPath = NULL, elements = "CHNOP", profile = "qtof", database = NULL, noise = NULL, cores = NULL, getFingerprints = FALSE, topMost = 100, login = FALSE, alwaysLogin = FALSE, extraOptsGeneral = NULL, extraOptsFormula = NULL, calculateFeatures = TRUE, featThreshold = 0, featThresholdAnn = 0.75, absAlignMzDev = 0.002, verbose = TRUE, splitBatches = FALSE, dryRun = FALSE ) ## S4 method for signature 'featureGroupsSet' generateFormulasSIRIUS( fGroups, MSPeakLists, relMzDev = 5, adduct = NULL, projectPath = NULL, ..., setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE )
fGroups |
|
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
MSPeakLists |
An |
relMzDev |
Maximum relative deviation between the measured and candidate formula m/z values (in ppm). Sets the --ppm-max command line option. |
adduct |
An (sets workflow) The |
projectPath , dryRun
|
These are mainly for internal purposes. (sets workflow) |
elements |
Elements to be considered for formulae calculation. This will heavily affects the number of
candidates! Always try to work with a minimal set by excluding elements you don't expect. The minimum/maximum
number of elements can also be specified, for example: a value of |
profile |
Name of the configuration profile, for example: "qtof", "orbitrap", "fticr". Sets the --profile commandline option. |
database |
If not |
noise |
Median intensity of the noise ( |
cores |
The number of cores |
getFingerprints |
Set to |
topMost |
Only keep this number of candidates (per feature group) with highest score. Sets the --candidates command line option. |
login , alwaysLogin
|
Specifies if and how account logging of SIRIUS should be handled:
if See the SIRIUS website and patRoon handbook for more information. |
extraOptsGeneral , extraOptsFormula
|
a |
calculateFeatures |
If |
featThreshold |
If |
featThresholdAnn |
As |
absAlignMzDev |
When the group formula annotation consensus is made from feature annotations, the m/z
values of annotated MS/MS fragments may slightly deviate from those of the corresponding group MS/MS peak list. The
|
verbose |
If |
splitBatches |
If |
setThreshold |
(sets workflow) Minimum abundance for a candidate among all sets (‘0-1’). For instance, a value of ‘1’ means that the candidate needs to be present in all the set data. |
setThresholdAnn |
(sets workflow) As |
setAvgSpecificScores |
(sets workflow) If |
This function uses sirius to generate formula candidates. This function is called when calling generateFormulas
with
algorithm="sirius"
.
Similarity of measured and theoretical isotopic patterns will be used for scoring candidates. Note that
SIRIUS
requires availability of MS/MS data.
A formulasSIRIUS
object.
generateFormulasSIRIUS uses multiprocessing to parallelize computations. Please see the parallelization section in the handbook for more details and patRoon options for configuration options.
For annotations performed with SIRIUS
it is often the fastest to keep the default
splitBatches=FALSE
. In this case, all SIRIUS
output will be printed to the terminal (unless
verbose=FALSE
or patRoon.MP.method="future"). Furthermore, please note that only annotations to be
performed for the same adduct are grouped in a single batch execution.
Duhrkop K, Fleischauer M, Ludwig M, Aksenov AA, Melnik AV, Meusel M, Dorrestein PC, Rousu J, Bocker S (2019).
“SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information.”
Nature Methods, 16(4), 299–302.
doi:10.1038/s41592-019-0344-8.
Duhrkop K, Bocker S (2015).
“Fragmentation Trees Reloaded.”
In Przytycka TM (ed.), Research in Computational Molecular Biology, 65–79.
ISBN 978-3-319-16706-0.
Duhrkop K, Shen H, Meusel M, Rousu J, Bocker S (2015).
“Searching molecular structure databases with tandem mass spectra using CSI:FingerID.”
Proceedings of the National Academy of Sciences, 112(41), 12580–12585.
doi:10.1073/pnas.1509788112.
Bocker S, Letzel MC, Liptak Z, Pervukhin A (2008).
“SIRIUS: decomposing isotope patterns for metabolite identification.”
Bioinformatics, 25(2), 218–224.
doi:10.1093/bioinformatics/btn603.
generateFormulas
for more details and other algorithms.
Functionality to convert MS and MS/MS data into MS peak lists.
generateMSPeakLists(fGroups, algorithm, ...) ## S4 method for signature 'featureGroups' generateMSPeakLists(fGroups, algorithm, ...)
generateMSPeakLists(fGroups, algorithm, ...) ## S4 method for signature 'featureGroups' generateMSPeakLists(fGroups, algorithm, ...)
fGroups |
The |
algorithm |
A character string describing the algorithm that should be
used: |
... |
Any parameters to be passed to the selected MS peak lists generation algorithm. |
Formula calculation and identification tools rely on mass spectra that belong to features of interest. For processing, MS (and MS/MS) spectra are typically reduced to a table with a column containing measured m/z values and a column containing their intensities. These 'MS peak lists' can then be used for formula generation and compound generation.
MS and MS/MS peak lists are first generated for all features (or a subset, if the topMost
argument is set).
During this step multiple spectra over the feature elution profile are averaged. Subsequently, peak lists will be
generated for each feature group by averaging peak lists of the features within the group. Functionality that uses
peak lists will either use data from individual features or from group averaged peak lists. For instance, the former
may be used by formulae calculation, while compound identification and plotting functionality typically uses group
averaged peak lists.
generateMSPeakLists
is a generic function that will generateMSPeakLists by one of the supported algorithms. The actual
functionality is provided by algorithm specific functions such as generateMSPeakListsMzR
and generateMSPeakListsDA
. While these
functions may be called directly, generateMSPeakLists
provides a generic interface and is therefore usually preferred.
A MSPeakLists
object.
With a sets workflow, the feature group averaged peak lists are made per set. This is important, because for averaging peak lists cannot be mixed, for instance, when different ionization modes were used to generate the sets. The group averaged peaklists are then simply combined and labelled in the final peak lists. However, please note that annotation and other functionality typically uses only the set specific peak lists, as this functionality cannot work with mixed peak lists.
In most cases it will be necessary to centroid your MS input files. The only exception is Bruker
,
however, you will still need centroided ‘mzXML’/‘mzML’ files for e.g. plotting chromatograms. In
this case the centroided MS files should be stored in the same directory as the raw Bruker
‘.d’
files. The convertMSFiles
function can be used to centroid data.
The MSPeakLists
output class and its methods and the algorithm specific functions:
generateMSPeakListsDA
, generateMSPeakListsDAFMF
, generateMSPeakListsMzR
Uses Bruker DataAnalysis to read the data needed to generate MS peak lists.
generateMSPeakListsDA(fGroups, ...) ## S4 method for signature 'featureGroups' generateMSPeakListsDA( fGroups, bgsubtr = TRUE, maxMSRtWindow = 5, minMSIntensity = 500, minMSMSIntensity = 500, clear = TRUE, close = TRUE, save = close, MSMSType = "MSMS", avgFGroupParams = getDefAvgPListParams() ) ## S4 method for signature 'featureGroupsSet' generateMSPeakListsDA(fGroups, ...)
generateMSPeakListsDA(fGroups, ...) ## S4 method for signature 'featureGroups' generateMSPeakListsDA( fGroups, bgsubtr = TRUE, maxMSRtWindow = 5, minMSIntensity = 500, minMSMSIntensity = 500, clear = TRUE, close = TRUE, save = close, MSMSType = "MSMS", avgFGroupParams = getDefAvgPListParams() ) ## S4 method for signature 'featureGroupsSet' generateMSPeakListsDA(fGroups, ...)
fGroups |
The |
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
bgsubtr |
If |
maxMSRtWindow |
Maximum chromatographic peak window used for spectrum averaging (in seconds, +/- retention
time). If |
minMSIntensity , minMSMSIntensity
|
Minimum intensity for peak lists obtained with DataAnalysis. Highly recommended to set ‘>0’ as DA tends to report many very low intensity peaks. |
clear |
Remove any existing chromatogram traces/mass spectra prior to making new ones. |
close , save
|
If |
MSMSType |
The type of MS/MS experiment performed: |
avgFGroupParams |
A |
This function uses Bruker DataAnalysis to generate MS peak lists. This function is called when calling generateMSPeakLists
with
algorithm="bruker"
.
The MS data should be in the Bruker data format (‘.d’). This function leverages DataAnalysis functionality to support averaging of spectra, background subtraction and identification of isotopes. In order to obtain mass spectra TICs will be added in DataAnalysis of the MS and relevant MS/MS signals.
A MSPeakLists
object.
The Component column should be active (Method–>Parameters–>Layouts–>Mass List Layout) in order to add isotopologue information.
If any errors related to DCOM
appear it might be necessary to
terminate DataAnalysis (note that DataAnalysis might still be running as a
background process). The ProcessCleaner
application installed
with DataAnalayis can be used for this.
generateMSPeakLists
for more details and other algorithms.
Uses 'compounds' that were generated by the Find Molecular Features (FMF) algorithm of Bruker DataAnalysis to extract MS peak lists.
generateMSPeakListsDAFMF(fGroups, ...) ## S4 method for signature 'featureGroups' generateMSPeakListsDAFMF( fGroups, minMSIntensity = 500, minMSMSIntensity = 500, close = TRUE, save = close, avgFGroupParams = getDefAvgPListParams() ) ## S4 method for signature 'featureGroupsSet' generateMSPeakListsDAFMF(fGroups, ...)
generateMSPeakListsDAFMF(fGroups, ...) ## S4 method for signature 'featureGroups' generateMSPeakListsDAFMF( fGroups, minMSIntensity = 500, minMSMSIntensity = 500, close = TRUE, save = close, avgFGroupParams = getDefAvgPListParams() ) ## S4 method for signature 'featureGroupsSet' generateMSPeakListsDAFMF(fGroups, ...)
fGroups |
The |
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
minMSIntensity , minMSMSIntensity
|
Minimum intensity for peak lists obtained with DataAnalysis. Highly recommended to set ‘>0’ as DA tends to report many very low intensity peaks. |
close , save
|
If |
avgFGroupParams |
A |
This function uses Bruker DataAnalysis with FMF to generate MS peak lists. This function is called when calling generateMSPeakLists
with
algorithm="brukerfmf"
.
This function is similar to generateMSPeakListsDA
, but uses 'compounds' that were generated by
the Find Molecular Features (FMF) algorithm to extract MS peak lists. This is generally much faster , however, it
only works when features were obtained with the findFeaturesBruker
function. Since all MS spectra
are generated in advance by Bruker DataAnalysis, only few parameters exist to customize its operation.
A MSPeakLists
object.
If any errors related to DCOM
appear it might be necessary to
terminate DataAnalysis (note that DataAnalysis might still be running as a
background process). The ProcessCleaner
application installed
with DataAnalayis can be used for this.
generateMSPeakLists
for more details and other algorithms.
Uses the mzR package to read the MS data needed for MS peak lists.
generateMSPeakListsMzR(fGroups, ...) ## S4 method for signature 'featureGroups' generateMSPeakListsMzR( fGroups, maxMSRtWindow = 5, precursorMzWindow = 4, topMost = NULL, avgFeatParams = getDefAvgPListParams(), avgFGroupParams = getDefAvgPListParams() ) ## S4 method for signature 'featureGroupsSet' generateMSPeakListsMzR(fGroups, ...)
generateMSPeakListsMzR(fGroups, ...) ## S4 method for signature 'featureGroups' generateMSPeakListsMzR( fGroups, maxMSRtWindow = 5, precursorMzWindow = 4, topMost = NULL, avgFeatParams = getDefAvgPListParams(), avgFGroupParams = getDefAvgPListParams() ) ## S4 method for signature 'featureGroupsSet' generateMSPeakListsMzR(fGroups, ...)
fGroups |
The |
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
maxMSRtWindow |
Maximum chromatographic peak window used for spectrum averaging (in seconds, +/- retention
time). If |
precursorMzWindow |
The m/z window (in Da) to find MS/MS spectra of a precursor. This is typically used
for Data-Dependent like MS/MS data and should correspond to the isolation m/z window (i.e. +/- the
precursor m/z) that was used to collect the data. For Data-Independent MS/MS experiments, where precursor
ions are not isolated prior to fragmentation (e.g. bbCID, MSe, all-ion, ...) the value should be
|
topMost |
Only extract MS peak lists from a maximum of |
avgFeatParams |
Parameters used for averaging MS peak lists of individual features. Analogous to
|
avgFGroupParams |
A |
This function uses mzR to generate MS peak lists. This function is called when calling generateMSPeakLists
with
algorithm="mzr"
.
The MS data files should be either in ‘.mzXML’ or ‘.mzML’ format.
The input MS data files need to be centroided. The convertMSFiles
function can be used to
centroid data.
A MSPeakLists
object.
A cross-platform toolkit for mass spectrometry and proteomics Chambers, Matthew C. and Maclean, Brendan and Burke, Robert and Amodei,
Dario and Ruderman, Daniel L. and Neumann, Steffen and Gatto, Laurent and
Fischer, Bernd and Pratt, Brian and Egertson, Jarrett and Hoff, Katherine
and Kessner, Darren and Tasman, Natalie and Shulman, Nicholas and Frewen,
Barbara and Baker, Tahmina A. and Brusniak, Mi-Youn and Paulse, Christopher
and Creasy, David and Flashner, Lisa and Kani, Kian and Moulding, Chris and
Seymour, Sean L. and Nuwaysir, Lydia M. and Lefebvre, Brent and Kuhlmann,
Frank and Roark, Joe and Rainer, Paape and Detlev, Suckau and Hemenway,
Tina and Huhmer, Andreas and Langridge, James and Connolly, Brian and Chadick,
Trey and Holly, Krisztina and Eckels, Josh and Deutsch, Eric W. and Moritz,
Robert L. and Katz, Jonathan E. and Agus, David B. and MacCoss, Michael and
Tabb, David L. and Mallick, Parag Nat Biotechnol. 2012 NOct;30(10):918-920.
Mol Cell Proteomics. 2010 Aug 17. mzML - a Community Standard for Mass Spectrometry Data. Martens L, Chambers M, Sturm M, Kessner D, Levander F, Shofstahl J, Tang WH, Rompp A, Neumann S, Pizarro AD, Montecchi-Palazzi L, Tasman N, Coleman M, Reisinger F, Souda P, Hermjakob H, Binz PA, Deutsch EW.
Nat Biotechnol. 2004 Nov;22(11):1459-66. A common open representation of mass spectrometry data and its application to proteomics research. Pedrioli PG, Eng JK, Hubley R, Vogelzang M, Deutsch EW, Raught B, Pratt B, Nilsson E, Angeletti RH, Apweiler R, Cheung K, Costello CE, Hermjakob H, Huang S, Julian RK, Kapp E, McComb ME, Oliver SG, Omenn G, Paton NW, Simpson R, Smith R, Taylor CF, Zhu W, Aebersold R.
Mol Syst Biol. 2005;1:2005.0017. Epub 2005 Aug 2. A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Keller A, Eng J, Zhang N, Li XJ, Aebersold R.
Bioinformatics. 2008 Nov 1;24(21):2534-6. Epub 2008 Jul 7. ProteoWizard: open source software for rapid proteomics tools development. Kessner D, Chambers M, Burke R, Agus D, Mallick P.
generateMSPeakLists
for more details and other algorithms.
Functionality to automatically obtain transformation products for a given set of parent compounds.
generateTPs(algorithm, ...)
generateTPs(algorithm, ...)
algorithm |
A character string describing the algorithm that should be
used: |
... |
Any parameters to be passed to the selected TP generation algorithm. |
generateTPs
is a generic function that will generate transformation products by one of the supported algorithms. The actual
functionality is provided by algorithm specific functions such as generateTPsBioTransformer
and generateTPsLogic
. While these
functions may be called directly, generateTPs
provides a generic interface and is therefore usually preferred.
A transformationProducts
(derived) object containing all generated TPs.
The transformationProducts
output class and its methods and the algorithm specific functions:
generateTPsBioTransformer
, generateTPsLogic
, generateTPsLibrary
, generateTPsLibraryFormula
, generateTPsCTS
The derived class transformationProductsStructure
for more specific methods to post-process TP
data.
Uses BioTransformer to predict TPs
generateTPsBioTransformer( parents, type = "env", generations = 2, maxExpGenerations = generations + 2, extraOpts = NULL, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, neutralizeTPs = TRUE, calcSims = FALSE, fpType = "extended", fpSimMethod = "tanimoto", MP = FALSE )
generateTPsBioTransformer( parents, type = "env", generations = 2, maxExpGenerations = generations + 2, extraOpts = NULL, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, neutralizeTPs = TRUE, calcSims = FALSE, fpType = "extended", fpSimMethod = "tanimoto", MP = FALSE )
parents |
The parents for which transformation products should be obtained. This can be (1) a suspect list (see
suspect screening for more information), (2) the resulting output of
|
type |
The type of prediction. Valid values are: |
generations |
The number of generations (steps) for the predictions. Sets the |
maxExpGenerations |
The maximum number of generations during hierarchy expansion, see below. |
extraOpts |
A |
skipInvalid |
If set to |
prefCalcChemProps |
If |
neutralChemProps |
If |
neutralizeTPs |
If |
calcSims |
If set to |
fpType |
The type of structural fingerprint that should be calculated. See the |
fpSimMethod |
The method for calculating similarities (i.e. not dissimilarity!). See the |
MP |
If |
This function uses BioTransformer to obtain transformation products. This function is called when calling generateTPs
with
algorithm="biotransformer"
.
In order to use this function the ‘.jar’ command line utility should be installed and specified in the
patRoon.path.BioTransformer
option. The ‘.jar’ file can be obtained via
https://bitbucket.org/djoumbou/biotransformer/src/master. Alternatively, the patRoonExt package can be
installed to automatically install/configure the necessary files.
An important advantage of this algorithm is that it provides structural information for generated TPs. However, this also means that if the input is from a parent suspect list or screening then either SMILES or InChI information must be available for the parents.
The TPs are stored in an object derived from the transformationProductsStructure
class.
BioTransformer
only reports the direct parent for a TP, not
the complete pathway. For instance, consider the following results:
parent –> TP1
parent –> TP2
TP1 –> TP2
TP2 –> TP3
In this case, TP3 may be formed either as:
parent –> TP1 –> TP2 –> TP3
parent –> TP2 –> TP3
For this reason, patRoon simply expands the hierarchy and assumes that all routes are possible. For instance,
Parent
/- -\
/- -\
- -
TP1 TP2
| |
| |
TP2 TP3
|
|
TP3
Note that this may result in pathways with more generations than defined by the generations
argument. Thus,
the maxExpGenerations
argument is used to avoid excessive expansions.
Chemical properties such as SMILES, InChIKey and formula in the parent suspect list are automatically validated and calculated if missing/invalid.
The internal validation/calculation process performs the following steps:
Validation of SMILES, InChI, InChIKey and formula data (if present). Invalid
entries will be set to NA
.
If neutralChemProps=TRUE
then chemical data (SMILES, formulae etc.) is neutralized by
(de-)protonation (using the --neutralized
option of OpenBabel
). An additional column
molNeutralized
is added to mark those molecules that were neutralized. Note that neutralization requires
either SMILES or InChI data to be available.
The SMILES and InChI data are used to calculate missing or invalid SMILES,
InChI, InChIKey and formula data. If prefCalcChemProps=TRUE
then existing
InChIKey and formula data is overwritten by calculated values whenever possible.
The chemical formulae which were not calculated are verified and normalized. This process may be time
consuming, and is potentially largely avoided by setting prefCalcChemProps=TRUE
.
Neutral masses are calculated for missing values (prefCalcChemProps=FALSE
) or whenever possible
(prefCalcChemProps=TRUE
).
Note that calculation of formulae for molecules that are isotopically labelled is currently only supported for deuterium (2H) elements.
This functionality relies heavily on OpenBabel, please make sure it is installed.
generateTPsBioTransformer
uses multiprocessing to parallelize
computations. Please see the parallelization section in the handbook for
more details and patRoon options for configuration
options.
When the parents
argument is a compounds
object, the candidate library identifier
is used in case the candidate has no defined compoundName
.
Guha, R. (2007). 'Chemical Informatics Functionality in R'. Journal of Statistical Software 6(18)
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011). “Open Babel: An open chemical toolbox.” Journal of Cheminformatics, 3(1). doi:10.1186/1758-2946-3-33.
Djoumbou-Feunang Y, Fiamoncini J, Gil-de-la-Fuente A, Greiner R, Manach C, Wishart DS (2019).
“BioTransformer: a comprehensive computational tool for small molecule metabolism prediction and metabolite identification.”
Journal of Cheminformatics, 11(1).
doi:10.1186/s13321-018-0324-5.
Wicker J, Lorsbach T, Gutlein M, Schmid E, Latino D, Kramer S, Fenner K (2015).
“enviPath - The environmental contaminant biotransformation pathway resource.”
Nucleic Acids Research, 44(D1), D502–D508.
doi:10.1093/nar/gkv1229.
generateTPs
for more details and other algorithms.
Uses Chemical Transformation Simulator (CTS) to predict TPs.
generateTPsCTS( parents, transLibrary, generations = 1, errorRetries = 3, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, neutralizeTPs = TRUE, calcLogP = "rcdk", calcSims = FALSE, fpType = "extended", fpSimMethod = "tanimoto", parallel = TRUE )
generateTPsCTS( parents, transLibrary, generations = 1, errorRetries = 3, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, neutralizeTPs = TRUE, calcLogP = "rcdk", calcSims = FALSE, fpType = "extended", fpSimMethod = "tanimoto", parallel = TRUE )
parents |
The parents for which transformation products should be obtained. This can be (1) a suspect list (see
suspect screening for more information), (2) the resulting output of
|
transLibrary |
A |
generations |
An |
errorRetries |
The maximum number of connection retries. Sets the |
skipInvalid |
If set to |
prefCalcChemProps |
If |
neutralChemProps |
If |
neutralizeTPs |
If |
calcLogP |
A |
calcSims |
If set to |
fpType |
The type of structural fingerprint that should be calculated. See the |
fpSimMethod |
The method for calculating similarities (i.e. not dissimilarity!). See the |
parallel |
If set to |
This function uses CTS to obtain transformation products. This function is called when calling generateTPs
with
algorithm="cts"
.
This function uses the httr package to access the Web API of CTS for automatic TP prediction. Hence, an Internet connection is mandatory. Please take care to not 'abuse' the CTS servers, e.g. by running very large batch calculations in parallel, as this may result in rejected connections.
An important advantage of this algorithm is that it provides structural information for generated TPs. However, this also means that if the input is from a parent suspect list or screening then either SMILES or InChI information must be available for the parents.
The TPs are stored in an object derived from the transformationProductsStructure
class.
Chemical properties such as SMILES, InChIKey and formula in the parent suspect list are automatically validated and calculated if missing/invalid.
The internal validation/calculation process performs the following steps:
Validation of SMILES, InChI, InChIKey and formula data (if present). Invalid
entries will be set to NA
.
If neutralChemProps=TRUE
then chemical data (SMILES, formulae etc.) is neutralized by
(de-)protonation (using the --neutralized
option of OpenBabel
). An additional column
molNeutralized
is added to mark those molecules that were neutralized. Note that neutralization requires
either SMILES or InChI data to be available.
The SMILES and InChI data are used to calculate missing or invalid SMILES,
InChI, InChIKey and formula data. If prefCalcChemProps=TRUE
then existing
InChIKey and formula data is overwritten by calculated values whenever possible.
The chemical formulae which were not calculated are verified and normalized. This process may be time
consuming, and is potentially largely avoided by setting prefCalcChemProps=TRUE
.
Neutral masses are calculated for missing values (prefCalcChemProps=FALSE
) or whenever possible
(prefCalcChemProps=TRUE
).
Note that calculation of formulae for molecules that are isotopically labelled is currently only supported for deuterium (2H) elements.
This functionality relies heavily on OpenBabel, please make sure it is installed.
When the parents
argument is a compounds
object, the candidate library identifier
is used in case the candidate has no defined compoundName
.
Guha, R. (2007). 'Chemical Informatics Functionality in R'. Journal of Statistical Software 6(18)
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011). “Open Babel: An open chemical toolbox.” Journal of Cheminformatics, 3(1). doi:10.1186/1758-2946-3-33.
Wolfe K, Pope N, Parmar R, Galvin M, Stevens C, Weber E, Flaishans J, Purucker T (2016).
“Chemical transformation system: Cloud based cheminformatic services to support integrated environmental modeling.”
Proceedings of the 8th International Congress on Environmental Modelling and Software.
Tebes-Stevens C, Patel JM, Jones WJ, Weber EJ (2017).
“Prediction of Hydrolysis Products of Organic Chemicals under Environmental pH Conditions.”
Environmental Science & Technology, 51(9), 5008–5016.
doi:10.1021/acs.est.6b05412.
Yuan C, Tebes-Stevens C, Weber EJ (2020).
“Reaction Library to Predict Direct Photochemical Transformation Products of Environmental Organic Contaminants in Sunlit Aquatic Systems.”
Environmental Science & Technology, 54(12), 7271–7279.
doi:10.1021/acs.est.0c00484.
Yuan C, Tebes-Stevens C, Weber EJ (2021).
“Prioritizing Direct Photolysis Products Predicted by the Chemical Transformation Simulator: Relative Reasoning and Absolute Ranking.”
Environmental Science & Technology, 55(9), 5950-5958.
doi:10.1021/acs.est.0c08745, PMID: 33881833, https://doi.org/10.1021/acs.est.0c08745.
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011).
“Open Babel: An open chemical toolbox.”
Journal of Cheminformatics, 3(1).
doi:10.1186/1758-2946-3-33.
generateTPs
for more details and other algorithms.
The website: https://qed.epa.gov/cts/ and the CTS User guide.
Automatically obtains transformation products from a library.
generateTPsLibrary( parents = NULL, TPLibrary = NULL, generations = 1, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, neutralizeTPs = FALSE, matchParentsBy = "InChIKey", matchGenerationsBy = "InChIKey", calcSims = FALSE, fpType = "extended", fpSimMethod = "tanimoto" )
generateTPsLibrary( parents = NULL, TPLibrary = NULL, generations = 1, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, neutralizeTPs = FALSE, matchParentsBy = "InChIKey", matchGenerationsBy = "InChIKey", calcSims = FALSE, fpType = "extended", fpSimMethod = "tanimoto" )
parents |
The parents for which transformation products should be obtained. This can be (1) a suspect list (see
suspect screening for more information), (2) the resulting output of
|
TPLibrary |
If |
generations |
An |
skipInvalid |
If set to |
prefCalcChemProps |
If |
neutralChemProps |
If |
neutralizeTPs |
If |
matchParentsBy |
A |
matchGenerationsBy |
Similar to |
calcSims |
If set to |
fpType |
The type of structural fingerprint that should be calculated. See the |
fpSimMethod |
The method for calculating similarities (i.e. not dissimilarity!). See the |
This function uses a library to obtain transformation products. This function is called when calling generateTPs
with
algorithm="library"
.
By default, a library is used that is based on data from PubChem. However, it also possible to use your own library.
An important advantage of this algorithm is that it provides structural information for generated TPs. However, this also means that if the input is from a parent suspect list or screening then either SMILES or InChI information must be available for the parents.
The TPs are stored in an object derived from the transformationProductsStructure
class.
The TPLibrary
argument is used to specify a custom TP library. This should be a
data.frame
where each row specifies a TP for a parent, with the following columns:
parent_name
and TP_name
: The name of the parent/TP.
parent_SMILES
and TP_SMILES
The SMILES of the parent/TP structure.
retDir
The retention direction of the TP compared to its parent: ‘-1’ (elutes before), ‘1’
(elutes after) or ‘0’ (elutes similarly or unknown). If not specified then the log P
values below may
be used to calculate retention time directions. (optional)
parent_LogP
and TP_LogP
The log P
values for the parent/TP. (optional)
LogPDiff
The difference between parent and TP Log P
values. Ignored if both
parent_LogP
and TP_LogP
are specified. (optional)
Other columns are allowed, and will be included in the final object. Multiple TPs for a single parent are specified
by repeating the value within parent_
columns.
Chemical properties such as SMILES, InChIKey and formula in the parent suspect list are automatically validated and calculated if missing/invalid.
The internal validation/calculation process performs the following steps:
Validation of SMILES, InChI, InChIKey and formula data (if present). Invalid
entries will be set to NA
.
If neutralChemProps=TRUE
then chemical data (SMILES, formulae etc.) is neutralized by
(de-)protonation (using the --neutralized
option of OpenBabel
). An additional column
molNeutralized
is added to mark those molecules that were neutralized. Note that neutralization requires
either SMILES or InChI data to be available.
The SMILES and InChI data are used to calculate missing or invalid SMILES,
InChI, InChIKey and formula data. If prefCalcChemProps=TRUE
then existing
InChIKey and formula data is overwritten by calculated values whenever possible.
The chemical formulae which were not calculated are verified and normalized. This process may be time
consuming, and is potentially largely avoided by setting prefCalcChemProps=TRUE
.
Neutral masses are calculated for missing values (prefCalcChemProps=FALSE
) or whenever possible
(prefCalcChemProps=TRUE
).
Note that calculation of formulae for molecules that are isotopically labelled is currently only supported for deuterium (2H) elements.
This functionality relies heavily on OpenBabel, please make sure it is installed.
When the parents
argument is a compounds
object, the candidate library identifier
is used in case the candidate has no defined compoundName
.
Guha, R. (2007). 'Chemical Informatics Functionality in R'. Journal of Statistical Software 6(18)
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011). “Open Babel: An open chemical toolbox.” Journal of Cheminformatics, 3(1). doi:10.1186/1758-2946-3-33.
generateTPs
for more details and other algorithms.
Automatically obtains transformation products from a library with formula data.
generateTPsLibraryFormula( parents = NULL, TPLibrary, generations = 1, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, matchParentsBy = "name", matchGenerationsBy = "name" )
generateTPsLibraryFormula( parents = NULL, TPLibrary, generations = 1, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, matchParentsBy = "name", matchGenerationsBy = "name" )
parents |
The parents for which transformation products should be obtained. This should be either a suspect list
(see suspect screening for more information) or the resulting output of
|
TPLibrary |
A |
generations |
An |
skipInvalid |
If set to |
prefCalcChemProps |
If |
neutralChemProps |
If |
matchParentsBy |
A |
matchGenerationsBy |
Similar to |
This function uses a library to obtain transformation products. This function is called when calling generateTPs
with
algorithm="library_formula"
.
This function is similar to generateTPsLibrary
, however, it only require formula information
of the parent and TPs.
The TPs are stored in an object derived from the transformationProductsFormula
class.
The TPLibrary
argument is used to specify a custom TP library. This should be a
data.frame
where each row specifies a TP for a parent, with the following columns:
parent_name
and TP_name
: The name of the parent/TP.
parent_formula
and TP_formula
The formula of the parent/TP structure.
retDir
The retention direction of the TP compared to its parent: ‘-1’ (elutes before), ‘1’
(elutes after) or ‘0’ (elutes similarly or unknown). If not specified then the log P
values below may
be used to calculate retention time directions. (optional)
parent_LogP
and TP_LogP
The log P
values for the parent/TP. (optional)
LogPDiff
The difference between parent and TP Log P
values. Ignored if both
parent_LogP
and TP_LogP
are specified. (optional)
Other columns are allowed, and will be included in the final object. Multiple TPs for a single parent are specified
by repeating the value within parent_
columns.
Chemical properties such as SMILES, InChIKey and formula in the parent suspect list are automatically validated and calculated if missing/invalid.
The internal validation/calculation process performs the following steps:
Validation of SMILES, InChI, InChIKey and formula data (if present). Invalid
entries will be set to NA
.
If neutralChemProps=TRUE
then chemical data (SMILES, formulae etc.) is neutralized by
(de-)protonation (using the --neutralized
option of OpenBabel
). An additional column
molNeutralized
is added to mark those molecules that were neutralized. Note that neutralization requires
either SMILES or InChI data to be available.
The SMILES and InChI data are used to calculate missing or invalid SMILES,
InChI, InChIKey and formula data. If prefCalcChemProps=TRUE
then existing
InChIKey and formula data is overwritten by calculated values whenever possible.
The chemical formulae which were not calculated are verified and normalized. This process may be time
consuming, and is potentially largely avoided by setting prefCalcChemProps=TRUE
.
Neutral masses are calculated for missing values (prefCalcChemProps=FALSE
) or whenever possible
(prefCalcChemProps=TRUE
).
Note that calculation of formulae for molecules that are isotopically labelled is currently only supported for deuterium (2H) elements.
This functionality relies heavily on OpenBabel, please make sure it is installed.
Unlike generateTPsLibrary
, this function defaults the matchParentsBy
and
matchGenerationsBy
arguments to "name"
. While matching by formula
is also possible, it is
likely that duplicate parent formulae (i.e. isomers) are present in parents
and/or TPLibrary
,
making matching by formula unsuitable. However, if you are sure that no duplicate formulae are present, it may be
better to set the matching method to "formula"
.
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011). “Open Babel: An open chemical toolbox.” Journal of Cheminformatics, 3(1). doi:10.1186/1758-2946-3-33.
generateTPs
for more details and other algorithms.
generateTPsLibrary
to generate TPs from a library that contains structural information.
genFormulaTPLibrary
to automatically generate formula TP libraries.
Automatically calculate potential transformation products with metabolic logic.
generateTPsLogic(fGroups, minMass = 40, ...) ## S4 method for signature 'featureGroups' generateTPsLogic(fGroups, minMass = 40, adduct = NULL, transformations = NULL) ## S4 method for signature 'featureGroupsSet' generateTPsLogic(fGroups, minMass = 40, transformations = NULL)
generateTPsLogic(fGroups, minMass = 40, ...) ## S4 method for signature 'featureGroups' generateTPsLogic(fGroups, minMass = 40, adduct = NULL, transformations = NULL) ## S4 method for signature 'featureGroupsSet' generateTPsLogic(fGroups, minMass = 40, transformations = NULL)
fGroups |
A |
minMass |
A |
... |
Further arguments specified to the methods. |
adduct |
An (sets workflow) The |
transformations |
A |
This function uses metabolic logic to obtain transformation products. This function is called when calling generateTPs
with
algorithm="logic"
.
With this algorithm TPs are predicted from common (environmental) chemical reactions, such as hydroxylation, demethylation etc. The generated TPs result from calculating the mass differences between a parent feature after it underwent the reaction. While this only results in little information on chemical properties of the TP, an advantage of this method is that it does not rely on structural information of the parent, which may be unknown in a full non-target analysis.
A transformationProducts
(derived) object containing all generated TPs.
The transformations
argument specifies custom rules to calculate
transformation products. This should be a data.frame
with the following columns:
transformation
The name of the chemical transformation
add
The elements that are added by this reaction (e.g. "O"
).
sub
The elements that are removed by this reaction (e.g. "H2O"
).
retDir
The expected retention time direction relative to the parent (assuming a reversed phase like LC
separation). Valid values are: ‘-1’ (elutes before the parent), ‘1’ (elutes after the parent) or ‘0’
(no significant change or unknown).
The algorithms using transformation reactions are directly based on the work done by Schollee et al. (see references).
Schollee JE, Schymanski EL, Avak SE, Loos M, Hollender J (2015). “Prioritizing Unknown Transformation Products from Biologically-Treated Wastewater Using High-Resolution Mass Spectrometry, Multivariate Statistics, and Metabolic Logic.” Analytical Chemistry, 87(24), 12121–12129. doi:10.1021/acs.analchem.5b02905.
generateTPs
for more details and other algorithms.
Various (S4) generic functions providing a common interface for common tasks such as plotting and filtering data. The actual functionality and function arguments are often specific for the implemented methods, for this reason, please refer to the linked method documentation for each generic.
adducts(obj, ...) adducts(obj, ...) <- value algorithm(obj) analysisInfo(obj) analyses(obj) annotatedPeakList(obj, ...) annotations(obj, ...) calculatePeakQualities(obj, weights = NULL, flatnessFactor = 0.05, ...) clusterProperties(obj) clusters(obj) consensus(obj, ...) convertToMFDB(TPs, out, ...) convertToSuspects(obj, ...) cutClusters(obj) defaultExclNormScores(obj) export(obj, type, out, ...) featureTable(obj, ...) filter(obj, ...) getBPCs(obj, ...) getFeatures(obj) getMCS(obj, ...) getTICs(obj, ...) groupNames(obj) plotBPCs(obj, ...) plotChord(obj, addSelfLinks = FALSE, addRetMzPlots = TRUE, ...) plotChroms(obj, ...) plotGraph(obj, ...) plotInt(obj, ...) plotScores(obj, ...) plotSilhouettes(obj, kSeq, ...) plotSpectrum(obj, ...) plotStructure(obj, ...) plotTICs(obj, ...) plotVenn(obj, ...) plotUpSet(obj, ...) predictRespFactors(obj, ...) predictTox(obj, ...) delete(obj, ...) plotVolcano(obj, ...) replicateGroups(obj) setObjects(obj) sets(obj) treeCut(obj, k = NULL, h = NULL, ...) treeCutDynamic( obj, maxTreeHeight = 1, deepSplit = TRUE, minModuleSize = 1, ... ) unset(obj, set)
adducts(obj, ...) adducts(obj, ...) <- value algorithm(obj) analysisInfo(obj) analyses(obj) annotatedPeakList(obj, ...) annotations(obj, ...) calculatePeakQualities(obj, weights = NULL, flatnessFactor = 0.05, ...) clusterProperties(obj) clusters(obj) consensus(obj, ...) convertToMFDB(TPs, out, ...) convertToSuspects(obj, ...) cutClusters(obj) defaultExclNormScores(obj) export(obj, type, out, ...) featureTable(obj, ...) filter(obj, ...) getBPCs(obj, ...) getFeatures(obj) getMCS(obj, ...) getTICs(obj, ...) groupNames(obj) plotBPCs(obj, ...) plotChord(obj, addSelfLinks = FALSE, addRetMzPlots = TRUE, ...) plotChroms(obj, ...) plotGraph(obj, ...) plotInt(obj, ...) plotScores(obj, ...) plotSilhouettes(obj, kSeq, ...) plotSpectrum(obj, ...) plotStructure(obj, ...) plotTICs(obj, ...) plotVenn(obj, ...) plotUpSet(obj, ...) predictRespFactors(obj, ...) predictTox(obj, ...) delete(obj, ...) plotVolcano(obj, ...) replicateGroups(obj) setObjects(obj) sets(obj) treeCut(obj, k = NULL, h = NULL, ...) treeCutDynamic( obj, maxTreeHeight = 1, deepSplit = TRUE, minModuleSize = 1, ... ) unset(obj, set)
obj |
The object the generic should be applied to. |
... |
Any further method specific arguments. See method documentation for details. |
value |
The replacement value. |
weights , flatnessFactor
|
See method documentation. |
TPs |
The |
out |
Output file. |
type |
The export type. |
addSelfLinks |
If |
addRetMzPlots |
Set to |
kSeq |
An integer vector containing the sequence that should be used for average silhouette width calculation. |
k , h
|
Desired numbers of clusters. See |
maxTreeHeight , deepSplit , minModuleSize
|
Arguments used by
|
set |
The name of the set. |
adducts
returns assigned adducts of the object.
Methods are defined for: featureGroups
; featureGroupsSet
.
adducts<-
sets adducts of the object.
Methods are defined for: featureGroups
; featureGroupsSet
.
algorithm
returns the algorithm that was used to generate the object.
Methods are defined for: optimizationResult
; workflowStep
.
analysisInfo
returns the analysis information from an object.
Methods are defined for: featureGroups
; features
; MSPeakListsSet
.
analyses
returns a character
vector with the analyses for which data is present in this object.
Methods are defined for: featureGroups
; features
; formulas
; MSPeakLists
.
annotatedPeakList
returns an annotated MS peak list.
Methods are defined for: compounds
; compoundsSet
; formulas
; formulasSet
.
annotations
returns annotations.
Methods are defined for: featureAnnotations
; featureGroups
; formulas
.
calculatePeakQualities
calculates chromatographic peak qualities and scores.
Methods are defined for: featureGroups
; features
.
clusterProperties
Obtain a list with properties of the generated cluster(s).
Methods are defined for: componentsClust
; compoundsCluster
.
clusters
Obtain clustering object(s).
Methods are defined for: componentsClust
; compoundsCluster
.
consensus
combines and merges data from various algorithms to generate a consensus.
Methods are defined for: components
; componentsSet
; compounds
; compoundsSet
; featureGroupsComparison
; featureGroupsComparisonSet
; formulas
; formulasSet
; transformationProductsStructure
.
convertToMFDB
Exports the object to a local database that can be used with MetFrag
.
Methods are defined for: .
convertToSuspects
Converts an object to a suspect list.
Methods are defined for: MSLibrary
; transformationProducts
.
cutClusters
Returns assigned cluster indices of a cut cluster.
Methods are defined for: componentsClust
; compoundsCluster
.
defaultExclNormScores
Returns default scorings that are excluded from normalization.
export
exports workflow data to a given format.
Methods are defined for: featureGroups
; featureGroupsSet
; MSLibrary
.
featureTable
returns feature information.
Methods are defined for: featureGroups
; featureGroupsSet
; features
.
filter
provides various functionality to do post-filtering of data.
Methods are defined for: components
; componentsSet
; componentsTPs
; compounds
; compoundsSet
; featureAnnotations
; featureGroups
; featureGroupsScreening
; featureGroupsScreeningSet
; featureGroupsSet
; features
; featuresSet
; formulasSet
; MSLibrary
; MSPeakLists
; MSPeakListsSet
; transformationProducts
; transformationProductsStructure
.
getBPCs
gets base peak chromatogram(s).
Methods are defined for: featureGroups
; features
.
getFeatures
returns the object's features
object.
Methods are defined for: featureGroups
.
getMCS
Calculates the maximum common substructure.
Methods are defined for: compounds
; compoundsCluster
.
getTICs
gets total ion chromatogram(s).
Methods are defined for: featureGroups
; features
.
groupNames
returns a character
vector with the names of the feature groups for which data is present in this object.
Methods are defined for: components
; compoundsCluster
; featureAnnotations
; featureGroups
; MSPeakLists
.
plotBPCs
plots base peak chromatogram(s).
Methods are defined for: featureGroups
; features
.
plotChord
plots a Chord diagram to assess overlapping data.
Methods are defined for: featureGroups
; featureGroupsComparison
.
plotChroms
plots extracted ion chromatogram(s).
Methods are defined for: components
; featureGroups
.
plotGraph
Plots an interactive network graph.
Methods are defined for: componentsNT
; componentsNTSet
; componentsTPs
; featureGroups
; featureGroupsSet
; transformationProductsFormula
; transformationProductsStructure
.
plotInt
plots the intensity of all contained features.
Methods are defined for: componentsIntClust
; featureGroups
; featureGroupsSet
.
plotScores
plots candidate scorings.
plotSilhouettes
plots silhouette widths to evaluate the desired cluster size.
Methods are defined for: componentsClust
; compoundsCluster
.
plotSpectrum
plots a (annotated) spectrum.
Methods are defined for: components
; compounds
; compoundsSet
; formulas
; formulasSet
; MSPeakLists
; MSPeakListsSet
.
plotStructure
plots a chemical structure.
Methods are defined for: compounds
; compoundsCluster
.
plotTICs
plots total ion chromatogram(s).
Methods are defined for: featureGroups
; features
.
plotVenn
plots a Venn diagram to assess unique and overlapping data.
Methods are defined for: featureAnnotations
; featureGroups
; featureGroupsComparison
; featureGroupsSet
; transformationProductsStructure
.
plotUpSet
plots an UpSet diagram to assess unique and overlapping data.
Methods are defined for: featureAnnotations
; featureGroups
; featureGroupsComparison
; transformationProductsStructure
.
predictRespFactors
Prediction of response factors.
Methods are defined for: compounds
; compoundsSet
; compoundsSIRIUS
; featureGroupsScreening
; featureGroupsScreeningSet
; formulasSet
; formulasSIRIUS
.
predictTox
Prediction of toxicity values.
Methods are defined for: compounds
; compoundsSet
; compoundsSIRIUS
; featureGroupsScreening
; featureGroupsScreeningSet
; formulasSet
; formulasSIRIUS
.
delete
Deletes results.
Methods are defined for: components
; componentsClust
; componentsSet
; compoundsSet
; compoundsSIRIUS
; featureAnnotations
; featureGroups
; featureGroupsKPIC2
; featureGroupsScreening
; featureGroupsScreeningSet
; featureGroupsSet
; featureGroupsXCMS
; featureGroupsXCMS3
; features
; featuresKPIC2
; featuresXCMS
; featuresXCMS3
; formulas
; formulasSet
; formulasSIRIUS
; MSLibrary
; MSPeakLists
; MSPeakListsSet
; transformationProducts
.
plotVolcano
plots a volcano plot.
Methods are defined for: featureGroups
.
replicateGroups
returns a character
vector with the analyses for which data is present in this object.
Methods are defined for: featureGroups
; features
.
setObjects
returns the set objects of this object. See the documentation of workflowStepSet
.
Methods are defined for: workflowStepSet
.
sets
returns the names of the sets inside this object. See the documentation for sets workflows.
Methods are defined for: featureGroupsSet
; featuresSet
; workflowStepSet
.
treeCut
Manually cut a cluster.
Methods are defined for: componentsClust
; compoundsCluster
.
treeCutDynamic
Automatically cut a cluster.
Methods are defined for: componentsClust
; compoundsCluster
.
unset
Converts this object to a regular non-set object. See the documentation for sets workflows.
Methods are defined for: componentsNTSet
; componentsSet
; compoundsConsensusSet
; compoundsSet
; featureGroupsScreeningSet
; featureGroupsSet
; featuresSet
; formulasConsensusSet
; formulasSet
; MSPeakListsSet
.
Below are methods that are defined for existing
generics (e.g. defined in base
). Please see method specific
documentation for more details.
[
Subsets data within an object.
Methods are defined for: components,ANY,ANY,missing
; componentsSet,ANY,ANY,missing
; compoundsCluster,ANY,missing,missing
; compoundsSet,ANY,missing,missing
; featureAnnotations,ANY,missing,missing
; featureGroups,ANY,ANY,missing
; featureGroupsComparison,ANY,missing,missing
; featureGroupsScreening,ANY,ANY,missing
; featureGroupsScreeningSet,ANY,ANY,missing
; featureGroupsSet,ANY,ANY,missing
; features,ANY,missing,missing
; featuresSet,ANY,missing,missing
; formulasSet,ANY,missing,missing
; MSLibrary,ANY,missing,missing
; MSPeakLists,ANY,ANY,missing
; MSPeakListsSet,ANY,ANY,missing
; transformationProducts,ANY,missing,missing
.
[[
Extract data from an object.
Methods are defined for: components,ANY,ANY
; featureAnnotations,ANY,missing
; featureGroups,ANY,ANY
; featureGroupsComparison,ANY,missing
; features,ANY,missing
; formulas,ANY,ANY
; MSLibrary,ANY,missing
; MSPeakLists,ANY,ANY
; transformationProducts,ANY,missing
.
$
Extract data from an object.
Methods are defined for: components
; featureAnnotations
; featureGroups
; featureGroupsComparison
; features
; MSLibrary
; MSPeakLists
; transformationProducts
.
as.data.table
Converts an object to a table (data.table
).
Methods are defined for: components
; componentsTPs
; featureAnnotations
; featureGroups
; featureGroupsScreening
; featureGroupsScreeningSet
; features
; featuresSet
; formulas
; MSLibrary
; MSPeakLists
; MSPeakListsSet
; transformationProducts
; workflowStep
.
as.data.frame
Converts an object to a table (data.frame
).
Methods are defined for: workflowStep
.
length
Returns the length of an object.
Methods are defined for: components
; compoundsCluster
; featureAnnotations
; featureGroups
; featureGroupsComparison
; features
; MSLibrary
; MSPeakLists
; optimizationResult
; transformationProducts
.
lengths
Returns the lengths of elements within this object.
Methods are defined for: compoundsCluster
; optimizationResult
.
names
Return names for this object.
Methods are defined for: components
; featureGroups
; featureGroupsComparison
; MSLibrary
; transformationProducts
.
plot
Generates a plot for an object.
Methods are defined for: componentsClust,missing
; compoundsCluster,missing
; featureGroups,missing
; featureGroupsComparison,missing
; optimizationResult,missing
.
show
Prints information about this object.
Methods are defined for: adduct
; components
; componentsFeatures
; componentsSet
; compounds
; compoundsCluster
; compoundsSet
; featureGroups
; featureGroupsScreening
; featureGroupsScreeningSet
; featureGroupsSet
; features
; featuresSet
; formulas
; formulasSet
; MSLibrary
; MSPeakLists
; MSPeakListsSet
; optimizationResult
; transformationProducts
; workflowStep
; workflowStepSet
.
Functionality to automatically generate a TP library with formula data from a set of transformation rules, which can
be used with generateTPsLibraryFormula
. TP calculation will be skipped if the transformation involves
subtraction of elements not present in the parent.
genFormulaTPLibrary( parents, transformations = NULL, minMass = 40, generations = 1, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE )
genFormulaTPLibrary( parents, transformations = NULL, minMass = 40, generations = 1, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE )
parents |
The parents to which the given transformation rules should be used to generate the TP library. Should
be either a suspect list (see suspect screening for more information) or the resulting
output of |
transformations |
A |
minMass |
The minimum mass for a TP to be kept. |
generations |
An |
skipInvalid |
Set to |
prefCalcChemProps |
If |
neutralChemProps |
If |
A data.table
that is suitable for the TPLibrary
argument to
generateTPsLibraryFormula
.
The transformations
argument specifies custom rules to calculate
transformation products. This should be a data.frame
with the following columns:
transformation
The name of the chemical transformation
add
The elements that are added by this reaction (e.g. "O"
).
sub
The elements that are removed by this reaction (e.g. "H2O"
).
retDir
The expected retention time direction relative to the parent (assuming a reversed phase like LC
separation). Valid values are: ‘-1’ (elutes before the parent), ‘1’ (elutes after the parent) or ‘0’
(no significant change or unknown).
The algorithms using transformation reactions are directly based on the work done by Schollee et al. (see references).
Chemical properties such as SMILES, InChIKey and formula in the parent suspect list are automatically validated and calculated if missing/invalid.
The internal validation/calculation process performs the following steps:
Validation of SMILES, InChI, InChIKey and formula data (if present). Invalid
entries will be set to NA
.
If neutralChemProps=TRUE
then chemical data (SMILES, formulae etc.) is neutralized by
(de-)protonation (using the --neutralized
option of OpenBabel
). An additional column
molNeutralized
is added to mark those molecules that were neutralized. Note that neutralization requires
either SMILES or InChI data to be available.
The SMILES and InChI data are used to calculate missing or invalid SMILES,
InChI, InChIKey and formula data. If prefCalcChemProps=TRUE
then existing
InChIKey and formula data is overwritten by calculated values whenever possible.
The chemical formulae which were not calculated are verified and normalized. This process may be time
consuming, and is potentially largely avoided by setting prefCalcChemProps=TRUE
.
Neutral masses are calculated for missing values (prefCalcChemProps=FALSE
) or whenever possible
(prefCalcChemProps=TRUE
).
Note that calculation of formulae for molecules that are isotopically labelled is currently only supported for deuterium (2H) elements.
This functionality relies heavily on OpenBabel, please make sure it is installed.
Schollee JE, Schymanski EL, Avak SE, Loos M, Hollender J (2015). “Prioritizing Unknown Transformation Products from Biologically-Treated Wastewater Using High-Resolution Mass Spectrometry, Multivariate Statistics, and Metabolic Logic.” Analytical Chemistry, 87(24), 12121–12129. doi:10.1021/acs.analchem.5b02905.
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011). “Open Babel: An open chemical toolbox.” Journal of Cheminformatics, 3(1). doi:10.1186/1758-2946-3-33.
generateTPsLibraryFormula
and generateTPsLogic
Create parameter lists for averaging MS peak list data.
getDefAvgPListParams(...)
getDefAvgPListParams(...)
... |
Optional named arguments that override defaults. |
The parameters set used for averaging peak lists are set by the avgFeatParams
and avgFGroupParams
arguments to generateMSPeakLists
and its related algorithm specific functions. The parameters are
specified as a named list
with the following values:
clusterMzWindow
m/z window (in Da) used for clustering m/z values when spectra are
averaged. For method="hclust"
this corresponds to the cluster height, while for method="distance"
this
value is used to find nearby masses (+/- window). Too small windows will prevent clustering m/z values (thus
erroneously treating equal masses along spectra as different), whereas too big windows may cluster unrelated
m/z values from different or even the same spectrum together.
topMost
Only retain this maximum number of MS peaks when generating averaged spectra. Lowering this
number may exclude more irrelevant (noisy) MS peaks and decrease processing time, whereas higher values may avoid
excluding lower intense MS peaks that may still be of interest.
minIntensityPre
MS peaks with intensities below this value will be removed (applied prior to selection
by topMost
) before averaging.
minIntensityPost
MS peaks with intensities below this value will be removed after averaging.
avgFun
Function that is used to calculate average m/z values.
method
Method used for producing averaged MS spectra. Valid values are "hclust"
, used for
hierarchical clustering (using the fastcluster package), and "distance"
, to use the between peak
distance. The latter method may reduces processing time and memory requirements, at the potential cost of reduced
accuracy.
pruneMissingPrecursorMS
For MS data only: if TRUE
then peak lists without a precursor peak are
removed. Note that even when this is set to FALSE
, functionality that relies on MS (not MS/MS) peak lists
(e.g. formulae calulcation) will still skip calculation if a precursor is not found.
retainPrecursorMSMS
For MS/MS data only: if TRUE
then always retain the precursor mass peak even
if is not amongst the topMost
peaks. Note that MS precursor mass peaks are always kept. Furthermore, note that
precursor peaks in both MS and MS/MS data may still be removed by intensity thresholds (this is unlike the
filter
method function).
The getDefAvgPListParams
function can be used to generate a default parameter list. The current defaults are:
clusterMzWindow=0.005
; topMost=50
; minIntensityPre=500
; minIntensityPost=500
; avgFun=mean
; method="hclust"
; pruneMissingPrecursorMS=TRUE
; retainPrecursorMSMS=TRUE
getDefAvgPListParams
returns a list
with the peak list averaging parameters.
Averaging of mass spectra algorithms used by are based on the msProcess R package (now archived on CRAN).
With Bruker algorithms these parameters only control generation of feature groups averaged peak lists: how peak lists for features are generated is controlled by DataAnalysis.
This function generates one or more EIC(s) for given retention time and m/z ranges.
getEICs(file, ranges)
getEICs(file, ranges)
file |
The file path to the sample analysis data file (‘.mzXML’ or ‘.mzML’). |
ranges |
A |
A list
with EIC data for each of the rows in ranges
.
Fold change calculation
getFCParams(rGroups, ...)
getFCParams(rGroups, ...)
rGroups |
A |
... |
Optional named arguments that override defaults. |
Fold change calculation can be used to easily identify significant changes between replicate groups. The
calculation process is configured through a paramater list, which can be constructed with the getFCParams
function. The parameter list has the following entries:
rGroups
the name of the two replicate groups to compare (taken from the rGroups
argument to
getFCParams
).
thresholdFC
: the threshold log FC for a feature group to be classified as increasing/decreasing.
thresholdPV
: the threshold log P for a feature group to be significantly different.
zeroMethod
,zeroValue
: how to handle zero values when calculating the FC: add
adds an
offset to zero values, "fixed"
sets zero values to a fixed number and "omit"
removes zero data. The
number that is added/set by the former two options is defined by zeroValue
.
PVTestFunc
: a function that is used to calculate P values (usually using t.test
).
PVAdjFunc
: a function that is used to adjust P values (usually using p.adjust
)
The code to calculate and plot Fold change data was created by Bas van de Velde.
featureGroups-class
and feature-plotting
Converts a features
object to an KPIC object.
getPICSet(obj, ...) ## S4 method for signature 'features' getPICSet(obj, loadRawData = TRUE) ## S4 method for signature 'featuresKPIC2' getPICSet(obj, ...)
getPICSet(obj, ...) ## S4 method for signature 'features' getPICSet(obj, loadRawData = TRUE) ## S4 method for signature 'featuresKPIC2' getPICSet(obj, ...)
obj |
The |
... |
Ignored |
loadRawData |
Set to |
Converts a features
or featureGroups
object to an xcmsSet
or
XCMSnExp
object.
getXCMSSet(obj, verbose = TRUE, ...) getXCMSnExp(obj, verbose = TRUE, ...) ## S4 method for signature 'features' getXCMSSet(obj, verbose, loadRawData) ## S4 method for signature 'featuresXCMS' getXCMSSet(obj, verbose = TRUE, ...) ## S4 method for signature 'featureGroups' getXCMSSet(obj, verbose, loadRawData) ## S4 method for signature 'featureGroupsXCMS' getXCMSSet(obj, verbose, loadRawData) ## S4 method for signature 'featuresSet' getXCMSSet(obj, ..., set) ## S4 method for signature 'featureGroupsSet' getXCMSSet(obj, ..., set) ## S4 method for signature 'features' getXCMSnExp(obj, verbose, loadRawData) ## S4 method for signature 'featuresXCMS3' getXCMSnExp(obj, verbose = TRUE, ...) ## S4 method for signature 'featureGroups' getXCMSnExp(obj, verbose, loadRawData) ## S4 method for signature 'featureGroupsXCMS3' getXCMSnExp(obj, verbose, loadRawData) ## S4 method for signature 'featuresSet' getXCMSnExp(obj, ..., set) ## S4 method for signature 'featureGroupsSet' getXCMSnExp(obj, ..., set)
getXCMSSet(obj, verbose = TRUE, ...) getXCMSnExp(obj, verbose = TRUE, ...) ## S4 method for signature 'features' getXCMSSet(obj, verbose, loadRawData) ## S4 method for signature 'featuresXCMS' getXCMSSet(obj, verbose = TRUE, ...) ## S4 method for signature 'featureGroups' getXCMSSet(obj, verbose, loadRawData) ## S4 method for signature 'featureGroupsXCMS' getXCMSSet(obj, verbose, loadRawData) ## S4 method for signature 'featuresSet' getXCMSSet(obj, ..., set) ## S4 method for signature 'featureGroupsSet' getXCMSSet(obj, ..., set) ## S4 method for signature 'features' getXCMSnExp(obj, verbose, loadRawData) ## S4 method for signature 'featuresXCMS3' getXCMSnExp(obj, verbose = TRUE, ...) ## S4 method for signature 'featureGroups' getXCMSnExp(obj, verbose, loadRawData) ## S4 method for signature 'featureGroupsXCMS3' getXCMSnExp(obj, verbose, loadRawData) ## S4 method for signature 'featuresSet' getXCMSnExp(obj, ..., set) ## S4 method for signature 'featureGroupsSet' getXCMSnExp(obj, ..., set)
obj |
The object that should be converted. |
verbose |
If |
... |
(sets workflow) Further arguments passed to non-sets method. Otherwise ignored. |
loadRawData |
Set to |
set |
(sets workflow) The name of the set to be exported. |
In a sets workflow, unset
is used to convert the
feature (group) data before the object is exported.
Group equal features across analyses.
groupFeatures(obj, algorithm, ...) ## S4 method for signature 'features' groupFeatures(obj, algorithm, ..., verbose = TRUE) ## S4 method for signature 'data.frame' groupFeatures(obj, algorithm, ..., verbose = TRUE) ## S4 method for signature 'featuresSet' groupFeatures(obj, algorithm, ..., verbose = TRUE)
groupFeatures(obj, algorithm, ...) ## S4 method for signature 'features' groupFeatures(obj, algorithm, ..., verbose = TRUE) ## S4 method for signature 'data.frame' groupFeatures(obj, algorithm, ..., verbose = TRUE) ## S4 method for signature 'featuresSet' groupFeatures(obj, algorithm, ..., verbose = TRUE)
obj |
Either a |
algorithm |
A |
... |
Further parameters passed to the selected grouping algorithm. |
verbose |
if |
After features have been found, the next step is to align and group them across analyses. This process is necessary to allow comparison of features between multiple analyses, which otherwise would be difficult due to small deviations in retention and mass data. Thus, algorithms of 'feature groupers' are used to collect features with similar retention and mass data. In addition, advanced retention time alignment algorithms exist to enhance grouping of features even with relative large retention time deviations (e.g. possibly observed from analyses collected over a long period). Like findFeatures, various algorithms are supported which may have many parameters that can be fine-tuned. This fine-tuning is likely to be necessary, since optimal settings often depend on applied methodology and instrumentation.
groupFeatures
is a generic function that will groupFeatures by one of the supported algorithms. The actual
functionality is provided by algorithm specific functions such as groupFeaturesOpenMS
and groupFeaturesXCMS3
. While these
functions may be called directly, groupFeatures
provides a generic interface and is therefore usually preferred.
The data.frame
method for groupFeatures
is a special case that currently only supports the
"sirius"
algorithm.
An object of a class which is derived from featureGroups
.
The featuresSet
method (for sets workflows) returns a
featureGroupsSet
object.
The featureGroups
output class and its methods and the algorithm specific functions:
groupFeaturesOpenMS
, groupFeaturesXCMS
, groupFeaturesXCMS3
, groupFeaturesKPIC2
, groupFeaturesSIRIUS
Uses the the KPIC2 R package for grouping of features.
groupFeaturesKPIC2(feat, ...) ## S4 method for signature 'features' groupFeaturesKPIC2( feat, rtalign = TRUE, loadRawData = TRUE, groupArgs = list(tolerance = c(0.005, 12)), alignArgs = list(), verbose = TRUE ) ## S4 method for signature 'featuresSet' groupFeaturesKPIC2( feat, groupArgs = list(tolerance = c(0.005, 12)), verbose = TRUE )
groupFeaturesKPIC2(feat, ...) ## S4 method for signature 'features' groupFeaturesKPIC2( feat, rtalign = TRUE, loadRawData = TRUE, groupArgs = list(tolerance = c(0.005, 12)), alignArgs = list(), verbose = TRUE ) ## S4 method for signature 'featuresSet' groupFeaturesKPIC2( feat, groupArgs = list(tolerance = c(0.005, 12)), verbose = TRUE )
feat |
The |
... |
Further parameters passed to the selected grouping algorithm. |
rtalign |
Set to |
loadRawData |
Set to |
groupArgs , alignArgs
|
Named |
verbose |
if |
This function uses KPIC2 to group features. This function is called when calling groupFeatures
with
algorithm="kpic2"
.
Grouping of features and alignment of their retention times are performed with the
KPIC::PICset.group
and KPIC::PICset.align
functions, respectively.
An object of a class which is derived from featureGroups
.
The featuresSet
method (for sets workflows) returns a
featureGroupsSet
object.
loadRawData
and arguments related to retention time alignment are currently not
supported for sets workflows.
Ji H, Zeng F, Xu Y, Lu H, Zhang Z (2017). “KPIC2: An Effective Framework for Mass Spectrometry-Based Metabolomics Using Pure Ion Chromatograms.” Analytical Chemistry, 89(14), 7631–7640. doi:10.1021/acs.analchem.7b01547.
groupFeatures
for more details and other algorithms.
Group and align features with OpenMS tools
groupFeaturesOpenMS(feat, ...) ## S4 method for signature 'features' groupFeaturesOpenMS( feat, rtalign = TRUE, QT = FALSE, maxAlignRT = 30, maxAlignMZ = 0.005, maxGroupRT = 12, maxGroupMZ = 0.005, extraOptsRT = NULL, extraOptsGroup = NULL, verbose = TRUE )
groupFeaturesOpenMS(feat, ...) ## S4 method for signature 'features' groupFeaturesOpenMS( feat, rtalign = TRUE, QT = FALSE, maxAlignRT = 30, maxAlignMZ = 0.005, maxGroupRT = 12, maxGroupMZ = 0.005, extraOptsRT = NULL, extraOptsGroup = NULL, verbose = TRUE )
feat |
The |
... |
Further parameters passed to the selected grouping algorithm. |
rtalign |
Set to |
QT |
If enabled, use |
maxAlignRT , maxAlignMZ
|
Used for retention alignment. Maximum retention time or m/z difference (seconds/Dalton)
for feature pairing. Sets |
maxGroupRT , maxGroupMZ
|
as |
extraOptsRT , extraOptsGroup
|
Named |
verbose |
if |
This function uses OpenMS to group features. This function is called when calling groupFeatures
with
algorithm="openms"
.
Retention times may be aligned by the MapAlignerPoseClustering TOPP tool. Grouping is achieved by either the FeatureLinkerUnlabeled or FeatureLinkerUnlabeledQT TOPP tools.
An object of a class which is derived from featureGroups
.
The featuresSet
method (for sets workflows) returns a
featureGroupsSet
object.
Rost HL, Sachsenberg T, Aiche S, Bielow C, Weisser H, Aicheler F, Andreotti S, Ehrlich H, Gutenbrunner P, Kenar E, Liang X, Nahnsen S, Nilse L, Pfeuffer J, Rosenberger G, Rurik M, Schmitt U, Veit J, Walzer M, Wojnar D, Wolski WE, Schilling O, Choudhary JS, Malmstrom L, Aebersold R, Reinert K, Kohlbacher O (2016).
“OpenMS: a flexible open-source software platform for mass spectrometry data analysis.”
Nature Methods, 13(9), 741–748.
doi:10.1038/nmeth.3959.
pugixml (via Rcpp) is used to process OpenMS XML output.
groupFeatures
for more details and other algorithms.
Uses SIRIUS to find and group features.
groupFeaturesSIRIUS(analysisInfo, verbose = TRUE)
groupFeaturesSIRIUS(analysisInfo, verbose = TRUE)
analysisInfo |
A |
verbose |
if |
This function uses SIRIUS to group features. This function is called when calling groupFeatures
with
algorithm="sirius"
.
Finding and grouping features is done by running the lcms-align
command on every analyses at once.
For this reason, grouping feature data from other algorithms than SIRIUS
is not supported.
The MS files should be in the ‘mzML’ or ‘mzXML’ format. Furthermore, this algorithms requires the presence of (data-dependent) MS/MS data.
The input MS data files need to be centroided. The convertMSFiles
function can be used to
centroid data.
An object of a class which is derived from featureGroups
.
The featuresSet
method (for sets workflows) returns a
featureGroupsSet
object.
Duhrkop K, Fleischauer M, Ludwig M, Aksenov AA, Melnik AV, Meusel M, Dorrestein PC, Rousu J, Bocker S (2019). “SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information.” Nature Methods, 16(4), 299–302. doi:10.1038/s41592-019-0344-8.
groupFeatures
for more details and other algorithms.
Group and align features with the legacy xcmsSet
function from the xcms package.
groupFeaturesXCMS(feat, ...) ## S4 method for signature 'features' groupFeaturesXCMS( feat, rtalign = TRUE, loadRawData = TRUE, groupArgs = list(mzwid = 0.015), retcorArgs = list(method = "obiwarp"), verbose = TRUE ) ## S4 method for signature 'featuresSet' groupFeaturesXCMS(feat, groupArgs = list(mzwid = 0.015), verbose = TRUE)
groupFeaturesXCMS(feat, ...) ## S4 method for signature 'features' groupFeaturesXCMS( feat, rtalign = TRUE, loadRawData = TRUE, groupArgs = list(mzwid = 0.015), retcorArgs = list(method = "obiwarp"), verbose = TRUE ) ## S4 method for signature 'featuresSet' groupFeaturesXCMS(feat, groupArgs = list(mzwid = 0.015), verbose = TRUE)
feat |
The |
... |
Further parameters passed to the selected grouping algorithm. |
rtalign |
Set to |
loadRawData |
Set to |
groupArgs |
named |
retcorArgs |
named |
verbose |
if |
This function uses XCMS to group features. This function is called when calling groupFeatures
with
algorithm="xcms"
.
Grouping of features and
alignment of their retention times are performed with the xcms::group
and
xcms::retcor
functions, respectively. Both functions have an extensive list of
parameters to modify their behavior and may therefore be used to potentially optimize results.
An object of a class which is derived from featureGroups
.
The featuresSet
method (for sets workflows) returns a
featureGroupsSet
object.
loadRawData
and arguments related to retention time alignment are currently not
supported for sets workflows.
Smith, C.A. and Want, E.J. and O'Maille, G. and Abagyan,R. and Siuzdak, G.: XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching and identification, Analytical Chemistry, 78:779-787 (2006)
Ralf Tautenhahn, Christoph Boettcher, Steffen Neumann: Highly sensitive feature detection for high resolution LC/MS BMC Bioinformatics, 9:504 (2008)
H. Paul Benton, Elizabeth J. Want and Timothy M. D. Ebbels Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data Bioinformatics, 26:2488 (2010)
groupFeatures
for more details and other algorithms.
Uses the new xcms3
interface from the xcms package to find features.
groupFeaturesXCMS3(feat, ...) ## S4 method for signature 'features' groupFeaturesXCMS3( feat, rtalign = TRUE, loadRawData = TRUE, groupParam = xcms::PeakDensityParam(sampleGroups = analysisInfo(feat)$group), preGroupParam = groupParam, retAlignParam = xcms::ObiwarpParam(), verbose = TRUE ) ## S4 method for signature 'featuresSet' groupFeaturesXCMS3( feat, groupParam = xcms::PeakDensityParam(sampleGroups = analysisInfo(feat)$group), verbose = TRUE )
groupFeaturesXCMS3(feat, ...) ## S4 method for signature 'features' groupFeaturesXCMS3( feat, rtalign = TRUE, loadRawData = TRUE, groupParam = xcms::PeakDensityParam(sampleGroups = analysisInfo(feat)$group), preGroupParam = groupParam, retAlignParam = xcms::ObiwarpParam(), verbose = TRUE ) ## S4 method for signature 'featuresSet' groupFeaturesXCMS3( feat, groupParam = xcms::PeakDensityParam(sampleGroups = analysisInfo(feat)$group), verbose = TRUE )
feat |
The |
... |
Further parameters passed to the selected grouping algorithm. |
rtalign |
Set to |
loadRawData |
Set to |
groupParam , retAlignParam
|
parameter object that is directly passed to
|
preGroupParam |
grouping parameters applied when features are grouped prior to alignment (only with peak groups alignment). |
verbose |
if |
This function uses XCMS3 to group features. This function is called when calling groupFeatures
with
algorithm="xcms3"
.
Grouping of features and alignment of their retention times are performed with the
xcms::groupChromPeaks
and xcms::adjustRtime
functions, respectively. Both of these functions support an extensive amount of parameters that modify their
behavior and may therefore require optimization.
An object of a class which is derived from featureGroups
.
The featuresSet
method (for sets workflows) returns a
featureGroupsSet
object.
loadRawData
and arguments related to retention time alignment are currently not
supported for sets workflows.
Smith, C.A. and Want, E.J. and O'Maille, G. and Abagyan,R. and Siuzdak, G.: XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching and identification, Analytical Chemistry, 78:779-787 (2006)
Ralf Tautenhahn, Christoph Boettcher, Steffen Neumann: Highly sensitive feature detection for high resolution LC/MS BMC Bioinformatics, 9:504 (2008)
H. Paul Benton, Elizabeth J. Want and Timothy M. D. Ebbels Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data Bioinformatics, 26:2488 (2010)
groupFeatures
for more details and other algorithms.
This class holds all the information for grouped features.
groupTable(object, ...) groupFeatIndex(fGroups) groupInfo(fGroups) unique(x, incomparables = FALSE, ...) overlap(fGroups, which, exclusive = FALSE, ...) selectIons(fGroups, components, prefAdduct, ...) groupQualities(fGroups) groupScores(fGroups) internalStandards(fGroups) internalStandardAssignments(fGroups, ...) normInts( fGroups, featNorm = "none", groupNorm = FALSE, normFunc = max, standards = NULL, ISTDRTWindow = 120, ISTDMZWindow = 300, minISTDs = 3, ... ) ## S4 method for signature 'featureGroups' names(x) ## S4 method for signature 'featureGroups' analyses(obj) ## S4 method for signature 'featureGroups' replicateGroups(obj) ## S4 method for signature 'featureGroups' groupNames(obj) ## S4 method for signature 'featureGroups' length(x) ## S4 method for signature 'featureGroups' show(object) ## S4 method for signature 'featureGroups' groupTable(object, areas = FALSE, normalized = FALSE) ## S4 method for signature 'featureGroups' analysisInfo(obj) ## S4 method for signature 'featureGroups' groupInfo(fGroups) ## S4 method for signature 'featureGroups' featureTable(obj) ## S4 method for signature 'featureGroups' getFeatures(obj) ## S4 method for signature 'featureGroups' groupFeatIndex(fGroups) ## S4 method for signature 'featureGroups' groupQualities(fGroups) ## S4 method for signature 'featureGroups' groupScores(fGroups) ## S4 method for signature 'featureGroups' annotations(obj) ## S4 method for signature 'featureGroups' internalStandards(fGroups) ## S4 method for signature 'featureGroups' internalStandardAssignments(fGroups) ## S4 method for signature 'featureGroups' adducts(obj) ## S4 replacement method for signature 'featureGroups' adducts(obj) <- value ## S4 method for signature 'featureGroups' concentrations(fGroups) ## S4 method for signature 'featureGroups' toxicities(fGroups) ## S4 method for signature 'featureGroups,ANY,ANY,missing' x[i, j, ..., rGroups, results, drop = TRUE] ## S4 method for signature 'featureGroups,ANY,ANY' x[[i, j]] ## S4 method for signature 'featureGroups' x$name ## S4 method for signature 'featureGroups' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'featureGroups' export(obj, type, out) ## S4 method for signature 'featureGroups' as.data.table( x, average = FALSE, areas = FALSE, features = FALSE, qualities = FALSE, regression = FALSE, averageFunc = mean, normalized = FALSE, FCParams = NULL, concAggrParams = getDefPredAggrParams(), toxAggrParams = getDefPredAggrParams(), normConcToTox = FALSE ) ## S4 method for signature 'featureGroups' unique(x, which, relativeTo = NULL, outer = FALSE) ## S4 method for signature 'featureGroups' overlap(fGroups, which, exclusive) ## S4 method for signature 'featureGroups' calculatePeakQualities( obj, weights, flatnessFactor, avgFunc = mean, parallel = TRUE ) ## S4 method for signature 'featureGroups' selectIons( fGroups, components, prefAdduct, onlyMonoIso = TRUE, chargeMismatch = "adduct" ) ## S4 method for signature 'featureGroups' normInts( fGroups, featNorm = "none", groupNorm = FALSE, normFunc = max, standards = NULL, ISTDRTWindow = 120, ISTDMZWindow = 300, minISTDs = 3, ... ) ## S4 method for signature 'featureGroups' getTICs(obj, retentionRange = NULL, MSLevel = c(1, 2)) ## S4 method for signature 'featureGroups' getBPCs(obj, retentionRange = NULL, MSLevel = c(1, 2)) ## S4 method for signature 'featureGroupsSet' sets(obj) ## S4 method for signature 'featureGroupsSet' internalStandardAssignments(fGroups, set = NULL) ## S4 method for signature 'featureGroupsSet' adducts(obj, set, ...) ## S4 replacement method for signature 'featureGroupsSet' adducts(obj, set, reGroup = TRUE) <- value ## S4 method for signature 'featureGroupsSet' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'featureGroupsSet' show(object) ## S4 method for signature 'featureGroupsSet' featureTable(obj) ## S4 method for signature 'featureGroupsSet,ANY,ANY,missing' x[i, j, ..., rGroups, sets = NULL, drop = TRUE] ## S4 method for signature 'featureGroupsSet' export(obj, type, out, set) ## S4 method for signature 'featureGroupsSet' unique(x, which, ..., sets = FALSE) ## S4 method for signature 'featureGroupsSet' overlap(fGroups, which, exclusive, sets = FALSE) ## S4 method for signature 'featureGroupsSet' selectIons(fGroups, components, prefAdduct, ...) ## S4 method for signature 'featureGroupsSet' normInts( fGroups, featNorm = "none", groupNorm = FALSE, normFunc = max, standards = NULL, ISTDRTWindow = 120, ISTDMZWindow = 300, minISTDs = 3, ... ) ## S4 method for signature 'featureGroupsSet' unset(obj, set) ## S4 method for signature 'featureGroupsKPIC2' delete(obj, ...) ## S4 method for signature 'featureGroups' plotTICs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, colourBy = c("none", "analyses", "rGroups"), showLegend = TRUE, xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'featureGroups' plotBPCs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, colourBy = c("none", "analyses", "rGroups"), showLegend = TRUE, xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'featureGroupsXCMS' delete(obj, ...) ## S4 method for signature 'featureGroupsXCMS3' delete(obj, ...)
groupTable(object, ...) groupFeatIndex(fGroups) groupInfo(fGroups) unique(x, incomparables = FALSE, ...) overlap(fGroups, which, exclusive = FALSE, ...) selectIons(fGroups, components, prefAdduct, ...) groupQualities(fGroups) groupScores(fGroups) internalStandards(fGroups) internalStandardAssignments(fGroups, ...) normInts( fGroups, featNorm = "none", groupNorm = FALSE, normFunc = max, standards = NULL, ISTDRTWindow = 120, ISTDMZWindow = 300, minISTDs = 3, ... ) ## S4 method for signature 'featureGroups' names(x) ## S4 method for signature 'featureGroups' analyses(obj) ## S4 method for signature 'featureGroups' replicateGroups(obj) ## S4 method for signature 'featureGroups' groupNames(obj) ## S4 method for signature 'featureGroups' length(x) ## S4 method for signature 'featureGroups' show(object) ## S4 method for signature 'featureGroups' groupTable(object, areas = FALSE, normalized = FALSE) ## S4 method for signature 'featureGroups' analysisInfo(obj) ## S4 method for signature 'featureGroups' groupInfo(fGroups) ## S4 method for signature 'featureGroups' featureTable(obj) ## S4 method for signature 'featureGroups' getFeatures(obj) ## S4 method for signature 'featureGroups' groupFeatIndex(fGroups) ## S4 method for signature 'featureGroups' groupQualities(fGroups) ## S4 method for signature 'featureGroups' groupScores(fGroups) ## S4 method for signature 'featureGroups' annotations(obj) ## S4 method for signature 'featureGroups' internalStandards(fGroups) ## S4 method for signature 'featureGroups' internalStandardAssignments(fGroups) ## S4 method for signature 'featureGroups' adducts(obj) ## S4 replacement method for signature 'featureGroups' adducts(obj) <- value ## S4 method for signature 'featureGroups' concentrations(fGroups) ## S4 method for signature 'featureGroups' toxicities(fGroups) ## S4 method for signature 'featureGroups,ANY,ANY,missing' x[i, j, ..., rGroups, results, drop = TRUE] ## S4 method for signature 'featureGroups,ANY,ANY' x[[i, j]] ## S4 method for signature 'featureGroups' x$name ## S4 method for signature 'featureGroups' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'featureGroups' export(obj, type, out) ## S4 method for signature 'featureGroups' as.data.table( x, average = FALSE, areas = FALSE, features = FALSE, qualities = FALSE, regression = FALSE, averageFunc = mean, normalized = FALSE, FCParams = NULL, concAggrParams = getDefPredAggrParams(), toxAggrParams = getDefPredAggrParams(), normConcToTox = FALSE ) ## S4 method for signature 'featureGroups' unique(x, which, relativeTo = NULL, outer = FALSE) ## S4 method for signature 'featureGroups' overlap(fGroups, which, exclusive) ## S4 method for signature 'featureGroups' calculatePeakQualities( obj, weights, flatnessFactor, avgFunc = mean, parallel = TRUE ) ## S4 method for signature 'featureGroups' selectIons( fGroups, components, prefAdduct, onlyMonoIso = TRUE, chargeMismatch = "adduct" ) ## S4 method for signature 'featureGroups' normInts( fGroups, featNorm = "none", groupNorm = FALSE, normFunc = max, standards = NULL, ISTDRTWindow = 120, ISTDMZWindow = 300, minISTDs = 3, ... ) ## S4 method for signature 'featureGroups' getTICs(obj, retentionRange = NULL, MSLevel = c(1, 2)) ## S4 method for signature 'featureGroups' getBPCs(obj, retentionRange = NULL, MSLevel = c(1, 2)) ## S4 method for signature 'featureGroupsSet' sets(obj) ## S4 method for signature 'featureGroupsSet' internalStandardAssignments(fGroups, set = NULL) ## S4 method for signature 'featureGroupsSet' adducts(obj, set, ...) ## S4 replacement method for signature 'featureGroupsSet' adducts(obj, set, reGroup = TRUE) <- value ## S4 method for signature 'featureGroupsSet' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'featureGroupsSet' show(object) ## S4 method for signature 'featureGroupsSet' featureTable(obj) ## S4 method for signature 'featureGroupsSet,ANY,ANY,missing' x[i, j, ..., rGroups, sets = NULL, drop = TRUE] ## S4 method for signature 'featureGroupsSet' export(obj, type, out, set) ## S4 method for signature 'featureGroupsSet' unique(x, which, ..., sets = FALSE) ## S4 method for signature 'featureGroupsSet' overlap(fGroups, which, exclusive, sets = FALSE) ## S4 method for signature 'featureGroupsSet' selectIons(fGroups, components, prefAdduct, ...) ## S4 method for signature 'featureGroupsSet' normInts( fGroups, featNorm = "none", groupNorm = FALSE, normFunc = max, standards = NULL, ISTDRTWindow = 120, ISTDMZWindow = 300, minISTDs = 3, ... ) ## S4 method for signature 'featureGroupsSet' unset(obj, set) ## S4 method for signature 'featureGroupsKPIC2' delete(obj, ...) ## S4 method for signature 'featureGroups' plotTICs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, colourBy = c("none", "analyses", "rGroups"), showLegend = TRUE, xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'featureGroups' plotBPCs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, colourBy = c("none", "analyses", "rGroups"), showLegend = TRUE, xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'featureGroupsXCMS' delete(obj, ...) ## S4 method for signature 'featureGroupsXCMS3' delete(obj, ...)
... |
For the For For For sets workflow methods: further arguments passed to the base |
fGroups , obj , x , object
|
|
incomparables |
Ignored. |
which |
A character vector with replicate groups used for comparison. For |
exclusive |
If |
components |
The |
prefAdduct |
The 'preferred adduct' (see method description). This is often |
featNorm |
The method applied for feature normalization: |
groupNorm |
If |
normFunc |
A |
standards |
A (sets workflow) Can also be a See the |
ISTDRTWindow , ISTDMZWindow
|
The retention time and m/z windows for IS selection. Only used if
|
minISTDs |
The minimum number of IS that should be assigned to each feature (if possible). Only used if
|
areas |
If set to For |
normalized |
If For |
value |
For |
i , j
|
For |
rGroups |
For |
results |
Optional argument. If specified only feature groups with results in the specified object are kept. The
class of |
drop |
ignored. |
name |
The feature group name (partially matched). |
type |
The export type: |
out |
The destination file for the exported data. |
average |
If For |
features |
If |
qualities |
Adds feature (group) qualities ( |
regression |
Set to |
averageFunc |
Function used for averaging. Only used when |
FCParams |
A parameter list to calculate Fold change data. See |
concAggrParams , toxAggrParams
|
Parameters to aggregate calculated concentrations/toxicities (obtained with
|
normConcToTox |
Set to |
relativeTo |
A character vector with replicate groups that should be
used for unique comparison. If |
outer |
If |
weights |
A named |
flatnessFactor |
Passed to MetaClean as the |
avgFunc |
The function used to average the peak qualities and scores for each feature group. |
parallel |
If set to |
onlyMonoIso |
Set to |
chargeMismatch |
Specifies how to deal with a mismatch in charge between adduct and isotope annotations. Valid
values are: |
retentionRange |
Range of retention time (in seconds) to collect TIC traces. Should be a numeric vector with length of two containing the min/max values. Set to NULL to ignore. |
MSLevel |
Integer vector with the ms levels (i.e., 1 for MS1 and 2 for MS2) to obtain TIC traces. |
set |
(sets workflow) The name of the set. |
reGroup |
(sets workflow) Set to |
sets |
(sets workflow) For For |
retMin |
Plot retention time in minutes (instead of seconds). |
title |
Character string used for title of the plot. If |
colourBy |
Sets the automatic colour selection: "none" for a single colour or "analyses"/"rGroups" for a distinct colour per analysis or analysis replicate group. |
showLegend |
Plot a legend if TRUE. |
xlim , ylim
|
Sets the plot size limits used by
|
The featureGroup
class is the workhorse of patRoon: almost all functionality operate on its instantiated
objects. The class holds all information from grouped features (obtained from features
). This class
itself is virtual
, hence, objects are not created directly from it. Instead, 'feature groupers' such as
groupFeaturesXCMS
return a featureGroups
derived object after performing the actual grouping of
features across analyses.
delete
returns the object for which the specified data was removed.
calculatePeakQualities
returns a modified object amended with peak qualities and scores.
selectIons
returns a featureGroups
object with only the selected feature groups and amended
with adduct annotations.
normInts
returns a featureGroups
object, amended with data in the ISTDs
and
ISTDAssignments
slots if featNorm="istd"
.
names(featureGroups)
: Obtain feature group names.
analyses(featureGroups)
: returns a character
vector with the names of the
analyses for which data is present in this object.
replicateGroups(featureGroups)
: returns a character
vector with the names of the
replicate groups for which data is present in this object.
groupNames(featureGroups)
: Same as names
. Provided for consistency to other classes.
length(featureGroups)
: Obtain number of feature groups.
show(featureGroups)
: Shows summary information for this object.
groupTable(featureGroups)
: Accessor for groups
slot.
analysisInfo(featureGroups)
: Obtain analysisInfo (see analysisInfo slot in features
).
groupInfo(featureGroups)
: Accessor for groupInfo
slot.
featureTable(featureGroups)
: Obtain feature information (see features
).
getFeatures(featureGroups)
: Accessor for features
slot.
groupFeatIndex(featureGroups)
: Accessor for ftindex
slot.
groupQualities(featureGroups)
: Accessor for groupQualities
slot.
groupScores(featureGroups)
: Accessor for groupScores
slot.
annotations(featureGroups)
: Accessor for annotations
slot.
internalStandards(featureGroups)
: Accessor for ISTDs
slot.
internalStandardAssignments(featureGroups)
: Accessor for ISTDAssignments
slot.
adducts(featureGroups)
: Returns a named character
with adduct annotations assigned to each feature group (if
available).
adducts(featureGroups) <- value
: Sets adduct annotations for feature groups.
concentrations(featureGroups)
: Accessor for concentrations
slot.
toxicities(featureGroups)
: Accessor for toxicities
slot.
x[i
: Subset on analyses/feature groups.
x[[i
: Extract intensity values.
$
: Extract intensity values for a feature group.
delete(featureGroups)
: Completely deletes specified feature groups.
export(featureGroups)
: Exports feature groups to a ‘.csv’ file that is readable to Bruker ProfileAnalysis (a
'bucket table'), Bruker TASQ (an analyte database) or that is suitable as input for the Targeted peak
detection
functionality of MZmine.
as.data.table(featureGroups)
: Obtain a summary table (a data.table
) with retention, m/z, intensity
and optionally other feature data.
unique(featureGroups)
: Obtain a subset with unique feature groups
present in one or more specified replicate group(s).
overlap(featureGroups)
: Obtain a subset with feature groups that overlap
between a set of specified replicate group(s).
calculatePeakQualities(featureGroups)
: Calculates peak and group qualities for all features and feature groups. The peak qualities
(and scores) are calculated with the features method of this
function, and subsequently averaged per feature group. Then, MetaClean is used to calculate the
Elution Shift
and Retention Time Consistency
group quality metrics (see the MetaClean
publication cited below for more details). Similarly to the features
method, these metrics are scored
by normalizing qualities among all groups and scaling them from ‘0’ (worst) to ‘1’ (best). The
totalScore
for each group is then calculated as the weighted sum from all feature (group) scores. The
getMCTrainData
and predictCheckFeaturesSession
functions can be used to train and apply
Pass/Fail ML models from MetaClean.
selectIons(featureGroups)
: uses componentization results to select feature groups with
preferred adduct ion and/or isotope annotation. Typically, this means that only feature groups are kept if they are
(de-)protonated adducts and are monoisotopic. The adduct annotation assignments for the selected feature groups are
copied from the components to the annotations
slot. If the adduct for a feature group is unknown, its
annotation is defaulted to the 'preferred' adduct, and hence, the feature group will never be removed. Furthermore,
if a component does not contain an annotation with the preferred adduct, the most intense feature group is selected
instead. Similarly, if no isotope annotation is available, the feature group is assumed to be monoisotopic and thus
not removed. An important advantage of selectIons
is that it may considerably simplify your dataset.
Furthermore, the adduct assignments allow formula/compound annotation steps later in the workflow to improve their
annotation accuracy. On the other hand, it is important the componentization results are reliable. Hence, it is
highly recommended that, prior to calling selectIons
, the settings to generateComponents
are
optimized and its results are reviewed with checkComponents
. Finally, the adducts<-
method can
be used to manually correct adduct assignments afterwards if necessary.
normInts(featureGroups)
: Provides various methods to normalizes feature intensities for each sample analysis or of
all features within a feature group. See the Feature intensity normalization
section below.
getTICs(featureGroups)
: Obtain the total ion chromatogram/s (TICs) of the analyses.
getBPCs(featureGroups)
: Obtain the base peak chromatogram/s (BPCs) of the analyses.
plotTICs(featureGroups)
: Plots the total ion chromatogram/s (TICs) of the analyses.
plotBPCs(featureGroups)
: Plots the base peak chromatogram/s (BPCs) of the analyses.
groups
Matrix (data.table
) with intensities for each feature group (columns) per analysis (rows).
Access with groups
method.
analysisInfo,features
Analysis info and features
class associated
with this object. Access with analysisInfo
and featureTable
methods, respectively.
groupInfo
data.frame
with retention time (rts
column, in seconds) and m/z (mzs
column) for each feature group. Access with groupInfo
method.
ftindex
Matrix (data.table
) with feature indices for each feature group (columns) per analysis
(rows). Each index corresponds to the row within the feature table of the analysis (see
featureTable
).
groupQualities,groupScores
A data.table
with qualities/scores for each feature group (see the
calculatePeakQualities
method).
annotations
A data.table
with adduct annotations for each group (see the selectIons
method).
ISTDs
A data.table
with screening results for internal standards (filled in by the normInts
method).
ISTDAssignments
A list
, where each item is named by a feature group and consists of a vector with
feature group names of the internal standards assigned to it (filled in by the normInts
method).
concentrations,toxicities
A data.table
with predicted concentrations/toxicities for each feature group.
Assigned by the calculateConcs
/calculateTox
methods. Use the
concentratrions
/toxicities
methods for access.
groupAlgo,groupArgs,groupVerbose
(sets workflow) Grouping parameters that were used when this object was created. Used
by adducts<-
and selectIons
when these methods perform a re-grouping of features.
annotations,ISTDAssignments
(sets workflow) As the featureGroups
slots, but contains the data per set.
annotationsChanged
Set internally by adducts()<-
and applied as soon as reGroup=TRUE
.
The normInts
method performs normalization of feature intensities
(and areas). These values are amended in the features
slot, while the original intensities/areas are kept.
To use the normalized intensities set normalized=TRUE
to methods such as plotInt
,
generateComponentsIntClust
and as.data.table
. Please see the normalized
argument
documentation for these methods for more details.
The normInts
method supports several methods to normalize intensities/areas of features within the same
analysis. Most methods are influenced by the normalization concentration (norm_conc
in the
analysis information) set for each sample analysis. For NA
or zero values the
output will be zero. If the norm_conc
is completely absent from the analysis information, the normalization
concentration is defaulted to one.
The different normalization methods are:
featNorm="istd"
Uses internal standards (IS) for normalization. The IS are screened internally
by the screenSuspects
function. Hence, the IS specified by the standards
argument should
follow the format of a suspect list. Note that labelled elements in IS formulae should
be specified with the rcdk format, e.g. "[13]C"
for 13C, "[2]H"
for a deuterium etc.
Example IS lists are provided with the patRoonData package.
The assignment of IS to features is automatically performed, using the following criteria:
Only analyses are considered with a defined normalization concentration.
The IS must be detected in all of the analyses in which the feature was detected.
The retention time and m/z are reasonably close (ISTDRTWindow
/ISTDMZWindow
arguments).
However, additional IS candidates outside these windows will be chosen if the number of candidates is less than the
minISTDs
argument. In this case the next close(st) candidate(s) will be chosen.
Normalization of features within the same feature group always occur with the same IS. If multiple IS are assigned
to a feature then normalization occurs with the combined intensity (area), which is calculated with the function
defined by the normFunc
argument. The (combined) IS intensity is then normalized by the normalization
concentration, and finally used for feature normalization.
featNorm="tic"
Uses the Total Ion Current (TIC) to normalize intensities. The TIC is calculated by
combining all intensities with the function defined by the normFunc
argument. For this reason, you may need
to take care to perform normalization before e.g. suspect screening or other prioritization techniques. The
TIC normalized intensities are finally divided by the normalization concentration.
featNorm="conc"
Simply divides all intensities (areas) with the normalization concentration defined
for the sample.
featNorm="none"
Performs no normalization. The raw intensity values are simply copied. This is mainly
useful if you only want to do group normalization (described below).
The meaning of the normalization concentration differs for each method: for "istd"
it resembles the IS
concentration of a sample analysis, whereas for "tic"
and "conc"
it is used to normalize different
sample amounts (e.g. injection volume).
If groupNorm=TRUE
then feature intensities (areas) will be normalized by the combined values for its feature
group (again, combination occurs with normFunc
). This group normalization always occurs after
aforementioned normalization methods. Group normalization was the only method with patRoon ‘<2.1’, and
still occurs automatically if normInts
was not called when a method is executed that requests normalized
data.
The featureGroupsSet
class is applicable for sets workflows. This class is derived from featureGroups
and therefore largely follows the same user interface.
The following methods are specifically defined for sets workflows:
sets
Returns the set names for this object.
unset
Converts the object data for a specified set into a 'non-set' object (featureGroupsUnset
), which allows it to be used in 'regular' workflows. The adduct annotations for the selected set are used to convert all
feature (group) masses to ionic m/z values. The annotations persist in the converted object.
The following methods are changed or with new functionality:
adducts
, adducts<-
require the set
argument. The order of the data that is
returned/changed follows that of the annotations
slot. Furthermore, adducts<-
will perform a
re-grouping of features when its reGroup
parameter is set to TRUE
. The implications for this are
discussed below. Note that no adducts are changed until reGroup=TRUE
.
the subset operator ([
) has specific arguments to choose (feature presence in) sets. See the argument
descriptions.
as.data.table
: normalization of intensities is performed per set.
export
Only allows to export data from one set. The unset
method is used prior to exporting the
data.
overlap
and unique
allow to handle data per set. See the sets
argument description.
selectIons
Will perform a re-grouping of features. The implications of this are discussed below.
normInts
Performs normalization for each set independently.
A re-grouping of features occurs if selectIons
is called or adducts<-
is used with
reGroup=TRUE
. Afterwards, it is very likely that feature group names are changed. Since data generated later
in the workflow (e.g. annotation steps) rely on feature group names, these objects are not valid
anymore, and must be re-generated.
Rick Helmus <[email protected]> and Ricardo Cunha <[email protected]> (getTICs
and
getBPCs
functions)
Chetnik K, Petrick L, Pandey G (2020). “MetaClean: a machine learning-based classifier for reduced false positive peak detection in untargeted LC-MS metabolomics data.” Metabolomics, 16(11). doi:10.1007/s11306-020-01738-3.
groupFeatures
for generating feature groups, feature-filtering and
feature-plotting for more advanced featureGroups
methods.
Generic function to import feature groups produced by other software from files.
importFeatureGroups(path, type, ...)
importFeatureGroups(path, type, ...)
path |
The path that should be used for importing. See the algorithm specific functions for more details. |
type |
Which file type should be imported: |
... |
Further arguments passed to the selected import algorithm function. |
importFeatureGroups
is a generic function that will import feature groups from files by one of the supported algorithms. The actual
functionality is provided by algorithm specific functions such as importFeatureGroupsBrukerTASQ
and importFeatureGroupsBrukerPA
. While these
functions may be called directly, importFeatureGroups
provides a generic interface and is therefore usually preferred.
An object of a class which is derived from featureGroups
.
The featuresSet
method (for sets workflows) returns a
featureGroupsSet
object.
The featureGroups
output class and its methods and the algorithm specific functions:
importFeatureGroupsBrukerPA
, importFeatureGroupsBrukerTASQ
, importFeatureGroupsEnviMass
groupFeatures
to group features. Other import functions:
importFeatureGroupsXCMS
, importFeatureGroupsXCMS3
and
importFeatureGroupsKPIC2
.
Imports a 'bucket table' produced by Bruker ProfileAnalysis (PA)
importFeatureGroupsBrukerPA( path, feat, rtWindow = 12, mzWindow = 0.005, intWindow = 5, warn = TRUE )
importFeatureGroupsBrukerPA( path, feat, rtWindow = 12, mzWindow = 0.005, intWindow = 5, warn = TRUE )
path |
The file path to a exported 'bucket table' ‘.txt’ file from PA. |
feat |
The |
rtWindow , mzWindow , intWindow
|
Search window values for retention time (seconds), m/z (Da) and intensity used to find back features within feature groups from PA (+/- the retention/mass/intensity value of a feature). |
warn |
Warn about missing or duplicate features when relating them back from grouped features. |
This function imports data from Bruker ProfileAnalysis. This function is called when calling importFeatureGroups
with
type="brukerpa"
.
The 'bucket table' should be exported as ‘.txt’ file. Please note that this function only supports
features generated by findFeaturesBruker
and it is crucial that DataAnalysis files remain
unchanged when features are collected and the bucket table is generated. Furthermore, please note that PA does not
retain information about originating features for generated buckets. For this reason, this function tries to find
back the original features and care must be taken to correctly specify search parameters (rtWindow
,
mzWindow
, intWindow
).
An object of a class which is derived from featureGroups
.
The featuresSet
method (for sets workflows) returns a
featureGroupsSet
object.
importFeatureGroups
for more details and other algorithms.
Imports screening results from Bruker TASQ as feature groups.
importFeatureGroupsBrukerTASQ(path, analysisInfo, clusterRTWindow = 12)
importFeatureGroupsBrukerTASQ(path, analysisInfo, clusterRTWindow = 12)
path |
The file path to an Excel export of the Global results table from TASQ, converted to ‘.csv’ format. |
analysisInfo |
A |
clusterRTWindow |
This retention time window (in seconds) is used to group hits across analyses together. See also the details section. |
This function imports data from Bruker TASQ. This function is called when calling importFeatureGroups
with
type="brukertasq"
.
The feature groups across analyses are formed based on the name of suspects and their closeness in retention
time. The latter is necessary because TASQ does not necessarily perform checks on retention times and may therefore
assign a suspect to peaks with different retention times across analyses (or within a single analysis). Hence,
suspects with equal names are hierarchically clustered on their retention times (using fastcluster) to
form the feature groups. The cut-off value for this is specified by the clusterRTWindow
argument. The input
for this function is obtained by generating an Excel export of the 'global' results and subsequently converting the
file to ‘.csv’ format.
A new featureGroups
object containing converted screening results from Bruker TASQ.
This function uses estimated min/max values for retention times and dummy min/max m/z values for
conversion to features, since this information is not (readily) available. Hence, when plotting, for instance,
extracted ion chromatograms (with plotChroms
) the integrated chromatographic peak range shown is
incorrect.
This function may use suspect names to base file names used for reporting, logging etc. Therefore, it is important that these are file-compatible names.
importFeatureGroups
for more details and other algorithms.
Imports a 'profiles' produced by enviMass.
importFeatureGroupsEnviMass(path, feat, positive)
importFeatureGroupsEnviMass(path, feat, positive)
path |
The path of the enviMass project. |
feat |
The |
positive |
Whether data from positive ( |
This function imports data from enviMass. This function is called when calling importFeatureGroups
with
type="envimass"
.
This function only imports 'raw' profiles, not any results from further componentization steps
performed in enviMass. Furthermore, this functionality has only been tested with older versions of
enviMass. Finally, please note that this function only supports features imported by
importFeaturesEnviMass
(obviously, the same project should be used for both importing functions).
An object of a class which is derived from featureGroups
.
The featuresSet
method (for sets workflows) returns a
featureGroupsSet
object.
importFeatureGroups
for more details and other algorithms.
Imports grouped features from an KPIC object.
importFeatureGroupsKPIC2(picsSetGrouped, analysisInfo)
importFeatureGroupsKPIC2(picsSetGrouped, analysisInfo)
picsSetGrouped |
A grouped |
analysisInfo |
A |
An object of a class which is derived from featureGroups
.
The featuresSet
method (for sets workflows) returns a
featureGroupsSet
object.
Ji H, Zeng F, Xu Y, Lu H, Zhang Z (2017). “KPIC2: An Effective Framework for Mass Spectrometry-Based Metabolomics Using Pure Ion Chromatograms.” Analytical Chemistry, 89(14), 7631–7640. doi:10.1021/acs.analchem.7b01547.
Imports grouped features from a legacy xcmsSet
object from the xcms package.
importFeatureGroupsXCMS(xs, analysisInfo)
importFeatureGroupsXCMS(xs, analysisInfo)
xs |
An |
analysisInfo |
A |
An object of a class which is derived from featureGroups
.
The featuresSet
method (for sets workflows) returns a
featureGroupsSet
object.
Smith, C.A. and Want, E.J. and O'Maille, G. and Abagyan,R. and Siuzdak, G.: XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching and identification, Analytical Chemistry, 78:779-787 (2006)
Ralf Tautenhahn, Christoph Boettcher, Steffen Neumann: Highly sensitive feature detection for high resolution LC/MS BMC Bioinformatics, 9:504 (2008)
H. Paul Benton, Elizabeth J. Want and Timothy M. D. Ebbels Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data Bioinformatics, 26:2488 (2010)
importFeaturesXCMS3
and groupFeatures
Imports grouped features from a XCMSnExp
object from the xcms package.
importFeatureGroupsXCMS3(xdata, analysisInfo)
importFeatureGroupsXCMS3(xdata, analysisInfo)
xdata |
An |
analysisInfo |
A |
An object of a class which is derived from featureGroups
.
The featuresSet
method (for sets workflows) returns a
featureGroupsSet
object.
Smith, C.A. and Want, E.J. and O'Maille, G. and Abagyan,R. and Siuzdak, G.: XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching and identification, Analytical Chemistry, 78:779-787 (2006)
Ralf Tautenhahn, Christoph Boettcher, Steffen Neumann: Highly sensitive feature detection for high resolution LC/MS BMC Bioinformatics, 9:504 (2008)
H. Paul Benton, Elizabeth J. Want and Timothy M. D. Ebbels Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data Bioinformatics, 26:2488 (2010)
Generic function to import features produced by other software.
importFeatures(analysisInfo, type, ...)
importFeatures(analysisInfo, type, ...)
analysisInfo |
A |
type |
What type of data should be imported: |
... |
Further arguments passed to the selected import algorithm function. |
importFeatures
is a generic function that will import features by one of the supported algorithms. The actual
functionality is provided by algorithm specific functions such as importFeaturesXCMS3
and importFeaturesKPIC2
. While these
functions may be called directly, importFeatures
provides a generic interface and is therefore usually preferred.
An object of a class which is derived from features
.
The features
output class and its methods and the algorithm specific functions:
importFeaturesXCMS
, importFeaturesXCMS3
, importFeaturesKPIC2
, importFeaturesEnviMass
findFeatures
to find new features.
Imports features from a project generated by the enviMass package.
importFeaturesEnviMass(analysisInfo, enviProjPath)
importFeaturesEnviMass(analysisInfo, enviProjPath)
analysisInfo |
A |
enviProjPath |
The path of the enviMass project. |
This function imports data from enviMass. This function is called when calling importFeatures
with
type="envimass"
.
An object of a class which is derived from features
.
This functionality has only been tested with older versions of enviMass.
importFeatures
for more details and other algorithms.
Imports feature data generated by the KPIC2 package.
importFeaturesKPIC2(picsList, analysisInfo)
importFeaturesKPIC2(picsList, analysisInfo)
picsList |
A |
analysisInfo |
A |
This function imports data from KPIC2. This function is called when calling importFeatures
with
type="kpic2"
.
An object of a class which is derived from features
.
Ji H, Zeng F, Xu Y, Lu H, Zhang Z (2017). “KPIC2: An Effective Framework for Mass Spectrometry-Based Metabolomics Using Pure Ion Chromatograms.” Analytical Chemistry, 89(14), 7631–7640. doi:10.1021/acs.analchem.7b01547.
importFeatures
for more details and other algorithms.
Imports feature data generated with the legacy xcmsSet
function from the xcms package.
importFeaturesXCMS(xs, analysisInfo)
importFeaturesXCMS(xs, analysisInfo)
xs |
An |
analysisInfo |
A |
This function imports data from XCMS. This function is called when calling importFeatures
with
type="xcms"
.
An object of a class which is derived from features
.
Smith, C.A. and Want, E.J. and O'Maille, G. and Abagyan,R. and Siuzdak, G.: XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching and identification, Analytical Chemistry, 78:779-787 (2006)
Ralf Tautenhahn, Christoph Boettcher, Steffen Neumann: Highly sensitive feature detection for high resolution LC/MS BMC Bioinformatics, 9:504 (2008)
H. Paul Benton, Elizabeth J. Want and Timothy M. D. Ebbels Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data Bioinformatics, 26:2488 (2010)
importFeatures
for more details and other algorithms.
Imports feature data generated from an existing XCMSnExp
object generated by the xcms package.
importFeaturesXCMS3(xdata, analysisInfo)
importFeaturesXCMS3(xdata, analysisInfo)
xdata |
An |
analysisInfo |
A |
This function imports data from XCMS3. This function is called when calling importFeatures
with
type="xcms3"
.
An object of a class which is derived from features
.
Smith, C.A. and Want, E.J. and O'Maille, G. and Abagyan,R. and Siuzdak, G.: XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching and identification, Analytical Chemistry, 78:779-787 (2006)
Ralf Tautenhahn, Christoph Boettcher, Steffen Neumann: Highly sensitive feature detection for high resolution LC/MS BMC Bioinformatics, 9:504 (2008)
H. Paul Benton, Elizabeth J. Want and Timothy M. D. Ebbels Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data Bioinformatics, 26:2488 (2010)
importFeatures
for more details and other algorithms.
Loads, parses, verifies and curates MS library data, e.g. obtained from MassBank.
loadMSLibrary(file, algorithm, ...)
loadMSLibrary(file, algorithm, ...)
file |
A |
algorithm |
A character string describing the algorithm that should be
used: |
... |
Any parameters to be passed to the selected MS library loading algorithm. |
loadMSLibrary
is a generic function that will loads MS library data by one of the supported algorithms. The actual
functionality is provided by algorithm specific functions such as loadMSLibraryMSP
and loadMSLibraryMoNAJSON
. While these
functions may be called directly, loadMSLibrary
provides a generic interface and is therefore usually preferred.
A MSLibrary
object containing the loaded library data.
The MSLibrary
output class and its methods and the algorithm specific functions:
loadMSLibraryMSP
, loadMSLibraryMoNAJSON
This function loads, verifies and curates MS library data from MoNA ‘.json’ files.
loadMSLibraryMoNAJSON( file, prefCalcChemProps = TRUE, neutralChemProps = FALSE, potAdducts = TRUE, potAdductsLib = TRUE, absMzDev = 0.002, calcSPLASH = TRUE )
loadMSLibraryMoNAJSON( file, prefCalcChemProps = TRUE, neutralChemProps = FALSE, potAdducts = TRUE, potAdductsLib = TRUE, absMzDev = 0.002, calcSPLASH = TRUE )
file |
A |
prefCalcChemProps |
If |
neutralChemProps |
If |
potAdducts , potAdductsLib
|
If and how missing adducts (
|
absMzDev |
The maximum absolute m/z deviation when guessing missing adducts. |
calcSPLASH |
If set to |
This function uses an efficient C++
JSON loader to load MS library data. This function is called when calling loadMSLibrary
with
algorithm="json"
.
This function uses C++
with Rcpp and rapidjsonr to efficiently load and parse
JSON files from MoNA. An advantage compared to
loadMSLibraryMSP
is that this function supports loading spectral annotations.
The record field names are converted to those used in ‘.msp’ files.
The loaded data is returned in an MSLibrary
object.
Several strategies are applied to automatically verify and improve
library data. This is important, since library records may have inconsistent or erroneous data, which makes them
unsuitable in automated workflows such as compounds annotation with generateCompoundsLibrary
.
The loaded library data is post-treated as follows:
The DB#
field is renamed to DB_ID
to improve compatibility with R column names.
Synonyms (Synon
fields) are merged together, mainly to save memory usage.
Inconsistently formatted NA
data (e.g. "n/a"
, "N/A"
or empty strings) are set to
regular R NA
values.
The case of record field names are made consistent.
The Formula
and ExactMass
fields are renamed to formula
and neutralMass
,
respectively. This is for consistency with other data generated with patRoon.
character
field data is trimmed from leading/trailing whitespace.
Mass data is verified to be properly numeric, and set to NA
otherwise.
The format of formulae data is made consistent: ionic species (with or without square brackets) or converted to a regular formula format.
Chemical identifiers such as SMILES and formulae are verified and missing values are calculated if possible. See below for more details.
Shortened data in the Ion_mode
field (P/N) is converted to the long format
(POSITIVE
/NEGATIVE
).
Many different adduct flavors typically found as Precursor_type
data are converted and normalized to
the generic textual format used by patRoon (see as.adduct
).
If potAdducts!=FALSE
then missing or invalid adduct data in Precursor_type
is guessed based on
the difference between the neutral and ionic mass. If multiple adducts explain the mass difference the result is
NA
.
Missing ion m/z data (PrecursorMZ
field) is calculated from adduct data, if possible.
Missing SPLASH data is calculated with the splashR package
if calcSPLASH=TRUE
.
Chemical properties such as SMILES, InChIKey and formula in the MS library are automatically validated and calculated if missing/invalid.
The internal validation/calculation process performs the following steps:
Validation of SMILES, InChI, InChIKey and formula data (if present). Invalid
entries will be set to NA
.
If neutralChemProps=TRUE
then chemical data (SMILES, formulae etc.) is neutralized by
(de-)protonation (using the --neutralized
option of OpenBabel
). An additional column
molNeutralized
is added to mark those molecules that were neutralized. Note that neutralization requires
either SMILES or InChI data to be available.
The SMILES and InChI data are used to calculate missing or invalid SMILES,
InChI, InChIKey and formula data. If prefCalcChemProps=TRUE
then existing
InChIKey and formula data is overwritten by calculated values whenever possible.
The chemical formulae which were not calculated are verified and normalized. This process may be time
consuming, and is potentially largely avoided by setting prefCalcChemProps=TRUE
.
Neutral masses are calculated for missing values (prefCalcChemProps=FALSE
) or whenever possible
(prefCalcChemProps=TRUE
).
Note that calculation of formulae for molecules that are isotopically labelled is currently only supported for deuterium (2H) elements.
This functionality relies heavily on OpenBabel, please make sure it is installed.
Guessing adducts from neutral/ionic mass differences was inspired from MetFrag.
Wohlgemuth G, Mehta SS, Mejia RF, Neumann S, Pedrosa D, Pluskal T, Schymanski EL, Willighagen EL, Wilson M, Wishart DS, Arita M, Dorrestein PC, Bandeira N, Wang M, Schulze T, Salek RM, Steinbeck C, Nainala VC, Mistrik R, Nishioka T, Fiehn O (2016).
“SPLASH, a hashed identifier for mass spectra.”
Nature Biotechnology, 34(11), 1099–1101.
doi:10.1038/nbt.3689.
Ruttkies C, Schymanski EL, Wolf S, Hollender J, Neumann S (2016).
“MetFrag relaunched: incorporating strategies beyond in silico fragmentation.”
Journal of Cheminformatics, 8(1).
doi:10.1186/s13321-016-0115-9.
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011). “Open Babel: An open chemical toolbox.” Journal of Cheminformatics, 3(1). doi:10.1186/1758-2946-3-33.
loadMSLibrary
for more details and other algorithms.
The MSLibrary
documentation for various methods to post-process the data and
generateCompoundsLibrary
for annotation of features with the library data.
This function loads, verifies and curates MS library data from MSP files.
loadMSLibraryMSP( file, parseComments = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, potAdducts = TRUE, potAdductsLib = TRUE, absMzDev = 0.002, calcSPLASH = TRUE )
loadMSLibraryMSP( file, parseComments = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, potAdducts = TRUE, potAdductsLib = TRUE, absMzDev = 0.002, calcSPLASH = TRUE )
file |
A |
parseComments |
If |
prefCalcChemProps |
If |
neutralChemProps |
If |
potAdducts , potAdductsLib
|
If and how missing adducts (
|
absMzDev |
The maximum absolute m/z deviation when guessing missing adducts. |
calcSPLASH |
If set to |
This function uses an efficient C++
MSP loader to load MS library data. This function is called when calling loadMSLibrary
with
algorithm="msp"
.
This function uses C++
with Rcpp to efficiently load and parse MSP files, and is mainly
optimized for loading the ‘.msp’ files from MassBank EU and
MoNA. Files from other sources may also work, any feedback on this is
welcome!
The loaded data is returned in an MSLibrary
object.
Several strategies are applied to automatically verify and improve
library data. This is important, since library records may have inconsistent or erroneous data, which makes them
unsuitable in automated workflows such as compounds annotation with generateCompoundsLibrary
.
The loaded library data is post-treated as follows:
The DB#
field is renamed to DB_ID
to improve compatibility with R column names.
Synonyms (Synon
fields) are merged together, mainly to save memory usage.
Inconsistently formatted NA
data (e.g. "n/a"
, "N/A"
or empty strings) are set to
regular R NA
values.
The case of record field names are made consistent.
The Formula
and ExactMass
fields are renamed to formula
and neutralMass
,
respectively. This is for consistency with other data generated with patRoon.
character
field data is trimmed from leading/trailing whitespace.
Mass data is verified to be properly numeric, and set to NA
otherwise.
The format of formulae data is made consistent: ionic species (with or without square brackets) or converted to a regular formula format.
Chemical identifiers such as SMILES and formulae are verified and missing values are calculated if possible. See below for more details.
Shortened data in the Ion_mode
field (P/N) is converted to the long format
(POSITIVE
/NEGATIVE
).
Many different adduct flavors typically found as Precursor_type
data are converted and normalized to
the generic textual format used by patRoon (see as.adduct
).
If potAdducts!=FALSE
then missing or invalid adduct data in Precursor_type
is guessed based on
the difference between the neutral and ionic mass. If multiple adducts explain the mass difference the result is
NA
.
Missing ion m/z data (PrecursorMZ
field) is calculated from adduct data, if possible.
Missing SPLASH data is calculated with the splashR package
if calcSPLASH=TRUE
.
Chemical properties such as SMILES, InChIKey and formula in the MS library are automatically validated and calculated if missing/invalid.
The internal validation/calculation process performs the following steps:
Validation of SMILES, InChI, InChIKey and formula data (if present). Invalid
entries will be set to NA
.
If neutralChemProps=TRUE
then chemical data (SMILES, formulae etc.) is neutralized by
(de-)protonation (using the --neutralized
option of OpenBabel
). An additional column
molNeutralized
is added to mark those molecules that were neutralized. Note that neutralization requires
either SMILES or InChI data to be available.
The SMILES and InChI data are used to calculate missing or invalid SMILES,
InChI, InChIKey and formula data. If prefCalcChemProps=TRUE
then existing
InChIKey and formula data is overwritten by calculated values whenever possible.
The chemical formulae which were not calculated are verified and normalized. This process may be time
consuming, and is potentially largely avoided by setting prefCalcChemProps=TRUE
.
Neutral masses are calculated for missing values (prefCalcChemProps=FALSE
) or whenever possible
(prefCalcChemProps=TRUE
).
Note that calculation of formulae for molecules that are isotopically labelled is currently only supported for deuterium (2H) elements.
This functionality relies heavily on OpenBabel, please make sure it is installed.
The mass spectrum parser currently only supports space separated entries (MSP formerly also allows other formats).
Guessing adducts from neutral/ionic mass differences was inspired from MetFrag.
Wohlgemuth G, Mehta SS, Mejia RF, Neumann S, Pedrosa D, Pluskal T, Schymanski EL, Willighagen EL, Wilson M, Wishart DS, Arita M, Dorrestein PC, Bandeira N, Wang M, Schulze T, Salek RM, Steinbeck C, Nainala VC, Mistrik R, Nishioka T, Fiehn O (2016).
“SPLASH, a hashed identifier for mass spectra.”
Nature Biotechnology, 34(11), 1099–1101.
doi:10.1038/nbt.3689.
Ruttkies C, Schymanski EL, Wolf S, Hollender J, Neumann S (2016).
“MetFrag relaunched: incorporating strategies beyond in silico fragmentation.”
Journal of Cheminformatics, 8(1).
doi:10.1186/s13321-016-0115-9.
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011). “Open Babel: An open chemical toolbox.” Journal of Cheminformatics, 3(1). doi:10.1186/1758-2946-3-33.
loadMSLibrary
for more details and other algorithms.
The MSLibrary
documentation for various methods to post-process the data and
generateCompoundsLibrary
for annotation of features with the library data.
Perform hierarchical clustering of structure candidates based on chemical similarity and obtain overall structural information based on the maximum common structure (MCS).
makeHCluster(obj, method = "complete", ...) ## S4 method for signature 'compounds' makeHCluster( obj, method, fpType = "extended", fpSimMethod = "tanimoto", maxTreeHeight = 1, deepSplit = TRUE, minModuleSize = 1 )
makeHCluster(obj, method = "complete", ...) ## S4 method for signature 'compounds' makeHCluster( obj, method, fpType = "extended", fpSimMethod = "tanimoto", maxTreeHeight = 1, deepSplit = TRUE, minModuleSize = 1 )
obj |
The |
method |
The clustering method passed to |
... |
further arguments specified to methods. |
fpType |
The type of structural fingerprint that should be calculated. See the |
fpSimMethod |
The method for calculating similarities (i.e. not dissimilarity!). See the |
maxTreeHeight , deepSplit , minModuleSize
|
Arguments used by
|
Often many possible chemical structure candidates are found for each feature group when performing compound annotation. Therefore, it may be useful to obtain an overview of their general structural properties. One strategy is to perform hierarchical clustering based on their chemical (dis)similarity, for instance, using the Tanimoto score. The resulting clusters can then be characterized by evaluating their maximum common substructure (MCS).
makeHCluster
performs hierarchical clustering of all
structure candidates for each feature group within a
compounds
object. The resulting dendrograms are automatically
cut using the cutreeDynamicTree
function from the
dynamicTreeCut package. The returned
compoundsCluster
object can then be used, for instance, for
plotting dendrograms and MCS structures and manually re-cutting specific
clusters.
makeHCluster
returns an compoundsCluster
object.
The methodology applied here has been largely derived from ‘chemclust.R’ from the metfRag package and the package vignette of rcdk.
Guha, R. (2007). 'Chemical Informatics Functionality in R'. Journal of Statistical Software 6(18)
compoundsCluster
Initiate sets workflows from specified feature data.
makeSet(obj, ...) ## S4 method for signature 'features' makeSet(obj, ..., adducts, labels = NULL) ## S4 method for signature 'featuresSet' makeSet(obj, ...) ## S4 method for signature 'featureGroups' makeSet( obj, ..., groupAlgo, groupArgs = NULL, verbose = TRUE, adducts = NULL, labels = NULL ) ## S4 method for signature 'featureGroupsSet' makeSet(obj, ...)
makeSet(obj, ...) ## S4 method for signature 'features' makeSet(obj, ..., adducts, labels = NULL) ## S4 method for signature 'featuresSet' makeSet(obj, ...) ## S4 method for signature 'featureGroups' makeSet( obj, ..., groupAlgo, groupArgs = NULL, verbose = TRUE, adducts = NULL, labels = NULL ) ## S4 method for signature 'featureGroupsSet' makeSet(obj, ...)
obj , ...
|
|
adducts |
The adduct assignments to each set. Should either be a For the |
labels |
The labels, or set names, for each set to be created. The order should follow that of the
objects given to the |
groupAlgo |
groupAlgo The name of the feature grouping algorithm. See the |
groupArgs |
A |
verbose |
If set to |
The makeSet
method function is used to initiate a sets workflow. The features from
input objects are combined and then neutralized by replacing their m/z values by neutral monoisotopic
masses. After neutralization features measured with e.g. different ionization polarities can be grouped since
their neutral mass will be the same.
The analysis information for this object is updated with all analyses, and a set
column is added to designate the set of each analysis. Note that currently, all analyses names must be
unique across different sets.
makeSet
supports two types of input:
features
objects: makeSet
combines the input objects into a featuresSet
object,
which is then grouped in the 'usual way' with groupFeatures
.
featureGroups
objects: In this case the features from the input objects are first neutralized and
feature groups between sets are then combined with groupFeatures
.
The advantage of the featureGroups
method is that it preserves any adduct annotations already present
(e.g. as set by selectIons
or adducts<-
). Furthermore, this approach allows more advanced
workflows where the input featureGroups
are first pre-treated with e.g. filter before the sets object
is made. On the other hand, the features
method is easier, as it doesn't require intermediate feature grouping
steps and is often sufficient since adduct annotations can be made afterwards with selectIons
/adducts<-
and most filter
operations do not need to be done per individual set.
The adduct information used for feature neutralization is specified through the adducts
argument.
Alternatively, when the featureGroups
method of makeSet
is used, then the adduct annotations already
present in the input objects can also by used by setting adducts=NULL
. The adduct information is also used to
add adduct annotations to the output of makeSet
.
Either a featuresSet
object (features
method) or featureGroupsSet
object
(featureGroups
method).
Initiating a sets workflow recursively, i.e. with featuresSet
or featureGroupsSet
objects
as input, is currently not supported.
The newProject
function is used to quickly generate a processing R script. This tool allows the user to
quickly select the targeted analyses, workflow steps and configuring some of their common parameters. This function
requires to be run within a RStudio session. The resulting script is either added to
the current open file or to a new file. The analysis information will be written to a
‘.csv’ file so that it can easily be modified afterwards.
newProject(destPath = NULL)
newProject(destPath = NULL)
destPath |
Set destination path value to this value (useful for debugging). Set to |
Objects from this class contain optimization results resulting from design of experiment (DoE).
optimizedParameters(object, paramSet = NULL, DoEIteration = NULL) optimizedObject(object, paramSet = NULL) scores(object, paramSet = NULL, DoEIteration = NULL) experimentInfo(object, paramSet, DoEIteration) ## S4 method for signature 'optimizationResult' algorithm(obj) ## S4 method for signature 'optimizationResult' length(x) ## S4 method for signature 'optimizationResult' lengths(x, use.names = FALSE) ## S4 method for signature 'optimizationResult' show(object) ## S4 method for signature 'optimizationResult,missing' plot( x, paramSet, DoEIteration, paramsToPlot = NULL, maxCols = NULL, type = "contour", image = TRUE, contours = "colors", ... ) ## S4 method for signature 'optimizationResult' optimizedParameters(object, paramSet = NULL, DoEIteration = NULL) ## S4 method for signature 'optimizationResult' optimizedObject(object, paramSet = NULL) ## S4 method for signature 'optimizationResult' scores(object, paramSet = NULL, DoEIteration = NULL) ## S4 method for signature 'optimizationResult' experimentInfo(object, paramSet, DoEIteration)
optimizedParameters(object, paramSet = NULL, DoEIteration = NULL) optimizedObject(object, paramSet = NULL) scores(object, paramSet = NULL, DoEIteration = NULL) experimentInfo(object, paramSet, DoEIteration) ## S4 method for signature 'optimizationResult' algorithm(obj) ## S4 method for signature 'optimizationResult' length(x) ## S4 method for signature 'optimizationResult' lengths(x, use.names = FALSE) ## S4 method for signature 'optimizationResult' show(object) ## S4 method for signature 'optimizationResult,missing' plot( x, paramSet, DoEIteration, paramsToPlot = NULL, maxCols = NULL, type = "contour", image = TRUE, contours = "colors", ... ) ## S4 method for signature 'optimizationResult' optimizedParameters(object, paramSet = NULL, DoEIteration = NULL) ## S4 method for signature 'optimizationResult' optimizedObject(object, paramSet = NULL) ## S4 method for signature 'optimizationResult' scores(object, paramSet = NULL, DoEIteration = NULL) ## S4 method for signature 'optimizationResult' experimentInfo(object, paramSet, DoEIteration)
paramSet |
Numeric index of the parameter set (i.e. the first
parameter set gets index ‘1’). For some methods optional: if
|
DoEIteration |
Numeric index specifying the DoE iteration within the
specified |
obj , x , object
|
An |
use.names |
Ignored. |
paramsToPlot |
Which parameters relations should be plot. If |
maxCols |
Multiple parameter pairs are plotted in a grid. The maximum
number of columns can be set with this argument. Set to |
type |
The type of plots to be generated: |
image |
Passed to |
contours |
Passed to |
... |
Further arguments passed to |
Objects from this class are returned by optimizeFeatureFinding
and
optimizeFeatureGrouping
.
algorithm(optimizationResult)
: Returns the algorithm that was used for finding features.
length(optimizationResult)
: Obtain total number of experimental design iterations performed.
lengths(optimizationResult)
: Obtain number of experimental design iterations performed for each parameter set.
show(optimizationResult)
: Shows summary information for this object.
plot(x = optimizationResult, y = missing)
: Generates response plots for all or a selected
set of parameters.
optimizedParameters(optimizationResult)
: Returns parameter set yielding optimal
results. The paramSet
and DoEIteration
arguments can be
NULL
.
optimizedObject(optimizationResult)
: Returns the object (i.e. a
features
or featureGroups
object) that was
generated with optimized parameters. The paramSet
argument can be
NULL
.
scores(optimizationResult)
: Returns optimization scores. The
paramSet
and DoEIteration
arguments can be NULL
.
experimentInfo(optimizationResult)
: Returns a list
with optimization
information from an DoE iteration.
algorithm
A character specifying the algorithm that was optimized.
paramSets
A list
with detailed results from each parameter set
that was tested.
bestParamSet
Numeric index of the parameter set yielding the best response.
## Not run: # ftOpt is an optimization object. # plot contour of all parameter pairs from the first parameter set/iteration. plot(ftOpt, paramSet = 1, DoEIteration = 1) # as above, but only plot two parameter pairs plot(ftOpt, paramSet = 1, DoEIteration = 1, paramsToPlot = list(c("mzPPM", "chromFWHM"), c("chromFWHM", "chromSNR"))) # plot 3d perspective plots plot(ftOpt, paramSet = 1, DoEIteration = 1, type = "persp") ## End(Not run)
## Not run: # ftOpt is an optimization object. # plot contour of all parameter pairs from the first parameter set/iteration. plot(ftOpt, paramSet = 1, DoEIteration = 1) # as above, but only plot two parameter pairs plot(ftOpt, paramSet = 1, DoEIteration = 1, paramsToPlot = list(c("mzPPM", "chromFWHM"), c("chromFWHM", "chromSNR"))) # plot 3d perspective plots plot(ftOpt, paramSet = 1, DoEIteration = 1, type = "persp") ## End(Not run)
Holds information for all TPs for a set of parents.
parents(TPs) products(TPs) ## S4 method for signature 'transformationProducts' parents(TPs) ## S4 method for signature 'transformationProducts' products(TPs) ## S4 method for signature 'transformationProducts' length(x) ## S4 method for signature 'transformationProducts' names(x) ## S4 method for signature 'transformationProducts' show(object) ## S4 method for signature 'transformationProducts,ANY,missing,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'transformationProducts,ANY,missing' x[[i, j]] ## S4 method for signature 'transformationProducts' x$name ## S4 method for signature 'transformationProducts' as.data.table(x) ## S4 method for signature 'transformationProducts' convertToSuspects(obj, includeParents = FALSE) ## S4 method for signature 'transformationProducts' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'transformationProducts' filter(obj, properties = NULL, verbose = TRUE, negate = FALSE)
parents(TPs) products(TPs) ## S4 method for signature 'transformationProducts' parents(TPs) ## S4 method for signature 'transformationProducts' products(TPs) ## S4 method for signature 'transformationProducts' length(x) ## S4 method for signature 'transformationProducts' names(x) ## S4 method for signature 'transformationProducts' show(object) ## S4 method for signature 'transformationProducts,ANY,missing,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'transformationProducts,ANY,missing' x[[i, j]] ## S4 method for signature 'transformationProducts' x$name ## S4 method for signature 'transformationProducts' as.data.table(x) ## S4 method for signature 'transformationProducts' convertToSuspects(obj, includeParents = FALSE) ## S4 method for signature 'transformationProducts' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'transformationProducts' filter(obj, properties = NULL, verbose = TRUE, negate = FALSE)
TPs , x , obj , object
|
|
i , j
|
For |
... |
For |
drop |
ignored. |
name |
The parent name (partially matched). |
includeParents |
If |
properties |
A named |
verbose |
If set to |
negate |
If |
This class holds all generated data for transformation products for a set of parents. The class is virtual
and
derived objects are created by TP generators.
The TP data in objects from this class include a retDir
column. These are numeric
values that hint what
the the chromatographic retention order of a TP might be compared to its parent: a value of ‘-1’ means it will
elute earlier, ‘1’ it will elute later and ‘0’ that there is no significant difference or the direction is
unknown. These values are based on a typical reversed phase separation. When structural information is available
(e.g. when generateTPsBioTransformer
or generateTPsLibrary
was used to generate
the data), the retDir
values are based on calculated log P
values of the parent and its TPs.
delete
returns the object for which the specified data was removed.
filter
returns a filtered transformationProducts
object.
parents(transformationProducts)
: Accessor method for the parents
slot of a
transformationProducts
class.
products(transformationProducts)
: Accessor method for the products
slot.
length(transformationProducts)
: Obtain total number of transformation products.
names(transformationProducts)
: Obtain the names of all parents in this object.
show(transformationProducts)
: Show summary information for this object.
x[i
: Subset on parents.
x[[i
: Extracts a table with TPs for a parent.
$
: Extracts a table with TPs for a parent.
as.data.table(transformationProducts)
: Returns all TP data in a table.
convertToSuspects(transformationProducts)
: Converts this object to a suspect list that can be used as input for
screenSuspects
.
delete(transformationProducts)
: Completely deletes specified transformation product data.
filter(transformationProducts)
: Performs rule-based filtering. Useful to simplify and clean-up the data.
parents
A data.table
with metadata for all parents that have TPs in this object. Use the
parents
method for access.
products
A list
with data.table
entries with TP information for each parent. Use the
products
method for access.
The derived transformationProductsStructure
class for more methods and
generateTPs
Contains all MS (and MS/MS where available) peak lists for a featureGroups
object.
peakLists(obj, ...) averagedPeakLists(obj, ...) spectrumSimilarity(obj, ...) ## S4 method for signature 'MSPeakLists' peakLists(obj) ## S4 method for signature 'MSPeakLists' averagedPeakLists(obj) ## S4 method for signature 'MSPeakLists' analyses(obj) ## S4 method for signature 'MSPeakLists' groupNames(obj) ## S4 method for signature 'MSPeakLists' length(x) ## S4 method for signature 'MSPeakLists' show(object) ## S4 method for signature 'MSPeakLists,ANY,ANY,missing' x[i, j, ..., reAverage = FALSE, drop = TRUE] ## S4 method for signature 'MSPeakLists,ANY,ANY' x[[i, j]] ## S4 method for signature 'MSPeakLists' x$name ## S4 method for signature 'MSPeakLists' as.data.table(x, fGroups = NULL, averaged = TRUE) ## S4 method for signature 'MSPeakLists' delete(obj, i = NULL, j = NULL, k = NULL, reAverage = FALSE, ...) ## S4 method for signature 'MSPeakLists' filter( obj, absMSIntThr = NULL, absMSMSIntThr = NULL, relMSIntThr = NULL, relMSMSIntThr = NULL, topMSPeaks = NULL, topMSMSPeaks = NULL, minMSMSPeaks = NULL, isolatePrec = NULL, deIsotopeMS = FALSE, deIsotopeMSMS = FALSE, withMSMS = FALSE, annotatedBy = NULL, retainPrecursorMSMS = TRUE, reAverage = FALSE, negate = FALSE ) ## S4 method for signature 'MSPeakLists' plotSpectrum( obj, groupName, analysis = NULL, MSLevel = 1, title = NULL, specSimParams = getDefSpecSimParams(), xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'MSPeakLists' spectrumSimilarity( obj, groupName1, groupName2 = NULL, analysis1 = NULL, analysis2 = NULL, MSLevel = 1, specSimParams = getDefSpecSimParams(), NAToZero = FALSE, drop = TRUE ) ## S4 method for signature 'MSPeakListsSet' analysisInfo(obj) ## S4 method for signature 'MSPeakListsSet' show(object) ## S4 method for signature 'MSPeakListsSet,ANY,ANY,missing' x[i, j, ..., reAverage = FALSE, sets = NULL, drop = TRUE] ## S4 method for signature 'MSPeakListsSet' as.data.table(x, fGroups = NULL, averaged = TRUE) ## S4 method for signature 'MSPeakListsSet' delete(obj, ...) ## S4 method for signature 'MSPeakListsSet' filter( obj, ..., annotatedBy = NULL, retainPrecursorMSMS = TRUE, reAverage = FALSE, negate = FALSE, sets = NULL ) ## S4 method for signature 'MSPeakListsSet' plotSpectrum( obj, groupName, analysis = NULL, MSLevel = 1, title = NULL, specSimParams = getDefSpecSimParams(), xlim = NULL, ylim = NULL, perSet = TRUE, mirror = TRUE, ... ) ## S4 method for signature 'MSPeakListsSet' spectrumSimilarity( obj, groupName1, groupName2 = NULL, analysis1 = NULL, analysis2 = NULL, MSLevel = 1, specSimParams = getDefSpecSimParams(), NAToZero = FALSE, drop = TRUE ) ## S4 method for signature 'MSPeakListsSet' unset(obj, set) getDefIsolatePrecParams(...)
peakLists(obj, ...) averagedPeakLists(obj, ...) spectrumSimilarity(obj, ...) ## S4 method for signature 'MSPeakLists' peakLists(obj) ## S4 method for signature 'MSPeakLists' averagedPeakLists(obj) ## S4 method for signature 'MSPeakLists' analyses(obj) ## S4 method for signature 'MSPeakLists' groupNames(obj) ## S4 method for signature 'MSPeakLists' length(x) ## S4 method for signature 'MSPeakLists' show(object) ## S4 method for signature 'MSPeakLists,ANY,ANY,missing' x[i, j, ..., reAverage = FALSE, drop = TRUE] ## S4 method for signature 'MSPeakLists,ANY,ANY' x[[i, j]] ## S4 method for signature 'MSPeakLists' x$name ## S4 method for signature 'MSPeakLists' as.data.table(x, fGroups = NULL, averaged = TRUE) ## S4 method for signature 'MSPeakLists' delete(obj, i = NULL, j = NULL, k = NULL, reAverage = FALSE, ...) ## S4 method for signature 'MSPeakLists' filter( obj, absMSIntThr = NULL, absMSMSIntThr = NULL, relMSIntThr = NULL, relMSMSIntThr = NULL, topMSPeaks = NULL, topMSMSPeaks = NULL, minMSMSPeaks = NULL, isolatePrec = NULL, deIsotopeMS = FALSE, deIsotopeMSMS = FALSE, withMSMS = FALSE, annotatedBy = NULL, retainPrecursorMSMS = TRUE, reAverage = FALSE, negate = FALSE ) ## S4 method for signature 'MSPeakLists' plotSpectrum( obj, groupName, analysis = NULL, MSLevel = 1, title = NULL, specSimParams = getDefSpecSimParams(), xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'MSPeakLists' spectrumSimilarity( obj, groupName1, groupName2 = NULL, analysis1 = NULL, analysis2 = NULL, MSLevel = 1, specSimParams = getDefSpecSimParams(), NAToZero = FALSE, drop = TRUE ) ## S4 method for signature 'MSPeakListsSet' analysisInfo(obj) ## S4 method for signature 'MSPeakListsSet' show(object) ## S4 method for signature 'MSPeakListsSet,ANY,ANY,missing' x[i, j, ..., reAverage = FALSE, sets = NULL, drop = TRUE] ## S4 method for signature 'MSPeakListsSet' as.data.table(x, fGroups = NULL, averaged = TRUE) ## S4 method for signature 'MSPeakListsSet' delete(obj, ...) ## S4 method for signature 'MSPeakListsSet' filter( obj, ..., annotatedBy = NULL, retainPrecursorMSMS = TRUE, reAverage = FALSE, negate = FALSE, sets = NULL ) ## S4 method for signature 'MSPeakListsSet' plotSpectrum( obj, groupName, analysis = NULL, MSLevel = 1, title = NULL, specSimParams = getDefSpecSimParams(), xlim = NULL, ylim = NULL, perSet = TRUE, mirror = TRUE, ... ) ## S4 method for signature 'MSPeakListsSet' spectrumSimilarity( obj, groupName1, groupName2 = NULL, analysis1 = NULL, analysis2 = NULL, MSLevel = 1, specSimParams = getDefSpecSimParams(), NAToZero = FALSE, drop = TRUE ) ## S4 method for signature 'MSPeakListsSet' unset(obj, set) getDefIsolatePrecParams(...)
obj , x , object
|
The |
... |
Further arguments passed to For sets workflow methods: further arguments passed to the base |
i , j
|
For |
reAverage |
Set to |
drop |
If set to |
name |
The feature group name (partially matched). |
fGroups |
The |
averaged |
If |
k |
A vector with analyses ( |
absMSIntThr , absMSMSIntThr , relMSIntThr , relMSMSIntThr
|
Absolute/relative intensity threshold for MS or MS/MS peak
lists. |
topMSPeaks , topMSMSPeaks
|
Only consider this amount of MS or MS/MS peaks with highest intensity. |
minMSMSPeaks |
If the number of peaks in an MS/MS peak list (excluding the precursor peak) is lower
than this it will be completely removed. Set to |
isolatePrec |
If not |
deIsotopeMS , deIsotopeMSMS
|
Remove any isotopic peaks in MS or MS/MS peak lists. This may improve data
processing steps which do not assume the presence of isotopic peaks (e.g. MetFrag for MS/MS). Note that
|
withMSMS |
If set to |
annotatedBy |
Either a |
retainPrecursorMSMS |
If |
negate |
If |
groupName |
The name of the feature group for which a plot should be made. To compare spectra, two group names can be specified. |
analysis |
The name of the analysis for which a plot should be made. If |
MSLevel |
The MS level: ‘1’ for regular MS, ‘2’ for MSMS. |
title |
The title of the plot. If |
specSimParams |
A named |
xlim , ylim
|
Sets the plot size limits used by
|
groupName1 , groupName2
|
The names of the feature groups for which the comparison should be made. If both
arguments are specified then a comparison is made with the spectra specified by |
analysis1 , analysis2
|
The name of the analysis (analyses) for the comparison. If |
NAToZero |
Set to |
sets |
(sets workflow) A |
perSet , mirror
|
(sets workflow) If |
set |
(sets workflow) The name of the set. |
Objects for this class are returned by generateMSPeakLists
.
The getDefIsolatePrecParams
is used to create a parameter
list for isolating the precursor and its isotopes (see Isolating precursor data
).
peakLists
returns a nested list containing MS (and MS/MS where
available) peak lists per feature group and per analysis. The format is:
[[analysis]][[featureGroupName]][[MSType]][[PeakLists]]
where
MSType
is either "MS"
or "MSMS"
and PeakLists
a
data.table
containing all m/z values (mz
column) and their intensities (intensity
column). In addition, the
peak list tables may contain a cmp
column which contains an unique
alphabetical identifier to which isotopic cluster (or "compound") a mass
belongs (only supported by MS peak lists generated by Bruker tools at the
moment).
averagedPeakLists
returns a nested list of feature group
averaged peak lists in a similar format as peakLists
.
delete
returns the object for which the specified data was removed.
peakLists(MSPeakLists)
: Accessor method to obtain the MS peak lists.
averagedPeakLists(MSPeakLists)
: Accessor method to obtain the feature group averaged
MS peak lists.
analyses(MSPeakLists)
: returns a character
vector with the names of the
analyses for which data is present in this object.
groupNames(MSPeakLists)
: returns a character
vector with the names of the
feature groups for which data is present in this object.
length(MSPeakLists)
: Obtain total number of m/z values.
show(MSPeakLists)
: Shows summary information for this object.
x[i
: Subset on analyses/feature groups.
x[[i
: Extract a list with MS and MS/MS (if available) peak
lists. If the second argument (j
) is not specified the averaged peak
lists for the group specified by the first argument (i
) will be
returned.
$
: Extract group averaged MS peaklists for a feature group.
as.data.table(MSPeakLists)
: Returns all MS peak list data in a table.
delete(MSPeakLists)
: Completely deletes specified peaks from MS peak lists.
filter(MSPeakLists)
: provides post filtering of generated MS peak lists, which may further enhance quality of
subsequent workflow steps (e.g. formulae calculation and compounds identification) and/or speed up these
processes. The filters are applied to all peak lists for each analysis. These peak lists are subsequently averaged
to update group averaged peak lists. However, since version ‘1.1’, the resulting feature group lists are
not filtered afterwards.
plotSpectrum(MSPeakLists)
: Plots a spectrum using MS or MS/MS peak lists for a given feature group. Two spectra can be
compared when two feature groups are specified.
spectrumSimilarity(MSPeakLists)
: Calculates the spectral similarity between two or more spectra.
peakLists
Contains a list of all MS (and MS/MS) peak lists. Use the peakLists
method for access.
metadata
Metadata for all spectra used to generate peak lists. Follows the format of the peakLists
slot.
averagedPeakLists
A list
with averaged MS (and MS/MS) peak lists for each feature group.
avgPeakListArgs
A list
with arguments used to generate feature group averaged MS(/MS) peak lists.
origFGNames
A character
with the original input feature group names.
analysisInfo
(sets workflow) Analysis information. Use the analysisInfo
method
for access.
Formula calculation typically relies on evaluating the measured isotopic pattern
from the precursor to score candidates. Some algorithms (currently only GenForm
) penalize candidates if
mass peaks are present in MS1 spectra that do not contribute to the isotopic pattern. Since these spectra are
typically very 'noisy' due to background and co-eluting ions, an additional filtering step may be recommended prior
to formula calculation. During this precursor isolation step all mass peaks are removed that are (1) not the
precursor and (2) not likely to be an isotopologue of the precursor. To determine potential isotopic peaks the
following parameters are used:
maxIsotopes
The maximum number of isotopes to consider. For instance, a value of ‘5’ means that
M+0
(i.e. the monoisotopic peak) till M+5
is considered. All mass peaks outside this range are
removed.
mzDefectRange
A two-sized vector
specifying the minimum (can be negative) and maximum
m/z defect deviation compared to the precursor m/z defect. When chlorinated, brominated or other
compounds with strong m/z defect in their isotopologues are to be considered a higher range may be desired.
On the other hand, for natural compounds this range may be tightened. Note that the search range is propegated with
increasing distance from the precursor, e.g. the search range is doubled for M+2
, tripled for
M+3
etc.
intRange
A two-sized vector
specifying the minimum and maximum relative intensity range
compared to the precursor. For instance, c(0.001, 2)
removes all peaks that have an intensity below 0.1% or
above 200% of that of the precursor.
z
The z
value (i.e. absolute charge) to be considerd. For instance, a value of 2
would look for M+0.5
, M+1
etc. Note that the mzDefectRange
is adjusted accordingly
(e.g. halved if z=2
).
maxGap
The maximum number of missing adjacent isotopic peaks ('gaps'). If the (rounded) m/z
difference to the previous peak exceeds this value then this and all next peaks will be removed. Similar to
z
, the maximum gap is automatically adjusted for charge
.
These parameters should be in a list
that is passed to the isolatePrec
argument to filter
. The
default values can be obtained with the getDefIsolatePrecParams
function:
maxIsotopes=5
; mzDefectRange=c(-0.01, 0.01)
; intRange=c(0.001, 2)
; z=1
; maxGap=2
spectrumSimilarity
: The principles of spectral binning and cosine similarity calculations
were loosely was based on the code from SpectrumSimilarity()
function of OrgMassSpecR.
The MSPeakListsSet
class is applicable for sets workflows. This class is derived from MSPeakLists
and therefore largely follows the same user interface.
The following methods are specifically defined for sets workflows:
All the methods from base class workflowStepSet
.
unset
Converts the object data for a specified set into a 'non-set' object (MSPeakListsUnset
), which allows it to be used in 'regular' workflows. Only the MS peaks that are present in the specified set are kept.
analysisInfo
Returns the analysis info for this object.
The following methods are changed or with new functionality:
filter
and the subset operator ([
) Can be used to select data that is only present for selected
sets. The filter
method is applied for each set individually, and afterwards the results are combined again
(see generateMSPeakLists
). Note that this has important implications for e.g. relative
intensity filters (relMSIntThr
/relMSMSIntThr
), topMSPeaks
/topMSMSPeaks
and
minMSMSPeaks
. Similarly, when the annotatedBy
filter is applied, each set specific MS peak list is
filtered by the annotation results from only that set.
plotSpectrum
Is able to highlight set specific mass peaks (perSet
and mirror
arguments).
spectrumSimilarity
First calculates similarities for each spectral pair per set (e.g. all
positive mode spectra are compared and then all negative mode spectra are compared). This data is then combined
into an overall similarity value. How this combination is performed depends on the setCombineMethod
field of
the specSimParams
argument.
For spectrumSimilarity
: major contributions by Bas van de Velde for spectral binning and similarity
calculation.
This class is derived from componentsClust
and is used to store hierarchical clustering information
from intensity profiles of feature groups.
plotHeatMap(obj, ...) ## S4 method for signature 'componentsIntClust' plotHeatMap( obj, interactive = FALSE, col = NULL, margins = c(6, 2), cexCol = 1, ... ) ## S4 method for signature 'componentsIntClust' plotInt( obj, index, pch = 20, type = "b", lty = 3, col = NULL, plotArgs = NULL, linesArgs = NULL )
plotHeatMap(obj, ...) ## S4 method for signature 'componentsIntClust' plotHeatMap( obj, interactive = FALSE, col = NULL, margins = c(6, 2), cexCol = 1, ... ) ## S4 method for signature 'componentsIntClust' plotInt( obj, index, pch = 20, type = "b", lty = 3, col = NULL, plotArgs = NULL, linesArgs = NULL )
obj |
A |
... |
Further options passed to |
interactive |
If |
col |
The colour used for plotting. Set to |
margins , cexCol
|
Passed to |
index |
Numeric component/cluster index or component name. |
pch , type , lty
|
Passed to |
plotArgs , linesArgs
|
A |
Objects from this class are generated by generateComponentsIntClust
plotHeatMap
returns the same as heatmap.2
or
heatmaply
.
plotHeatMap(componentsIntClust)
: draws a heatmap using the
heatmap.2
or heatmaply
function.
plotInt(componentsIntClust)
: makes a plot for all (normalized) intensity
profiles of the feature groups within a given cluster.
clusterm
Numeric matrix with normalized feature group intensities that was used for clustering.
When the object is altered (e.g. by filtering or subsetting it), methods that need the original clustered data such as plotting methods do not work anymore and stop with an error.
componentsClust
for other relevant methods and generateComponents
Parameters that are used by method functions such as.data.table
to
aggregate predicted concentrations or toxicities.
getDefPredAggrParams(all = mean, ...)
getDefPredAggrParams(all = mean, ...)
all |
The default aggregation function for all types, e.g. |
... |
optional named arguments that override defaults. |
Multiple concentration or toxicity values may be assigned to a single feature group. To ease the interpretation and data handling, several functions aggregate these values prior their use. Aggregation occurs by the following data:
The candidate (i.e. suspect or annotation candidate). This is mainly relevant for sets workflows, where calculations among sets may yield different results for the same candidate.
The prediction type, e.g. all values that were obtained from suspect or compound annotation data.
The feature group.
The aggregation of all data first occurs by the same candidate/type/feature group, then the same type/feature group and finally for each feature group. This ensures that e.g. large numbers of data points for a prediction type do not bias results.
The candidateFunc
, typeFunc
and groupFunc
parameters specify the function that should be used to
aggregate data. Commonly, functions such mean
, min
or max
can be used here.
Note that the function does not need to handle NA
values, as these are removed in advance.
The preferType
parameters specifies the preferred prediction type. Any values from other prediction
types will be ignored unless the preferred type is not available for a feature group. Valid values are
"suspect"
(the default), "compound"
(results from compound annotation by SMILES),
"SIRIUS_FP"
(results from formula/compound annotation with SIRIUS+CSI:FingerID
) or "none"
.
These parameters should be stored inside a list
. The getDefPredAggrParams
function can be used to
generate such parameter list with defaults.
Functions to predict response factors and feature concentrations from SMILES and/or
SIRIUS+CSI:FingerID
fingerprints using the MS2Quant package.
## S4 method for signature 'featureGroups' calculateConcs(fGroups, featureAnn, areas = FALSE) ## S4 method for signature 'featureGroupsSet' calculateConcs(fGroups, featureAnn, areas = FALSE) ## S4 method for signature 'compounds' predictRespFactors( obj, fGroups, calibrants, eluent, organicModifier, pHAq, concUnit = "ugL", calibConcUnit = concUnit, updateScore = FALSE, scoreWeight = 1, parallel = TRUE ) ## S4 method for signature 'featureGroupsScreening' predictRespFactors( obj, calibrants, eluent, organicModifier, pHAq, concUnit = "ugL", calibConcUnit = concUnit ) ## S4 method for signature 'featureGroupsScreening' calculateConcs(fGroups, featureAnn = NULL, areas = FALSE) ## S4 method for signature 'featureGroupsScreeningSet' predictRespFactors(obj, calibrants, ...) ## S4 method for signature 'featureGroupsScreeningSet' calculateConcs(fGroups, featureAnn = NULL, areas = FALSE) ## S4 method for signature 'compoundsSet' predictRespFactors(obj, fGroups, calibrants, ...) ## S4 method for signature 'compoundsSIRIUS' predictRespFactors( obj, fGroups, calibrants, eluent, organicModifier, pHAq, concUnit = "ugL", calibConcUnit = concUnit, type = "FP" ) ## S4 method for signature 'formulasSet' predictRespFactors(obj, fGroups, calibrants, ...) ## S4 method for signature 'formulasSIRIUS' predictRespFactors( obj, fGroups, calibrants, eluent, organicModifier, pHAq, concUnit = "ugL", calibConcUnit = concUnit ) getQuantCalibFromScreening(fGroups, concs, areas = FALSE, average = FALSE)
## S4 method for signature 'featureGroups' calculateConcs(fGroups, featureAnn, areas = FALSE) ## S4 method for signature 'featureGroupsSet' calculateConcs(fGroups, featureAnn, areas = FALSE) ## S4 method for signature 'compounds' predictRespFactors( obj, fGroups, calibrants, eluent, organicModifier, pHAq, concUnit = "ugL", calibConcUnit = concUnit, updateScore = FALSE, scoreWeight = 1, parallel = TRUE ) ## S4 method for signature 'featureGroupsScreening' predictRespFactors( obj, calibrants, eluent, organicModifier, pHAq, concUnit = "ugL", calibConcUnit = concUnit ) ## S4 method for signature 'featureGroupsScreening' calculateConcs(fGroups, featureAnn = NULL, areas = FALSE) ## S4 method for signature 'featureGroupsScreeningSet' predictRespFactors(obj, calibrants, ...) ## S4 method for signature 'featureGroupsScreeningSet' calculateConcs(fGroups, featureAnn = NULL, areas = FALSE) ## S4 method for signature 'compoundsSet' predictRespFactors(obj, fGroups, calibrants, ...) ## S4 method for signature 'compoundsSIRIUS' predictRespFactors( obj, fGroups, calibrants, eluent, organicModifier, pHAq, concUnit = "ugL", calibConcUnit = concUnit, type = "FP" ) ## S4 method for signature 'formulasSet' predictRespFactors(obj, fGroups, calibrants, ...) ## S4 method for signature 'formulasSIRIUS' predictRespFactors( obj, fGroups, calibrants, eluent, organicModifier, pHAq, concUnit = "ugL", calibConcUnit = concUnit ) getQuantCalibFromScreening(fGroups, concs, areas = FALSE, average = FALSE)
fGroups |
For For For |
featureAnn |
A |
areas |
Set to |
obj |
The workflow object for which predictions should be performed, e.g. feature groups with screening
results ( |
calibrants |
A (sets workflow) Should be a |
eluent |
A |
organicModifier |
The organic modifier of the mobile phase: either |
pHAq |
The pH of the aqueous part of the mobile phase. |
concUnit |
The concentration unit for calculated concentrations. Can be molar based ( |
calibConcUnit |
The concentration unit used in the calibrants table. For possible values see the |
updateScore , scoreWeight
|
If |
parallel |
If set to |
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
type |
Which types of predictions should be performed: should be |
concs |
A |
average |
Set to |
The MS2Quant R package predicts concentrations from SMILES
and/or MS/MS fingerprints obtained with SIRIUS+CSI:FingerID
. The predictRespFactors
method functions
interface with this package to calculate response factors, which can then be used to calculate feature concentrations
with the calculateConcs
method function.
predictRespFactors
returns an object amended with response factors (RF_SMILES
/LRF_SIRFP
columns).
calculateConcs
returns a featureGroups
based object amended with concentrations for each
feature group (accessed with the concentrations
method).
The MS2Quant package requires calibration to convert predicted ionization efficiencies to
instrument/method specific response factors. The calibration data should be specified with the calibrants
argument to predictRespFactors
. This should be a data.frame
with intensity observations at different
concentrations for a set of calibrants. Each row specifies one intensity observation at one concentration. The
table should have the following columns:
name
The name of the calibrant. Can be freely chosen.
SMILES
The SMILES of the calibrant.
rt
The retention time of the calibrant (in seconds).
intensity
The peak intensity (or area, see the areas
argument) of the calibrant.
conc
The concentration of the calibrant (see the calibConcUnit
argument for specifying the unit).
It is recommended to include multiple calibrants (e.g. ‘>=10’) at multiple concentrations (e.g.
‘>=5’). The latter is achieved by adding multiple rows for the same calibrant (keeping the
name
/SMILES
/rt
columns constant). It is also possible to follow the column naming used by
MS2Quant (however retention times should still be in seconds!). For more details and tips see
https://github.com/kruvelab/MS2Quant.
The getQuantCalibFromScreening
function can be used to automatically generate a calibrants table from a
feature groups object with suspect screening results. Here, the idea is to perform a screening with
screenSuspects
with a suspect list that contain the calibrants, which is then used to construct the
calibrant table. It is highly recommended to add retention times for the calibrants in the suspect list to ensure
the calibrant is assigned to the correct feature. Furthermore, it is possible to simply add the calibrants to the
'regular' suspect list in case a suspect screening was already part of the workflow. The
getQuantCalibFromScreening
function still requires you to specify concentration data, which is achieved via
the concs
argument. This should be a data.frame
with a column name
corresponding to the
calibrant name (i.e. same as used by screenSuspects
above) and columns with concentration data. The
latter columns specify the concentrations of a calibrant in different replicate groups (as defined in the
analysis information). The concentration columns should be named after the
corresponding replicate group. Only those replicate groups that should be used for calibration need to be included.
Furthermore, NA
values can be used if a replicate group should be ignored for a specific calibrant.
The response factors are predicted with the predictRespFactors
generic functions,
which accepts the following input:
Suspect screening results. The SMILES data is used to predict response factors for suspect hits.
Formula annotation data obtained with "sirius"
algorithm (generateFormulasSIRIUS
). The
predictions are performed for each formula candidate using SIRIUS+CSI:FingerID
fingerprints. For this
reason, the getFingerprint
argument must be set to TRUE
when generating the formula data.
Compound annotation data obtained with the "sirius"
algorithm (generateCompoundsSIRIUS
).
The predictions are performed for each annotation candidate using its SMILES and/or
SIRIUS+CSI:FingerID
fingerprints. The predictions are performed on a per formula basis, hence,
response factors for isomers will be equal.
Compound annotation data obtained with algorithms other than "sirius"
. The response factors are predicted
from SMILES data.
When SMILES data is used then predictions of response factors are generally more accurate. However,
calculations with SIRIUS+CSI:FingerID
fingerprints are faster and only require the formula and MS/MS
spectrum, i.e. not the full structure. Hence, calculations with SMILES are mostly useful in
suspect screening workflows, or with high confidence compound annotation data, whereas MS/MS fingerprints are
suitable with unknowns.
For annotation data the calculations are performed for all candidates. This can especially lead to long
running calculations when SMILES data is used. Hence, it is strongly recommended to first
prioritize the annotation results, e.g. with the topMost
argument to the
filter method.
When response factors are predicted from SIRIUS+CSI:FingerID
fingerprints then only formula and MS/MS
spectra are used, even if compound annotations are used for input. The major difference is that with formula
annotation input all formula candidates for which a fingerprint could be generated are considered, whereas
with compound annotations only candidate formulae are considered for which also a structure could be assigned.
Hence, the formula annotation input could be more comprehensive, whereas predictions from structure annotations
could lead to more representative results as only formulae are considered for which at least one structure could be
assigned.
The calculateConcs
generic function is used to assign concentrations for each
feature using the response factors discussed in the previous section. The function takes response factors from suspect
screening results and/or feature annotation data. If multiple response factors were predicted for the same feature
group, for instance when multiple annotation candidates or suspect hits for this feature group are present, then a
concentrations is assigned for all response factors. These values can later be easily aggregated with e.g. the
as.data.table function.
The rcdk package and OpenBabel tool are used
internally to calculate molecular weights. Please make sure that OpenBabel
is installed.
MS2Quant currently only supports ‘M+H’ and ‘M+’ adducts when performing predictions with
SIRIUS:FingerID
fingerprints. Predictions for candidates with other adducts, including ‘M-H]’, are
skipped with a warning.
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011).
“Open Babel: An open chemical toolbox.”
Journal of Cheminformatics, 3(1).
doi:10.1186/1758-2946-3-33.
Guha, R. (2007). 'Chemical Informatics Functionality in R'. Journal of Statistical Software 6(18)
Sepman H, Malm L, Peets P, MacLeod M, Martin J, Breitholtz M, Kruve A (2023). “Bypassing the Identification: MS2Quant for Concentration Estimations of Chemicals Detected with Nontarget LC-HRMS from MS2 Data.” Analytical Chemistry, 95(33), 12329–12338. doi:10.1021/acs.analchem.3c01744, https://doi.org/10.1021/acs.analchem.3c01744.
Functions to predict toxicities from SMILES and/or SIRIUS+CSI:FingerID
fingerprints using the
MS2Tox package.
## S4 method for signature 'featureGroups' calculateTox(fGroups, featureAnn) ## S4 method for signature 'featureGroupsSet' calculateTox(fGroups, featureAnn) ## S4 method for signature 'compounds' predictTox( obj, LC50Mode = "static", concUnit = "ugL", updateScore = FALSE, scoreWeight = 1, parallel = TRUE ) ## S4 method for signature 'featureGroupsScreening' predictTox(obj, LC50Mode = "static", concUnit = "ugL") ## S4 method for signature 'featureGroupsScreening' calculateTox(fGroups, featureAnn = NULL) ## S4 method for signature 'featureGroupsScreeningSet' predictTox(obj, ...) ## S4 method for signature 'featureGroupsScreeningSet' calculateTox(fGroups, featureAnn = NULL) ## S4 method for signature 'compoundsSet' predictTox(obj, ...) ## S4 method for signature 'compoundsSIRIUS' predictTox(obj, type = "FP", LC50Mode = "static", concUnit = "ugL") ## S4 method for signature 'formulasSet' predictTox(obj, ...) ## S4 method for signature 'formulasSIRIUS' predictTox(obj, LC50Mode = "static", concUnit = "ugL")
## S4 method for signature 'featureGroups' calculateTox(fGroups, featureAnn) ## S4 method for signature 'featureGroupsSet' calculateTox(fGroups, featureAnn) ## S4 method for signature 'compounds' predictTox( obj, LC50Mode = "static", concUnit = "ugL", updateScore = FALSE, scoreWeight = 1, parallel = TRUE ) ## S4 method for signature 'featureGroupsScreening' predictTox(obj, LC50Mode = "static", concUnit = "ugL") ## S4 method for signature 'featureGroupsScreening' calculateTox(fGroups, featureAnn = NULL) ## S4 method for signature 'featureGroupsScreeningSet' predictTox(obj, ...) ## S4 method for signature 'featureGroupsScreeningSet' calculateTox(fGroups, featureAnn = NULL) ## S4 method for signature 'compoundsSet' predictTox(obj, ...) ## S4 method for signature 'compoundsSIRIUS' predictTox(obj, type = "FP", LC50Mode = "static", concUnit = "ugL") ## S4 method for signature 'formulasSet' predictTox(obj, ...) ## S4 method for signature 'formulasSIRIUS' predictTox(obj, LC50Mode = "static", concUnit = "ugL")
fGroups |
For For |
featureAnn |
A |
obj |
The workflow object for which predictions should be performed, e.g. feature groups with screening
results ( |
LC50Mode |
The mode used for predictions: should be |
concUnit |
The concentration unit for calculated toxicities. Can be molar based ( |
updateScore , scoreWeight
|
If |
parallel |
If set to |
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
type |
Which types of predictions should be performed: should be |
The MS2Tox R package predicts toxicities from SMILES and/or
MS/MS fingerprints obtained with SIRIUS+CSI:FingerID
. The predictTox
method functions interface with
this package to predict toxicities, which can then be assigned to feature groups with the calculateTox
method
function.
predictTox
returns an object amended with LC 50 values (LC50_SMILES
/LC50_SIRFP
columns).
calculateTox
returns a featureGroups
based object amended with toxicity values for each
feature group (accessed with the toxicities
method).
The toxicities are predicted with the predictTox
generic functions,
which accepts the following input:
Suspect screening results. The SMILES data is used to predict toxicities for suspect hits.
Formula annotation data obtained with "sirius"
algorithm (generateFormulasSIRIUS
). The
predictions are performed for each formula candidate using SIRIUS+CSI:FingerID
fingerprints. For this
reason, the getFingerprint
argument must be set to TRUE
when generating the formula data.
Compound annotation data obtained with the "sirius"
algorithm (generateCompoundsSIRIUS
).
The predictions are performed for each annotation candidate using its SMILES and/or
SIRIUS+CSI:FingerID
fingerprints. The predictions are performed on a per formula basis, hence,
toxicities for isomers will be equal.
Compound annotation data obtained with algorithms other than "sirius"
. The toxicities are predicted
from SMILES data.
When SMILES data is used then predictions of toxicities are generally more accurate. However,
calculations with SIRIUS+CSI:FingerID
fingerprints are faster and only require the formula and MS/MS
spectrum, i.e. not the full structure. Hence, calculations with SMILES are mostly useful in
suspect screening workflows, or with high confidence compound annotation data, whereas MS/MS fingerprints are
suitable with unknowns.
For annotation data the calculations are performed for all candidates. This can especially lead to long
running calculations when SMILES data is used. Hence, it is strongly recommended to first
prioritize the annotation results, e.g. with the topMost
argument to the
filter method.
When toxicities are predicted from SIRIUS+CSI:FingerID
fingerprints then only formula and MS/MS
spectra are used, even if compound annotations are used for input. The major difference is that with formula
annotation input all formula candidates for which a fingerprint could be generated are considered, whereas
with compound annotations only candidate formulae are considered for which also a structure could be assigned.
Hence, the formula annotation input could be more comprehensive, whereas predictions from structure annotations
could lead to more representative results as only formulae are considered for which at least one structure could be
assigned.
The calculateTox
generic function is used to assign toxicities for each
feature using the toxicities discussed in the previous section. The function takes toxicities from suspect
screening results and/or feature annotation data. If multiple toxicities were predicted for the same feature
group, for instance when multiple annotation candidates or suspect hits for this feature group are present, then a
toxicities is assigned for all toxicities. These values can later be easily aggregated with e.g. the
as.data.table function.
The rcdk package and OpenBabel tool are used
internally to calculate molecular weights. Please make sure that OpenBabel
is installed.
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011).
“Open Babel: An open chemical toolbox.”
Journal of Cheminformatics, 3(1).
doi:10.1186/1758-2946-3-33.
Guha, R. (2007). 'Chemical Informatics Functionality in R'. Journal of Statistical Software 6(18)
Peets P, Wang W, MacLeod M, Breitholtz M, Martin JW, Kruve A (2022). “MS2Tox Machine Learning Tool for Predicting the Ecotoxicity of Unidentified Chemicals in Water by Nontarget LC-HRMS.” Environmental Science & Technology, 56(22), 15508-15517. doi:10.1021/acs.est.2c02536, PMID: 36269851, https://doi.org/10.1021/acs.est.2c02536.
patRoon
and their currently set values.Prints all the package options of patRoon
and their currently set values.
printPackageOpts()
printPackageOpts()
Stores the spectra and metadata from the records of an MS library.
records(obj) spectra(obj) ## S4 method for signature 'MSLibrary' records(obj) ## S4 method for signature 'MSLibrary' spectra(obj) ## S4 method for signature 'MSLibrary' length(x) ## S4 method for signature 'MSLibrary' names(x) ## S4 method for signature 'MSLibrary' show(object) ## S4 method for signature 'MSLibrary,ANY,missing,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'MSLibrary,ANY,missing' x[[i, j]] ## S4 method for signature 'MSLibrary' x$name ## S4 method for signature 'MSLibrary' as.data.table(x) ## S4 method for signature 'MSLibrary' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'MSLibrary' filter( obj, properties = NULL, massRange = NULL, mzRangeSpec = NULL, relMinIntensity = NULL, topMost = NULL, onlyAnnotated = FALSE, negate = FALSE ) ## S4 method for signature 'MSLibrary' convertToSuspects( obj, adduct, spectrumType = "MS2", avgSpecParams = getDefAvgPListParams(minIntensityPre = 0, minIntensityPost = 2, topMost = 10), collapse = TRUE, suspects = NULL, prefCalcChemProps = TRUE, neutralChemProps = FALSE ) ## S4 method for signature 'MSLibrary' export(obj, type = "msp", out) ## S4 method for signature 'MSLibrary,MSLibrary' merge(x, y, ...)
records(obj) spectra(obj) ## S4 method for signature 'MSLibrary' records(obj) ## S4 method for signature 'MSLibrary' spectra(obj) ## S4 method for signature 'MSLibrary' length(x) ## S4 method for signature 'MSLibrary' names(x) ## S4 method for signature 'MSLibrary' show(object) ## S4 method for signature 'MSLibrary,ANY,missing,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'MSLibrary,ANY,missing' x[[i, j]] ## S4 method for signature 'MSLibrary' x$name ## S4 method for signature 'MSLibrary' as.data.table(x) ## S4 method for signature 'MSLibrary' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'MSLibrary' filter( obj, properties = NULL, massRange = NULL, mzRangeSpec = NULL, relMinIntensity = NULL, topMost = NULL, onlyAnnotated = FALSE, negate = FALSE ) ## S4 method for signature 'MSLibrary' convertToSuspects( obj, adduct, spectrumType = "MS2", avgSpecParams = getDefAvgPListParams(minIntensityPre = 0, minIntensityPost = 2, topMost = 10), collapse = TRUE, suspects = NULL, prefCalcChemProps = TRUE, neutralChemProps = FALSE ) ## S4 method for signature 'MSLibrary' export(obj, type = "msp", out) ## S4 method for signature 'MSLibrary,MSLibrary' merge(x, y, ...)
x , obj , object
|
|
i |
For |
... |
Unused. |
drop , j
|
ignored. |
name |
The record name (partially matched). |
properties |
A named |
massRange |
Records with a neutral mass outside this range will be removed. Should be a two-sized |
mzRangeSpec |
Similar to the |
relMinIntensity |
The minimum relative intensity (‘0-1’) of a mass peak to be kept. Set to |
topMost |
Only keep |
onlyAnnotated |
If |
negate |
If |
adduct |
An |
spectrumType |
A |
avgSpecParams |
A |
collapse |
Whether records with the same first-block InChIKey should be collapsed. See the
|
suspects |
If not |
prefCalcChemProps |
If |
neutralChemProps |
If |
type |
The export type. Currently just |
out |
The file path to the output library file. |
y |
The |
This class is used by loadMSLibrary
to store the loaded MS library data.
delete
returns the object for which the specified data was removed.
filter
returns a filtered MSLibrary
object.
convertToSuspects
return a suspect list (data.table
), which can be used with
screenSuspects
.
merge
returns a merged MSLibrary
object.
records(MSLibrary)
: Accessor method for the records
slot of an MSLibrary
class.
spectra(MSLibrary)
: Accessor method for the spectra
slot of an MSLibrary
class.
length(MSLibrary)
: Obtains the total number of records stored.
names(MSLibrary)
: Obtains the names of the stored records (DB_ID
field).
show(MSLibrary)
: Shows summary information for this object.
x[i
: Subset on records.
x[[i
: Extracts a spectrum table for a record.
$
: Extracts a spectrum table for a record.
as.data.table(MSLibrary)
: Converts all the data (spectra and metadata) to a single data.table
.
delete(MSLibrary)
: Completely deletes specified full records or spectra.
filter(MSLibrary)
: Performs rule-based filtering of records and spectra. This may be especially to improve
annotation with generateCompoundsLibrary
.
convertToSuspects(MSLibrary)
: Converts the MS library data to a suspect list, which can be used with
screenSuspects
. See the Suspect conversion
section for details.
export(MSLibrary)
: Exports the library data to a ‘.msp’ file. The export is accelerated by an C++
interface with Rcpp.
merge(x = MSLibrary, y = MSLibrary)
: Merges two MSLibrary
objects (x
and y
). The records from y
that are
unique are added to x
. Records that were already in x
are simply ignored. The
SPLASH values are used to test equality between records, hence, the
calcSPLASH
argument to loadMSLibrary
should be TRUE
.
records
A data.table
with metadata for all records. Use the records
method for access.
spectra
A list
with all (annotated) spectra. Each spectrum is stored in a data.table
. Use
the spectra
method for access.
The convertToSuspects
method converts MS library data to a suspect list, which
can be used with e.g. screenSuspects
. Furthermore, this function can also amend existing
suspect lists with spectral data.
Conversion occurs in either of the following three methods:
Direct (collapse=FALSE
and suspects=NULL
): each record is considered a suspect, and the
resulting suspect list is generated directly by converting the records metadata. The fragments_mz
column for
each suspect is constructed from the mass peaks of the corresponding record.
Collapse (collapse=TRUE
and suspects=NULL
): All records with the same first-block
InChIKey are first merged, and their spectra are averaged using the parameters from the
avgSpecParams
argument (see getDefAvgPListParams
). The suspect list is based on the merged
records, where the fragments_mz
column is constructed from the averaged spectra. This is generally a good
default, especially with large MS libraries.
Amend (suspects
is not NULL
): only those records are considered if their first-block
InChIKey is present in the suspect list. The remaining records and their spectra are then collapsed as
described for the Collapse method, and the fragments_mz
column for each suspect is set from the
averaged spectra. If a suspect is not present in the library, its fragments_mz
value will be empty. Note
that any existing fragments_mz
data will be overwritten.
Chemical properties such as SMILES,
InChIKey and formula in the input suspect list to convertToSuspects
are automatically validated and calculated if missing/invalid.
The internal validation/calculation process performs the following steps:
Validation of SMILES, InChI, InChIKey and formula data (if present). Invalid
entries will be set to NA
.
If neutralChemProps=TRUE
then chemical data (SMILES, formulae etc.) is neutralized by
(de-)protonation (using the --neutralized
option of OpenBabel
). An additional column
molNeutralized
is added to mark those molecules that were neutralized. Note that neutralization requires
either SMILES or InChI data to be available.
The SMILES and InChI data are used to calculate missing or invalid SMILES,
InChI, InChIKey and formula data. If prefCalcChemProps=TRUE
then existing
InChIKey and formula data is overwritten by calculated values whenever possible.
The chemical formulae which were not calculated are verified and normalized. This process may be time
consuming, and is potentially largely avoided by setting prefCalcChemProps=TRUE
.
Neutral masses are calculated for missing values (prefCalcChemProps=FALSE
) or whenever possible
(prefCalcChemProps=TRUE
).
Note that calculation of formulae for molecules that are isotopically labelled is currently only supported for deuterium (2H) elements.
This functionality relies heavily on OpenBabel, please make sure it is installed.
export
does not split any Synon
data that was merged when the library was loaded.
Wohlgemuth G, Mehta SS, Mejia RF, Neumann S, Pedrosa D, Pluskal T, Schymanski EL, Willighagen EL, Wilson M, Wishart DS, Arita M, Dorrestein PC, Bandeira N, Wang M, Schulze T, Salek RM, Steinbeck C, Nainala VC, Mistrik R, Nishioka T, Fiehn O (2016).
“SPLASH, a hashed identifier for mass spectra.”
Nature Biotechnology, 34(11), 1099–1101.
doi:10.1038/nbt.3689.
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011). “Open Babel: An open chemical toolbox.” Journal of Cheminformatics, 3(1). doi:10.1186/1758-2946-3-33.
Basic rule based filtering of feature groups.
replicateGroupSubtract(fGroups, rGroups, threshold = 0) ## S4 method for signature 'featureGroups' filter( obj, absMinIntensity = NULL, relMinIntensity = NULL, preAbsMinIntensity = NULL, preRelMinIntensity = NULL, absMinAnalyses = NULL, relMinAnalyses = NULL, absMinReplicates = NULL, relMinReplicates = NULL, absMinFeatures = NULL, relMinFeatures = NULL, absMinReplicateAbundance = NULL, relMinReplicateAbundance = NULL, absMinConc = NULL, relMinConc = NULL, absMaxTox = NULL, relMaxTox = NULL, absMinConcTox = NULL, relMinConcTox = NULL, maxReplicateIntRSD = NULL, blankThreshold = NULL, retentionRange = NULL, mzRange = NULL, mzDefectRange = NULL, chromWidthRange = NULL, featQualityRange = NULL, groupQualityRange = NULL, rGroups = NULL, results = NULL, removeBlanks = FALSE, removeISTDs = FALSE, checkFeaturesSession = NULL, predAggrParams = getDefPredAggrParams(), removeNA = FALSE, negate = FALSE ) ## S4 method for signature 'featureGroupsSet' filter( obj, ..., negate = FALSE, sets = NULL, absMinSets = NULL, relMinSets = NULL ) ## S4 method for signature 'featureGroups' replicateGroupSubtract(fGroups, rGroups, threshold = 0)
replicateGroupSubtract(fGroups, rGroups, threshold = 0) ## S4 method for signature 'featureGroups' filter( obj, absMinIntensity = NULL, relMinIntensity = NULL, preAbsMinIntensity = NULL, preRelMinIntensity = NULL, absMinAnalyses = NULL, relMinAnalyses = NULL, absMinReplicates = NULL, relMinReplicates = NULL, absMinFeatures = NULL, relMinFeatures = NULL, absMinReplicateAbundance = NULL, relMinReplicateAbundance = NULL, absMinConc = NULL, relMinConc = NULL, absMaxTox = NULL, relMaxTox = NULL, absMinConcTox = NULL, relMinConcTox = NULL, maxReplicateIntRSD = NULL, blankThreshold = NULL, retentionRange = NULL, mzRange = NULL, mzDefectRange = NULL, chromWidthRange = NULL, featQualityRange = NULL, groupQualityRange = NULL, rGroups = NULL, results = NULL, removeBlanks = FALSE, removeISTDs = FALSE, checkFeaturesSession = NULL, predAggrParams = getDefPredAggrParams(), removeNA = FALSE, negate = FALSE ) ## S4 method for signature 'featureGroupsSet' filter( obj, ..., negate = FALSE, sets = NULL, absMinSets = NULL, relMinSets = NULL ) ## S4 method for signature 'featureGroups' replicateGroupSubtract(fGroups, rGroups, threshold = 0)
fGroups , obj
|
|
rGroups |
A character vector of replicate groups that should be kept ( |
threshold |
Minimum relative threshold (compared to mean intensity of replicate group being subtracted) for a feature group to be not removed. When ‘0’ a feature group is always removed when present in the given replicate groups. |
absMinIntensity , relMinIntensity
|
Minimum absolute/relative intensity for features to be kept. The relative
intensity is determined from the feature with highest intensity (of
all features from all groups). Set to ‘0’ or |
preAbsMinIntensity , preRelMinIntensity
|
As |
absMinAnalyses , relMinAnalyses
|
Feature groups are only kept when they contain data for at least this (absolute
or relative) amount of analyses. Set to |
absMinReplicates , relMinReplicates
|
Feature groups are only kept when they contain data for at least this
(absolute or relative) amount of replicates. Set to |
absMinFeatures , relMinFeatures
|
Analyses are only kept when they contain at least this (absolute or relative)
amount of features. Set to |
absMinReplicateAbundance , relMinReplicateAbundance
|
Minimum absolute/relative abundance that a grouped feature
should be present within a replicate group. If this minimum is not met all features within the replicate group are
removed. Set to |
absMinConc , relMinConc
|
The minimum absolute/relative predicted concentration (calculated by
|
absMaxTox , relMaxTox
|
The maximum absolute/relative predicted toxicity (LC50) (calculated by
|
absMinConcTox , relMinConcTox
|
Like |
maxReplicateIntRSD |
Maximum relative standard deviation (RSD) of intensity values for features within a
replicate group. If the RSD is above this value all features within the replicate group are removed. Set to
|
blankThreshold |
Feature groups that are also present in blank analyses (see
analysis info) are filtered out unless their relative intensity is above this
threshold. For instance, a value of ‘5’ means that only features with an intensity five times higher than that
of the blank are kept. The relative intensity values between blanks and non-blanks are determined from the mean of
all non-zero blank intensities. Set to |
retentionRange , mzRange , mzDefectRange , chromWidthRange
|
Range of retention time (in seconds), m/z, mass
defect (defined as the decimal part of m/z values) or chromatographic peak width (in seconds), respectively.
Features outside this range will be removed. Should be a numeric vector with length of two containing the min/max
values. The maximum can be |
featQualityRange |
Used to filter features by their peak qualities/scores
(see |
groupQualityRange |
Like |
results |
Only keep feature groups that have results in the object specified by |
removeBlanks |
Set to |
removeISTDs |
If |
checkFeaturesSession |
If set then features and/or feature groups are removed that were selected for removal
(see check-GUI). The session files are typically generated with the |
predAggrParams |
Parameters to aggregate calculated concentrations/toxicities (obtained with
|
removeNA |
Set to |
negate |
If set to |
... |
For sets workflow methods: further arguments passed to the base |
sets |
(sets workflow) A |
absMinSets , relMinSets
|
(sets workflow) Feature groups are only kept when they contain data for at least this (absolute
or relative) amount of sets. Set to |
filter
performs common rule based filtering of feature groups such as blank subtraction, minimum
intensity and minimum replicate abundance. Removing of features occurs by zeroing their intensity values.
Furthermore, feature groups that are left completely empty (i.e. all intensities are zero) will be
automatically removed.
replicateGroupSubtract
removes feature groups present in a
given set of replicate groups (unless intensities are above a given
threshold). The replicate groups that are subtracted will be removed.
A filtered featureGroups
object. Feature groups that are filtered away have their intensity set
to zero. In case a feature group is not present in any of the analyses anymore it will be removed completely.
The following methods are changed or with new functionality:
filter
has specific arguments to filter by (feature presence in) sets. See the argument descriptions.
When multiple arguments are specified to filter
, multiple filters are applied in
sequence. Since some of these filters may affect each other, choosing their order correctly may be important for
effective data filtering. For instance, when an intensity filter removes features from blank analyses, a subsequent
blank filter may not adequately perform blank subtraction. Similarly, when intensity and blank filters are executed
after the replicate abundance filter it may be necessary to ensure minimum replicate abundance again as the
intensity and blank filters may have removed some features within a replicate group.
With this in mind, filters (if specified) occur in the following order:
Features/feature groups selected for removal by the session specified by checkFeaturesSession
.
Pre-Intensity filters (i.e. preAbsMinIntensity
and preRelMinIntensity
).
Chromatography and mass filters (i.e retentionRange
, mzRange
, mzDefectRange
,
chromWidthRange
, featQualityRange
and groupQualityRange
).
Replicate abundance filters (i.e. absMinReplicateAbundance
, relMinReplicateAbundance
and
maxReplicateIntRSD
).
Blank filter (i.e. blankThreshold).
Intensity filters (i.e. absMinIntensity
and relMinIntensity
).
Replicate abundance filters (2nd time, only if previous filters affected results).
General abundance filters (i.e. absMinAnalyses
, relMinAnalyses
, absMinReplicates
,
relMinReplicates
, absMinFeatures
, relMinFeatures
), absMinConc
, relMinConc
,
absMaxTox
and relMaxTox
.
Replicate group filter (i.e. rGroups
), results filter (i.e. results
) and blank
analyses / internal standard removal (i.e. removeBlanks=TRUE
/ removeISTDs=TRUE
).
If another filtering order is desired then filter
should be called multiple times with only one filter
argument at a time.
featureGroups-class
and groupFeatures
Functionality to report data produced by most workflow steps such as features, feature groups, formula and compound annotations, and TPs.
report( fGroups, MSPeakLists = NULL, formulas = NULL, compounds = NULL, compsCluster = NULL, components = NULL, TPs = NULL, settingsFile = system.file("report", "settings.yml", package = "patRoon"), path = NULL, EICParams = getDefEICParams(topMost = 1, topMostByRGroup = TRUE), specSimParams = getDefSpecSimParams(), clearPath = FALSE, openReport = TRUE, parallel = TRUE, overrideSettings = list() ) ## S4 method for signature 'featureGroups' report( fGroups, MSPeakLists = NULL, formulas = NULL, compounds = NULL, compsCluster = NULL, components = NULL, TPs = NULL, settingsFile = system.file("report", "settings.yml", package = "patRoon"), path = NULL, EICParams = getDefEICParams(topMost = 1, topMostByRGroup = TRUE), specSimParams = getDefSpecSimParams(), clearPath = FALSE, openReport = TRUE, parallel = TRUE, overrideSettings = list() ) genReportSettingsFile(out = "report.yml", baseFrom = NULL)
report( fGroups, MSPeakLists = NULL, formulas = NULL, compounds = NULL, compsCluster = NULL, components = NULL, TPs = NULL, settingsFile = system.file("report", "settings.yml", package = "patRoon"), path = NULL, EICParams = getDefEICParams(topMost = 1, topMostByRGroup = TRUE), specSimParams = getDefSpecSimParams(), clearPath = FALSE, openReport = TRUE, parallel = TRUE, overrideSettings = list() ) ## S4 method for signature 'featureGroups' report( fGroups, MSPeakLists = NULL, formulas = NULL, compounds = NULL, compsCluster = NULL, components = NULL, TPs = NULL, settingsFile = system.file("report", "settings.yml", package = "patRoon"), path = NULL, EICParams = getDefEICParams(topMost = 1, topMostByRGroup = TRUE), specSimParams = getDefSpecSimParams(), clearPath = FALSE, openReport = TRUE, parallel = TRUE, overrideSettings = list() ) genReportSettingsFile(out = "report.yml", baseFrom = NULL)
fGroups |
The |
MSPeakLists , formulas , compounds , compsCluster , components , TPs
|
Further objects ( |
settingsFile |
The path to the report settings file used for report configuration (see |
path |
The destination file path for files generated during reporting. Will be generated if needed. If
|
EICParams |
A named |
specSimParams |
A named |
clearPath |
If |
openReport |
If set to |
parallel |
If set to |
overrideSettings |
A |
out |
The output file path. |
baseFrom |
An existing report file to which the report settings should be based from. This is primarily used to update old settings files: the output settings file will be based on the old settings and amended with any missing. |
The reporting functionality is typically used at the very end of the workflow. It is used to overview the data generated during the workflow, such as features, their annotations and TP screening results.
report
reports all workflow data in an interactive HTML file. The reports include both
tabular data (e.g. retention times, annotation properties, screening results) and varios plots (e.g.
chromatograms, (annotated) mass spectra and many more). This function uses functionality from other R packages,
such as rmarkdown, flexdashboard, knitr and bslib.
The genReportSettingsFile
function generates a new template ‘YAML’ file to configure report
settings (see the next section).
The report generation can be customized with a variety of settings that are read from a
‘YAML’ file. This is especially useful if you want to change more advanced settings or want to add or remove
the parts that are reported The report settings file is specified through the settingsFile
argument. If not
specified then default settings will be used. To ease creation of a new template settings file, the
genReportSettingsFile
function can be used.
The following settings are currently available:
General
format
: the report format. Currently this can only be "html"
.
path
: the destination path (ignored if the path
argument is specified).
keepUnusedPlots
: the number of days that unused plot files are kept (see Plot file caching
).
selfContained
: If true
then the output ‘report.html’ embeds all graphics and script
dependencies. Otherwise these files are read from the report_files/
directory. Self-contained reports are
easily shared, since only the ‘report.html’ needs to be copied. However, they may be slower to generate and
render, especially when the report contains a lot of data.
noDate
Set to true
to omit the date from the report. Mainly used for internal purposes.
summary
: defines the plots on the summary page: chord
, venn
and/or upset
.
features
retMin
: if true
then retention times are reported in minutes.
chromatograms
large
: inclusion of large chromatograms (used in feature group table and TP parent chromatogram
view).
small
: inclusion of small chromatograms (feature group table).
features
: inclusion of chromatograms for individual features (features view). Set to all
to also include plots for analyses in which a feature was not found (or removed afterwards).
intMax
: Method to determine the maximum intensity plot range: eic
or feature
.
Sets the intMax
argument to plotChroms
.
intensityPlots
: inclusion of intensity trend plots.
MSPeakLists
spectra
: inclusion of MS and MS/MS spectra (not annotated).
formulas
include
: whether formula results are reported (formula view). If false
then the input
formulas
object is still used to amend e.g. compound annotated spectra.
normalizeScores
, exclNormScores
: controls score normalization, sets the equally named
arguments to e.g. plotScores
.
topMost
only report this number of top ranked candidates. This number can be lowered to speed-up
report generation.
compounds
normalizeScores
, exclNormScores
, topMost
: same as formulas
, see above.
TPs
internalStandards
graph
: inclusion of internal standard network plot
(plotGraph
).
When a new report is generated the plot files are stored inside the report_files
sub-directory inside the destination path of the report. The plot files are kept so they can be reused to speed-up
re-creation of reports (e.g. with different report settings). After the report is generated, any unused plot
files are removed unless they were recently created (controlled by the keepUnusedPlots
setting, see previous
section). The clearPath
argument can be used to completely remove any old files.
No data will be reported for feature groups in any of the reported objects (formulas
, compounds
etc) which are not present in the input featureGroups
object (fGroups
).
The topMost
, topMostByRGroup
and onlyPresent
EIC parameters may be ignored,
e.g., when generating overview plots.
Creating MetFrag landing page URLs based on code from
MetFamily R package.
Yihui Xie (2015) Dynamic Documents with R and knitr. 2nd edition. Chapman and Hall/CRC. ISBN 978-1498716963
Yihui Xie (2014) knitr: A Comprehensive Tool for Reproducible Research in R. In Victoria Stodden, Friedrich Leisch and Roger D. Peng, editors, Implementing Reproducible Computational Research. Chapman and Hall/CRC. ISBN 978-1466561595
Functionality to report data produced by most workflow steps such as features, feature groups, calculated chemical formulae and tentatively identified compounds. This is the legacy interface, for the updated interface see reporting.
reportCSV( fGroups, path = "report", reportFeatures = FALSE, formulas = NULL, formulasNormalizeScores = "max", formulasExclNormScores = NULL, compounds = NULL, compoundsNormalizeScores = "max", compoundsExclNormScores = c("score", "individualMoNAScore", "annoTypeCount", "annotHitCount", "libMatch"), compsCluster = NULL, components = NULL, retMin = TRUE, clearPath = FALSE ) reportPDF( fGroups, path = "report", reportFGroups = TRUE, formulas = NULL, formulasTopMost = 5, formulasNormalizeScores = "max", formulasExclNormScores = NULL, reportFormulaSpectra = TRUE, compounds = NULL, compoundsNormalizeScores = "max", compoundsExclNormScores = c("score", "individualMoNAScore", "annoTypeCount", "annotHitCount", "libMatch"), compoundsOnlyUsedScorings = TRUE, compoundsTopMost = 5, compsCluster = NULL, components = NULL, MSPeakLists = NULL, retMin = TRUE, EICGrid = c(2, 1), EICParams = getDefEICParams(rRtWindow = 20, topMost = 1, topMostByRGroup = TRUE), clearPath = FALSE ) reportHTML( fGroups, path = "report", reportPlots = c("chord", "venn", "upset", "eics", "formulas"), formulas = NULL, formulasTopMost = 5, formulasNormalizeScores = "max", formulasExclNormScores = NULL, compounds = NULL, compoundsNormalizeScores = "max", compoundsExclNormScores = c("score", "individualMoNAScore", "annoTypeCount", "annotHitCount", "libMatch"), compoundsOnlyUsedScorings = TRUE, compoundsTopMost = 5, compsCluster = NULL, includeMFWebLinks = "compounds", components = NULL, interactiveHeat = FALSE, MSPeakLists = NULL, specSimParams = getDefSpecSimParams(), TPs = NULL, retMin = TRUE, EICParams = getDefEICParams(rtWindow = 20, topMost = 1, topMostByRGroup = TRUE), TPGraphStructuresMax = 25, selfContained = TRUE, optimizePng = FALSE, clearPath = FALSE, openReport = TRUE, noDate = FALSE ) ## S4 method for signature 'featureGroups' reportCSV( fGroups, path = "report", reportFeatures = FALSE, formulas = NULL, formulasNormalizeScores = "max", formulasExclNormScores = NULL, compounds = NULL, compoundsNormalizeScores = "max", compoundsExclNormScores = c("score", "individualMoNAScore", "annoTypeCount", "annotHitCount", "libMatch"), compsCluster = NULL, components = NULL, retMin = TRUE, clearPath = FALSE ) ## S4 method for signature 'featureGroups' reportPDF( fGroups, path = "report", reportFGroups = TRUE, formulas = NULL, formulasTopMost = 5, formulasNormalizeScores = "max", formulasExclNormScores = NULL, reportFormulaSpectra = TRUE, compounds = NULL, compoundsNormalizeScores = "max", compoundsExclNormScores = c("score", "individualMoNAScore", "annoTypeCount", "annotHitCount", "libMatch"), compoundsOnlyUsedScorings = TRUE, compoundsTopMost = 5, compsCluster = NULL, components = NULL, MSPeakLists = NULL, retMin = TRUE, EICGrid = c(2, 1), EICParams = getDefEICParams(), clearPath = FALSE ) ## S4 method for signature 'featureGroups' reportHTML( fGroups, path = "report", reportPlots = c("chord", "venn", "upset", "eics", "formulas"), formulas = NULL, formulasTopMost = 5, formulasNormalizeScores = "max", formulasExclNormScores = NULL, compounds = NULL, compoundsNormalizeScores = "max", compoundsExclNormScores = c("score", "individualMoNAScore", "annoTypeCount", "annotHitCount", "libMatch"), compoundsOnlyUsedScorings = TRUE, compoundsTopMost = 5, compsCluster = NULL, includeMFWebLinks = "compounds", components = NULL, interactiveHeat = FALSE, MSPeakLists = NULL, specSimParams = getDefSpecSimParams(), TPs = NULL, retMin = TRUE, EICParams = getDefEICParams(rtWindow = 20, topMost = 1, topMostByRGroup = TRUE), TPGraphStructuresMax = 25, selfContained = TRUE, optimizePng = FALSE, clearPath = FALSE, openReport = TRUE, noDate = FALSE )
reportCSV( fGroups, path = "report", reportFeatures = FALSE, formulas = NULL, formulasNormalizeScores = "max", formulasExclNormScores = NULL, compounds = NULL, compoundsNormalizeScores = "max", compoundsExclNormScores = c("score", "individualMoNAScore", "annoTypeCount", "annotHitCount", "libMatch"), compsCluster = NULL, components = NULL, retMin = TRUE, clearPath = FALSE ) reportPDF( fGroups, path = "report", reportFGroups = TRUE, formulas = NULL, formulasTopMost = 5, formulasNormalizeScores = "max", formulasExclNormScores = NULL, reportFormulaSpectra = TRUE, compounds = NULL, compoundsNormalizeScores = "max", compoundsExclNormScores = c("score", "individualMoNAScore", "annoTypeCount", "annotHitCount", "libMatch"), compoundsOnlyUsedScorings = TRUE, compoundsTopMost = 5, compsCluster = NULL, components = NULL, MSPeakLists = NULL, retMin = TRUE, EICGrid = c(2, 1), EICParams = getDefEICParams(rRtWindow = 20, topMost = 1, topMostByRGroup = TRUE), clearPath = FALSE ) reportHTML( fGroups, path = "report", reportPlots = c("chord", "venn", "upset", "eics", "formulas"), formulas = NULL, formulasTopMost = 5, formulasNormalizeScores = "max", formulasExclNormScores = NULL, compounds = NULL, compoundsNormalizeScores = "max", compoundsExclNormScores = c("score", "individualMoNAScore", "annoTypeCount", "annotHitCount", "libMatch"), compoundsOnlyUsedScorings = TRUE, compoundsTopMost = 5, compsCluster = NULL, includeMFWebLinks = "compounds", components = NULL, interactiveHeat = FALSE, MSPeakLists = NULL, specSimParams = getDefSpecSimParams(), TPs = NULL, retMin = TRUE, EICParams = getDefEICParams(rtWindow = 20, topMost = 1, topMostByRGroup = TRUE), TPGraphStructuresMax = 25, selfContained = TRUE, optimizePng = FALSE, clearPath = FALSE, openReport = TRUE, noDate = FALSE ) ## S4 method for signature 'featureGroups' reportCSV( fGroups, path = "report", reportFeatures = FALSE, formulas = NULL, formulasNormalizeScores = "max", formulasExclNormScores = NULL, compounds = NULL, compoundsNormalizeScores = "max", compoundsExclNormScores = c("score", "individualMoNAScore", "annoTypeCount", "annotHitCount", "libMatch"), compsCluster = NULL, components = NULL, retMin = TRUE, clearPath = FALSE ) ## S4 method for signature 'featureGroups' reportPDF( fGroups, path = "report", reportFGroups = TRUE, formulas = NULL, formulasTopMost = 5, formulasNormalizeScores = "max", formulasExclNormScores = NULL, reportFormulaSpectra = TRUE, compounds = NULL, compoundsNormalizeScores = "max", compoundsExclNormScores = c("score", "individualMoNAScore", "annoTypeCount", "annotHitCount", "libMatch"), compoundsOnlyUsedScorings = TRUE, compoundsTopMost = 5, compsCluster = NULL, components = NULL, MSPeakLists = NULL, retMin = TRUE, EICGrid = c(2, 1), EICParams = getDefEICParams(), clearPath = FALSE ) ## S4 method for signature 'featureGroups' reportHTML( fGroups, path = "report", reportPlots = c("chord", "venn", "upset", "eics", "formulas"), formulas = NULL, formulasTopMost = 5, formulasNormalizeScores = "max", formulasExclNormScores = NULL, compounds = NULL, compoundsNormalizeScores = "max", compoundsExclNormScores = c("score", "individualMoNAScore", "annoTypeCount", "annotHitCount", "libMatch"), compoundsOnlyUsedScorings = TRUE, compoundsTopMost = 5, compsCluster = NULL, includeMFWebLinks = "compounds", components = NULL, interactiveHeat = FALSE, MSPeakLists = NULL, specSimParams = getDefSpecSimParams(), TPs = NULL, retMin = TRUE, EICParams = getDefEICParams(rtWindow = 20, topMost = 1, topMostByRGroup = TRUE), TPGraphStructuresMax = 25, selfContained = TRUE, optimizePng = FALSE, clearPath = FALSE, openReport = TRUE, noDate = FALSE )
fGroups |
The |
path |
The destination file path for files generated during reporting. Will be generated if needed. |
reportFeatures |
If set to |
formulas , compounds , compsCluster , components
|
Further objects ( |
compoundsNormalizeScores , formulasNormalizeScores
|
A |
compoundsExclNormScores , formulasExclNormScores
|
A
For |
retMin |
If |
clearPath |
If |
reportFGroups |
If |
formulasTopMost , compoundsTopMost
|
Only this amount of top ranked candidate formulae/compounds are reported.
Lower values may significantly speed up reporting. Set to |
reportFormulaSpectra |
If |
compoundsOnlyUsedScorings |
If |
MSPeakLists |
A |
EICGrid |
An integer vector in the form |
EICParams |
A named |
reportPlots |
A character vector specifying what should be plotted. Valid options are: |
includeMFWebLinks |
A |
interactiveHeat |
If |
specSimParams |
A named |
TPs |
A |
TPGraphStructuresMax |
Maximum number of TP structures to plot in TP hierarchies, see the |
selfContained |
If |
optimizePng |
If |
openReport |
If set to |
noDate |
If |
These functions are usually called at the very end of the workflow. It is used to report various data on features and
feature groups. In addition, these functions may be used for reporting formulae and/or compounds that were generated
for the specified feature groups. Data can be reported in tabular form (i.e. ‘.csv’ files) by
reportCSV
or graphically by reportPDF
and reportHTML
. The latter functions will plot for
instance chromatograms and annotated mass spectra, which are useful to get a graphical overview of results.
All functions have a wide variety of arguments that influence the reporting process. Nevertheless, most parameters are optional and only required to be given for fine tuning. In addition, only those objects (e.g. formulae, compounds, clustering) that are desired to be reported need to be specified.
reportCSV
generates tabular data (i.e. ‘.csv’
files) for given data to be reported. This may also be useful to allow
import by other tools for post processing.
reportPDF
will report graphical data (e.g. chromatograms and mass spectra) within PDF files.
Compared to reportHTML
this function may be faster and yield smaller report files, however, its
functionality is a bit more basic and generated data is more 'scattered' around.
reportHTML
will report graphical data (e.g. chromatograms and mass spectra) and summary
information in an easy browsable HTML
file using rmarkdown, flexdashboard and knitr.
reportHTML
uses multiprocessing to parallelize
computations. Please see the parallelization section in the handbook for
more details and patRoon options for configuration
options.
Currently, reportHTML
only uses "classic"
multiprocessing, regardless of the
patRoon.MP.method option.
Any formulae and compounds for feature groups which are not present within fGroups
(i.e. because
it has been subset afterwards) will not be reported.
The topMost
, topMostByRGroup
and onlyPresent
EIC parameters may be ignored,
e.g., when generating overview plots.
Creating MetFrag landing page URLs based on code from
MetFamily R package.
Yihui Xie (2015) Dynamic Documents with R and knitr. 2nd edition. Chapman and Hall/CRC. ISBN 978-1498716963
Yihui Xie (2014) knitr: A Comprehensive Tool for Reproducible Research in R. In Victoria Stodden, Friedrich Leisch and Roger D. Peng, editors, Implementing Reproducible Computational Research. Chapman and Hall/CRC. ISBN 978-1466561595
reporting
This class derives from featureGroups
and adds suspect screening information.
screenInfo(obj) annotateSuspects( fGroups, MSPeakLists = NULL, formulas = NULL, compounds = NULL, ... ) ## S4 method for signature 'featureGroupsScreening' screenInfo(obj) ## S4 method for signature 'featureGroupsScreening' show(object) ## S4 method for signature 'featureGroupsScreening,ANY,ANY,missing' x[i, j, ..., rGroups, suspects = NULL, drop = TRUE] ## S4 method for signature 'featureGroupsScreening' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'featureGroupsScreening' as.data.table(x, ..., collapseSuspects = ",", onlyHits = FALSE) ## S4 method for signature 'featureGroupsScreening' annotateSuspects( fGroups, MSPeakLists, formulas, compounds, absMzDev = 0.005, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), checkFragments = c("mz", "formula", "compound"), formulasNormalizeScores = "max", compoundsNormalizeScores = "max", IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"), logPath = file.path("log", "ident") ) ## S4 method for signature 'featureGroupsScreening' filter( obj, ..., onlyHits = NULL, selectHitsBy = NULL, selectBestFGroups = FALSE, maxLevel = NULL, maxFormRank = NULL, maxCompRank = NULL, minAnnSimForm = NULL, minAnnSimComp = NULL, minAnnSimBoth = NULL, absMinFragMatches = NULL, relMinFragMatches = NULL, minRF = NULL, maxLC50 = NULL, negate = FALSE ) ## S4 method for signature 'featureGroupsScreeningSet' screenInfo(obj) ## S4 method for signature 'featureGroupsScreeningSet' show(object) ## S4 method for signature 'featureGroupsScreeningSet,ANY,ANY,missing' x[i, j, ..., rGroups, suspects = NULL, sets = NULL, drop = TRUE] ## S4 method for signature 'featureGroupsScreeningSet' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'featureGroupsScreeningSet' as.data.table(x, ..., collapseSuspects = ",", onlyHits = FALSE) ## S4 method for signature 'featureGroupsScreeningSet' annotateSuspects( fGroups, MSPeakLists, formulas, compounds, absMzDev = 0.005, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), checkFragments = c("mz", "formula", "compound"), formulasNormalizeScores = "max", compoundsNormalizeScores = "max", IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"), logPath = file.path("log", "ident") ) ## S4 method for signature 'featureGroupsScreeningSet' filter( obj, ..., onlyHits = NULL, selectHitsBy = NULL, selectBestFGroups = FALSE, maxLevel = NULL, maxFormRank = NULL, maxCompRank = NULL, minAnnSimForm = NULL, minAnnSimComp = NULL, minAnnSimBoth = NULL, absMinFragMatches = NULL, relMinFragMatches = NULL, minRF = NULL, maxLC50 = NULL, negate = FALSE ) ## S4 method for signature 'featureGroupsScreeningSet' unset(obj, set)
screenInfo(obj) annotateSuspects( fGroups, MSPeakLists = NULL, formulas = NULL, compounds = NULL, ... ) ## S4 method for signature 'featureGroupsScreening' screenInfo(obj) ## S4 method for signature 'featureGroupsScreening' show(object) ## S4 method for signature 'featureGroupsScreening,ANY,ANY,missing' x[i, j, ..., rGroups, suspects = NULL, drop = TRUE] ## S4 method for signature 'featureGroupsScreening' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'featureGroupsScreening' as.data.table(x, ..., collapseSuspects = ",", onlyHits = FALSE) ## S4 method for signature 'featureGroupsScreening' annotateSuspects( fGroups, MSPeakLists, formulas, compounds, absMzDev = 0.005, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), checkFragments = c("mz", "formula", "compound"), formulasNormalizeScores = "max", compoundsNormalizeScores = "max", IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"), logPath = file.path("log", "ident") ) ## S4 method for signature 'featureGroupsScreening' filter( obj, ..., onlyHits = NULL, selectHitsBy = NULL, selectBestFGroups = FALSE, maxLevel = NULL, maxFormRank = NULL, maxCompRank = NULL, minAnnSimForm = NULL, minAnnSimComp = NULL, minAnnSimBoth = NULL, absMinFragMatches = NULL, relMinFragMatches = NULL, minRF = NULL, maxLC50 = NULL, negate = FALSE ) ## S4 method for signature 'featureGroupsScreeningSet' screenInfo(obj) ## S4 method for signature 'featureGroupsScreeningSet' show(object) ## S4 method for signature 'featureGroupsScreeningSet,ANY,ANY,missing' x[i, j, ..., rGroups, suspects = NULL, sets = NULL, drop = TRUE] ## S4 method for signature 'featureGroupsScreeningSet' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'featureGroupsScreeningSet' as.data.table(x, ..., collapseSuspects = ",", onlyHits = FALSE) ## S4 method for signature 'featureGroupsScreeningSet' annotateSuspects( fGroups, MSPeakLists, formulas, compounds, absMzDev = 0.005, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), checkFragments = c("mz", "formula", "compound"), formulasNormalizeScores = "max", compoundsNormalizeScores = "max", IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"), logPath = file.path("log", "ident") ) ## S4 method for signature 'featureGroupsScreeningSet' filter( obj, ..., onlyHits = NULL, selectHitsBy = NULL, selectBestFGroups = FALSE, maxLevel = NULL, maxFormRank = NULL, maxCompRank = NULL, minAnnSimForm = NULL, minAnnSimComp = NULL, minAnnSimBoth = NULL, absMinFragMatches = NULL, relMinFragMatches = NULL, minRF = NULL, maxLC50 = NULL, negate = FALSE ) ## S4 method for signature 'featureGroupsScreeningSet' unset(obj, set)
obj , object , x , fGroups
|
The |
MSPeakLists , formulas , compounds
|
Annotation data ( |
... |
Further arguments passed to the base method. |
i , j , rGroups
|
Used for subsetting data analyses, feature groups and
replicate groups, see |
suspects |
An optional |
drop |
Ignored. |
collapseSuspects |
If a |
onlyHits |
For For
|
absMzDev |
Maximum absolute m/z deviation. |
specSimParams |
A named |
checkFragments |
Which type(s) of MS/MS fragments from workflow data should be checked to evaluate the number of
suspect fragment matches (i.e. from the |
compoundsNormalizeScores , formulasNormalizeScores
|
A
|
IDFile |
A file path to a YAML file with rules used for estimation of identification levels. See the
|
logPath |
A directory path to store logging information. If |
selectHitsBy |
Should be |
selectBestFGroups |
If |
maxLevel , maxFormRank , maxCompRank , minAnnSimForm , minAnnSimComp , minAnnSimBoth
|
Filter suspects by maximum
identification level (e.g. |
absMinFragMatches , relMinFragMatches
|
Only retain suspects with this minimum number MS/MS matches with the
fragments specified in the suspect list (i.e. |
minRF |
Filter suspect hits by the given minimum predicted response factor (as calculated by
|
maxLC50 |
Filter suspect hits by the given maximum toxicity (LC50) (as calculated by
|
negate |
If set to |
sets |
(sets workflow) A |
set |
(sets workflow) The name of the set. |
annotateSuspects
returns a featureGroupsScreening
object, which is a
featureGroups
object amended with annotation data.
filter
returns a filtered featureGroupsScreening
object.
screenInfo(featureGroupsScreening)
: Returns a table with screening information
(see screenInfo
slot).
show(featureGroupsScreening)
: Shows summary information for this object.
x[i
: Subset on analyses, feature groups and/or
suspects.
as.data.table(featureGroupsScreening)
: Obtain a summary table (a data.table
) with retention, m/z,
intensity and optionally other feature data. Furthermore, the output table will be merged with information from
screenInfo
, such as suspect names and other properties and annotation data.
annotateSuspects(featureGroupsScreening)
: Incorporates annotation data obtained during the workflow to annotate suspects
with matched known MS/MS fragments, formula/candidate ranks and automatic estimation of identification levels. See
the Suspect annotation
section for more details. The estimation of identification levels for each suspect is
logged in the log/ident
directory.
filter(featureGroupsScreening)
: Performs rule based filtering. This method builds on the comprehensive filter
functionality from the base filter,featureGroups-method
. It adds several filters to select
e.g. the best ranked suspects or those with a minimum estimated identification level. NOTE: most
filters only affect suspect hits, not feature groups. Set onlyHits=TRUE
to subsequently remove any
feature groups that lost any suspect matches due to other filter steps.
screenInfo
A (data.table
) with results from suspect screening. This table will be amended with
annotation data when annotateSuspects
is run.
MS2QuantMeta
Metadata from MS2Quant filled in by predictRespFactors
.
The annotateSuspects
method is used to annotate suspects after
screenSuspects
was used to collect suspect screening results and other workflow steps such as formula
and compound annotation steps have been completed. The annotation results, which can be acquired with the
as.data.table
and screenInfo
methods, amends the current screening data with the following columns:
formRank
,compRank
The rank of the suspect within the formula/compound annotation results.
annSimForm
,annSimComp
,annSimBoth
A similarity measure between measured and annotated
MS/MS peaks from annotation of formulae, compounds or both. The similarity is calculated as the spectral similarity
between a peaklist with (a) all MS/MS peaks and (b) only annotated peaks. Thus, a value of one means that all MS/MS
peaks were annotated. If both formula and compound annotations are available then annSimBoth
is calculated
after combining all the annotated peaks, otherwise annSimBoth
equals the available value for
annSimForm
or annSimComp
. The similarity calculation can be configured with the specSimParams
argument to annotateSuspects
. Note for annotation with generateCompoundsLibrary
results: the method
and default parameters for annSimComp
calculation slightly differs to those from the spectral similarity
calculated with compound annotation (libMatch
score), hence small differences in results are typically
observed.
maxFrags
The maximum number of MS/MS fragments that can be matched for this suspect (based on the
fragments_*
columns from the suspect list).
maxFragMatches
,maxFragMatchesRel
The absolute and relative amount of experimental MS/MS peaks
that were matched from the fragments specified in the suspect list. The value for maxFragMatchesRel
is
relative to the value for maxFrags
. The calculation of this column is influenced by the
checkFragments
argument to annotateSuspects
.
estIDLevel
Provides an estimation of the identification level, roughly following that of
(Schymanski et al. 2014). However, please note that this value is only an estimation, and manual
interpretation is still necessary to assign final identification levels. The estimation is done through a set of
rules, see the Identification level rules
section below.
Note that only columns are present if sufficient data is available for their calculation.
The estimation of identification levels is configured through a YAML file which specifies the rules for each level. The default file is shown below.
1: suspectFragments: 3 retention: 12 2a: or: - individualMoNAScore: min: 0.9 higherThanNext: .inf - libMatch: min: 0.9 higherThanNext: .inf rank: max: 1 type: compound 3a: or: - individualMoNAScore: 0.4 - libMatch: 0.4 3b: suspectFragments: 3 3c: annMSMSSim: type: compound min: 0.7 4a: annMSMSSim: type: formula min: 0.7 isoScore: min: 0.5 higherThanNext: 0.2 rank: max: 1 type: formula 4b: isoScore: min: 0.9 higherThanNext: 0.2 rank: max: 1 type: formula 5: all: yes
Most of the file should be self-explanatory. Some notes:
Each rule is either a field of suspectFragments
(minimum number of MS/MS fragments matched from
suspect list), retention
(maximum retention deviation from suspect list), rank
(the maximum
annotation rank from formula or compound annotations), all
(this level is always matched) or any of the
scorings available from the formula or compound annotations.
In case any of the rules could be applied to either formula or compound annotations, the annotation type must
be specified with the type
field (formula
or compound
).
Identification levels should start with a number and may optionally be followed by a alphabetic character. The lowest levels are checked first.
If relative=yes
then the relative scoring will be used for testing.
For suspectFragments
: if the number of fragments from the suspect list (maxFrags
column) is
less then the minimum rule value, the minimum is adjusted to the number of available fragments.
The or
and and
keywords can be used to combine multiple conditions.
A template rules file can be generated with the genIDLevelRulesFile
function, and this file can
subsequently passed to annotateSuspects
. The file format is highly flexible and (sub)levels can be added or
removed if desired. Note that the default file is currently only suitable when annotation is performed with GenForm
and MetFrag, for other algorithms it is crucial to modify the rules.
The as.data.table
method fir featureGroupsScreening
supports an
additional format where each suspect hit is reported on a separate row (enabled by setting
collapseSuspects=NULL
). In this format the suspect
properties from the screenInfo
method are merged with each suspect row. Alternatively, if suspect
collapsing is enabled (the default) then the regular as.data.table
format is used, and amended with the
names of all suspects matched to a feature group (separated by the value of the collapseSuspects
argument).
Suspect collapsing also influences how calculated feature concentrations/toxicities are reported (i.e.
obtained with calculateConcs
/calculateTox
). If these values were directly predicted for
suspects, i.e. by using predictRespFactors
/predictTox
on the feature groups
object, and suspects are not collapsed, then the calculated concentration/toxicity reported for each
suspect row is not aggregated and specific for that suspect (unless not available). Hence, this allows you to
obtain specific concentration/toxicity values for each suspect/feature group pair.
The featureGroupsScreeningSet
class is applicable for sets workflows. This class is derived from featureGroupsScreening
and therefore largely follows the same user interface.
The following methods are specifically defined for sets workflows:
All the methods from base class workflowStepSet
.
unset
Converts the object data for a specified set into a 'non-set' object (featureGroupsScreeningUnset
), which allows it to be used in 'regular' workflows. Only the screening results present in the specified set are kept.
The following methods are changed or with new functionality:
annotateSuspects
Suspect annotation is performed per set. Thus, formula/compound ranks, estimated
identification levels etc are calculated for each set. Subsequently, these results are merged in the final
screenInfo
. In addition, an overall formRank
and compRank
column is created based on the
rankings of the suspect candidate in the set consensus data. Furthermore, an overall estIDLevel
is generated
that is based on the 'best' estimated identification level among the sets data (i.e. the lowest). In case
there is a tie between sub-levels (e.g. ‘3a’ and ‘3b’), then the sub-level is stripped
(e.g. ‘3’).
filter
All filters related to estimated identification levels and formula/compound rankings are
applied to the overall set data (see above). All others are applied to set specific data: in this case candidates
are only removed if none of the set data confirms to the filter.
This class derives also from featureGroupsSet
. Please see its documentation for more relevant details
with sets workflows.
Note that the formRank
and compRank
columns are not updated when the data is subset.
filter
removes suspect hits with NA
values when any of the filters related to minimum or maximum
values are applied (unless negate=TRUE
).
Rick Helmus <[email protected]>, Emma Schymanski <[email protected]> (contributions to identification level rules), Bas van de Velde (contributions to spectral similarity calculation).
Schymanski EL, Jeon J, Gulde R, Fenner K, Ruff M, Singer HP, Hollender J (2014).
“Identifying Small Molecules via High Resolution Mass Spectrometry: Communicating Confidence.”
Environmental Science and Technology, 48(4), 2097–2098.
doi:10.1021/es5002105.
Stein SE, Scott DR (1994).
“Optimization and testing of mass spectral library search algorithms for compound identification.”
Journal of the American Society for Mass Spectrometry, 5(9), 859–866.
doi:10.1016/1044-0305(94)87009-8.
Utilities to screen for analytes with known or suspected identity.
screenSuspects( fGroups, suspects, rtWindow = 12, mzWindow = 0.005, adduct = NULL, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, onlyHits = FALSE, ... ) ## S4 method for signature 'featureGroups' screenSuspects( fGroups, suspects, rtWindow, mzWindow, adduct, skipInvalid, prefCalcChemProps, neutralChemProps, onlyHits ) ## S4 method for signature 'featureGroupsScreening' screenSuspects( fGroups, suspects, rtWindow, mzWindow, adduct, skipInvalid, onlyHits, amend = FALSE ) numericIDLevel(level) genIDLevelRulesFile(out, inLevels = NULL, exLevels = NULL) ## S4 method for signature 'featureGroupsSet' screenSuspects( fGroups, suspects, rtWindow, mzWindow, adduct, skipInvalid, prefCalcChemProps, neutralChemProps, onlyHits ) ## S4 method for signature 'featureGroupsScreeningSet' screenSuspects( fGroups, suspects, rtWindow, mzWindow, adduct, skipInvalid, prefCalcChemProps, neutralChemProps, onlyHits, amend = FALSE )
screenSuspects( fGroups, suspects, rtWindow = 12, mzWindow = 0.005, adduct = NULL, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, onlyHits = FALSE, ... ) ## S4 method for signature 'featureGroups' screenSuspects( fGroups, suspects, rtWindow, mzWindow, adduct, skipInvalid, prefCalcChemProps, neutralChemProps, onlyHits ) ## S4 method for signature 'featureGroupsScreening' screenSuspects( fGroups, suspects, rtWindow, mzWindow, adduct, skipInvalid, onlyHits, amend = FALSE ) numericIDLevel(level) genIDLevelRulesFile(out, inLevels = NULL, exLevels = NULL) ## S4 method for signature 'featureGroupsSet' screenSuspects( fGroups, suspects, rtWindow, mzWindow, adduct, skipInvalid, prefCalcChemProps, neutralChemProps, onlyHits ) ## S4 method for signature 'featureGroupsScreeningSet' screenSuspects( fGroups, suspects, rtWindow, mzWindow, adduct, skipInvalid, prefCalcChemProps, neutralChemProps, onlyHits, amend = FALSE )
fGroups |
The |
suspects |
A (sets workflow) Can also be a |
rtWindow , mzWindow
|
The retention time window (in seconds) and m/z window that will be used for matching a suspect (+/- feature data). |
adduct |
An |
skipInvalid |
If set to |
prefCalcChemProps |
If |
neutralChemProps |
If |
onlyHits |
If |
... |
Further arguments specified to the methods. |
amend |
If |
level |
The identification level to be converted. |
out |
The file path to the target file. |
inLevels , exLevels
|
A regular expression for the
identification levels to include or exclude, respectively. For instance,
|
Besides 'full non-target analysis', where compounds may be identified with little to no prior knowledge, a common strategy is to screen for compounds with known or suspected identity. This may be a generally favorable approach if possible, as it can significantly reduce the load on data interpretation.
screenSuspects
is used to perform suspect screening. The input featureGroups
object
will be screened for suspects by m/z values and optionally retention times. Afterwards, any feature groups
not matched may be kept or removed, depending whether a full non-target analysis is desired.
numericIDLevel
Extracts the numeric part of a given
identification level (e.g. "3a"
becomes ‘3’).
genIDLevelRulesFile
Generates a template YAML file that is
used to configure the rules for automatic estimation of identification
levels. This file can then be used as input for
annotateSuspects
.
screenSuspects
returns a featureGroupsScreening
object, which is a copy of the input
fGroups
object amended with additional screening information.
In a sets workflow, screenSuspects
performs suspect screening
for each set separately, and the screening results are combined afterwards. The sets
column in the
screenInfo
data marks in which sets the suspect hit was found.
the suspects
argument for screenSuspects
should be a data.frame
with the following mandatory and optional columns:
name
The suspect name. Must be file-compatible. (mandatory)
rt
The retention time (in seconds) for the suspect. If specified the suspect will only be matched if
its retention matches the experimental value (tolerance defined by the rtWindow
argument).
(optional)
neutralMass
,formula
,SMILES
,InChI
The neutral monoisotopic mass, chemical formula,
SMILES or InChI for the suspect. (data from one of these columns are mandatory in case no value from the
mz
column is available for a suspect)
mz
The ionized m/z of the suspect. (mandatory unless it can be calculated from one of
the aforementioned columns)
adduct
A character
that can be converted with as.adduct
. Can be used to
automatically calculate values for the mz
column. (mandatory unless data from the mz
column
is available, the adduct
argument is set or fGroups
has adduct annotations)
fragments_mz
,fragments_formula
One or more MS/MS fragments (specified as m/z or
formulae, respectively). Multiple values can be specified by separating them with a semicolon (;
). This data
is used by annotateSuspects
to report detected MS/MS fragments and calculate identification levels.
(optional)
How the mass of a suspect is matched with the mass of a feature depends on the available data:
If the suspect has data from the mz
column of the suspect list, then this data is matched with the
detected feature m/z.
Otherwise, if the suspect has data in the adduct
column of the suspect list, this data is used to
calculate its mz
value, which is then used like above.
In the last case, the neutral mass of the suspect is matched with the neutral mass of the feature. Hence,
either the adduct
argument needs to be specified, or the featureGroups
input object must have adduct
annotations.
Chemical properties such as SMILES, InChIKey and formula in the suspect list are automatically validated and calculated if missing/invalid.
The internal validation/calculation process performs the following steps:
Validation of SMILES, InChI, InChIKey and formula data (if present). Invalid
entries will be set to NA
.
If neutralChemProps=TRUE
then chemical data (SMILES, formulae etc.) is neutralized by
(de-)protonation (using the --neutralized
option of OpenBabel
). An additional column
molNeutralized
is added to mark those molecules that were neutralized. Note that neutralization requires
either SMILES or InChI data to be available.
The SMILES and InChI data are used to calculate missing or invalid SMILES,
InChI, InChIKey and formula data. If prefCalcChemProps=TRUE
then existing
InChIKey and formula data is overwritten by calculated values whenever possible.
The chemical formulae which were not calculated are verified and normalized. This process may be time
consuming, and is potentially largely avoided by setting prefCalcChemProps=TRUE
.
Neutral masses are calculated for missing values (prefCalcChemProps=FALSE
) or whenever possible
(prefCalcChemProps=TRUE
).
Note that calculation of formulae for molecules that are isotopically labelled is currently only supported for deuterium (2H) elements.
This functionality relies heavily on OpenBabel, please make sure it is installed.
Both screenSuspects
may use the suspect names to base file names used for reporting, logging etc.
Therefore, it is important that these are file-compatible names. For this purpose, screenSuspects
will
automatically try to convert long, non-unique and/or otherwise incompatible suspect names.
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011). “Open Babel: An open chemical toolbox.” Journal of Cheminformatics, 3(1). doi:10.1186/1758-2946-3-33.
featureGroupsScreening
With sets workflows in patRoon a complete non-target (or suspect) screening workflow is performed with sample analyses that were measured with different MS methods (typically positive and negative ionization).
The analyses files that were measured with a different method are grouped in sets. In the most typical case,
there is a "positive"
and "negative"
set, for the positively/negatively ionized data, respectively.
However, other distinctions than polarity are also possible (although currently the chromatographic method should be
the same between sets). A sets workflow is typically initiated with the makeSet
method. The handbook
contains much more details about sets workflows.
makeSet
to initiate sets workflows, workflowStepSet
, the Sets workflows
sections in other documentation pages and the patRoon handbook.
This class is derived from compounds
and contains additional
specific MetFrag data.
settings(compoundsMF) ## S4 method for signature 'compoundsMF' settings(compoundsMF)
settings(compoundsMF) ## S4 method for signature 'compoundsMF' settings(compoundsMF)
compoundsMF |
A |
Objects from this class are generated by
generateCompoundsMetFrag
settings(compoundsMF)
: Accessor method for the settings
slot.
settings
A list with all general configuration settings passed to MetFrag. Feature specific items (e.g. spectra and precursor masses) are not contained in this list.
Ruttkies C, Schymanski EL, Wolf S, Hollender J, Neumann S (2016). “MetFrag relaunched: incorporating strategies beyond in silico fragmentation.” Journal of Cheminformatics, 8(1). doi:10.1186/s13321-016-0115-9.
compounds
and generateCompoundsMetFrag
Parameters relevant for calculation of similarities between mass spectra.
getDefSpecSimParams(...)
getDefSpecSimParams(...)
... |
optional named arguments that override defaults. |
For the calculation of spectral similarities the following parameters exist:
method
The similarity method: either "cosine"
or "jaccard"
.
removePrecursor
If TRUE
then precursor peaks (i.e. the mass peak corresponding to the
feature) are removed prior to similarity calculation.
mzWeight
,intWeight
Mass and intensity weights used for cosine calculation.
absMzDev
Maximum absolute m/z deviation between mass peaks, used for binning spectra.
relMinIntensity
The minimum relative intensity for mass peaks (‘0-1’). Peaks with lower intensities
are not considered for similarity calculation. The relative intensities are called after the precursor peak is
removed when removePrecursor=TRUE
.
minPeaks
Only consider spectra that have at least this amount of peaks (after the spectrum is
filtered).
shift
If and how shifting is applied prior to similarity calculation. Valid options are: "none"
(no shifting), "precursor"
(all mass peaks of the second spectrum are shifted by the mass difference between
the precursors of both spectra) or "both"
(the spectra are first binned without shifting, and peaks still
unaligned are then shifted as is done when shift="precursor"
).
setCombinedMethod
(sets workflow) Determines how spectral similarities from different sets are combined.
Possible values are "mean"
, "min"
or "max"
, which calculates the combined value as the mean,
minimum or maximum value, respectively. NA
values (e.g. if a set does not have peak list data to
combine) are removed in advance.
These parameters are typically passed as a named list
as the specSimParams
argument to functions that
do spectral similarity calculations. The getDefSpecSimParams
function can be used to generate such parameter
list with defaults.
Holds information for all TPs for a set of parents, including chemical formulae.
## S4 method for signature 'transformationProductsFormula' plotGraph( obj, which, components = NULL, prune = TRUE, onlyCompletePaths = FALSE, width = NULL, height = NULL )
## S4 method for signature 'transformationProductsFormula' plotGraph( obj, which, components = NULL, prune = TRUE, onlyCompletePaths = FALSE, width = NULL, height = NULL )
obj |
|
which |
Either a |
components |
If specified (i.e. not |
prune |
If |
onlyCompletePaths |
If |
width , height
|
Passed to |
This (virtual) class is derived from the transformationProducts
base class, please see its
documentation for more details. Objects from this class are returned by TP generators. More
specifically, algorithms that works with chemical formulae (e.g. library_formula
), uses this class to
store their results. The methods defined for this class extend the functionality for the base
transformationProducts
class.
plotGraph
returns the result of visNetwork
.
plotGraph(transformationProductsFormula)
: Plots an interactive hierarchy graph of the transformation products. The
resulting graph can be browsed interactively and allows exploration of the different TP formation pathways.
Furthermore, results from TP componentization can be used to match the hierarchy
with screening results. The graph is rendered with visNetwork.
The base class transformationProducts
for more relevant methods and generateTPs
Holds information for all TPs for a set of parents, including structural information.
## S4 method for signature 'transformationProductsStructure' convertToMFDB(TPs, out, includeParents = FALSE) ## S4 method for signature 'transformationProductsStructure' filter( obj, ..., removeParentIsomers = FALSE, removeTPIsomers = FALSE, removeDuplicates = FALSE, minSimilarity = NULL, verbose = TRUE, negate = FALSE ) ## S4 method for signature 'transformationProductsStructure' plotGraph( obj, which, components = NULL, structuresMax = 25, prune = TRUE, onlyCompletePaths = FALSE, width = NULL, height = NULL ) ## S4 method for signature 'transformationProductsStructure' plotVenn(obj, ..., commonParents = FALSE, labels = NULL, vennArgs = NULL) ## S4 method for signature 'transformationProductsStructure' plotUpSet( obj, ..., commonParents = FALSE, labels = NULL, nsets = length(list(...)) + 1, nintersects = NA, upsetArgs = NULL ) ## S4 method for signature 'transformationProductsStructure' consensus( obj, ..., absMinAbundance = NULL, relMinAbundance = NULL, uniqueFrom = NULL, uniqueOuter = FALSE, labels = NULL )
## S4 method for signature 'transformationProductsStructure' convertToMFDB(TPs, out, includeParents = FALSE) ## S4 method for signature 'transformationProductsStructure' filter( obj, ..., removeParentIsomers = FALSE, removeTPIsomers = FALSE, removeDuplicates = FALSE, minSimilarity = NULL, verbose = TRUE, negate = FALSE ) ## S4 method for signature 'transformationProductsStructure' plotGraph( obj, which, components = NULL, structuresMax = 25, prune = TRUE, onlyCompletePaths = FALSE, width = NULL, height = NULL ) ## S4 method for signature 'transformationProductsStructure' plotVenn(obj, ..., commonParents = FALSE, labels = NULL, vennArgs = NULL) ## S4 method for signature 'transformationProductsStructure' plotUpSet( obj, ..., commonParents = FALSE, labels = NULL, nsets = length(list(...)) + 1, nintersects = NA, upsetArgs = NULL ) ## S4 method for signature 'transformationProductsStructure' consensus( obj, ..., absMinAbundance = NULL, relMinAbundance = NULL, uniqueFrom = NULL, uniqueOuter = FALSE, labels = NULL )
out |
The file name of the the output |
includeParents |
Set to |
obj , TPs
|
|
... |
For For |
removeParentIsomers |
If |
removeTPIsomers |
If |
removeDuplicates |
If |
minSimilarity |
Minimum structure similarity (‘0-1’) that a TP should have relative to its parent. This
data is only available if the |
verbose |
If set to |
negate |
If |
which |
Either a |
components |
If specified (i.e. not |
structuresMax |
An |
prune |
If |
onlyCompletePaths |
If |
width , height
|
Passed to |
commonParents |
Only consider TPs from parents that are common to all compared objects. |
labels |
A |
vennArgs |
A |
nsets , nintersects
|
See |
upsetArgs |
A list with any further arguments to be passed to
|
absMinAbundance , relMinAbundance
|
Minimum absolute or relative
(‘0-1’) abundance across objects for a result to be kept. For
instance, |
uniqueFrom |
Set this argument to only retain TPs that are unique
within one or more of the objects for which the consensus is made.
Selection is done by setting the value of |
uniqueOuter |
If |
This (virtual) class is derived from the transformationProducts
base class, please see its
documentation for more details. Objects from this class are returned by TP generators. More
specifically, algorithms that works with chemical structures (e.g. biotransformer
), uses this class to
store their results. The methods defined for this class extend the functionality for the base
transformationProducts
class.
filter
returns a filtered transformationProductsStructure
object.
plotGraph
returns the result of visNetwork
.
plotVenn
(invisibly) returns a list with the following fields:
gList
the gList
object that was returned by
the utilized VennDiagram plotting function.
areas
The total area for each plotted group.
intersectionCounts
The number of intersections between groups.
The order for the areas
and intersectionCounts
fields is the same as the parameter order
from the used plotting function (see e.g. draw.pairwise.venn
and
draw.triple.venn
).
consensus
returns a transformationProductsStructure
object that is produced by merging results
from multiple transformationProductsStructure
objects.
convertToMFDB(transformationProductsStructure)
: Exports this object as a ‘.csv’ file that can be used as a MetFrag
local
database. Any duplicate TPs (formed by different pathways or parents) will be merged based on their
InChIKey.
filter(transformationProductsStructure)
: Performs rule-based filtering. Useful to simplify and clean-up the data.
plotGraph(transformationProductsStructure)
: Plots an interactive hierarchy graph of the transformation products. The
resulting graph can be browsed interactively and allows exploration of the different TP formation pathways.
Furthermore, results from TP componentization can be used to match the hierarchy
with screening results. The graph is rendered with visNetwork.
plotVenn(transformationProductsStructure)
: plots a Venn diagram (using VennDiagram) outlining unique and shared
candidates of up to five different featureAnnotations
objects.
plotUpSet(transformationProductsStructure)
: Plots an UpSet diagram (using the upset
function)
outlining unique and shared TPs between different transformationProductsStructure
objects.
consensus(transformationProductsStructure)
: Generates a consensus from different
transformationProductsStructure
objects. Currently this removes any hierarchical data, and all TPs are
considered to originate from the same (original) parent.
The methods that compare different objects (e.g. plotVenn
and
consensus
) use the InChIKey to match TPs between objects. Moreover, the parents between objects
are matched by their name. Hence, it is crucial that the input parents to generateTPs
(i.e. the parents
argument) are named equally.
consensus
: If the retDir
values differs between matched TPs it will be set to ‘0’. If
structure similarity data is available (i.e. calcSims=TRUE
to generateTPs
) then the mean
similarity is calculated.
Conway JR, Lex A, Gehlenborg N (2017).
“UpSetR: an R package for the visualization of intersecting sets and their properties.”
Bioinformatics, 33(18), 2938-2940.
doi:10.1093/bioinformatics/btx364, http://dx.doi.org/10.1093/bioinformatics/btx364.
Lex A, Gehlenborg N, Strobelt H, Vuillemot R, Pfister H (2014).
“UpSet: Visualization of Intersecting Sets.”
IEEE Transactions on Visualization and Computer Graphics, 20(12), 1983–1992.
doi:10.1109/tvcg.2014.2346248.
The base class transformationProducts
for more relevant methods and generateTPs
Verifies if all dependencies are installed properly and instructs the user if this is not the case.
verifyDependencies()
verifyDependencies()
This function is inspired by
withr::with_options
: it can be used to
execute some code where package options are temporarily changed. This
function uses a shortened syntax, especially when changing options for
patRoon
.
withOpt(code, ..., prefix = "patRoon.")
withOpt(code, ..., prefix = "patRoon.")
code |
The code to be executed. |
... |
Named arguments with options to change. |
prefix |
A |
## Not run: # Set max parallel processes to five while performing formula calculations withOpt(MP.maxProcs = 5, { formulas <- generateFormulas(fGroups, "genform", ...) }) ## End(Not run)
## Not run: # Set max parallel processes to five while performing formula calculations withOpt(MP.maxProcs = 5, { formulas <- generateFormulas(fGroups, "genform", ...) }) ## End(Not run)
All workflow objects (e.g. featureGroups
,
compounds
, etc) are derived from this class. Objects from this
class are never created directly.
## S4 method for signature 'workflowStep' algorithm(obj) ## S4 method for signature 'workflowStep' as.data.table(x, keep.rownames = FALSE, ...) ## S4 method for signature 'workflowStep' as.data.frame(x, row.names = NULL, optional = FALSE, ...) ## S4 method for signature 'workflowStep' show(object)
## S4 method for signature 'workflowStep' algorithm(obj) ## S4 method for signature 'workflowStep' as.data.table(x, keep.rownames = FALSE, ...) ## S4 method for signature 'workflowStep' as.data.frame(x, row.names = NULL, optional = FALSE, ...) ## S4 method for signature 'workflowStep' show(object)
obj , x , object
|
An object (derived from) this class. |
keep.rownames |
Ignored. |
... |
Method specific arguments. Please see the documentation of the derived classes. |
row.names , optional
|
Ignored. |
algorithm(workflowStep)
: Returns the algorithm that was used to generate an
object.
as.data.table(workflowStep)
: Summarizes the data in this object and returns this
as a data.table
.
as.data.frame(workflowStep)
: This method simply calls as.data.table
and
converts the result to a classic a data.frame
.
show(workflowStep)
: Shows summary information for this object.
algorithm
The algorithm that was used to generate this object. Use the
algorithm
method for access.
This class is the base for many sets workflows related classes. This class is virtual, and therefore never created directly.
## S4 method for signature 'workflowStepSet' setObjects(obj) ## S4 method for signature 'workflowStepSet' sets(obj) ## S4 method for signature 'workflowStepSet' show(object)
## S4 method for signature 'workflowStepSet' setObjects(obj) ## S4 method for signature 'workflowStepSet' sets(obj) ## S4 method for signature 'workflowStepSet' show(object)
obj , object
|
An object that is derived from |
The most important purpose of this class is to hold data that is specific for a set. These set objects are
typically objects with classes from a regular non-sets workflow (e.g. components
,
compounds
), and are used by the sets workflow object to e.g. form a consensus. Since the set
objects may contain additional data, such as algorithm specific slots, it may in some cases be of interest to access
them directly with the setObjects
method (described below).
setObjects(workflowStepSet)
: Accessor for the setObjects
slot.
sets(workflowStepSet)
: Returns the names for each set in this object.
show(workflowStepSet)
: Shows summary information for this object.
setObjects
A list
with the set objects (see the Details
section). The list
is named
with the set names.