| Title: | Workflows for Mass-Spectrometry Based Non-Target Analysis |
|---|---|
| Description: | Provides an easy-to-use interface to a mass spectrometry based non-target analysis workflow. Various (open-source) tools are combined which provide algorithms for extraction and grouping of features, extraction of MS and MS/MS data, automatic formula and compound annotation and grouping related features to components. In addition, various tools are provided for e.g. data preparation and cleanup, plotting results and automatic reporting. |
| Authors: | Rick Helmus [aut, cre] (ORCID: <https://orcid.org/0000-0001-9401-3133>), Olaf Brock [ctb] (ORCID: <https://orcid.org/0000-0003-4727-8459>), Vittorio Albergamo [ctb] (ORCID: <https://orcid.org/0000-0002-5347-1362>), Andrea Brunner [ctb] (ORCID: <https://orcid.org/0000-0002-2801-1751>), Emma Schymanski [ctb] (ORCID: <https://orcid.org/0000-0001-6868-8145>), Bas van de Velde [ctb] (ORCID: <https://orcid.org/0000-0003-1292-3251>), Leon Saal [ctb] (ORCID: <https://orcid.org/0000-0002-3522-7729>) |
| Maintainer: | Rick Helmus <[email protected]> |
| License: | GPL-3 |
| Version: | 3.0.0 |
| Built: | 2026-05-22 18:45:04 UTC |
| Source: | https://github.com/rickhelmus/patRoon |
Provides an easy-to-use interface to a mass spectrometry based non-target analysis workflow. Various (open-source) tools are combined which provide algorithms for extraction and grouping of features, extraction of MS and MS/MS data, automatic formula and compound annotation and grouping related features to components. In addition, various tools are provided for e.g. data preparation and cleanup, plotting results and automatic reporting.
The following package options (see options) can be set:
patRoon.cache.mode: A character setting the current caching mode: "save" and
"load" will only save/load results to/from the cache, "both" (default) will do both and "none"
to completely disable caching. This option can be changed anytime, which might be useful, for instance, to
temporarily disable cached results before running a function.
patRoon.cache.fileName: a character specifying the name of the cache file (default is
‘cache.sqlite’).
patRoon.cache.maxEntries: a numeric specifying the maximum number of entries per cache
category (default is 100000). When this limit is exceeded, the oldest entries are automatically removed.
patRoon.MS.backends,patRoon.MS.preferIMS,patRoon.path.TDFSDK:
Options related to the raw data interface.
patRoon.threads: The number of threads to be used for parallelization. This is currently only used by
the raw data interface and when the piek algorithm is used for peak
detection.
patRoon.MP.maxProcs: The maximum number of processes that should be initiated in parallel. A good
starting point is the number of physical cores, which is the default as detected by
detectCores. This option is only used when patRoon.MP.method="classic".
patRoon.MP.method: Either "classic" or "future". The former is the default and uses
processx to execute multiple commands in parallel. When "future" the future.apply
package is used for parallelization, which is especially useful for e.g. cluster computing.
patRoon.MP.futureSched: Sets the future.scheduling function argument for
future_lapply. Only used if patRoon.MP.method="future".
patRoon.MP.logPath: The path used for logging of output from commands executed by multiprocess. Set to
FALSE to disable logging.
patRoon.path.pwiz: The path in which the ProteoWizard binaries are installed. If unset an
attempt is made to find this directory from the Windows registry and PATH environment variable.
patRoon.path.GenForm: The path to the GenForm executable. If not set (the default) the
internal GenForm binary is used. Only set if you want to override the executable.
patRoon.path.MetFragCL: The complete file path to the MetFrag CL ‘jar’ to be used by
generateCompoundsMetFrag. Example: "C:/MetFrag2.4.2-CL.jar".
patRoon.path.MetFragCompTox: The complete file path to the CompTox database ‘csv’ file. See
generateCompounds for more details.
patRoon.path.MetFragPubChemLite: The complete file path to the PubChemLite database ‘csv’ file.
See generateCompounds for more details.
patRoon.path.SIRIUS: The directory in which the SIRIUS binaries are installed. Used by all functions that interface with SIRIUS, such as generateFormulasSIRIUS
and generateCompoundsSIRIUS. Example: "C:/sirius-win64-3.5.1". Note that the location of the
binaries differs for each operating system.
patRoon.path.OpenMS: The path in which the OpenMS binaries are installed.
patRoon.path.obabel: The path in which the OpenBabel binaries are installed.
patRoon.path.BiotransFormer The full file path to the biotransformer ‘.jar’ command
line utility. This needs to be set when generateTPsBioTransformer is used. For more details see
https://bitbucket.org/djoumbou/biotransformer/src/master.
patRoon.path.limits A path to a customized limits YAML file.
Most external dependencies are provided by patRoonExt or otherwise found in the system environment
PATH variable. However, the patRoon.path.* options should be set if this fails or you want to
override the location. The verifyDependencies function can be used to assess if dependencies are
found.
Maintainer: Rick Helmus [email protected] (ORCID)
Authors:
Rick Helmus [email protected] (ORCID)
Other contributors:
Olaf Brock (ORCID) [contributor]
Vittorio Albergamo (ORCID) [contributor]
Andrea Brunner (ORCID) [contributor]
Emma Schymanski (ORCID) [contributor]
Bas van de Velde (ORCID) [contributor]
Leon Saal (ORCID) [contributor]
Useful links:
Objects from this class are used to specify adduct information in an algorithm independent way.
adduct(...) ## S4 method for signature 'adduct' show(object) ## S4 method for signature 'adduct' as.character(x, format = "generic", err = TRUE)adduct(...) ## S4 method for signature 'adduct' show(object) ## S4 method for signature 'adduct' as.character(x, format = "generic", err = TRUE)
x, object
|
An |
format |
A
|
err |
If |
... |
Any of |
show(adduct): Shows summary information for this object.
as.character(adduct): Converts an adduct object to a specified
character format.
add,subA character with one or more formulas to add/subtract.
molMultHow many times the original molecule is present in this molecule (e.g. for a dimer this would be ‘2’). Default is ‘1’.
chargeThe final charge of the adduct (default ‘1’).
as.adduct for easy creation of adduct objects
and adduct utilities for other adduct functionality.
adduct("H") # [M+H]+ adduct(sub = "H", charge = -1) # [M-H]- adduct(add = "K", sub = "H2", charge = -1) # [M+K-H2]+ adduct(add = "H3", charge = 3) # [M+H3]3+ adduct(add = "H", molMult = 2) # [2M+H]+ as.character(adduct("H")) # returns "[M+H]+"adduct("H") # [M+H]+ adduct(sub = "H", charge = -1) # [M-H]- adduct(add = "K", sub = "H2", charge = -1) # [M+K-H2]+ adduct(add = "H3", charge = 3) # [M+H3]3+ adduct(add = "H", molMult = 2) # [2M+H]+ as.character(adduct("H")) # returns "[M+H]+"
Several utility functions to work with adducts.
GenFormAdducts() MetFragAdducts() as.adduct(x, format = "generic", isPositive = NULL, charge = NULL, err = TRUE) calculateIonFormula(formula, adduct) calculateNeutralFormula(formula, adduct)GenFormAdducts() MetFragAdducts() as.adduct(x, format = "generic", isPositive = NULL, charge = NULL, err = TRUE) calculateIonFormula(formula, adduct) calculateNeutralFormula(formula, adduct)
x |
The object that should be converted. Should be a |
format |
A
|
isPositive |
A logical that specifies whether the adduct should be
positive. Should only be set when |
charge |
The final charge. Only needs to be set when |
err |
If |
formula |
A |
adduct |
An |
GenFormAdducts returns a table with information on adducts
supported by GenForm.
MetFragAdducts returns a table with information on adducts
supported by MetFrag.
as.adduct Converts an object in to an adduct
object.
calculateIonFormula Converts one or more neutral formulae to
adduct ions.
calculateNeutralFormula Converts one or more adduct ions to
neutral formulae.
as.adduct("[M+H]+") as.adduct("[M+H2]2+") as.adduct("[2M+H]+") as.adduct("[M-H]-") as.adduct("+H", format = "genform") as.adduct(1, isPositive = TRUE, format = "metfrag") # MetFrag adduct ID 1 --> returns [M+H]+ calculateIonFormula("C2H4O", "[M+H]+") # C2H5O calculateNeutralFormula("C2H5O", "[M+H]+") # C2H4Oas.adduct("[M+H]+") as.adduct("[M+H2]2+") as.adduct("[2M+H]+") as.adduct("[M-H]-") as.adduct("+H", format = "genform") as.adduct(1, isPositive = TRUE, format = "metfrag") # MetFrag adduct ID 1 --> returns [M+H]+ calculateIonFormula("C2H4O", "[M+H]+") # C2H5O calculateNeutralFormula("C2H5O", "[M+H]+") # C2H4O
Properties for the sample analyses used in the workflow and utilities to automatically generate this information.
generateAnalysisInfo( fromRaw = NULL, fromCentroid = NULL, fromProfile = NULL, fromIMS = NULL, convCentroid = NULL, convProfile = NULL, convIMS = NULL, ... ) generateAnalysisInfoFromEnviMass(path)generateAnalysisInfo( fromRaw = NULL, fromCentroid = NULL, fromProfile = NULL, fromIMS = NULL, convCentroid = NULL, convProfile = NULL, convIMS = NULL, ... ) generateAnalysisInfoFromEnviMass(path)
fromRaw, fromCentroid, fromProfile, fromIMS
|
One or more file paths that should be used for finding analyses that
are stored as raw, centroided, profile or IMS data, respectively (see details below). Set to |
convCentroid, convProfile, convIMS
|
These arguments specify the MS file conversion
destination paths for centroided, profile and IMS data, respectively. These paths are used for those analyses for
which no file with a particular file type could be found in the directories specified by the respective
|
... |
Any other columns that should be added to the analysis information table, such as |
path |
The path of the enviMass project. |
In patRoon a sample analysis, or simply analysis, refers to a single MS analysis file (sometimes
also called sample or file). The analysis information summarizes several properties for the
analyses, and is used in various steps throughout the workflow, such as findFeatures, averaging
intensities of feature groups and blank subtraction. The analysis information should be a data.frame or
data.table with a set of mandatory and optional columns (described below).
generateAnalysisInfo is an utility function that automatically generates analysis information. It
scans given directories for analysis files, and uses this to automatically fill in the analysis and
path_* columns. This function automatically groups together analyses that are stored with different file
types and formats (see further details below).
generateAnalysisInfoFromEnviMass loads analysis information
from an enviMass project. Note: this funtionality has only been
tested with older versions of enviMass.
generateAnalysisInformation returns a data.frame with automatically generated analysis
information.
The following columns should be present in the analysis information:
path_raw, path_centroid, path_profile, path_ims Specifies the directory path for
the raw, centroided, profile and IMS data, respectively. See below for more details. At least one column should not
be empty for each row.
analysis the file name without extension and without directory path. Must be unique
across all table rows.
replicate name of the replicate. Used to group analyses together that are
replicates of each other. Thus, the replicate column for all analyses considered to be belonging to the same
replicate should have an equal (but unique) value. Used for e.g. averaging and
filter.
blank all analyses within this replicate are used by the featureGroups method of
filter for blank subtraction. Multiple entries can be entered by
separation with a comma. May be empty ("") if no blank subtraction is desired.
Depending on the workflow step, different file types for the same analysis may be required.
raw Specifies the directory to raw HRMS files (e.g. ‘.raw’, ‘.d’). This is used by
e.g. conversion of raw MS data and the OpenTIMS backend.
centroid Specifies the directory to centroided and exported HRMS files (‘.mzML’, ‘.mzXML’).
These files are required by most feature finding algorithms.
profile Specifies the directory to exported but not centroided (i.e. profile) HRMS data files
(‘.mzML’, ‘.mzXML’). This is currently only used by findFeaturesSAFD.
ims Specifies the directory to exported IMS-HRMS data (‘.mzML’). This is required in IMS workflows,
unless raw IMS-HRMS data is directly loaded with the OpenTIMS backend. See e.g.
assignMobilities for more details.
Some workflows may require multiple file formats for a same file type. In this case, the file formats
should be stored within the same directory specified by the respective path_* column. For instance, if
feature finding algorithms from OpenMS and enviPick are
mixed then centroided ‘.mzML’ and ‘.mzXML’ files are needed, and files with both file formats must be
stored in the directory specified by path_centroid.
If non-raw data files are not yet present and should be exported by MS file conversion, then
path_centroid, path_profile and path_ims should specify the desired destination paths of the
converted files.
The following columns may need to be present:
conc a numeric value specifying the 'concentration' for the analysis. This can be actually any kind of
numeric value such as exposure time, dilution factor or anything else which may be used to form a linear
relationship. This is used by the as.data.table method if
regression=TRUE. As of patRoon version 3.0, any other column than "conc" can be used by setting
its name with the regression argument.
norm_conc a numeric value specifying the normalization concentration for the analysis. See the
Feature intensity normalization section in the featureGroups documentation) for
more details.
Any other columns that are present will be added to the features and featureGroups objects as
metadata. This metadata can be used e.g. in various plotting and data subsetting functions.
Various parsing and plotting functions for the analysisInfo data.frame.
## S4 method for signature 'data.frame' getTICs(obj, retentionRange = NULL, MSLevel = 1) ## S4 method for signature 'data.frame' getBPCs(obj, retentionRange = NULL, MSLevel = 1) ## S4 method for signature 'data.frame' plotTICs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, groupBy = NULL, showLegend = TRUE, xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'data.frame' plotBPCs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, groupBy = NULL, showLegend = TRUE, xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'data.table' plotTICs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, groupBy = NULL, showLegend = TRUE, xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'data.table' plotBPCs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, groupBy = NULL, showLegend = TRUE, xlim = NULL, ylim = NULL, ... )## S4 method for signature 'data.frame' getTICs(obj, retentionRange = NULL, MSLevel = 1) ## S4 method for signature 'data.frame' getBPCs(obj, retentionRange = NULL, MSLevel = 1) ## S4 method for signature 'data.frame' plotTICs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, groupBy = NULL, showLegend = TRUE, xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'data.frame' plotBPCs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, groupBy = NULL, showLegend = TRUE, xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'data.table' plotTICs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, groupBy = NULL, showLegend = TRUE, xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'data.table' plotBPCs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, groupBy = NULL, showLegend = TRUE, xlim = NULL, ylim = NULL, ... )
obj |
An |
retentionRange |
Range of retention time (in seconds), m/z, respectively. Should be a numeric vector with length of two containing the min/max values. The maximum can be Inf to specify no maximum range. Set to NULL to skip this step. |
MSLevel |
Integer vector with the ms levels (i.e., 1 for MS1 and 2 for MS2) to obtain traces. |
retMin |
Plot retention time in minutes (instead of seconds). |
title |
Character string used for title of the plot. If |
groupBy |
Specifies how results are grouped in the plot. Should be a name of a column in the
analysis information table which is used to make analysis groups (e.g.
|
showLegend |
Plot a legend if TRUE. |
xlim, ylim
|
Sets the plot size limits used by
|
... |
Further arguments passed to |
getTICs(data.frame): Obtain the total ion chromatogram/s (TICs) of the analyses.
getBPCs(data.frame): Obtain the base peak chromatogram/s (BPCs) of the analyses.
plotTICs(data.frame): Plots the TICs of the analyses.
plotBPCs(data.frame): Plots the BPCs of the analyses.
plotBPCs(data.table): Plots the BPCs of the analyses.
The raw data interface of patRoon is used by these functions to process HRMS (or IMS-HRMS) data. Please see its documentation for more information on the supported formats and available configuration options.
Ricardo Cunha ([email protected]) and Rick Helmus ([email protected])
compounds object.Assigns ion mobility and CCS values to the candidates in a compounds object.
## S4 method for signature 'compounds' assignMobilities( obj, fGroups, IMS = TRUE, from = NULL, matchFromBy = "InChIKey1", overwrite = FALSE, adduct = NULL, CCSParams = NULL, prefCalcChemProps = TRUE, neutralChemProps = FALSE, virtualenv = "patRoon-C3SDB" ) ## S4 method for signature 'compoundsSet' assignMobilities(obj, fGroups, IMS = TRUE, from = NULL, ...)## S4 method for signature 'compounds' assignMobilities( obj, fGroups, IMS = TRUE, from = NULL, matchFromBy = "InChIKey1", overwrite = FALSE, adduct = NULL, CCSParams = NULL, prefCalcChemProps = TRUE, neutralChemProps = FALSE, virtualenv = "patRoon-C3SDB" ) ## S4 method for signature 'compoundsSet' assignMobilities(obj, fGroups, IMS = TRUE, from = NULL, ...)
obj |
The |
fGroups |
The |
IMS |
(IMS workflow) Specifies which feature groups are considered for IMS assignments in IMS workflows. The following options are valid:
|
from, matchFromBy, overwrite, CCSParams, prefCalcChemProps, neutralChemProps, virtualenv
|
Passed to the method for suspects |
adduct |
An (sets workflow) The |
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
The assignMobilities method for compounds is used to (1) add predicted or known IMS data to
annotation candidates and (2) convert (previously added) mobility <–> CCS values. Internally, the
assignMobilities method for suspects is used to perform these operations, please see
its documentation for more details.
If both adduct specific and non-adduct specific mobility and CCS data is available (and not NA),
then the non-adduct specific data is assigned to the compound candidate. Otherwise, data corresponding to the
adduct argument or the adduct assigned to the feature group is taken. The
filter method can be used to filter out candidates with deviating IMS data.
In a sets workflow the calculations are performed and stored independently for each set, as adducts, charges and m/z values typically differ.
The from argument may be a list that specifies the from values for each set. This is primarily
intended when tables are used to set from.
SIRIUS does currently not report InChIKey values, hence, matchFromBy="InChIKey" is
not supported in this case.
If compound annotation is performed with generateCompoundsMetFrag with database="pubchemlite"
then CCS data is already added, provided the local database has it. If you want to overwrite this data,
set overwrite=TRUE.
The assignMobilities method for suspects.
Various approaches to assign mobilities to features and perform CCS conversions.
## S4 method for signature 'featureGroups' assignMobilities( obj, mobPeakParams = NULL, chromPeakParams = NULL, EIMParams = getDefEIMParams(), EICParams = getDefEICParams(), peakRTWindow = defaultLim("retention", "narrow"), fallbackEIC = TRUE, calcArea = "integrate", mobWindow = defaultLim("mobility", "medium"), scoreWeights = c(mobility = 1, intensity = 1), CCSParams = NULL, parallel = "maybe" ) ## S4 method for signature 'featureGroupsSet' assignMobilities( obj, mobPeakParams = NULL, chromPeakParams = NULL, EIMParams = getDefEIMParams(), EICParams = getDefEICParams(), peakRTWindow = defaultLim("retention", "narrow"), fallbackEIC = TRUE, calcArea = "integrate", mobWindow = defaultLim("mobility", "medium"), scoreWeights = c(mobility = 1, intensity = 1), CCSParams = NULL, parallel = "maybe" ) ## S4 method for signature 'featureGroupsScreening' assignMobilities( obj, mobPeakParams = NULL, chromPeakParams = NULL, EIMParams = getDefEIMParams(), EICParams = getDefEICParams(), peakRTWindow = defaultLim("retention", "narrow"), fallbackEIC = TRUE, calcArea = "integrate", mobWindow = defaultLim("mobility", "medium"), scoreWeights = c(mobility = 1, intensity = 1), CCSParams = NULL, parallel = "maybe", fromSuspects = FALSE, IMSMatchParams = NULL ) ## S4 method for signature 'featureGroupsScreeningSet' assignMobilities( obj, mobPeakParams = NULL, chromPeakParams = NULL, EIMParams = getDefEIMParams(), EICParams = getDefEICParams(), peakRTWindow = defaultLim("retention", "narrow"), fallbackEIC = TRUE, calcArea = "integrate", mobWindow = defaultLim("mobility", "medium"), scoreWeights = c(mobility = 1, intensity = 1), CCSParams = NULL, parallel = "maybe", fromSuspects = FALSE, IMSMatchParams = NULL )## S4 method for signature 'featureGroups' assignMobilities( obj, mobPeakParams = NULL, chromPeakParams = NULL, EIMParams = getDefEIMParams(), EICParams = getDefEICParams(), peakRTWindow = defaultLim("retention", "narrow"), fallbackEIC = TRUE, calcArea = "integrate", mobWindow = defaultLim("mobility", "medium"), scoreWeights = c(mobility = 1, intensity = 1), CCSParams = NULL, parallel = "maybe" ) ## S4 method for signature 'featureGroupsSet' assignMobilities( obj, mobPeakParams = NULL, chromPeakParams = NULL, EIMParams = getDefEIMParams(), EICParams = getDefEICParams(), peakRTWindow = defaultLim("retention", "narrow"), fallbackEIC = TRUE, calcArea = "integrate", mobWindow = defaultLim("mobility", "medium"), scoreWeights = c(mobility = 1, intensity = 1), CCSParams = NULL, parallel = "maybe" ) ## S4 method for signature 'featureGroupsScreening' assignMobilities( obj, mobPeakParams = NULL, chromPeakParams = NULL, EIMParams = getDefEIMParams(), EICParams = getDefEICParams(), peakRTWindow = defaultLim("retention", "narrow"), fallbackEIC = TRUE, calcArea = "integrate", mobWindow = defaultLim("mobility", "medium"), scoreWeights = c(mobility = 1, intensity = 1), CCSParams = NULL, parallel = "maybe", fromSuspects = FALSE, IMSMatchParams = NULL ) ## S4 method for signature 'featureGroupsScreeningSet' assignMobilities( obj, mobPeakParams = NULL, chromPeakParams = NULL, EIMParams = getDefEIMParams(), EICParams = getDefEICParams(), peakRTWindow = defaultLim("retention", "narrow"), fallbackEIC = TRUE, calcArea = "integrate", mobWindow = defaultLim("mobility", "medium"), scoreWeights = c(mobility = 1, intensity = 1), CCSParams = NULL, parallel = "maybe", fromSuspects = FALSE, IMSMatchParams = NULL )
obj |
A |
mobPeakParams |
A |
chromPeakParams |
A |
EIMParams, EICParams
|
Parameters to be used for the generation of EIMs (for mobility peak detection) and EICs
(for IMS feature re-integration), generated by |
peakRTWindow |
The retention time tolerance (in seconds) for detected peaks to be used for re-integration (Step
3, see |
fallbackEIC |
Set to |
calcArea |
controls how the area is calculated when updating from raw EIC data (see |
mobWindow |
The mobility tolerance window. |
scoreWeights |
A |
CCSParams |
A |
parallel |
If set to Alternatively, set |
fromSuspects |
If |
IMSMatchParams |
(IMS workflow) A |
The assignMobilities method function for features is used (1) to assign Ion Mobility values and (2) calculate
CCS values from these mobilities.
In patRoon, two approaches are supported to assign mobilities to features: direct and post
assignment (DMA and PMA). With the DMA the mobility values are directly assigned during
feature detection. This is currently only supported by the piek and
greedy algorithms or by importing feature data. With
PMA, the mobility values are assigned after feature detection and grouping (and possibly other steps such
as filtering). Thus, the PMA approach is supported by all available feature detection algorithms in
patRoon. The PMA approach is further described below. Only the CCS conversion functionality
of assignMobilities should be used in PMA mobility assignment workflows.
The assignment of CCS values is controlled by the CCSParams argument (see above).
In suspect screening workflows assignMobilities also assigns reference mobility and CCS
values to suspect hits, and can filter hits if IMSMatchParams is set. This is similarly performed as
screenSuspects, please see its documentation for more details.
The post assignment of mobilities occurs in the following steps:
Extracted ion mobilograms (EIMs) are generated for all features and subjected to automatic peak detection to obtain mobility peaks.
The detected mobility peaks in an EIM are then used to form IMS features. These features inherit their LC-MS properties (RT, m/z, etc) from the corresponding IMS precursor, i.e. the feature for which the EIM was created. The mobility peak centroids and ranges are used to assign IMS data to the IMS features. Multiple mobility peaks within the same EIM result in multiple IMS features, and each are linked to the same IMS precursor. The linkage is especially useful to keep a relation between e.g. protomers.
LC-MS properties such as the area, intensity and RTs are (optionally) updated by re-integration of detected
peaks from mobility filtered extracted ion chromatograms. If peak detection is disabled or fails, then the
intensity and areas can be estimated directly from raw EIC data (fallbackEIC argument).
Any IMS features that could not be re-integrated (either by peak detection or EIC fallback) are removed.
The feature grouping is updated: the IMS features with close mobilities (defined by mobWindow)
within a feature group are split-off into new feature groups and linked to the original IMS precursor
feature group. This is performed by the greedy grouping algorithm. LC-MS properties
and most other data such as feature group qualities and scores (calculatePeakQualities), adduct
annotations (e.g. selectIons), predicted concentrations and toxicities
(calculateConcs and calculateTox) and internal standards for intensity normalization
(normInts) are copied from the IMS precursors to the IMS feature groups.
Note that re-running assignMobilities will first remove any existing IMS features.
In suspect screening workflows the
fromSuspects arguments can be set to alternatively perform mobility assignment directly from the suspect
list data (replacing Steps 1-2). The feature mobility is simply assigned from the suspect data and the mobility
range is derived from the mobWindow argument. Relationships with IMS precursors (Step 2) are similarly formed.
An advantage of this approach is that no mobility peak detection is needed, which may useful for low intensity
features where this could be difficult. Setting fromSuspects=TRUE is primarily intended for workflows where
(1) the mobility of a suspect is accurately known upfront or (2) IMS data should only be used as a rough filtering
step for feature data. In the latter case accurate feature mobility assignment is not of interest and the suspect
IMS data is typically not accurately known (e.g. predicted), hence, for these workflows the tolerance
specified by mobWindow should be increased.
With fromSuspects=TRUE no mobility peak detection is performed, hence, the actual presence of the feature is
only verified in Step 3. For this reason, falling back to EIC data (fallbackEIC argument) is never performed
for IMS features from suspects, and chromPeakParams must always be defined to allow chromatographic
peak detection.
If both fromSuspects and mobPeakParams are set, regular mobility assignment (Steps 1-2) is performed
for features without suspect hit. If multiple suspects were assigned to a feature group then suspect data is
never used to form IMS features.
The features (and feature groups) with IMS properties are referred to IMS features (and IMS feature groups). These are referred to orphans if their link to the original IMS precursor is removed (in post workflows, see previous section) or non-existent (direct workflows). The formation of orphans in post workflows typically occurs by removal of IMS precursors by subsetting or filtering operations.
Most data-processing functionality, such as subsetting, plotting, filtering, etc., allows to selectively operate
either on the IMS features, their IMS precursors, both or either the precursor or orphans (controlled by the
IMS argument to the corresponding functions).
The raw data interface of patRoon is used by assignMobilities to
process HRMS (or IMS-HRMS) data. Please see its documentation for more information on the supported
formats and available configuration options.
Mobility assignment in a sets workflow is equivalent to non-sets workflows. However, currently there is no (known) way to relate mobilities across MS polarities (+/- mode). Thus, IMS features will always be formed for each set separately. However, IMS precursors will still be grouped across polarities (like a non-IMS workflow) and their links to IMS features can therefore be used to relate IMS features across MS polarities.
Adds calculated mobility and/or CCS data to a suspect list.
## S4 method for signature 'data.table' assignMobilities( obj, from = NULL, matchFromBy = "InChIKey1", overwrite = FALSE, adducts = c("[M+H]+", "[M-H]-", NA), predictAdductOnly = TRUE, CCSParams = NULL, prepareChemProps = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, virtualenv = "patRoon-C3SDB" ) ## S4 method for signature 'data.frame' assignMobilities(obj, ...)## S4 method for signature 'data.table' assignMobilities( obj, from = NULL, matchFromBy = "InChIKey1", overwrite = FALSE, adducts = c("[M+H]+", "[M-H]-", NA), predictAdductOnly = TRUE, CCSParams = NULL, prepareChemProps = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, virtualenv = "patRoon-C3SDB" ) ## S4 method for signature 'data.frame' assignMobilities(obj, ...)
obj |
The suspect list to which the mobility and/or CCS data should be added. Should be a |
from |
Specifies from where IMS data is added to the suspect list. This can be the following:
Any |
matchFromBy |
Which column should be used to match the IMS data from Matching by |
overwrite |
Set to |
adducts |
A The value for |
predictAdductOnly |
If |
CCSParams |
A |
prepareChemProps |
Set to |
prefCalcChemProps |
If |
neutralChemProps |
If |
virtualenv |
The virtual |
... |
Arguments passed to |
The assignMobilities method for suspect lists is used to (1) add IMS data to suspects from predictions or
library data and (2) convert (previously added) mobility <–> CCS values. These steps are controlled by the
from and CCSParams arguments, respectively.
Mobility and CCS values assigned in the suspect list are either adduct specific or not. Adduct specific
values are preferred, as the 'correct' value can be automatically selected during suspect screening based on the
adduct assigned to the feature (or passed as the adduct argument to screenSuspects). The
non-adduct specific values are typically used when the corresponding adduct for the mobility/CCS value is
unknown (or not of interest). These values get precedence over adduct specific values. The adduct specific values are
stored in mobility_<adduct> and CCS_<adduct> columns, where <adduct> is the adduct name
(e.g. [M+H]+, [M-H]-). The mobility and CCS columns store any non-adduct
specific values. The adducts argument ultimately defines the use of adduct and non-adduct specific values.
The mobility <–> CCS conversions occur both ways, i.e. missing CCS values will be
converted from mobility values and vice versa. If adduct specific values are converted then the charge value
used for these calculations is taken from the corresponding adduct. For non-adduct specific values the charge is
taken from the adduct specified in suspect list if present, or from the default charge specified in CCSParams
otherwise.
Chemical properties such as SMILES,
InChIKey and formulae in the suspect data (if prepareChemProps=TRUE) and from data (if a table) are automatically validated and calculated if missing/invalid.
The internal validation/calculation process performs the following steps:
Validation of SMILES, InChI, InChIKey and formula data (if present). Invalid
entries will be set to NA.
If neutralChemProps=TRUE then chemical data (SMILES, formulae etc.) is neutralized by
(de-)protonation (using the --neutralized option of OpenBabel). An additional column
molNeutralized is added to mark those molecules that were neutralized. Note that neutralization requires
either SMILES or InChI data to be available.
The SMILES and InChI data are used to calculate missing or invalid SMILES,
InChI, InChIKey and formula data. If prefCalcChemProps=TRUE then existing
InChIKey and formula data is overwritten by calculated values whenever possible.
The chemical formulae which were not calculated are verified and normalized. This process may be time
consuming, and is potentially largely avoided by setting prefCalcChemProps=TRUE.
Neutral masses are calculated for missing values (prefCalcChemProps=FALSE) or whenever possible
(prefCalcChemProps=TRUE).
Note that calculation of formulae for molecules that are isotopically labelled is currently only supported for deuterium (2H) elements.
This functionality relies heavily on OpenBabel, please make sure it is installed.
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011). “Open Babel: An open chemical toolbox.” Journal of Cheminformatics, 3(1). doi:10.1186/1758-2946-3-33.
Schymanski EL, Kondić T, Neumann S, Thiessen PA, Zhang J, Bolton EE (2021).
“Empowering large chemical knowledge bases for exposomics: PubChemLite meets MetFrag.”
Journal of Cheminformatics, 13(1).
ISSN 1758-2946.
doi:10.1186/s13321-021-00489-0.
http://dx.doi.org/10.1186/s13321-021-00489-0.
Elapavalore A, Ross DH, Grouès V, Aurich D, Krinsky AM, Kim S, Thiessen PA, Zhang J, Dodds JN, Baker ES, Bolton EE, Xu L, Schymanski EL (2025).
“PubChemLite Plus Collision Cross Section (CCS) Values for Enhanced Interpretation of Nontarget Environmental Data.”
Environmental Science & Technology Letters, 12(2), 166–174.
ISSN 2328-8930.
doi:10.1021/acs.estlett.4c01003.
http://dx.doi.org/10.1021/acs.estlett.4c01003.
Ross DH, Cho JH, Xu L (2020).
“Breaking Down Structural Diversity for Comprehensive Prediction of Ion-Neutral Collision Cross Sections.”
Analytical Chemistry, 92(6), 4548–4557.
ISSN 1520-6882.
doi:10.1021/acs.analchem.9b05772.
http://dx.doi.org/10.1021/acs.analchem.9b05772.
Miscellaneous utility functions which interface with Bruker DataAnalysis
showDataAnalysis() setDAMethod(anaInfo, method, close = TRUE) revertDAAnalyses(anaInfo, close = TRUE, save = close) recalibrarateDAFiles(anaInfo, close = TRUE, save = close) getDACalibrationError(anaInfo) addDAEIC( analysis, path, mz, mzWindow = defaultLim("mz", "medium"), ctype = "EIC", mtype = "MS", polarity = "both", bgsubtr = FALSE, fragpath = "", name = NULL, hideDA = TRUE, close = FALSE, save = close ) addAllDAEICs( fGroups, mzWindow = defaultLim("mz", "medium"), ctype = "EIC", bgsubtr = FALSE, name = TRUE, onlyPresent = TRUE, hideDA = TRUE, close = FALSE, save = close )showDataAnalysis() setDAMethod(anaInfo, method, close = TRUE) revertDAAnalyses(anaInfo, close = TRUE, save = close) recalibrarateDAFiles(anaInfo, close = TRUE, save = close) getDACalibrationError(anaInfo) addDAEIC( analysis, path, mz, mzWindow = defaultLim("mz", "medium"), ctype = "EIC", mtype = "MS", polarity = "both", bgsubtr = FALSE, fragpath = "", name = NULL, hideDA = TRUE, close = FALSE, save = close ) addAllDAEICs( fGroups, mzWindow = defaultLim("mz", "medium"), ctype = "EIC", bgsubtr = FALSE, name = TRUE, onlyPresent = TRUE, hideDA = TRUE, close = FALSE, save = close )
anaInfo |
|
method |
The full path of the DataAnalysis method. |
close, save
|
If |
analysis |
Analysis name (without file extension). |
path |
path of the analysis. |
mz |
m/z (Da) value used for the chromatographic trace (if applicable). |
mzWindow |
m/z window (in Da) used for the chromatographic trace (if applicable). |
ctype |
Type of the chromatographic trace. Valid options are:
|
mtype |
MS filter for chromatographic trace. Valid values are:
|
polarity |
Polarity filter for chromatographic trace. Valid values:
|
bgsubtr |
If |
fragpath |
Precursor m/z used for MS/MS traces ( |
name |
For |
hideDA |
Hides DataAnalysis while adding the chromatographic trace (faster). |
fGroups |
The |
onlyPresent |
If |
These functions communicate directly with Bruker DataAnalysis to provide various functionality, such as calibrating and exporting data and adding chromatographic traces. For this the RDCOMClient package is required to be installed.
showDataAnalysis makes a hidden DataAnalysis window visible
again. Most functions using DataAnalysis will hide the window during
processing for efficiency reasons. If the window remains hidden
(e.g. because there was an error) this function can be used to make
it visible again. This function can also be used to start DataAnalysis if
it is not running yet.
setDAMethod Sets a given DataAnalysis method (‘.m’ file)
to a set of analyses. NOTE: as a workaround for a bug in
DataAnalysis, this function will save(!), close and re-open any analyses
that are already open prior to setting the new method. The close
argument only controls whether the file should be closed after setting the
method (files are always saved).
revertDAAnalyses Reverts a given set of analyses to their
unprocessed raw state.
recalibrarateDAFiles Performs automatic mass recalibration of
a given set of analyses. The current method settings for each analyses will
be used.
getDACalibrationError is used to obtain the standard
deviation of the current mass calibration (in ppm).
addDAEIC adds an Extracted Ion Chromatogram (EIC) or other
chromatographic trace to a given analysis which can be used directly with
DataAnalysis.
addAllDAEICs adds Extracted Ion Chromatograms (EICs) for all
features within a featureGroups object.
getDACalibrationError returns a data.frame with a
column of all analyses (named analysis) and their mass error (named
error).
Returns the adducts supported by the C3SDB Python package.
C3SDBAdducts()C3SDBAdducts()
Several utility functions for caching workflow data. The most important function is clearCache; other
functions are primarily for internal use.
makeHash(..., checkDT = TRUE) makeFileHash(..., length = Inf) loadCacheData(category, hashes, dbArg = NULL, simplify = TRUE, fixDTs = TRUE) saveCacheData(category, data, hash, dbArg = NULL) clearCache(what = NULL, file = NULL, vacuum = TRUE)makeHash(..., checkDT = TRUE) makeFileHash(..., length = Inf) loadCacheData(category, hashes, dbArg = NULL, simplify = TRUE, fixDTs = TRUE) saveCacheData(category, data, hash, dbArg = NULL) clearCache(what = NULL, file = NULL, vacuum = TRUE)
... |
Arguments/objects to be used for hashing. |
checkDT |
|
length |
Maximum file length to hash. Passed to |
category |
The category of the object to be cached. |
hashes |
A |
dbArg |
Alternative connection to database. Default is |
simplify |
If |
fixDTs |
Should be |
data |
The object to be cached. |
hash |
The hash string of the object to be cached (e.g. obtained with |
what |
This argument describes what should be done. When |
file |
The cache file. If |
vacuum |
If |
makeHash Make a hash string of given arguments.
makeFileHash Generates a hash from the contents of one or more files.
loadCacheData Loads cached data from a database.
saveCacheData caches data in a database.
clearCache will either remove one or more tables within the cache sqlite database or simply
wipe the whole cache file. Removing tables will VACUUM the database (unless vacuum=FALSE), which may
take some time for large cache files.
Utility functions to convert between mobility and CCS data.
convertMobilityToCCS(mobility, mz, CCSParams, charge = NULL) convertCCSToMobility(ccs, mz, CCSParams, charge = NULL)convertMobilityToCCS(mobility, mz, CCSParams, charge = NULL) convertCCSToMobility(ccs, mz, CCSParams, charge = NULL)
mobility, ccs
|
A |
mz |
A |
CCSParams |
A |
charge |
A |
These functions provide interactive utilities to explore and review workflow data using a shiny graphical user interface (GUI). In addition, unsatisfactory data (e.g. noise identified as a feature and unrelated feature groups in a component) can easily be selected for removal.
checkFeatures( fGroups, session = "checked-features.yml", EICParams = getDefEICParams(), EIMParams = getDefEIMParams(), clearSession = FALSE ) checkComponents( components, fGroups, session = "checked-components.yml", EICParams = getDefEICParams(), clearSession = FALSE ) ## S4 method for signature 'components' checkComponents( components, fGroups, session = "checked-components.yml", EICParams = getDefEICParams(), clearSession = FALSE ) importCheckFeaturesSession( sessionIn, sessionOut, fGroups, rtWindow = defaultLim("retention", "narrow"), mzWindow = defaultLim("mz", "narrow"), mobWindow = defaultLim("mobility", "narrow"), overwrite = FALSE ) ## S4 method for signature 'featureGroups' checkFeatures( fGroups, session = "checked-features.yml", EICParams = getDefEICParams(), EIMParams = getDefEIMParams(), clearSession = FALSE ) getMCTrainData(fGroups, session) predictCheckFeaturesSession(fGroups, session, model = NULL, overwrite = FALSE)checkFeatures( fGroups, session = "checked-features.yml", EICParams = getDefEICParams(), EIMParams = getDefEIMParams(), clearSession = FALSE ) checkComponents( components, fGroups, session = "checked-components.yml", EICParams = getDefEICParams(), clearSession = FALSE ) ## S4 method for signature 'components' checkComponents( components, fGroups, session = "checked-components.yml", EICParams = getDefEICParams(), clearSession = FALSE ) importCheckFeaturesSession( sessionIn, sessionOut, fGroups, rtWindow = defaultLim("retention", "narrow"), mzWindow = defaultLim("mz", "narrow"), mobWindow = defaultLim("mobility", "narrow"), overwrite = FALSE ) ## S4 method for signature 'featureGroups' checkFeatures( fGroups, session = "checked-features.yml", EICParams = getDefEICParams(), EIMParams = getDefEIMParams(), clearSession = FALSE ) getMCTrainData(fGroups, session) predictCheckFeaturesSession(fGroups, session, model = NULL, overwrite = FALSE)
fGroups |
A This should be the 'new' object for |
session |
The session file name. |
EICParams |
A named |
EIMParams |
A named |
clearSession |
If |
components |
The |
sessionIn, sessionOut
|
The file names for the input and output sessions. |
rtWindow, mzWindow, mobWindow
|
The retention time (seconds), m/z and mobility window (if present) used to relate 'old' with 'new' feature groups. |
overwrite |
Set to |
model |
The model that was created with MetaClean and that should be used to predict pass/fail data. If
|
The data selected for removal is stored in sessions. These are ‘YAML’ files to allow easy external
manipulation. The sessions can be used to restore the selections that were made for data removal when the GUI tool is
executed again. Furthermore, functionality is provided to import and export sessions. To actually remove the data the
filter method should be used with the session file as input.
checkComponents is used to review components and their feature groups contained within. A typical use
case is to verify that peaks from features that were annotated as related adducts and/or isotopes are correctly
aligned.
importCheckFeaturesSession is used to import a session file that was generated from a different
featureGroups object. This is useful to avoid re-doing manual interpretation of chromatographic peaks
when, for instance, feature group data is re-created with different parameters.
checkFeatures is used to review chromatographic information for feature groups. Its main purpose is
to assist in reviewing the quality of detected feature (groups) and easily select unwanted data such as features
with poor peak shapes or noise.
getMCTrainData converts a session created by checkFeatures to a data.frame that can be
used by the MetaClean to train a new model. The output format is comparable to that from
getPeakQualityMetrics.
predictCheckFeaturesSession Uses ML data from MetaClean to predict the quality (Pass/Fail) of
feature group data, and converts this to a session which can be reviewed with checkFeatures and used to
remove unwanted feature groups by filter.
A dataframe with the class predictions as well as the associated probabilities for each EIC as returned by the MetaClean::getPredicitons function.
The dataframe has the four columns: EIC, Pred_Class, Pred_Prob_Pass, Pred_Prob_Fail.
The raw data interface of patRoon is used by these functions to process HRMS (or IMS-HRMS) data. Please see its documentation for more information on the supported formats and available configuration options.
The topMost and topMostByReplicate EIC parameters are ignored.
checkComponents: Some componentization algorithms (e.g. generateComponentsNontarget
and generateComponentsTPs) may output components where the same feature group in a component is
present multiple times, for instance, when multiple TPs are matched to the same feature group. If such a feature
group is selected for removal, then all of its result in the component will be marked for removal.
getMCTrainData only uses session data for selected feature groups. Selected features for removal are
ignored, as this is not supported by MetaClean.
Chetnik K, Petrick L, Pandey G (2020). “MetaClean: a machine learning-based classifier for reduced false positive peak detection in untargeted LC-MS metabolomics data.” Metabolomics, 16(11). doi:10.1007/s11306-020-01738-3.
Parameters for clustering data such as mass spectra and mobilograms.
Different functionality within patRoon uses clustering to group similar data together, for instance, to average
mass spectra. A fast C++ backend based on Rcpp is used to perform the clustering.
The clustering can be configured by the method and window parameter. The following clustering methods
are available:
"hclust": uses hierarchical clustering to find similar data points (using
hclust-cpp, which is based on the fastcluster package).
"distance_point": uses a maximum distance between adjacent sorted data points to form clusters.
"distance_mean": uses a maximum distance between the mean of the current cluster and the next sorted
data point to form clusters.
"bin": uses a simple binning approach to cluster data points.
The hclust method may give more accurate results and was the default prior to patRoon 3.0, but is more
computationally demanding and generally unsuitable for IMS workflows due to excessive use of RAM. The
distance_* methods are now default and suit most cases.
The window parameter defines the clustering tolerance. For method="hclust" this corresponds to the
cluster height, for method="distance_*" methods this value sets the maximum distance between compared data and
for method="bin" it corresponds to the bin width. Too small windows will prevent clustering close data points
(e.g. resulting in split mass peaks in averaged spectra), whereas too big windows may cluster unrelated data
points together (e.g. resulting in mass inaccuracies).
Averaging of mass spectra was originally based on algorithms from the msProcess R package (now archived on CRAN).
Eddelbuettel D (2013).
Seamless R and C++ Integration with Rcpp.
Springer, New York.
doi:10.1007/978-1-4614-6868-4.
ISBN 978-1-4614-6867-7.
Eddelbuettel D, Balamuta J (2018).
“Extending R with C++: A Brief Introduction to Rcpp.”
The American Statistician, 72(1), 28-36.
doi:10.1080/00031305.2017.1375990.
Eddelbuettel D, François R (2011).
“Rcpp: Seamless R and C++ Integration.”
Journal of Statistical Software, 40(8), 1–18.
doi:10.18637/jss.v040.i08.
Eddelbuettel D, Francois R, Allaire J, Ushey K, Kou Q, Russell N, Ucar I, Bates D, Chambers J (2026).
Rcpp: Seamless R and C++ Integration.
R package version 1.1.1-1.1, https://www.rcpp.org.
Müllner D (2013).
“fastcluster: Fast Hierarchical, Agglomerative Clustering Routines for R and Python.”
Journal of Statistical Software, 53(9), 1–18.
doi:10.18637/jss.v053.i09.
Contains data for feature groups that are related in some way. These components commonly include adducts, isotopes and homologues.
componentTable(obj) componentInfo(obj) findFGroup(obj, fGroup) expandForIMS(obj, ...) ## S4 method for signature 'components' componentTable(obj) ## S4 method for signature 'components' componentInfo(obj) ## S4 method for signature 'components' groupNames(obj) ## S4 method for signature 'components' length(x) ## S4 method for signature 'components' names(x) ## S4 method for signature 'components' show(object) ## S4 method for signature 'components,ANY,ANY,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'components,ANY,ANY' x[[i, j]] ## S4 method for signature 'components' x$name ## S4 method for signature 'components' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'components' as.data.table(x) ## S4 method for signature 'components' expandForIMS(obj, fGroups) ## S4 method for signature 'components' filter( obj, size = NULL, adducts = NULL, isotopes = NULL, rtIncrement = NULL, mzIncrement = NULL, checkComponentsSession = NULL, negate = FALSE, verbose = TRUE ) ## S4 method for signature 'components' findFGroup(obj, fGroup) ## S4 method for signature 'components' plotSpectrum(obj, index, markFGroup = NULL, xlim = NULL, ylim = NULL, ...) ## S4 method for signature 'components' plotChroms(obj, index, fGroups, EICParams = getDefEICParams(window = 5), ...) ## S4 method for signature 'components' consensus(obj, ...) ## S4 method for signature 'componentsCamera' expandForIMS(obj, ...) ## S4 method for signature 'componentsFeatures' show(object) ## S4 method for signature 'componentsCliqueMS' expandForIMS(obj, ...) ## S4 method for signature 'componentsSet' show(object) ## S4 method for signature 'componentsSet,ANY,ANY,missing' x[i, j, ..., sets = NULL, drop = TRUE] ## S4 method for signature 'componentsSet' filter(obj, ..., negate = FALSE, sets = NULL) ## S4 method for signature 'componentsSet' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'componentsSet' consensus(obj, ...) ## S4 method for signature 'componentsSet' expandForIMS(obj, fGroups) ## S4 method for signature 'componentsSet' unset(obj, set) ## S4 method for signature 'componentsNT' expandForIMS(obj, ...) ## S4 method for signature 'componentsNTSet' expandForIMS(obj, ...) ## S4 method for signature 'componentsOpenMS' expandForIMS(obj, ...) ## S4 method for signature 'componentsRC' expandForIMS(obj, ...)componentTable(obj) componentInfo(obj) findFGroup(obj, fGroup) expandForIMS(obj, ...) ## S4 method for signature 'components' componentTable(obj) ## S4 method for signature 'components' componentInfo(obj) ## S4 method for signature 'components' groupNames(obj) ## S4 method for signature 'components' length(x) ## S4 method for signature 'components' names(x) ## S4 method for signature 'components' show(object) ## S4 method for signature 'components,ANY,ANY,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'components,ANY,ANY' x[[i, j]] ## S4 method for signature 'components' x$name ## S4 method for signature 'components' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'components' as.data.table(x) ## S4 method for signature 'components' expandForIMS(obj, fGroups) ## S4 method for signature 'components' filter( obj, size = NULL, adducts = NULL, isotopes = NULL, rtIncrement = NULL, mzIncrement = NULL, checkComponentsSession = NULL, negate = FALSE, verbose = TRUE ) ## S4 method for signature 'components' findFGroup(obj, fGroup) ## S4 method for signature 'components' plotSpectrum(obj, index, markFGroup = NULL, xlim = NULL, ylim = NULL, ...) ## S4 method for signature 'components' plotChroms(obj, index, fGroups, EICParams = getDefEICParams(window = 5), ...) ## S4 method for signature 'components' consensus(obj, ...) ## S4 method for signature 'componentsCamera' expandForIMS(obj, ...) ## S4 method for signature 'componentsFeatures' show(object) ## S4 method for signature 'componentsCliqueMS' expandForIMS(obj, ...) ## S4 method for signature 'componentsSet' show(object) ## S4 method for signature 'componentsSet,ANY,ANY,missing' x[i, j, ..., sets = NULL, drop = TRUE] ## S4 method for signature 'componentsSet' filter(obj, ..., negate = FALSE, sets = NULL) ## S4 method for signature 'componentsSet' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'componentsSet' consensus(obj, ...) ## S4 method for signature 'componentsSet' expandForIMS(obj, fGroups) ## S4 method for signature 'componentsSet' unset(obj, set) ## S4 method for signature 'componentsNT' expandForIMS(obj, ...) ## S4 method for signature 'componentsNTSet' expandForIMS(obj, ...) ## S4 method for signature 'componentsOpenMS' expandForIMS(obj, ...) ## S4 method for signature 'componentsRC' expandForIMS(obj, ...)
obj, object, x
|
The |
fGroup |
The name (thus a character) of the feature group that should be searched for. |
... |
For For For For For sets workflow methods: further arguments passed to the base |
i, j
|
For |
drop |
ignored. |
name |
The component name (partially matched). |
fGroups |
The For |
size |
Should be a two sized vector with the minimum/maximum size of a component. Set to |
adducts |
Remove any feature groups within components that do not match given adduct rules. If |
isotopes |
Only keep results that match a given isotope rule. If |
rtIncrement, mzIncrement
|
Should be a two sized vector with the minimum/maximum retention or mz increment of a
homologous series. Set to |
checkComponentsSession |
If set then components and/or feature groups are removed that were selected for removal
(see check-GUI and the |
negate |
If |
verbose |
If set to |
index |
The index of the component. Can be a numeric index or a character with its name. |
markFGroup |
If specified (i.e. not |
xlim, ylim
|
Sets the plot size limits used by
|
EICParams |
A named |
sets |
(sets workflow) A |
set |
(sets workflow) The name of the set. |
components objects are obtained from generateComponents.
delete returns the object for which the specified data was removed.
consensus returns a components object that is produced
by merging multiple specified components objects.
componentTable(components): Accessor method for the components slot of a
components class. Each component is stored as a
data.table.
componentInfo(components): Accessor method for the componentInfo slot of a
components class.
groupNames(components): returns a character vector with the names of the
feature groups for which data is present in this object.
length(components): Obtain total number of components.
names(components): Obtain the names of all components.
show(components): Show summary information for this object.
x[i: Subset on components/feature groups.
x[[i: Extracts a component table, optionally filtered by a feature group.
$: Extracts a component table by component name.
delete(components): Completely deletes specified (parts of) components.
as.data.table(components): Returns all component data in a table.
expandForIMS(components): Expands the components data for IMS feature groups. See the IMS expansion section
below.
filter(components): Provides rule based filtering for components.
findFGroup(components): Returns the component id(s) to which a feature group
belongs.
plotSpectrum(components): Plot a pseudo mass spectrum for a single
component.
plotChroms(components): Plot an extracted ion chromatogram (EIC) for all feature groups within a single component.
consensus(components): Generates a consensus from multiple components
objects. At this point results are simply combined and no attempt is made to
merge similar components.
componentsList of all components in this object. Use the componentTable method for access.
componentInfoA data.table containing general information for each component. Use the
componentInfo method for access.
In IMS workflows with post mobility assignment (see
assignMobilities), specifically when the assignment occurs after
generation of the components, it may be desired to copy the results of the IMS precursors to the IMS feature
groups. For instance, for components from intensity clusters or
TPs one could assume that results for IMS feature groups will be largely the
same as their IMS precursors. The expandForIMS method function is used to expand the original
components object by adding in IMS feature groups with data copied from their IMS precursors.
Currently, only components generated with generateComponentsIntClust,
generateComponentsSpecClust and generateComponentsTPs support this operation.
The componentsSet class is applicable for sets workflows. This class is derived from components and therefore largely follows the same user interface.
The following methods are specifically defined for sets workflows:
All the methods from base class workflowStepSet.
unset Converts the object data for a specified set into a 'non-set' object (componentsUnset), which allows it to be used in 'regular' workflows. Only the components in the specified set are kept.
The following methods are changed or with new functionality:
filter and the subset operator ([) Can be used to select components that are only present for
selected sets.
filter Applies only those filters for which a component has data available. For instance, filtering by
adduct will only filter any results within a component if that component contains adduct information.
For plotChroms: The topMost and topMostByReplicate EIC parameters are ignored unless the
components are from homologous series.
This base class is derived from components and is used to store components resulting from hierarchical
clustering information, for instance, generated by generateComponentsIntClust and
generateComponentsSpecClust.
## S4 method for signature 'componentsClust' delete(obj, ...) ## S4 method for signature 'componentsClust' clusters(obj) ## S4 method for signature 'componentsClust' cutClusters(obj) ## S4 method for signature 'componentsClust' clusterProperties(obj) ## S4 method for signature 'componentsClust' treeCut(obj, k = NULL, h = NULL) ## S4 method for signature 'componentsClust' treeCutDynamic(obj, maxTreeHeight, deepSplit, minModuleSize) ## S4 method for signature 'componentsClust,missing' plot( x, pal = "Paired", numericLabels = TRUE, colourBranches = length(x) < 50, showLegend = length(x) < 20, ... ) ## S4 method for signature 'componentsClust' plotSilhouettes(obj, kSeq, pch = 16, type = "b", ...)## S4 method for signature 'componentsClust' delete(obj, ...) ## S4 method for signature 'componentsClust' clusters(obj) ## S4 method for signature 'componentsClust' cutClusters(obj) ## S4 method for signature 'componentsClust' clusterProperties(obj) ## S4 method for signature 'componentsClust' treeCut(obj, k = NULL, h = NULL) ## S4 method for signature 'componentsClust' treeCutDynamic(obj, maxTreeHeight, deepSplit, minModuleSize) ## S4 method for signature 'componentsClust,missing' plot( x, pal = "Paired", numericLabels = TRUE, colourBranches = length(x) < 50, showLegend = length(x) < 20, ... ) ## S4 method for signature 'componentsClust' plotSilhouettes(obj, kSeq, pch = 16, type = "b", ...)
... |
Further options passed to |
k, h
|
Desired number of clusters or tree height to be used for cutting the dendrogram, respectively. One or the
other must be specified. Analogous to |
maxTreeHeight, deepSplit, minModuleSize
|
Arguments used by
|
x, obj
|
A |
pal |
Colour palette to be used from RColorBrewer. |
numericLabels |
Set to |
colourBranches |
Whether branches from cut clusters (and their labels)
should be coloured. Might be slow with large numbers of clusters, hence,
the default is only |
showLegend |
If |
kSeq |
An integer vector containing the sequence that should be used for average silhouette width calculation. |
pch, type
|
Passed to |
clusters(componentsClust): Accessor method to the clust slot, which was generated by hclust.
cutClusters(componentsClust): Accessor method to the cutClusters slot. Returns a vector with cluster membership
for each candidate (format as cutree).
clusterProperties(componentsClust): Returns a list with properties on how the
clustering was performed.
treeCut(componentsClust): Manually (re-)cut the dendrogram.
treeCutDynamic(componentsClust): Automatically (re-)cut the dendrogram using the cutreeDynamicTree function
from dynamicTreeCut.
plot(x = componentsClust, y = missing): generates a dendrogram from a given cluster object and optionally highlights resulting
branches when the cluster is cut.
plotSilhouettes(componentsClust): Plots the average silhouette width when the
clusters are cut by a sequence of k numbers. The k value with the highest
value (marked in the plot) may be considered as the optimal number of
clusters.
distmDistance matrix that was used for clustering (obtained with daisy).
clustObject returned by hclust.
cutClustersA list with assigned clusters (same format as what cutree returns).
gInfoThe groupInfo of the feature groups object that was used.
propertiesA list containing general properties and parameters used for clustering.
alteredSet to TRUE if the object was altered (e.g. filtered) after its creation.
When components are re-made by treeCut or treeCutDynamic any
expanded data should be re-added by calling expandForIMS.
The intensity values for components (used by plotSpectrum) are set
to a dummy value (1) as no single intensity value exists for this kind of
components.
When the object is altered (e.g. by filtering or subsetting it), methods that need the original clustered data such as plotting methods do not work anymore and stop with an error.
Schollee JE, Bourgin M, von Gunten U, McArdell CS, Hollender J (2018). “Non-target screening to trace ozonation transformation products in a wastewater treatment train including different post-treatments.” Water Research, 142, 267–278. doi:10.1016/j.watres.2018.05.045.
components and generateComponents
This class is derived from componentsClust and is used to store hierarchical clustering information
from intensity profiles of feature groups.
plotHeatMap(obj, ...) ## S4 method for signature 'componentsIntClust' plotHeatMap( obj, interactive = FALSE, col = NULL, margins = c(6, 2), cexCol = 1, ... ) ## S4 method for signature 'componentsIntClust' plotInt( obj, index, pch = 20, type = "b", lty = 3, col = NULL, plotArgs = NULL, linesArgs = NULL )plotHeatMap(obj, ...) ## S4 method for signature 'componentsIntClust' plotHeatMap( obj, interactive = FALSE, col = NULL, margins = c(6, 2), cexCol = 1, ... ) ## S4 method for signature 'componentsIntClust' plotInt( obj, index, pch = 20, type = "b", lty = 3, col = NULL, plotArgs = NULL, linesArgs = NULL )
obj |
A |
... |
Further options passed to |
interactive |
If |
col |
The colour used for plotting. Set to |
margins, cexCol
|
Passed to |
index |
Numeric component/cluster index or component name. |
pch, type, lty
|
Passed to |
plotArgs, linesArgs
|
A |
Objects from this class are generated by generateComponentsIntClust
plotHeatMap returns the same as heatmap.2 or
heatmaply.
plotHeatMap(componentsIntClust): draws a heatmap using the
heatmap.2 or heatmaply function.
plotInt(componentsIntClust): makes a plot for all (normalized) intensity
profiles of the feature groups within a given cluster.
clustermNumeric matrix with normalized feature group intensities that was used for clustering.
When the object is altered (e.g. by filtering or subsetting it), methods that need the original clustered data such as plotting methods do not work anymore and stop with an error.
componentsClust for other relevant methods and generateComponents
This class is derived from components and is used to store
results from unsupervised homolog detection with the nontarget
package.
## S4 method for signature 'componentsNT' plotGraph(obj, onlyLinked = TRUE, width = NULL, height = NULL) ## S4 method for signature 'componentsNTSet' plotGraph(obj, onlyLinked = TRUE, set, ...) ## S4 method for signature 'componentsNTSet' unset(obj, set)## S4 method for signature 'componentsNT' plotGraph(obj, onlyLinked = TRUE, width = NULL, height = NULL) ## S4 method for signature 'componentsNTSet' plotGraph(obj, onlyLinked = TRUE, set, ...) ## S4 method for signature 'componentsNTSet' unset(obj, set)
obj |
The |
onlyLinked |
If |
width, height
|
Passed to |
set |
(sets workflow) The name of the set. |
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
Objects from this class are generated by
generateComponentsNontarget
plotGraph returns the result of visNetwork.
plotGraph(componentsNT): Plots an interactive network graph for linked
homologous series (i.e. series with (partial) overlap which could
not be merged). The resulting graph can be browsed interactively and allows
quick inspection of series which may be related. The graph is constructed
with the igraph package and rendered with
visNetwork.
homolA list with homol objects for each replicate
as returned by homol.search
The componentsNTSet class is applicable for sets workflows. This class is derived from componentsNT and therefore largely follows the same user interface.
The following methods are specifically defined for sets workflows:
All the methods from base class workflowStepSet.
unset Converts the object data for a specified set into a 'non-set' object (componentsNTUnset), which allows it to be used in 'regular' workflows. Only the components in the specified set are kept. Furthermore, the
component names are restored to non-set specific names (see generateComponents for more details).
The following methods are changed or with new functionality:
plotGraph Currently can only create graph networks from one set (specified by the set
argument).
Note that the componentsNTSet class does not have a homol slot. Instead, the setObjects
method can be used to access this data for a specific set.
Loos M, Singer H (2017).
“Nontargeted homologue series extraction from hyphenated high resolution mass spectrometry data.”
Journal of Cheminformatics, 9(1).
doi:10.1186/s13321-017-0197-z.
Loos M, Gerber C, Corona F, Hollender J, Singer H (2015).
“Accelerated Isotope Fine Structure Calculation Using Pruned Transition Trees.”
Analytical Chemistry, 87(11), 5738-5744.
https://pubs.acs.org/doi/abs/10.1021/acs.analchem.5b00941.
Antonov M, Csárdi G, Horvát S, Müller K, Nepusz T, Noom D, Salmon M, Traag V, Welles BF, Zanini F (2023).
“igraph enables fast and robust network analysis across programming languages.”
arXiv preprint arXiv:2311.10260.
doi:10.48550/arXiv.2311.10260.
Csárdi G, Nepusz T (2006).
“The igraph software package for complex network research.”
InterJournal, Complex Systems, 1695.
https://igraph.org.
Csárdi G, Nepusz T, Traag V, Horvát S, Zanini F, Noom D, Müller K, Schoch D, Salmon M (2026).
igraph: Network Analysis and Visualization in R.
doi:10.5281/zenodo.7682609.
R package version 2.3.1, https://CRAN.R-project.org/package=igraph.
components and generateComponents
This class is derived from componentsClust and is used to store components from feature groups that
were clustered based on their MS/MS similarities.
Objects from this class are generated by generateComponentsSpecClust
When the object is altered (e.g. by filtering or subsetting it), methods that need the original clustered data such as plotting methods do not work anymore and stop with an error.
componentsClust for other relevant methods and generateComponents
This class is derived from components and is used to store components that result from linking feature
groups that are (predicted to be) parents with feature groups that (are predicted to be) transformation products. For
more details, see generateComponentsTPs.
## S4 method for signature 'componentsTPs' as.data.table(x, candidates = FALSE) ## S4 method for signature 'componentsTPs' filter( obj, ..., retDirMatch = FALSE, minSpecSim = NULL, minSpecSimPrec = NULL, minSpecSimBoth = NULL, minTotFragMatches = NULL, minTotNLMatches = NULL, minFragMatches = NULL, minNLMatches = NULL, formulas = NULL, verbose = TRUE, negate = FALSE ) ## S4 method for signature 'componentsTPs' plotGraph(obj, onlyLinked = TRUE, width = NULL, height = NULL)## S4 method for signature 'componentsTPs' as.data.table(x, candidates = FALSE) ## S4 method for signature 'componentsTPs' filter( obj, ..., retDirMatch = FALSE, minSpecSim = NULL, minSpecSimPrec = NULL, minSpecSimBoth = NULL, minTotFragMatches = NULL, minTotNLMatches = NULL, minFragMatches = NULL, minNLMatches = NULL, formulas = NULL, verbose = TRUE, negate = FALSE ) ## S4 method for signature 'componentsTPs' plotGraph(obj, onlyLinked = TRUE, width = NULL, height = NULL)
x, obj
|
A |
candidates |
If |
..., verbose
|
Further arguments passed to the base |
retDirMatch |
If set to |
minSpecSim, minSpecSimPrec, minSpecSimBoth
|
The minimum spectral similarity of a TP compared to its parent
(‘0-1’). The |
minTotFragMatches, minTotNLMatches, minFragMatches, minNLMatches
|
Minimum number of (total) parent/TP fragment and
neutral loss matches. Set to |
formulas |
A |
negate |
If |
onlyLinked |
If |
width, height
|
Passed to |
filter returns a filtered componentsTPs object.
plotGraph returns the result of visNetwork.
as.data.table(componentsTPs): Returns all component data as a data.table.
filter(componentsTPs): Provides various rule based filtering options to clean and prioritize TP data.
plotGraph(componentsTPs): Plots an interactive network graph for linked components. Components are linked with each
other if one or more transformation products overlap. The graph is constructed with the igraph package
and rendered with visNetwork.
fromTPsA logical that is TRUE when the componentization was performed with
transformationProducts data (i.e. the TPs argument was not NULL).
parentsFromScreeningA logical that is TRUE when the parents were obtained from screening data.
TPsFromScreeningA logical that is TRUE when the TPs were obtained from screening data.
The intensity values for components (used by plotSpectrum) are set
to a dummy value (1) as no single intensity value exists for this kind of
components.
Antonov M, Csárdi G, Horvát S, Müller K, Nepusz T, Noom D, Salmon M, Traag V, Welles BF, Zanini F (2023).
“igraph enables fast and robust network analysis across programming languages.”
arXiv preprint arXiv:2311.10260.
doi:10.48550/arXiv.2311.10260.
Csárdi G, Nepusz T (2006).
“The igraph software package for complex network research.”
InterJournal, Complex Systems, 1695.
https://igraph.org.
Csárdi G, Nepusz T, Traag V, Horvát S, Zanini F, Noom D, Müller K, Schoch D, Salmon M (2026).
igraph: Network Analysis and Visualization in R.
doi:10.5281/zenodo.7682609.
R package version 2.3.1, https://CRAN.R-project.org/package=igraph.
components for other relevant methods and generateComponents
Contains data for compound annotations for feature groups.
addFormulaScoring( compounds, formulas, updateScore = FALSE, formulaScoreWeight = 1 ) ## S4 method for signature 'compounds' defaultExclNormScores(obj) ## S4 method for signature 'compounds' show(object) ## S4 method for signature 'compounds' identifiers(compounds) ## S4 method for signature 'compounds' filter( obj, minExplainedPeaks = NULL, minScore = NULL, minFragScore = NULL, minFormulaScore = NULL, scoreLimits = NULL, IMSRangeParams = NULL, IMSMatchParams = NULL, ... ) ## S4 method for signature 'compounds' addFormulaScoring( compounds, formulas, updateScore = FALSE, formulaScoreWeight = 1 ) ## S4 method for signature 'compounds' getMCS(obj, index, groupName) ## S4 method for signature 'compounds' plotStructure(obj, index, groupName, width = 500, height = 500) ## S4 method for signature 'compounds' plotScores( obj, index, groupName, normalizeScores = "max", excludeNormScores = defaultExclNormScores(obj), onlyUsed = TRUE ) ## S4 method for signature 'compounds' annotatedPeakList( obj, index, groupName, MSPeakLists, formulas = NULL, onlyAnnotated = FALSE ) ## S4 method for signature 'compounds' plotSpectrum( obj, index, groupName, MSPeakLists, formulas = NULL, plotStruct = FALSE, title = NULL, normalized = "multiple", specSimParams = getDefSpecSimParams(), mincex = 0.9, xlim = NULL, ylim = NULL, showLegend = TRUE, maxMolSize = c(0.2, 0.4), molRes = c(100, 100), ... ) ## S4 method for signature 'compounds' consensus( obj, ..., MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), absMinAbundance = NULL, relMinAbundance = NULL, uniqueFrom = NULL, uniqueOuter = FALSE, rankWeights = 1, labels = NULL ) ## S4 method for signature 'compoundsSet' show(object) ## S4 method for signature 'compoundsSet' delete(obj, i, j, ...) ## S4 method for signature 'compoundsSet,ANY,missing,missing' x[i, j, ..., sets = NULL, updateConsensus = FALSE, drop = TRUE] ## S4 method for signature 'compoundsSet' filter(obj, ..., sets = NULL, updateConsensus = FALSE, negate = FALSE) ## S4 method for signature 'compoundsSet' plotSpectrum( obj, index, groupName, MSPeakLists, formulas = NULL, plotStruct = FALSE, title = NULL, normalized = "multiple", specSimParams = getDefSpecSimParams(), mincex = 0.9, xlim = NULL, ylim = NULL, showLegend = TRUE, maxMolSize = c(0.2, 0.4), molRes = c(100, 100), perSet = TRUE, mirror = TRUE, ... ) ## S4 method for signature 'compoundsSet' addFormulaScoring( compounds, formulas, updateScore = FALSE, formulaScoreWeight = 1 ) ## S4 method for signature 'compoundsSet' annotatedPeakList(obj, index, groupName, MSPeakLists, formulas = NULL, ...) ## S4 method for signature 'compoundsSet' consensus( obj, ..., MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), absMinAbundance = NULL, relMinAbundance = NULL, uniqueFrom = NULL, uniqueOuter = FALSE, rankWeights = 1, labels = NULL, filterSets = FALSE, setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE ) ## S4 method for signature 'compoundsSet' unset(obj, set) ## S4 method for signature 'compoundsConsensusSet' unset(obj, set) ## S4 method for signature 'compoundsSIRIUS' delete(obj, i = NULL, j = NULL, ...)addFormulaScoring( compounds, formulas, updateScore = FALSE, formulaScoreWeight = 1 ) ## S4 method for signature 'compounds' defaultExclNormScores(obj) ## S4 method for signature 'compounds' show(object) ## S4 method for signature 'compounds' identifiers(compounds) ## S4 method for signature 'compounds' filter( obj, minExplainedPeaks = NULL, minScore = NULL, minFragScore = NULL, minFormulaScore = NULL, scoreLimits = NULL, IMSRangeParams = NULL, IMSMatchParams = NULL, ... ) ## S4 method for signature 'compounds' addFormulaScoring( compounds, formulas, updateScore = FALSE, formulaScoreWeight = 1 ) ## S4 method for signature 'compounds' getMCS(obj, index, groupName) ## S4 method for signature 'compounds' plotStructure(obj, index, groupName, width = 500, height = 500) ## S4 method for signature 'compounds' plotScores( obj, index, groupName, normalizeScores = "max", excludeNormScores = defaultExclNormScores(obj), onlyUsed = TRUE ) ## S4 method for signature 'compounds' annotatedPeakList( obj, index, groupName, MSPeakLists, formulas = NULL, onlyAnnotated = FALSE ) ## S4 method for signature 'compounds' plotSpectrum( obj, index, groupName, MSPeakLists, formulas = NULL, plotStruct = FALSE, title = NULL, normalized = "multiple", specSimParams = getDefSpecSimParams(), mincex = 0.9, xlim = NULL, ylim = NULL, showLegend = TRUE, maxMolSize = c(0.2, 0.4), molRes = c(100, 100), ... ) ## S4 method for signature 'compounds' consensus( obj, ..., MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), absMinAbundance = NULL, relMinAbundance = NULL, uniqueFrom = NULL, uniqueOuter = FALSE, rankWeights = 1, labels = NULL ) ## S4 method for signature 'compoundsSet' show(object) ## S4 method for signature 'compoundsSet' delete(obj, i, j, ...) ## S4 method for signature 'compoundsSet,ANY,missing,missing' x[i, j, ..., sets = NULL, updateConsensus = FALSE, drop = TRUE] ## S4 method for signature 'compoundsSet' filter(obj, ..., sets = NULL, updateConsensus = FALSE, negate = FALSE) ## S4 method for signature 'compoundsSet' plotSpectrum( obj, index, groupName, MSPeakLists, formulas = NULL, plotStruct = FALSE, title = NULL, normalized = "multiple", specSimParams = getDefSpecSimParams(), mincex = 0.9, xlim = NULL, ylim = NULL, showLegend = TRUE, maxMolSize = c(0.2, 0.4), molRes = c(100, 100), perSet = TRUE, mirror = TRUE, ... ) ## S4 method for signature 'compoundsSet' addFormulaScoring( compounds, formulas, updateScore = FALSE, formulaScoreWeight = 1 ) ## S4 method for signature 'compoundsSet' annotatedPeakList(obj, index, groupName, MSPeakLists, formulas = NULL, ...) ## S4 method for signature 'compoundsSet' consensus( obj, ..., MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), absMinAbundance = NULL, relMinAbundance = NULL, uniqueFrom = NULL, uniqueOuter = FALSE, rankWeights = 1, labels = NULL, filterSets = FALSE, setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE ) ## S4 method for signature 'compoundsSet' unset(obj, set) ## S4 method for signature 'compoundsConsensusSet' unset(obj, set) ## S4 method for signature 'compoundsSIRIUS' delete(obj, i = NULL, j = NULL, ...)
formulas |
The |
updateScore, formulaScoreWeight
|
If |
obj, object, compounds, x
|
The |
minExplainedPeaks, scoreLimits
|
Passed to the
|
minScore, minFragScore, minFormulaScore
|
Minimum overall score, in-silico fragmentation score and formula score,
respectively. Set to |
IMSRangeParams |
(IMS workflow) A |
IMSMatchParams |
(IMS workflow) A |
... |
For For for For For sets workflow methods: further arguments passed to the base |
index |
The numeric index of the candidate structure. For For |
groupName |
The name of the feature group for which a plot should be made. To compare spectra, two group names can be specified. |
width, height
|
The dimensions (in pixels) of the raster image that should be plotted. |
normalizeScores |
A |
excludeNormScores |
A
For |
onlyUsed |
If |
MSPeakLists |
The |
onlyAnnotated |
Set to |
plotStruct |
If |
title |
The title of the plot. If |
normalized |
Controls intensity normalization. Should be |
specSimParams |
A named |
mincex |
The formula annotation labels are automatically scaled. The
|
xlim, ylim
|
Sets the plot size limits used by
|
showLegend |
Set to |
maxMolSize |
Numeric vector of size two with the maximum width/height of the candidate structure (relative to the plot size). |
molRes |
Numeric vector of size two with the resolution of the candidate structure (in pixels). |
absMinAbundance, relMinAbundance
|
Minimum absolute or relative
(‘0-1’) abundance across objects for a result to be kept. For
instance, |
uniqueFrom |
Set this argument to only retain compounds that are unique
within one or more of the objects for which the consensus is made.
Selection is done by setting the value of |
uniqueOuter |
If |
rankWeights |
A numeric vector with weights of to calculate the mean ranking score for each candidate. The value will be re-cycled if necessary, hence, the default value of ‘1’ means equal weights for all considered objects. |
labels |
A |
i, j, drop
|
Passed to the |
sets |
(sets workflow) A |
updateConsensus |
(sets workflow) If |
negate |
Passed to the |
perSet, mirror
|
(sets workflow) If |
filterSets |
(sets workflow) Controls how algorithms concensus abundance filters are applied. See the |
setThreshold, setThresholdAnn
|
(sets workflow) Thresholds used to create the annotation set consensus. See
|
setAvgSpecificScores |
(sets workflow) If |
set |
(sets workflow) The name of the set. |
compounds objects are obtained from compound generators. This class is derived from
the featureAnnotations class, please see its documentation for more methods and other details.
addFormulaScoring returns a compounds object updated
with formula scoring.
getMCS returns an rcdk molecule object
(IAtomContainer).
consensus returns a compounds object that is produced by merging multiple specified
compounds objects.
defaultExclNormScores(compounds): Returns default scorings that are excluded from normalization.
show(compounds): Show summary information for this object.
identifiers(compounds): Returns a list containing for each feature group a
character vector with database identifiers for all candidate compounds. The
list is named by feature group names, and is typically used with the
identifiers option of generateCompoundsMetFrag.
filter(compounds): Provides rule based filtering for generated compounds. Useful to eliminate unlikely candidates
and speed up further processing. Also see the featureAnnotations
method.
addFormulaScoring(compounds): Adds formula ranking data from a formulas
object as an extra compound candidate scoring (formulaScore column).
The formula score for each compound candidate is between ‘0-1’, where
zero means no match with any formula candidates, and one
means that the compound candidate's formula is the highest ranked.
getMCS(compounds): Calculates the maximum common substructure (MCS)
for two or more candidate structures for a feature group. This method uses
the get.mcs function from rcdk.
plotStructure(compounds): Plots a structure of a candidate compound using the
rcdk package. If multiple candidates are specified (i.e.
by specifying a vector for index) then the maximum common
substructure (MCS) of the selected candidates is drawn.
plotScores(compounds): Plots a barplot with scoring of a candidate compound.
annotatedPeakList(compounds): Returns an MS/MS peak list annotated with data from a
given candidate compound for a feature group.
plotSpectrum(compounds): Plots an annotated spectrum for a given candidate compound for a feature group. Two spectra can
be compared by specifying a two-sized vector for the index and groupName arguments.
consensus(compounds): Generates a consensus of results from multiple
objects. In order to rank the consensus candidates, first
each of the candidates are scored based on their original ranking
(the scores are normalized and the highest ranked candidate gets value
‘1’). The (weighted) mean is then calculated for all scorings of each
candidate to derive the final ranking (if an object lacks the candidate its
score will be ‘0’). The original rankings for each object is stored in
the rank columns.
MS2QuantMetaMetadata from MS2Quant filled in by predictRespFactors.
(sets workflow) A named list with the metadata stored for each set.
setThreshold,setThresholdAnn,setAvgSpecificScores(sets workflow) A copy of the equally named arguments that were
passed when this object was created by generateCompounds.
origFGNames(sets workflow) The original (order of) names of the featureGroups object that was used to
create this object.
In IMS workflows, reference IMS data to candidates can be assigned with
assignMobilities method function. Furthermore, CCS values may be
assigned directly to candidates with generateCompounds if database="pubchemlite".
This data can be used to prioritize candidates with the IMSMatchParams and IMSRangeParams filters.
Subscripting of formulae for plots generated by
plotSpectrum is based on the chemistry2expression function
from the ReSOLUTION package.
The compoundsSet class is applicable for sets workflows. This class is derived from compounds and therefore largely follows the same user interface.
The following methods are specifically defined for sets workflows:
All the methods from base class workflowStepSet.
unset Converts the object data for a specified set into a 'non-set' object (compoundsUnset), which allows it to be used in 'regular' workflows. Only the annotation results that are present in the specified set are kept
(based on the set consensus, see below for implications).
The following methods are changed or with new functionality:
filter and the subset operator ([) Can be used to select data that is only present for selected
sets. Depending on the updateConsenus, both either operate on set consensus or original data (see below for
implications).
annotatedPeakList Returns a combined annotation table with all sets.
plotSpectrum Is able to highlight set specific mass peaks (perSet and mirror arguments).
consensus Creates the algorithm consensus based on the original annotation data (see below for
implications). Then, like the sets workflow method for generateCompounds, a consensus is made for all
sets, which can be controlled with the setThreshold and setThresholdAnn arguments. The candidate
coverage among the different algorithms is calculated for each set (e.g. coverage-positive column)
and for all sets (coverage column), which is based on the presence of a candidate in all the algorithms from
all sets data. The consensus method for sets workflow data supports the filterSets argument. This
controls how the algorithm consensus abundance filters (absMinAbundance/relMinAbundance) are applied:
if filterSets=TRUE then the minimum of all coverage set specific columns is used to obtain the
algorithm abundance. Otherwise the overall coverage column is used. For instance, consider a consensus
object to be generated from two objects generated by different algorithms (e.g. SIRIUS and
MetFrag), which both have a positive and negative set. Then, if a candidate occurs with both
algorithms for the positive mode set, but only with the first algorithm in the negative mode set,
relMinAbundance=1 will remove the candidate if filterSets=TRUE (because the minimum relative
algorithm abundance is ‘0.5’), while filterSets=FALSE will not remove the candidate (because based on
all sets data the candidate occurs in both algorithms).
addFormulaScoring Adds the formula scorings to the original data and re-creates the annotation set consensus (see below for implications).
Two types of annotation data are stored in a compoundsSet object:
Annotations that are produced from a consensus between set results (see generateCompounds).
The 'original' annotation data per set, prior to when the set consensus was made. This includes candidates
that were filtered out because of the thresholds set by setThreshold and setThresholdAnn. However,
when filter or subsetting ([) operations are performed, the original data is also updated.
In most cases the first data is used. However, in a few cases the original annotation data is used (as indicated
above), for instance, to re-create the set consensus. It is important to realize that the original annotation data
may have additional candidates, and a newly created set consensus may therefore have 'new' candidates. For
instance, when the object consists of the sets "positive" and "negative" and setThreshold=1
was used to create it, then compounds[, sets = "positive", updateConsensus = TRUE] may now have additional
candidates, i.e. those that were not present in the "negative" set and were previously removed due to
the consensus threshold filter.
The values ranges in the scoreLimits slot, which are used for normalization of scores, are based on the
original scorings when the compounds were generated (prior to employing the topMost filter to
generateCompounds).
Guha R (2007). “Chemical Informatics Functionality in R.” Journal of Statistical Software, 18(6).
The featureAnnotations base class for more relevant methods and
generateCompounds.
Perform hierarchical clustering of structure candidates based on chemical similarity and obtain overall structural information based on the maximum common structure (MCS).
makeHCluster(obj, method = "complete", ...) ## S4 method for signature 'compounds' makeHCluster( obj, method, fpType = "extended", fpSimMethod = "tanimoto", maxTreeHeight = 1, deepSplit = TRUE, minModuleSize = 1 )makeHCluster(obj, method = "complete", ...) ## S4 method for signature 'compounds' makeHCluster( obj, method, fpType = "extended", fpSimMethod = "tanimoto", maxTreeHeight = 1, deepSplit = TRUE, minModuleSize = 1 )
obj |
The |
method |
The clustering method passed to |
... |
further arguments specified to methods. |
fpType |
The type of structural fingerprint that should be calculated. See the |
fpSimMethod |
The method for calculating similarities (i.e. not dissimilarity!). See the |
maxTreeHeight, deepSplit, minModuleSize
|
Arguments used by
|
Often many possible chemical structure candidates are found for each feature group when performing compound annotation. Therefore, it may be useful to obtain an overview of their general structural properties. One strategy is to perform hierarchical clustering based on their chemical (dis)similarity, for instance, using the Tanimoto score. The resulting clusters can then be characterized by evaluating their maximum common substructure (MCS).
makeHCluster performs hierarchical clustering of all
structure candidates for each feature group within a
compounds object. The resulting dendrograms are automatically
cut using the cutreeDynamicTree function from the
dynamicTreeCut package. The returned
compoundsCluster object can then be used, for instance, for
plotting dendrograms and MCS structures and manually re-cutting specific
clusters.
makeHCluster returns an compoundsCluster object.
The methodology applied here has been largely derived from ‘chemclust.R’ from the metfRag package and the package vignette of rcdk.
Guha R (2007). “Chemical Informatics Functionality in R.” Journal of Statistical Software, 18(6).
compoundsCluster
Objects from this class are used to store hierarchical clustering data of
candidate structures within compounds objects.
## S4 method for signature 'compoundsCluster' clusters(obj) ## S4 method for signature 'compoundsCluster' cutClusters(obj) ## S4 method for signature 'compoundsCluster' clusterProperties(obj) ## S4 method for signature 'compoundsCluster' groupNames(obj) ## S4 method for signature 'compoundsCluster' length(x) ## S4 method for signature 'compoundsCluster' lengths(x, use.names = TRUE) ## S4 method for signature 'compoundsCluster' show(object) ## S4 method for signature 'compoundsCluster,ANY,missing,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'compoundsCluster' treeCut(obj, k = NULL, h = NULL, groupName) ## S4 method for signature 'compoundsCluster' treeCutDynamic(obj, maxTreeHeight, deepSplit, minModuleSize, groupName) ## S4 method for signature 'compoundsCluster,missing' plot( x, ..., groupName, pal = "Paired", colourBranches = lengths(x)[groupName] < 50, showLegend = lengths(x)[groupName] < 20 ) ## S4 method for signature 'compoundsCluster' getMCS(obj, groupName, cluster) ## S4 method for signature 'compoundsCluster' plotStructure( obj, groupName, cluster, width = 500, height = 500, withTitle = TRUE ) ## S4 method for signature 'compoundsCluster' plotSilhouettes(obj, kSeq, groupName, pch = 16, type = "b", ...)## S4 method for signature 'compoundsCluster' clusters(obj) ## S4 method for signature 'compoundsCluster' cutClusters(obj) ## S4 method for signature 'compoundsCluster' clusterProperties(obj) ## S4 method for signature 'compoundsCluster' groupNames(obj) ## S4 method for signature 'compoundsCluster' length(x) ## S4 method for signature 'compoundsCluster' lengths(x, use.names = TRUE) ## S4 method for signature 'compoundsCluster' show(object) ## S4 method for signature 'compoundsCluster,ANY,missing,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'compoundsCluster' treeCut(obj, k = NULL, h = NULL, groupName) ## S4 method for signature 'compoundsCluster' treeCutDynamic(obj, maxTreeHeight, deepSplit, minModuleSize, groupName) ## S4 method for signature 'compoundsCluster,missing' plot( x, ..., groupName, pal = "Paired", colourBranches = lengths(x)[groupName] < 50, showLegend = lengths(x)[groupName] < 20 ) ## S4 method for signature 'compoundsCluster' getMCS(obj, groupName, cluster) ## S4 method for signature 'compoundsCluster' plotStructure( obj, groupName, cluster, width = 500, height = 500, withTitle = TRUE ) ## S4 method for signature 'compoundsCluster' plotSilhouettes(obj, kSeq, groupName, pch = 16, type = "b", ...)
obj, x, object
|
A |
use.names |
A logical value specifying whether the returned vector should be named with the feature group names. |
i |
For |
... |
Further arguments passed directly to the plotting function
( |
drop, j
|
ignored. |
k, h
|
Desired number of clusters or tree height to be used for cutting
the dendrogram, respecitively. One or the other must be specified.
Analogous to |
groupName |
A character specifying the feature group name. |
maxTreeHeight, deepSplit, minModuleSize
|
Arguments used by
|
pal |
Colour palette to be used from RColorBrewer. |
colourBranches |
Whether branches from cut clusters (and their labels)
should be coloured. Might be slow with large numbers of clusters, hence,
the default is only |
showLegend |
If |
cluster |
A numeric value specifying the cluster. |
width, height
|
The dimensions (in pixels) of the raster image that should be plotted. |
withTitle |
A logical value specifying whether a title should be added. |
kSeq |
An integer vector containing the sequence that should be used for average silhouette width calculation. |
pch, type
|
Passed to |
Objects from this type are returned by the compounds method for
makeHCluster.
cutTree and cutTreeDynamic return the modified
compoundsCluster object.
getMCS returns an rcdk molecule object
(IAtomContainer).
clusters(compoundsCluster): Accessor method to the clusters slot.
Returns a list that contains for each feature group an object as returned
by hclust.
cutClusters(compoundsCluster): Accessor method to the cutClusters slot.
Returns a list that contains for each feature group a vector with cluster
membership for each candidate (format as cutree).
clusterProperties(compoundsCluster): Returns a list with properties on how the
clustering was performed.
groupNames(compoundsCluster): returns a character vector with the names of the
feature groups for which data is present in this object.
length(compoundsCluster): Returns the total number of clusters.
lengths(compoundsCluster): Returns a vector with the number of
clusters per feature group.
show(compoundsCluster): Show summary information for this object.
x[i: Subset on feature groups.
treeCut(compoundsCluster): Manually (re-)cut a dendrogram that was
generated for a feature group.
treeCutDynamic(compoundsCluster): Automatically (re-)cut a dendrogram that was
generated for a feature group using the cutreeDynamicTree
function from dynamicTreeCut.
plot(x = compoundsCluster, y = missing): Plot the dendrogram for clustered compounds of a
feature group. Clusters are highlighted using dendextend.
getMCS(compoundsCluster): Calculates the maximum common substructure (MCS)
for all candidate structures within a specified cluster. This method uses
the get.mcs function from rcdk.
plotStructure(compoundsCluster): Plots the maximum common substructure (MCS) for
all candidate structures within a specified cluster.
plotSilhouettes(compoundsCluster): Plots the average silhouette width when the
clusters are cut by a sequence of k numbers. The k value with the highest
value (marked in the plot) may be considered as the optimal number of
clusters.
clustersA list with hclust objects for each
feature group.
distsA list with distance matrices for each feature group.
SMILESA list containing a vector with SMILES for all
candidate structures per feature group.
cutClustersA list with assigned clusters for all candidates per
feature group (same format as what cutree returns).
propertiesA list containing general properties and parameters used for clustering.
Returns an overview of scorings may be applied to rank candidate compounds.
compoundScorings( algorithm = NULL, database = NULL, includeSuspectLists = TRUE, onlyDefault = FALSE, includeNoDB = TRUE )compoundScorings( algorithm = NULL, database = NULL, includeSuspectLists = TRUE, onlyDefault = FALSE, includeNoDB = TRUE )
algorithm |
The algorithm: |
database |
The database for which results should be returned (e.g. |
includeSuspectLists, onlyDefault, includeNoDB
|
A logical specifying whether scoring terms related to suspect lists, default scoring terms and non-database specific scoring terms should be included in the output, respectively. |
A data.frame with information on which scoring terms are used, what their algorithm specific name is
and other information such as to which database they apply and short remarks.
generateCompounds
This class is derived from compounds and contains additional
specific MetFrag data.
settings(compoundsMF) ## S4 method for signature 'compoundsMF' settings(compoundsMF)settings(compoundsMF) ## S4 method for signature 'compoundsMF' settings(compoundsMF)
compoundsMF |
A |
Objects from this class are generated by
generateCompoundsMetFrag
settings(compoundsMF): Accessor method for the settings slot.
settingsA list with all general configuration settings passed to MetFrag. Feature specific items (e.g. spectra and precursor masses) are not contained in this list.
Ruttkies C, Schymanski EL, Wolf S, Hollender J, Neumann S (2016). “MetFrag relaunched: incorporating strategies beyond in silico fragmentation.” Journal of Cheminformatics, 8(1). doi:10.1186/s13321-016-0115-9.
compounds and generateCompoundsMetFrag
This class is derived from compounds and contains additional specific SIRIUS data.
Objects from this class are generated by generateCompoundsSIRIUS
fingerprintsA list with for each feature group result a data.table containing fingerprints
obtained with CSI:FingerID.
Duhrkop K, Fleischauer M, Ludwig M, Aksenov AA, Melnik AV, Meusel M, Dorrestein PC, Rousu J, Bocker S (2019).
“SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information.”
Nature Methods, 16(4), 299–302.
doi:10.1038/s41592-019-0344-8.
Duhrkop K, Bocker S (2015).
“Fragmentation Trees Reloaded.”
In Przytycka TM (ed.), Research in Computational Molecular Biology, 65–79.
ISBN 978-3-319-16706-0.
Duhrkop K, Shen H, Meusel M, Rousu J, Bocker S (2015).
“Searching molecular structure databases with tandem mass spectra using CSI:FingerID.”
Proceedings of the National Academy of Sciences, 112(41), 12580–12585.
doi:10.1073/pnas.1509788112.
Bocker S, Letzel MC, Liptak Z, Pervukhin A (2008).
“SIRIUS: decomposing isotope patterns for metabolite identification.”
Bioinformatics, 25(2), 218–224.
doi:10.1093/bioinformatics/btn603.
compounds and generateCompoundsSIRIUS
Returns the default adducts and their probabilities when the OpenMS algorithm is used for componentization.
defaultOpenMSAdducts(ionization)defaultOpenMSAdducts(ionization)
ionization |
The ionization polarity: either |
See the potentialAdducts argument of generateComponentsOpenMS for more details.
Parameters for creation of extracted ion chromatograms and mobilograms.
getDefEICParams(...) getDefEIMParams(..., IMS = getLimIMS())getDefEICParams(...) getDefEIMParams(..., IMS = getLimIMS())
... |
optional named arguments that override defaults. |
IMS |
A |
The following parameters exist to configure the creation of extracted ion chromatograms (EICs) and extracted ion mobilograms (EIMs):
window A value that is subtracted or added to the minimum and maximum retention time (EICs) or mobility
(EIMs) of the feature. Thus, setting this value to ‘>0’ will 'zoom out' on the x-axis of a chromatogram or
mobilogram. Defaults to defaultLim("retention", "wide") (EICs) or defaultLim("mobility", "wide") (EIMs)
(see limits).
topMost Only create EICs/EIMs for this number of top most intense features. If NULL then
EICs/EIMs are created for all features.
topMostByReplicate If set to TRUE and topMost is set: only create EICs/EIMs for the top
most features in each replicate. For instance, when topMost=1 and topMostByReplicate=TRUE, then only
the most intense feature of each replicate is considered.
onlyPresent If TRUE then EICs/EIMs are created only for analyses in which a feature was detected,
if onlyPresent=FALSE then data is generated for all analyses. The latter is handy to evaluate if a
peak was 'missed' during peak detection or removed during e.g. filtering.
mzExpMobWindow (IMS workflow) Additional m/z tolerance on top of the feature limits. This is for IMS
workflows where features were detected from centroided LC-MS like data, while EICs/EIMs are generated from raw IMS
data. In this case the feature m/z limits were derived from centroided data, which typically has smaller
m/z deviations across scans compared to IMS data. The mzExpMobWindow parameter sets an additional
m/z tolerance to specifically handle this case. Defaults to defaultLim("mz", "default") (see
limits).
minIntensityIMS (IMS workflow) Raw intensity threshold for IMS data. This is primarily intended to speed up raw
data processing.
if onlyPresent=FALSE then the following parameters are also relevant:
mzExpWindow,mobExpWindow To create EICs or EIMs for analyses in which no feature was found, the
m/z or mobility value is derived from the min/max values of all features in the feature group. The value of
mzExpWindow and mobExpWindow further expands this window to allow a greater tolerance. Defaults to
defaultLim("mz", "very_narrow") and defaultLim("mobility", "very_narrow") (see limits).
setsAdductPos,setsAdductNeg (sets workflow) In sets workflows the adduct must be known to calculate the
ionized m/z. If a feature is completely absent in a particular set then it follows no adduct annotations are
available and the value of setsAdductPos (positive ionization data) or setsAdductNeg (negative
ionization data) will be used instead.
The following additional parameters exist specifically for EICs (EICParams):
gapFactor Bruker TIMS data (and maybe others?) seem to omit zero intensity scans, which will lead to
time gaps between spectra and incorrect EICs. To determine a time gap, the gapFactor is multiplied with the
median of time differences between scans. If a gap is detected, then appropriate zero intensity points are added to
the EIC. Set to 0 to disable this.
The following additional parameters exist specifically for EIMs (EIMParams):
maxRTWindow Maximum retention time window (seconds, +/- feature retention time) in which mobilograms
are collected and averaged. Defaults to defaultLim("retention", "very_narrow") (see limits).
smooth The smoothing method that is applied to the EIM. Can be "none" for no smoothing,
"sg" for Savitzky-Golay (using signal)) or "ma" for centered moving average (same algorithm
as used by findFeaturesPiek).
smLength The smoothing length. If smooth="sg" then this is passed as the n argument to
the signal::sgolayfilt function.
sgOrder The smoothing order for Savitzky-Golay. Passed as the p argument to the
signal::sgolayfilt function.
These parameters are passed as a named list as the EICParams or EIMParams argument to functions
that work with EIC or EIM data. The getDefEICParams and getDefEIMParams functions generate such
parameter list with defaults.
Basic rule based filtering of feature groups.
replicateSubtract(fGroups, replicates, threshold = 0) ## S4 method for signature 'featureGroups' filter( obj, absMinIntensity = NULL, relMinIntensity = NULL, preAbsMinIntensity = NULL, preRelMinIntensity = NULL, absMinMaxIntensity = NULL, relMinMaxIntensity = NULL, absMinAnalyses = NULL, relMinAnalyses = NULL, absMinReplicates = NULL, relMinReplicates = NULL, absMinFeatures = NULL, relMinFeatures = NULL, absMinReplicateAbundance = NULL, relMinReplicateAbundance = NULL, absMinConc = NULL, relMinConc = NULL, absMaxTox = NULL, relMaxTox = NULL, absMinConcTox = NULL, relMinConcTox = NULL, maxReplicateIntRSD = NULL, blankThreshold = NULL, retentionRange = NULL, mzRange = NULL, mzDefectRange = NULL, chromWidthRange = NULL, featQualityRange = NULL, groupQualityRange = NULL, replicates = NULL, IMS = NULL, withIMSPrecursor = FALSE, IMSRangeParams = NULL, results = NULL, removeBlanks = FALSE, removeISTDs = FALSE, checkFeaturesSession = NULL, predAggrParams = getDefPredAggrParams(), removeNA = FALSE, negate = FALSE, applyIMS = "both" ) ## S4 method for signature 'featureGroupsSet' filter( obj, ..., negate = FALSE, applyIMS = "both", sets = NULL, absMinSets = NULL, relMinSets = NULL ) ## S4 method for signature 'featureGroups' replicateSubtract(fGroups, replicates, threshold = 0)replicateSubtract(fGroups, replicates, threshold = 0) ## S4 method for signature 'featureGroups' filter( obj, absMinIntensity = NULL, relMinIntensity = NULL, preAbsMinIntensity = NULL, preRelMinIntensity = NULL, absMinMaxIntensity = NULL, relMinMaxIntensity = NULL, absMinAnalyses = NULL, relMinAnalyses = NULL, absMinReplicates = NULL, relMinReplicates = NULL, absMinFeatures = NULL, relMinFeatures = NULL, absMinReplicateAbundance = NULL, relMinReplicateAbundance = NULL, absMinConc = NULL, relMinConc = NULL, absMaxTox = NULL, relMaxTox = NULL, absMinConcTox = NULL, relMinConcTox = NULL, maxReplicateIntRSD = NULL, blankThreshold = NULL, retentionRange = NULL, mzRange = NULL, mzDefectRange = NULL, chromWidthRange = NULL, featQualityRange = NULL, groupQualityRange = NULL, replicates = NULL, IMS = NULL, withIMSPrecursor = FALSE, IMSRangeParams = NULL, results = NULL, removeBlanks = FALSE, removeISTDs = FALSE, checkFeaturesSession = NULL, predAggrParams = getDefPredAggrParams(), removeNA = FALSE, negate = FALSE, applyIMS = "both" ) ## S4 method for signature 'featureGroupsSet' filter( obj, ..., negate = FALSE, applyIMS = "both", sets = NULL, absMinSets = NULL, relMinSets = NULL ) ## S4 method for signature 'featureGroups' replicateSubtract(fGroups, replicates, threshold = 0)
fGroups, obj
|
|
replicates |
A character vector of replicates that should be kept ( |
threshold |
Minimum relative threshold (compared to mean intensity of replicate being subtracted) for a feature group to be not removed. When ‘0’ a feature group is always removed when present in the given replicates. |
absMinIntensity, relMinIntensity
|
Minimum absolute/relative intensity for features to be kept. The relative
intensity is determined from the feature with highest intensity (of
all features from all groups). Set to ‘0’ or |
preAbsMinIntensity, preRelMinIntensity
|
As |
absMinMaxIntensity, relMinMaxIntensity
|
Feature groups are only kept if at least one feature in the group has an intensity above this absolute/relative threshold. |
absMinAnalyses, relMinAnalyses
|
Feature groups are only kept when they contain data for at least this (absolute
or relative) amount of analyses. Set to |
absMinReplicates, relMinReplicates
|
Feature groups are only kept when they contain data for at least this
(absolute or relative) amount of replicates. Set to |
absMinFeatures, relMinFeatures
|
Analyses are only kept when they contain at least this (absolute or relative)
amount of features. Set to |
absMinReplicateAbundance, relMinReplicateAbundance
|
Minimum absolute/relative abundance that a grouped feature
should be present within a replicate. If this minimum is not met all features within the replicate are removed. Set
to |
absMinConc, relMinConc
|
The minimum absolute/relative predicted concentration (calculated by
|
absMaxTox, relMaxTox
|
The maximum absolute/relative predicted toxicity (LC50) (calculated by
|
absMinConcTox, relMinConcTox
|
Like |
maxReplicateIntRSD |
Maximum relative standard deviation (RSD) of intensity values for features within a
replicate. If the RSD is above this value all features within the replicate are removed. Set to |
blankThreshold |
Feature groups that are also present in blank analyses (see
analysis info) are filtered out unless their relative intensity is above this
threshold. For instance, a value of ‘5’ means that only features with an intensity five times higher than that
of the blank are kept. The relative intensity values between blanks and non-blanks are determined from the mean of
all non-zero blank intensities. Set to |
retentionRange, mzRange, mzDefectRange, chromWidthRange
|
Range of retention time (in seconds), m/z, mass
defect (defined as the decimal part of m/z values) or chromatographic peak width (in seconds), respectively.
Features outside this range will be removed. Should be a numeric vector with length of two containing the min/max
values. The maximum can be |
featQualityRange |
Used to filter features by their peak qualities/scores
(see |
groupQualityRange |
Like |
IMS |
(IMS workflow) Specifies which feature groups are considered to be kept in IMS workflows. The following options are valid:
Set to |
withIMSPrecursor |
(IMS workflow) only keep IMS feature groups with IMS precursors, i.e. remove all orphans.
Unaffected by |
IMSRangeParams |
(IMS workflow) A |
results |
Only keep feature groups that have results in the object specified by |
removeBlanks |
Set to |
removeISTDs |
If |
checkFeaturesSession |
If set then features and/or feature groups are removed that were selected for removal
(see check-GUI). The session files are typically generated with the |
predAggrParams |
Parameters to aggregate calculated concentrations/toxicities (obtained with
|
removeNA |
Set to |
negate |
If set to |
applyIMS |
(IMS workflow) whether the filters are only applied to IMS precursors ( |
... |
For sets workflow methods: further arguments passed to the base |
sets |
(sets workflow) A |
absMinSets, relMinSets
|
(sets workflow) Feature groups are only kept when they contain data for at least this (absolute
or relative) amount of sets. Set to |
filter performs common rule based filtering of feature groups such as blank subtraction, minimum
intensity and minimum replicate abundance. Removing of features occurs by zeroing their intensity values.
Furthermore, feature groups that are left completely empty (i.e. all intensities are zero) will be
automatically removed.
replicateSubtract removes feature groups present in a
given set of replicates (unless intensities are above a given
threshold). The replicates that are subtracted will be removed.
A filtered featureGroups object. Feature groups that are filtered away have their intensity set
to zero. In case a feature group is not present in any of the analyses anymore it will be removed completely.
The following methods are changed or with new functionality:
filter has specific arguments to filter by (feature presence in) sets. See the argument descriptions.
Important: the mzRange, mzDefectRange and IMSRangeParams filters use neutral
feature masses, whereas non-sets workflows use m/z values. Hence, adjust accordingly to avoid (slightly)
different results!
When multiple arguments are specified to filter, multiple filters are applied in
sequence. Since some of these filters may affect each other, choosing their order correctly may be important for
effective data filtering. For instance, when an intensity filter removes features from blank analyses, a subsequent
blank filter may not adequately perform blank subtraction. Similarly, when intensity and blank filters are executed
after the replicate abundance filter it may be necessary to ensure minimum replicate abundance again as the
intensity and blank filters may have removed some features within a replicate.
With this in mind, filters (if specified) occur in the following order:
Features/feature groups selected for removal by the session specified by checkFeaturesSession.
Pre-Intensity filters (i.e. preAbsMinIntensity and preRelMinIntensity).
Chromatography and mass filters (i.e retentionRange, mzRange, mzDefectRange,
chromWidthRange, featQualityRange and groupQualityRange).
Replicate abundance filters (i.e. absMinReplicateAbundance, relMinReplicateAbundance and
maxReplicateIntRSD).
Blank filter (i.e. blankThreshold).
Intensity filters (i.e. absMinIntensity and relMinIntensity).
Replicate abundance filters (2nd time, only if previous filters affected results).
Minimum-maximum intensity filters (i.e. absMinMaxIntensity and relMinMaxIntensity).
General abundance filters (i.e. absMinAnalyses, relMinAnalyses, absMinReplicates,
relMinReplicates, absMinFeatures, relMinFeatures), absMinConc, relMinConc,
absMaxTox and relMaxTox.
Replicate filter (i.e. replicates), results filter (i.e. results) and blank
analyses / internal standard removal (i.e. removeBlanks=TRUE / removeISTDs=TRUE).
If another filtering order is desired then filter should be called multiple times with only one filter
argument at a time.
featureGroups-class and groupFeatures
Automatic optimization of feature finding and grouping parameters through Design of Experiments (DoE).
optimizeFeatureGrouping( features, algorithm, ..., templateParams = list(), paramRanges = list(), maxIterations = 50, maxModelDeviation = 0.1, parallel = TRUE ) generateFGroupsOptPSet(algorithm, ...) getDefFGroupsOptParamRanges(algorithm) optimizeFeatureFinding( anaInfo, algorithm, ..., templateParams = list(), paramRanges = list(), isoIdent = if (algorithm == "openms") "OpenMS" else "IPO", checkPeakShape = "none", CAMERAOpts = list(), maxIterations = 50, maxModelDeviation = 0.1, parallel = TRUE ) generateFeatureOptPSet(algorithm, ...) getDefFeaturesOptParamRanges(algorithm, method = "centWave")optimizeFeatureGrouping( features, algorithm, ..., templateParams = list(), paramRanges = list(), maxIterations = 50, maxModelDeviation = 0.1, parallel = TRUE ) generateFGroupsOptPSet(algorithm, ...) getDefFGroupsOptParamRanges(algorithm) optimizeFeatureFinding( anaInfo, algorithm, ..., templateParams = list(), paramRanges = list(), isoIdent = if (algorithm == "openms") "OpenMS" else "IPO", checkPeakShape = "none", CAMERAOpts = list(), maxIterations = 50, maxModelDeviation = 0.1, parallel = TRUE ) generateFeatureOptPSet(algorithm, ...) getDefFeaturesOptParamRanges(algorithm, method = "centWave")
features |
A |
algorithm |
The algorithm used for finding or grouping features (see |
... |
One or more lists with parameter sets (see below) (for |
templateParams |
Template parameter set (see below). |
paramRanges |
A list with vectors containing absolute parameter ranges (minimum/maximum) that constrain numeric
parameters choosen during experiments. See the |
maxIterations |
Maximum number of iterations that may be performed to find optimimum values. Used to restrict neededless long optimization procedures. In IPO this was fixed to ‘50’. |
maxModelDeviation |
See the |
parallel |
If set to |
anaInfo |
Analysis info table (passed to |
isoIdent |
Sets the algorithm used to identify isotopes. Valid values
are: |
checkPeakShape |
Additional peak shape checking of isotopes. Only used
if |
CAMERAOpts |
A |
method |
Method used by XCMS to find features (only if |
Many different parameters exist that may affect the output quality of feature finding and grouping. To avoid time consuming manual experimentation, functionality is provided to largely automate the optimization process. The methodology, which uses design of experiments (DoE), is based on the excellent Isotopologue Parameter Optimization (IPO) R package. The functionality of this package is directly integrated in patRoon. Some functionality was added or changed, however, the principle algorithm workings are nearly identical.
Compared to IPO, the following functionality was added or changed:
The code was made more generic in order to include support for other feature finding/grouping algorithms (e.g. OpenMS, enviPick, XCMS3).
The methodology of FeatureFinderMetabo (OpenMS) may be used to
find isotopes.
The maxModelDeviation parameter was added to potentially avoid suboptimal results
(issue discussed here).
The use of multiple 'parameter sets' (discussed below) which, for instance, allow optimizing qualitative
paremeters more easily (see examples).
More consistent optimization code for feature finding/grouping.
More consistent output using S4 classes (i.e. optimizationResult class).
Parallelization is performed via the future package instead of BiocParallel. If this is enabled
(parallel=TRUE) then any parallelization supported by the feature finding or grouping algorithm is disabled.
The optimizeFeatureFinding and optimizeFeatureGrouping return their results in a
optimizationResult object.
Which parameters should be optimized is determined by a parameter set. A set is
defined by a named list containing the minimum and maximum starting range for each parameter that should be
tested. For instance, the set list(chromFWHM = c(5, 10), mzPPM = c(5, 15)) specifies that the
chromFWHM and mzPPM parameters (used by OpenMS feature finding) should be optimized within a range of
‘5’-‘10’ and ‘5’-‘15’, respectively. Note that this range may be increased or decreased after a
DoE iteration in order to find a better optimum. The absolute limits are controlled by the paramRanges
function argument.
Multiple parameter sets may be specified (i.e. through the ... function argument). In this situation, the
optimization algorithm is repeated for each set, and the final optimum is determined from the parameter set with
the best response. The templateParams function argument may be useful in this case to define a template for
each parameter set. Actual parameter sets are then constructed by joining each parameter set with the set specified
for templateParams. When a parameter is defined in both a regular and template set, the parameter in the
regular set takes precedence.
Parameters that should not be optimized but still need to be set for the feature finding/grouping functions should
also be defined in a (template) parameter set. Which parameters should be optimized is determined whether its value
is specified as a vector range or a single fixed value. For instance, when a set is defined as list(chromFWHM
= c(5, 10), mzPPM = 5), only the chromFWHM parameter is optimized, whereas mzPPM is kept constant at
‘5’.
Using multiple parameter sets with differing fixed values allows optimization of qualitative values (see examples below).
The parameters specified in parameter sets are directly passed through the findFeatures or
groupFeatures functions. Hence, grouping and retention time alignment parameters used by XCMS should
(still) be set through the groupArgs and retcorArgs parameters.
NOTE: For XCMS3, which normally uses parameter classes for settings its options, the parameters must be
defined in a named list like any other algorithm. The set parameters are then used passed to the constructor of the
right parameter class object (e.g. CentWaveParam, ObiwarpParam). For grouping/alignment
sets, these parameters need to be specified in nested lists called groupParams and retAlignParams,
respectively (similar to groupArgs/retcorArgs for algorithm="xcms"). Finally, the underlying
XCMS method to be used should be defined in the parameter set (i.e. by setting the method field for
feature parameter sets and the groupMethod and retAlignMethod for grouping/aligning parameter sets).
See the examples below for more details.
NOTE: Similar to IPO, the peakwidth and prefilter parameters for XCMS feature finding should
be split in two different values:
The minimum and maximum ranges for peakwidth are optimized by setting min_peakwidth and
max_peakwidth, respectively.
The k and I parameters contained in prefilter are split in prefilter and
value_of_prefilter, respectively.
Similary, for KPIC2, the following parameters should be split:
the width parameter (feature optimization) is optimized by specifying the min_width and
max_width parameters.
the tolerance and weight parameters (feature grouping optimization) are optimized by setting
mz_tolerance/rt_tolerance and mz_weight/rt_weight parameters, respectively.
The optimizeFeatureFinding and optimizeFeatureGrouping are the functions to be used
to optimize parameters for feature finding and grouping, respectively. These functions are analogous to
optimizeXcmsSet and optimizeRetGroup from IPO.
The generateFeatureOptPSet and generateFGroupsOptPSet functions may be used to generate a parameter
set for feature finding and grouping, respectively. Some algorithm dependent default parameter optimization ranges
will be returned. These functions are analogous to getDefaultXcmsSetStartingParams and
getDefaultRetGroupStartingParams from IPO. However, unlike their IPO counterparts, these
functions will not output default fixed values. The generateFGroupsOptPSet will only generate defaults for
density grouping if algorithm="xcms".
The getDefFeaturesOptParamRanges and getDefFGroupsOptParamRanges return the default absolute
optimization parameter ranges for feature finding and grouping, respectively. These functions are useful if you
want to set the paramRanges function argument.
After each experiment iteration an optimimum parameter
set is found by generating a model containing the tested parameters and their responses. Sometimes the actual
response from the parameters derived from the model is actually signficantly lower than expected. When the response
is lower than the maximum reponse found during the experiment, the parameters belonging to this experimental
maximum may be choosen instead. The maxModelDeviation argument sets the maximum deviation in response
between the modelled and experimental maxima. The value is relative: ‘0’ means that experimental values will
always be favored when leading to improved responses, whereas 1 will effectively disable this procedure (and
return to 'regular' IPO behaviour).
The code and methodology is a direct adaptation from the IPO R package.
Libiseller G, Dvorzak M, Kleb U, Gander E, Eisenberg T, Madeo F, Neumann S, Trausinger G, Sinner F, Pieber T, Magnes C (2015). “IPO: a tool for automated optimization of XCMS parameters.” BMC Bioinformatics, 16(1). doi:10.1186/s12859-015-0562-8.
# example data from patRoonData package dataDir <- patRoonData::exampleDataPath() anaInfo <- generateAnalysisInfo(dataDir) anaInfo <- anaInfo[1:2, ] # only focus on first two analyses (e.g. training set) # optimize mzPPM and chromFWHM parameters ftOpt <- optimizeFeatureFinding(anaInfo, "openms", list(mzPPM = c(5, 10), chromFWHM = c(4, 8))) # optimize chromFWHM and isotopeFilteringModel (a qualitative parameter) ftOpt2 <- optimizeFeatureFinding(anaInfo, "openms", list(isotopeFilteringModel = "metabolites (5% RMS)"), list(isotopeFilteringModel = "metabolites (2% RMS)"), templateParams = list(chromFWHM = c(4, 8))) # perform grouping optimization with optimized features object fgOpt <- optimizeFeatureGrouping(optimizedObject(ftOpt), "xcms", list(groupArgs = list(bw = c(22, 28)), retcorArgs = list(method = "obiwarp"))) # same, but using the XCMS3 interface fgOpt2 <- optimizeFeatureGrouping(optimizedObject(ftOpt), "xcms3", list(groupMethod = "density", groupParams = list(bw = c(22, 28)), retAlignMethod = "obiwarp")) # plot contour of first parameter set/DoE iteration plot(ftOpt, paramSet = 1, DoEIteration = 1, type = "contour") # generate parameter set with some predefined and custom parameters to be # optimized. pSet <- generateFeatureOptPSet("openms", chromSNR = c(3, 9), useSmoothedInts = FALSE)# example data from patRoonData package dataDir <- patRoonData::exampleDataPath() anaInfo <- generateAnalysisInfo(dataDir) anaInfo <- anaInfo[1:2, ] # only focus on first two analyses (e.g. training set) # optimize mzPPM and chromFWHM parameters ftOpt <- optimizeFeatureFinding(anaInfo, "openms", list(mzPPM = c(5, 10), chromFWHM = c(4, 8))) # optimize chromFWHM and isotopeFilteringModel (a qualitative parameter) ftOpt2 <- optimizeFeatureFinding(anaInfo, "openms", list(isotopeFilteringModel = "metabolites (5% RMS)"), list(isotopeFilteringModel = "metabolites (2% RMS)"), templateParams = list(chromFWHM = c(4, 8))) # perform grouping optimization with optimized features object fgOpt <- optimizeFeatureGrouping(optimizedObject(ftOpt), "xcms", list(groupArgs = list(bw = c(22, 28)), retcorArgs = list(method = "obiwarp"))) # same, but using the XCMS3 interface fgOpt2 <- optimizeFeatureGrouping(optimizedObject(ftOpt), "xcms3", list(groupMethod = "density", groupParams = list(bw = c(22, 28)), retAlignMethod = "obiwarp")) # plot contour of first parameter set/DoE iteration plot(ftOpt, paramSet = 1, DoEIteration = 1, type = "contour") # generate parameter set with some predefined and custom parameters to be # optimized. pSet <- generateFeatureOptPSet("openms", chromSNR = c(3, 9), useSmoothedInts = FALSE)
Various plotting functions for feature group data.
plotMobilograms(obj, ...) ## S4 method for signature 'featureGroups,missing' plot( x, groupBy = NULL, onlyUnique = FALSE, retMin = FALSE, showLegend = TRUE, IMS = "maybe", col = NULL, pch = NULL, ... ) ## S4 method for signature 'featureGroups' plotInt( obj, average = FALSE, averageFunc = mean, areas = FALSE, normalized = FALSE, xBy = NULL, xNames = TRUE, groupBy = "fGroups", regression = FALSE, showLegend = FALSE, IMS = "maybe", pch = 20, type = if (regression) "p" else "b", lty = 3, xlim = NULL, ylim = NULL, col = NULL, plotArgs = NULL, linesArgs = NULL ) ## S4 method for signature 'featureGroups' plotChord( obj, addSelfLinks = FALSE, addRetMzPlots = TRUE, aggregate = FALSE, groupBy = NULL, addIntraOuterGroupLinks = FALSE, IMS = "maybe", ... ) ## S4 method for signature 'featureGroups' plotChroms( obj, analysis = analyses(obj), groupName = names(obj), retMin = FALSE, showPeakArea = FALSE, showFGroupRect = TRUE, title = NULL, groupBy = NULL, showLegend = TRUE, annotate = c("none", "ret", "mz", "mob"), intMax = "eic", EICParams = getDefEICParams(), showProgress = FALSE, IMS = "maybe", xlim = NULL, ylim = NULL, EICs = NULL, ... ) ## S4 method for signature 'featureGroups' plotChroms3D( obj, analysis = analyses(obj), groupName = names(obj), dim3 = "mz", retMin = FALSE, showLimits = TRUE, rtWindow = defaultLim("retention", "medium"), mzWindow = defaultLim("mz", "medium"), mobWindow = defaultLim("mobility", "medium"), gridSize = 50, title = NULL, ... ) ## S4 method for signature 'featureGroupsSet' plotChroms3D(obj, analysis = analyses(obj), ...) ## S4 method for signature 'featureGroups' plotMobilograms( obj, analysis = analyses(obj), groupName = names(obj), showPeakArea = FALSE, showFGroupRect = TRUE, title = NULL, groupBy = NULL, showLegend = TRUE, annotate = c("none", "ret", "mz", "mob"), EIMParams = getDefEIMParams(), showProgress = FALSE, xlim = NULL, ylim = NULL, EIMs = NULL, ... ) ## S4 method for signature 'featureGroups' plotVenn(obj, which = NULL, aggregate = TRUE, IMS = "maybe", ...) ## S4 method for signature 'featureGroups' plotUpSet( obj, which = NULL, aggregate = TRUE, IMS = "maybe", nsets = NULL, nintersects = NA, ... ) ## S4 method for signature 'featureGroups' plotVolcano( obj, FCParams, showLegend = TRUE, averageFunc = mean, normalized = FALSE, IMS = "maybe", col = NULL, pch = 19, ... ) ## S4 method for signature 'featureGroups' plotGraph(obj, onlyPresent = TRUE, width = NULL, height = NULL) ## S4 method for signature 'featureGroupsSet' plotGraph(obj, onlyPresent = TRUE, set, ...) ## S4 method for signature 'featureGroups' plotTICs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, groupBy = NULL, showLegend = TRUE, xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'featureGroups' plotBPCs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, groupBy = NULL, showLegend = TRUE, xlim = NULL, ylim = NULL, ... )plotMobilograms(obj, ...) ## S4 method for signature 'featureGroups,missing' plot( x, groupBy = NULL, onlyUnique = FALSE, retMin = FALSE, showLegend = TRUE, IMS = "maybe", col = NULL, pch = NULL, ... ) ## S4 method for signature 'featureGroups' plotInt( obj, average = FALSE, averageFunc = mean, areas = FALSE, normalized = FALSE, xBy = NULL, xNames = TRUE, groupBy = "fGroups", regression = FALSE, showLegend = FALSE, IMS = "maybe", pch = 20, type = if (regression) "p" else "b", lty = 3, xlim = NULL, ylim = NULL, col = NULL, plotArgs = NULL, linesArgs = NULL ) ## S4 method for signature 'featureGroups' plotChord( obj, addSelfLinks = FALSE, addRetMzPlots = TRUE, aggregate = FALSE, groupBy = NULL, addIntraOuterGroupLinks = FALSE, IMS = "maybe", ... ) ## S4 method for signature 'featureGroups' plotChroms( obj, analysis = analyses(obj), groupName = names(obj), retMin = FALSE, showPeakArea = FALSE, showFGroupRect = TRUE, title = NULL, groupBy = NULL, showLegend = TRUE, annotate = c("none", "ret", "mz", "mob"), intMax = "eic", EICParams = getDefEICParams(), showProgress = FALSE, IMS = "maybe", xlim = NULL, ylim = NULL, EICs = NULL, ... ) ## S4 method for signature 'featureGroups' plotChroms3D( obj, analysis = analyses(obj), groupName = names(obj), dim3 = "mz", retMin = FALSE, showLimits = TRUE, rtWindow = defaultLim("retention", "medium"), mzWindow = defaultLim("mz", "medium"), mobWindow = defaultLim("mobility", "medium"), gridSize = 50, title = NULL, ... ) ## S4 method for signature 'featureGroupsSet' plotChroms3D(obj, analysis = analyses(obj), ...) ## S4 method for signature 'featureGroups' plotMobilograms( obj, analysis = analyses(obj), groupName = names(obj), showPeakArea = FALSE, showFGroupRect = TRUE, title = NULL, groupBy = NULL, showLegend = TRUE, annotate = c("none", "ret", "mz", "mob"), EIMParams = getDefEIMParams(), showProgress = FALSE, xlim = NULL, ylim = NULL, EIMs = NULL, ... ) ## S4 method for signature 'featureGroups' plotVenn(obj, which = NULL, aggregate = TRUE, IMS = "maybe", ...) ## S4 method for signature 'featureGroups' plotUpSet( obj, which = NULL, aggregate = TRUE, IMS = "maybe", nsets = NULL, nintersects = NA, ... ) ## S4 method for signature 'featureGroups' plotVolcano( obj, FCParams, showLegend = TRUE, averageFunc = mean, normalized = FALSE, IMS = "maybe", col = NULL, pch = 19, ... ) ## S4 method for signature 'featureGroups' plotGraph(obj, onlyPresent = TRUE, width = NULL, height = NULL) ## S4 method for signature 'featureGroupsSet' plotGraph(obj, onlyPresent = TRUE, set, ...) ## S4 method for signature 'featureGroups' plotTICs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, groupBy = NULL, showLegend = TRUE, xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'featureGroups' plotBPCs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, groupBy = NULL, showLegend = TRUE, xlim = NULL, ylim = NULL, ... )
obj, x
|
|
... |
passed to |
groupBy |
Specifies how results are grouped in the plot. Should be a name of a column in the
analysis information table which is used to make analysis groups (e.g.
For For |
onlyUnique |
If |
retMin |
Plot retention time in minutes (instead of seconds). |
showLegend |
Plot a legend if |
IMS |
(IMS workflow) Specifies which feature groups are considered for plotting in IMS workflows. The following options are valid:
For |
col |
Colour(s) used. If |
pch, type, lty
|
Common plotting parameters passed to e.g. For for |
average |
Controls plot data averaging: see the |
averageFunc, normalized
|
Used for intensity data treatment, see the documentation for the
|
areas |
Set to |
xBy |
Controls x-value grouping in the plot: see the |
xNames |
Plot names ( |
regression |
If |
xlim, ylim
|
Sets the plot size limits used by
|
plotArgs, linesArgs
|
A |
addSelfLinks |
If |
addRetMzPlots |
Set to |
aggregate |
Specifies how data should be aggregated prior to comparison. Set to |
addIntraOuterGroupLinks |
If |
analysis, groupName
|
For For |
showPeakArea |
Set to |
showFGroupRect |
Set to |
title |
Character string used for title of the plot. If |
annotate |
Set to |
intMax |
Method used to determine the maximum intensity plot limit. Should be |
EICParams |
A named |
showProgress |
if set to |
EICs, EIMs
|
Internal parameter for now and should be kept at |
dim3 |
The third dimension to plot besides retention time and intensity. Can be either |
showLimits |
If |
rtWindow, mzWindow, mobWindow
|
Numeric values specifying the window size around the feature for retention time,
m/z, and ion mobility respectively. Values |
gridSize |
The size of the grid for interpolation. |
EIMParams |
A named |
which |
A character vector with the selection to compare (e.g. replicates, as set by the |
nsets, nintersects
|
See |
FCParams |
A parameter list to calculate Fold change data. See |
onlyPresent |
Only plot feature groups of internal standards that are still present in the |
width, height
|
Passed to |
set |
(sets workflow) The set for which data must be plotted. |
retentionRange |
Range of retention time (in seconds) to collect TIC traces. Should be a numeric vector with length of two containing the min/max values. Set to NULL to ignore. |
MSLevel |
Integer vector with the ms levels (i.e., 1 for MS1 and 2 for MS2) to obtain TIC traces. |
plot Generates an m/z vs retention time plot for all featue groups. Optionally
highlights unique/overlapping presence amongst replicates.
plotInt Generates a line plot with feature intensities.
plotChord Generates a chord diagram which can be used to
visualize shared presence of feature groups between analyses or replicate
groups. In addition, analyses/replicates sharing similar properties
(e.g. location, age, type) may be grouped to enhance visualization
between these 'outer groups'.
plotChroms Plots extracted ion chromatograms (EICs) of feature groups.
plotChroms3D generates a 3D plot of chromatographic data for a single feature group in a single
analysis. The plot shows retention time on the x-axis, m/z or mobility on the y-axis, and intensity as color in a
contour plot. The plot is made with the filled.contour function.
plotMobilograms Plots extracted ion mobilograms (EIMs) of feature groups.
plotVenn plots a Venn diagram (using VennDiagram) outlining unique and shared feature
groups between up to five replicates.
plotUpSet plots an UpSet diagram (using the upset function) outlining unique
and shared feature groups between given replicates.
plotVolcano Plots Fold change data in a 'Volcano plot'.
plotGraph generates an interactive network plot which is used to explore internal standard (IS)
assignments to each feature group. This requires the availability of IS assignments, see the documentation for
normInts for details. The graph is rendered with visNetwork.
plotTICs Plots the total ion chromatogram/s (TICs) of the analyses.
plotTICs Plots the base peak chromatogram/s (BPCs) of the analyses.
plotChroms3D returns the interpolated grid used for plotting (generated with
mba.surf).
plotVenn (invisibly) returns a list with the following fields:
gList the gList object that was returned by
the utilized VennDiagram plotting function.
areas The total area for each plotted group.
intersectionCounts The number of intersections between groups.
The order for the areas and intersectionCounts fields is the same as the parameter order
from the used plotting function (see e.g. draw.pairwise.venn and
draw.triple.venn).
plotGraph returns the result of visNetwork.
The raw data interface of patRoon is used by plotChroms, plotMobilograms, plotTICs and plotBPCs to
process HRMS (or IMS-HRMS) data. Please see its documentation for more information on the supported
formats and available configuration options.
The following methods are changed or with new functionality:
plotGraph only plots data per set, and requires the set argument to be set.
In sets workflows the analysis information contains an additional "set"
column, which can be used for arguments that involve grouping of analyses. For instance, if groupBy="set"
then plotting data is grouped per set.
the average, xBy and groupBy arguments control how
data is aggregated in intensity plots:
average: controls the averaging of feature intensities prior to plotting.
xBy: can map the x value of individual points to analysis metadata. For example, exposure time or
sample location. Non-numeric values are allowed (unless regression=TRUE).
groupBy: controls the grouping of points in the plot. Equal groups are plotted in sequence so they can
be connected with lines and are coloured equally. Examples include experiment type or feature groups.
The following values are valid:
FALSE (average) or NULL (xBy and groupBy): aggregation is disabled.
TRUE (only average): results are averaged for each replicate
a name of a column in the analysis information: results are aggregated for analyses with the same table column value.
"fGroups" (only groupBy): plots are grouped by feature groups.
Rick Helmus <[email protected]> and Ricardo Cunha <[email protected]> (plotTICs and
plotBPCs functions)
Gu Z, Gu L, Eils R, Schlesner M, Brors B (2014). “circlize implements and enhances circular visualization in R.” Bioinformatics, 30, 2811-2812.
Conway JR, Lex A, Gehlenborg N (2017).
“UpSetR: an R package for the visualization of intersecting sets and their properties.”
Bioinformatics, 33(18), 2938-2940.
doi:10.1093/bioinformatics/btx364.
http://dx.doi.org/10.1093/bioinformatics/btx364.
Lex A, Gehlenborg N, Strobelt H, Vuillemot R, Pfister H (2014).
“UpSet: Visualization of Intersecting Sets.”
IEEE Transactions on Visualization and Computer Graphics, 20(12), 1983–1992.
doi:10.1109/tvcg.2014.2346248.
featureGroups-class, groupFeatures
Functions to define and work with chromatographic peak quality metrics used for feature and feature group quality calculations.
featureQualities(qualities = NULL) featureGroupQualities(qualities = NULL) featureQualityNames(feat = TRUE, group = TRUE, scores = FALSE, totScore = TRUE)featureQualities(qualities = NULL) featureGroupQualities(qualities = NULL) featureQualityNames(feat = TRUE, group = TRUE, scores = FALSE, totScore = TRUE)
qualities |
A character vector specifying which qualities to return. If |
feat |
If |
group |
If |
scores |
If |
totScore |
If |
These functions provide access to quality metrics that are used to assess the quality of detected features and
feature groups by the calculatePeakQualities function. The quality metrics are calculated using the
MetaClean package and are useful for filtering out low-quality features before further analysis.
For featureQualities and featureGroupQualities: A named list containing quality definitions.
Each element contains:
func: The MetaClean function to calculate the quality metric
HQ: "HV" (high values good) or "LV" (low values good)
range: Expected range of values (may be Inf for unbounded ranges)
For featureQualityNames: A character vector with quality and/or score names.
The featureQualities function defines quality metrics that are calculated for
individual features. The following quality metrics are available:
ApexBoundaryRatio - Ratio between apex and boundary intensities
FWHM2Base - Full width at half maximum to base ratio
Jaggedness - Measure of peak smoothness
Modality - Measure of peak multiplicity
Symmetry - Measure of peak symmetry (range: -1 to 1)
GaussianSimilarity - Similarity to Gaussian distribution
Sharpness - Measure of peak sharpness
TPASR - Triangle peak area similarity ratio
ZigZag - Zig-zag index measure
The featureGroupQualities function defines quality metrics that are
calculated for feature groups across multiple analyses. The following quality metrics are available:
ElutionShift - Measure of retention time consistency across analyses
RetentionTimeCorrelation - Correlation of retention times across analyses
Chetnik K, Petrick L, Pandey G (2020). “MetaClean: a machine learning-based classifier for reduced false positive peak detection in untargeted LC-MS metabolomics data.” Metabolomics, 16(11). doi:10.1007/s11306-020-01738-3.
Convert feature group data to a data.table (or data.frame).
## S4 method for signature 'featureGroups' as.data.table( x, average = FALSE, areas = FALSE, features = FALSE, qualities = FALSE, regression = FALSE, regressionBy = NULL, averageFunc = mean, normalized = FALSE, FCParams = NULL, concAggrParams = getDefPredAggrParams(), toxAggrParams = getDefPredAggrParams(), normConcToTox = FALSE, anaInfoCols = NULL, IMS = "both" ) ## S4 method for signature 'featureGroupsScreening' as.data.table(x, ..., collapseSuspects = ",", onlyHits = FALSE) ## S4 method for signature 'featureGroupsScreeningSet' as.data.table(x, ..., collapseSuspects = ",", onlyHits = FALSE)## S4 method for signature 'featureGroups' as.data.table( x, average = FALSE, areas = FALSE, features = FALSE, qualities = FALSE, regression = FALSE, regressionBy = NULL, averageFunc = mean, normalized = FALSE, FCParams = NULL, concAggrParams = getDefPredAggrParams(), toxAggrParams = getDefPredAggrParams(), normConcToTox = FALSE, anaInfoCols = NULL, IMS = "both" ) ## S4 method for signature 'featureGroupsScreening' as.data.table(x, ..., collapseSuspects = ",", onlyHits = FALSE) ## S4 method for signature 'featureGroupsScreeningSet' as.data.table(x, ..., collapseSuspects = ",", onlyHits = FALSE)
x |
The |
average |
Controls the averaging of feature intensities. Averaging also influences the calculation of regression
parameters. Set to If |
areas |
If set to |
features |
If |
qualities |
Adds feature (group) qualities ( |
regression, regressionBy
|
Used for regression calculations. See the |
averageFunc |
Function used for averaging. Only used when data is averaged or |
normalized |
If |
FCParams |
A parameter list to calculate fold change data. See |
concAggrParams, toxAggrParams
|
Parameters to aggregate calculated concentrations/toxicities (obtained with
|
normConcToTox |
Set to |
anaInfoCols |
A |
IMS |
(IMS workflow) Specifies which feature groups are considered to be returned in IMS workflows. The following options are valid:
|
... |
Passed to the parent |
collapseSuspects |
If a |
onlyHits |
If |
The as.data.table generic function converts most feature group data to a highly customizable
data.table. If a classical data.frame is preferred, the as.data.frame generic function
can be used instead and accepts the exact same arguments. The methods defined for suspect
screening workflows will merge the information from screenInfo, such as
suspect names and other properties and annotation data.
The regression argument controls the calculation of regression parameters
from a regression model calculated with feature intensities (or areas if areas=TRUE). Here, simple linear
regression is used, i.e. ‘y=ax+b’ with ‘a’ the slope and ‘b’ the intercept. The value for
regression should be the name of a column in the analysis information table
with numerical data to be used for x-values. Alternatively, if regression=TRUE then the "conc" column
is used. Any NA x-values are ignored, and no regression will be calculated if less than two (non-NA)
x-values are available. The output table will contain properties such as the slope and correlation coefficient
(R-squared). Furthermore, if features=TRUE then x-values will be calculated from the model and stored in the
x_reg column.
The regressionBy argument can be used to construct separate regression models for different groups of
analysis. It should be set to the name of a column in the analysis information table
which defines the grouping between samples. If features=TRUE then the grouping is stored in the
regression_group column of the output table.
Please see the handbook for examples on how to use the regression functionality.
The as.data.table method for featureGroupsScreening supports an
additional format where each suspect hit is reported on a separate row (enabled by setting
collapseSuspects=NULL). In this format the suspect
properties from the screenInfo method are merged with each suspect row. Alternatively, if suspect
collapsing is enabled (the default) then the regular as.data.table format is used, and amended with the
names and estimated ID levels (if available) of the suspects matched to a feature group (each separated by the
value of the collapseSuspects argument).
Suspect collapsing also influences the reporting of predicted feature concentrations and
toxicities. In the case that (1) suspects are not collapsed in the output table and (2)
predictions are available for a specific suspect hit (i.e. if predictRespFactors or
predictTox was called on the feature groups object), then only the suspect specific data is reported
and no aggregation is performed. Hence, this allows you to obtain specific concentration/toxicity values for each
suspect/feature group pair.
If the IMS argument is set to "both" or "maybe" then
"mobility_collapsed" and "CCS_collapsed" columns will be added that summarize all
mobility/CCS values of the IMS features (or IMS feature groups) assigned to this IMS precursor. These
numbers are currently rounded to ‘3’ decimals.
In a sets workflow normalization of feature intensities occur per set.
In sets workflows the analysis information contains an additional "set"
column, which can be used for arguments that involve grouping of analyses. For instance, if
regressionBy="set" then regression models will be calculated for each set.
Holds information for all feature group annotations.
## S4 method for signature 'featureAnnotations' annotations(obj) ## S4 method for signature 'featureAnnotations' groupNames(obj) ## S4 method for signature 'featureAnnotations' length(x) ## S4 method for signature 'featureAnnotations,ANY,missing,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'featureAnnotations,ANY,missing' x[[i, j]] ## S4 method for signature 'featureAnnotations' x$name ## S4 method for signature 'featureAnnotations' as.data.table( x, fGroups = NULL, fragments = FALSE, countElements = NULL, countFragElements = NULL, OM = FALSE, normalizeScores = "none", excludeNormScores = defaultExclNormScores(x) ) ## S4 method for signature 'featureAnnotations' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'featureAnnotations' filter( obj, minExplainedPeaks = NULL, scoreLimits = NULL, elements = NULL, fragElements = NULL, lossElements = NULL, fragFormulas = NULL, lossFormulas = NULL, topMost = NULL, OM = FALSE, maxLevel = NULL, negate = FALSE ) ## S4 method for signature 'featureAnnotations' plotVenn(obj, ..., labels = NULL, vennArgs = NULL) ## S4 method for signature 'featureAnnotations' plotUpSet( obj, ..., labels = NULL, nsets = NULL, nintersects = NA, upsetArgs = NULL )## S4 method for signature 'featureAnnotations' annotations(obj) ## S4 method for signature 'featureAnnotations' groupNames(obj) ## S4 method for signature 'featureAnnotations' length(x) ## S4 method for signature 'featureAnnotations,ANY,missing,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'featureAnnotations,ANY,missing' x[[i, j]] ## S4 method for signature 'featureAnnotations' x$name ## S4 method for signature 'featureAnnotations' as.data.table( x, fGroups = NULL, fragments = FALSE, countElements = NULL, countFragElements = NULL, OM = FALSE, normalizeScores = "none", excludeNormScores = defaultExclNormScores(x) ) ## S4 method for signature 'featureAnnotations' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'featureAnnotations' filter( obj, minExplainedPeaks = NULL, scoreLimits = NULL, elements = NULL, fragElements = NULL, lossElements = NULL, fragFormulas = NULL, lossFormulas = NULL, topMost = NULL, OM = FALSE, maxLevel = NULL, negate = FALSE ) ## S4 method for signature 'featureAnnotations' plotVenn(obj, ..., labels = NULL, vennArgs = NULL) ## S4 method for signature 'featureAnnotations' plotUpSet( obj, ..., labels = NULL, nsets = NULL, nintersects = NA, upsetArgs = NULL )
obj, x
|
|
i, j
|
For |
... |
For the For Others: Any further (and unique) |
drop |
ignored. |
name |
The feature group name (partially matched). |
fGroups |
The |
fragments |
If |
countElements, countFragElements
|
A |
OM |
For For |
normalizeScores |
A |
excludeNormScores |
A
For |
minExplainedPeaks |
Minimum number of explained peaks. Set to |
scoreLimits |
Filter results by their scores. Should be a named |
elements |
Only retain candidate formulae (neutral form) that match a
given elemental restriction. The format of |
fragElements, lossElements
|
Specifies elemental restrictions for
fragment or neutral loss formulae (charged form). Candidates are retained
if at least one of the fragment formulae follow (or not follow if
|
fragFormulas, lossFormulas
|
A |
topMost |
Only keep a maximum of |
maxLevel |
Filter by maximum identification level (e.g. |
negate |
If |
labels |
A |
vennArgs |
A |
nsets, nintersects
|
See |
upsetArgs |
A list with any further arguments to be passed to
|
This class stores annotation data for feature groups, such as molecular formulae, SMILES identifiers, compound names
etc. The class of objects that are generated by formula and compound annotation (generateFormulas and
generateCompounds) are based on this class.
as.data.table returns a data.table.
delete returns the object for which the specified data was removed.
filter returns a filtered featureAnnotations object.
plotVenn (invisibly) returns a list with the following fields:
gList the gList object that was returned by
the utilized VennDiagram plotting function.
areas The total area for each plotted group.
intersectionCounts The number of intersections between groups.
The order for the areas and intersectionCounts fields is the same as the parameter order
from the used plotting function (see e.g. draw.pairwise.venn and
draw.triple.venn).
annotations(featureAnnotations): Accessor for the groupAnnotations slot.
groupNames(featureAnnotations): returns a character vector with the names of the
feature groups for which data is present in this object.
length(featureAnnotations): Obtain total number of candidates.
x[i: Subset on feature groups.
x[[i: Extracts annotation data for a feature group.
$: Extracts annotation data for a feature group.
as.data.table(featureAnnotations): Generates a table with all annotation data for each feature group and other
information such as element counts.
delete(featureAnnotations): Completely deletes specified annotations.
filter(featureAnnotations): Provides rule based filtering for feature group annotations. Useful to eliminate
unlikely candidates and speed up further processing.
plotVenn(featureAnnotations): plots a Venn diagram (using VennDiagram) outlining unique and shared
candidates of up to five different featureAnnotations objects.
plotUpSet(featureAnnotations): plots an UpSet diagram (using the upset function) outlining
unique and shared candidates between different featureAnnotations objects.
groupAnnotationsA list with for each annotated feature group a data.table with annotation data.
Use the annotations method for access.
scoreTypesA character with all the score types present in this object.
scoreRangesThe minimum and maximum score values of all candidates for each feature group. Used for normalization.
Calculation of the aromaticity index (AI) and related double bond equivalents (DBE_AI) is performed as described in Koch 2015. Formula classification is performed by the rules described in Abdulla 2013. Filtering of OM related molecules is performed as described in Koch 2006 and Kujawinski 2006. (see references).
Koch BP, Dittmar T (2015).
“From mass to structure: an aromaticity index for high-resolution mass data of natural organic matter.”
Rapid Communications in Mass Spectrometry, 30(1), 250–250.
doi:10.1002/rcm.7433.
Abdulla HA, Sleighter RL, Hatcher PG (2013).
“Two Dimensional Correlation Analysis of Fourier Transform Ion Cyclotron Resonance Mass Spectra of Dissolved Organic Matter: A New Graphical Analysis of Trends.”
Analytical Chemistry, 85(8), 3895–3902.
doi:10.1021/ac303221j.
Koch BP, Dittmar T (2006).
“From mass to structure: an aromaticity index for high-resolution mass data of natural organic matter.”
Rapid Communications in Mass Spectrometry, 20(5), 926–932.
doi:10.1002/rcm.2386.
Kujawinski EB, Behn MD (2006).
“Automated Analysis of Electrospray Ionization Fourier Transform Ion Cyclotron Resonance Mass Spectra of Natural Organic Matter.”
Analytical Chemistry, 78(13), 4363–4373.
doi:10.1021/ac0600306.
Conway JR, Lex A, Gehlenborg N (2017).
“UpSetR: an R package for the visualization of intersecting sets and their properties.”
Bioinformatics, 33(18), 2938-2940.
doi:10.1093/bioinformatics/btx364.
http://dx.doi.org/10.1093/bioinformatics/btx364.
Lex A, Gehlenborg N, Strobelt H, Vuillemot R, Pfister H (2014).
“UpSet: Visualization of Intersecting Sets.”
IEEE Transactions on Visualization and Computer Graphics, 20(12), 1983–1992.
doi:10.1109/tvcg.2014.2346248.
formulas-class and compounds-class
The derived formulas and compounds classes.
This class holds all the information for grouped features.
groupTable(object, ...) groupFeatIndex(fGroups) groupInfo(fGroups) unique(x, incomparables = FALSE, ...) overlap(fGroups, which = NULL, aggregate = TRUE, exclusive = FALSE, ...) selectIons(fGroups, components, prefAdduct, ...) groupQualities(fGroups) groupScores(fGroups) internalStandards(fGroups) internalStandardAssignments(fGroups, ...) normInts( fGroups, featNorm = "none", groupNorm = FALSE, normFunc = max, standards = NULL, ISTDRTWindow = 120, ISTDMZWindow = 300, minISTDs = 3, ... ) concentrations(fGroups, ...) toxicities(fGroups, ...) updateGroups(fGroups, ...) ## S4 method for signature 'featureGroups' names(x) ## S4 method for signature 'featureGroups' analyses(obj) ## S4 method for signature 'featureGroups' replicates(obj) ## S4 method for signature 'featureGroups' groupNames(obj) ## S4 method for signature 'featureGroups' length(x) ## S4 method for signature 'featureGroups' hasIMS(obj) ## S4 method for signature 'featureGroups' fromIMS(obj) ## S4 method for signature 'featureGroups' show(object) ## S4 method for signature 'featureGroups' groupTable(object, areas = FALSE, normalized = FALSE) ## S4 method for signature 'featureGroups' analysisInfo(obj, df = FALSE) ## S4 replacement method for signature 'featureGroups' analysisInfo(obj) <- value ## S4 method for signature 'featureGroups' groupInfo(fGroups) ## S4 method for signature 'featureGroups' featureTable(obj) ## S4 method for signature 'featureGroups' getFeatures(obj) ## S4 method for signature 'featureGroups' groupFeatIndex(fGroups) ## S4 method for signature 'featureGroups' groupQualities(fGroups) ## S4 method for signature 'featureGroups' groupScores(fGroups) ## S4 method for signature 'featureGroups' getFeatureQualityNames( obj, feat = TRUE, group = TRUE, scores = FALSE, totScore = TRUE ) ## S4 method for signature 'featureGroups' annotations(obj) ## S4 method for signature 'featureGroups' internalStandards(fGroups) ## S4 method for signature 'featureGroups' internalStandardAssignments(fGroups) ## S4 method for signature 'featureGroups' adducts(obj) ## S4 replacement method for signature 'featureGroups' adducts(obj) <- value ## S4 method for signature 'featureGroups' concentrations(fGroups) ## S4 method for signature 'featureGroups' toxicities(fGroups) ## S4 method for signature 'featureGroups,ANY,ANY,missing' x[i, j, ..., ni, replicates, IMS, results, reorder = FALSE, drop = TRUE] ## S4 method for signature 'featureGroups,ANY,ANY' x[[i, j]] ## S4 method for signature 'featureGroups' x$name ## S4 method for signature 'featureGroups' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'featureGroups' export(obj, type, out, IMS = FALSE) ## S4 method for signature 'featureGroups' unique( x, incomparables = FALSE, which, aggregate = TRUE, relativeTo = NULL, outer = FALSE ) ## S4 method for signature 'featureGroups' overlap(fGroups, which, aggregate, exclusive) ## S4 method for signature 'featureGroups' calculatePeakQualities( obj, weights, flatnessFactor, featureQualities = NULL, featureGroupQualities = NULL, avgFunc = mean, EICParams = getDefEICParams(window = 0), parallel = TRUE ) ## S4 method for signature 'featureGroups' selectIons( fGroups, components, prefAdduct, onlyMonoIso = TRUE, chargeMismatch = "adduct" ) ## S4 method for signature 'featureGroups' normInts( fGroups, featNorm = "none", groupNorm = FALSE, normFunc = max, standards = NULL, ISTDRTWindow = 120, ISTDMZWindow = 300, minISTDs = 3, ... ) ## S4 method for signature 'featureGroups' getTICs(obj, retentionRange = NULL, MSLevel = 1) ## S4 method for signature 'featureGroups' getBPCs(obj, retentionRange = NULL, MSLevel = 1) ## S4 method for signature 'featureGroups' updateGroups( fGroups, what = c("ret", "mz", "mobility", "CCS"), intWeight = FALSE ) ## S4 method for signature 'featureGroupsSet' sets(obj) ## S4 method for signature 'featureGroupsSet' internalStandardAssignments(fGroups, set = NULL) ## S4 method for signature 'featureGroupsSet' adducts(obj, set, ...) ## S4 replacement method for signature 'featureGroupsSet' adducts(obj, set, reGroup = TRUE) <- value ## S4 method for signature 'featureGroupsSet' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'featureGroupsSet' show(object) ## S4 method for signature 'featureGroupsSet' featureTable(obj) ## S4 method for signature 'featureGroupsSet,ANY,ANY,missing' x[i, j, ..., sets = NULL, reorder = FALSE, drop = TRUE] ## S4 method for signature 'featureGroupsSet' export(obj, type, out, ..., set) ## S4 method for signature 'featureGroupsSet' selectIons(fGroups, components, prefAdduct, ...) ## S4 method for signature 'featureGroupsSet' normInts( fGroups, featNorm = "none", groupNorm = FALSE, normFunc = max, standards = NULL, ISTDRTWindow = 120, ISTDMZWindow = 300, minISTDs = 3, ... ) ## S4 method for signature 'featureGroupsSet' unset(obj, set) ## S4 method for signature 'featureGroupsKPIC2' delete(obj, ...) ## S4 replacement method for signature 'featureGroupsXCMS' analysisInfo(obj) <- value ## S4 method for signature 'featureGroupsXCMS' delete(obj, ...) ## S4 method for signature 'featureGroupsXCMS3' delete(obj, ...)groupTable(object, ...) groupFeatIndex(fGroups) groupInfo(fGroups) unique(x, incomparables = FALSE, ...) overlap(fGroups, which = NULL, aggregate = TRUE, exclusive = FALSE, ...) selectIons(fGroups, components, prefAdduct, ...) groupQualities(fGroups) groupScores(fGroups) internalStandards(fGroups) internalStandardAssignments(fGroups, ...) normInts( fGroups, featNorm = "none", groupNorm = FALSE, normFunc = max, standards = NULL, ISTDRTWindow = 120, ISTDMZWindow = 300, minISTDs = 3, ... ) concentrations(fGroups, ...) toxicities(fGroups, ...) updateGroups(fGroups, ...) ## S4 method for signature 'featureGroups' names(x) ## S4 method for signature 'featureGroups' analyses(obj) ## S4 method for signature 'featureGroups' replicates(obj) ## S4 method for signature 'featureGroups' groupNames(obj) ## S4 method for signature 'featureGroups' length(x) ## S4 method for signature 'featureGroups' hasIMS(obj) ## S4 method for signature 'featureGroups' fromIMS(obj) ## S4 method for signature 'featureGroups' show(object) ## S4 method for signature 'featureGroups' groupTable(object, areas = FALSE, normalized = FALSE) ## S4 method for signature 'featureGroups' analysisInfo(obj, df = FALSE) ## S4 replacement method for signature 'featureGroups' analysisInfo(obj) <- value ## S4 method for signature 'featureGroups' groupInfo(fGroups) ## S4 method for signature 'featureGroups' featureTable(obj) ## S4 method for signature 'featureGroups' getFeatures(obj) ## S4 method for signature 'featureGroups' groupFeatIndex(fGroups) ## S4 method for signature 'featureGroups' groupQualities(fGroups) ## S4 method for signature 'featureGroups' groupScores(fGroups) ## S4 method for signature 'featureGroups' getFeatureQualityNames( obj, feat = TRUE, group = TRUE, scores = FALSE, totScore = TRUE ) ## S4 method for signature 'featureGroups' annotations(obj) ## S4 method for signature 'featureGroups' internalStandards(fGroups) ## S4 method for signature 'featureGroups' internalStandardAssignments(fGroups) ## S4 method for signature 'featureGroups' adducts(obj) ## S4 replacement method for signature 'featureGroups' adducts(obj) <- value ## S4 method for signature 'featureGroups' concentrations(fGroups) ## S4 method for signature 'featureGroups' toxicities(fGroups) ## S4 method for signature 'featureGroups,ANY,ANY,missing' x[i, j, ..., ni, replicates, IMS, results, reorder = FALSE, drop = TRUE] ## S4 method for signature 'featureGroups,ANY,ANY' x[[i, j]] ## S4 method for signature 'featureGroups' x$name ## S4 method for signature 'featureGroups' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'featureGroups' export(obj, type, out, IMS = FALSE) ## S4 method for signature 'featureGroups' unique( x, incomparables = FALSE, which, aggregate = TRUE, relativeTo = NULL, outer = FALSE ) ## S4 method for signature 'featureGroups' overlap(fGroups, which, aggregate, exclusive) ## S4 method for signature 'featureGroups' calculatePeakQualities( obj, weights, flatnessFactor, featureQualities = NULL, featureGroupQualities = NULL, avgFunc = mean, EICParams = getDefEICParams(window = 0), parallel = TRUE ) ## S4 method for signature 'featureGroups' selectIons( fGroups, components, prefAdduct, onlyMonoIso = TRUE, chargeMismatch = "adduct" ) ## S4 method for signature 'featureGroups' normInts( fGroups, featNorm = "none", groupNorm = FALSE, normFunc = max, standards = NULL, ISTDRTWindow = 120, ISTDMZWindow = 300, minISTDs = 3, ... ) ## S4 method for signature 'featureGroups' getTICs(obj, retentionRange = NULL, MSLevel = 1) ## S4 method for signature 'featureGroups' getBPCs(obj, retentionRange = NULL, MSLevel = 1) ## S4 method for signature 'featureGroups' updateGroups( fGroups, what = c("ret", "mz", "mobility", "CCS"), intWeight = FALSE ) ## S4 method for signature 'featureGroupsSet' sets(obj) ## S4 method for signature 'featureGroupsSet' internalStandardAssignments(fGroups, set = NULL) ## S4 method for signature 'featureGroupsSet' adducts(obj, set, ...) ## S4 replacement method for signature 'featureGroupsSet' adducts(obj, set, reGroup = TRUE) <- value ## S4 method for signature 'featureGroupsSet' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'featureGroupsSet' show(object) ## S4 method for signature 'featureGroupsSet' featureTable(obj) ## S4 method for signature 'featureGroupsSet,ANY,ANY,missing' x[i, j, ..., sets = NULL, reorder = FALSE, drop = TRUE] ## S4 method for signature 'featureGroupsSet' export(obj, type, out, ..., set) ## S4 method for signature 'featureGroupsSet' selectIons(fGroups, components, prefAdduct, ...) ## S4 method for signature 'featureGroupsSet' normInts( fGroups, featNorm = "none", groupNorm = FALSE, normFunc = max, standards = NULL, ISTDRTWindow = 120, ISTDMZWindow = 300, minISTDs = 3, ... ) ## S4 method for signature 'featureGroupsSet' unset(obj, set) ## S4 method for signature 'featureGroupsKPIC2' delete(obj, ...) ## S4 replacement method for signature 'featureGroupsXCMS' analysisInfo(obj) <- value ## S4 method for signature 'featureGroupsXCMS' delete(obj, ...) ## S4 method for signature 'featureGroupsXCMS3' delete(obj, ...)
... |
For the For For For sets workflow methods: further arguments passed to the base |
fGroups, obj, x, object
|
|
incomparables |
Not used. Included for compatibility with the generic. |
which |
A character vector with the selection to compare (e.g. replicates, as set by the For |
aggregate |
Specifies how data should be aggregated prior to comparison. Set to |
exclusive |
If |
components |
The |
prefAdduct |
The 'preferred adduct' (see method description). This is often |
featNorm |
The method applied for feature normalization: |
groupNorm |
If |
normFunc |
A |
standards |
A (sets workflow) Can also be a See the |
ISTDRTWindow, ISTDMZWindow
|
The retention time and m/z windows for IS selection. Only used if
|
minISTDs |
The minimum number of IS that should be assigned to each feature (if possible). Only used if
|
areas |
If set to |
normalized |
If |
df |
If |
value |
For For |
feat, group
|
If |
scores |
If |
totScore |
If |
i, j
|
For |
ni |
Optional argument. An expression used for subsetting the analyses. The
analysis information is first subset and the remaining rows are used to determine for
which analyses the results should be kept. The unevaluated |
replicates |
An optional |
IMS |
(IMS workflow) Specifies which feature groups are considered to be kept (
For |
results |
Optional argument. If specified only feature groups with results in the specified object are kept. The
class of |
reorder |
If (sets workflow) If the |
drop |
Ignored. |
name |
The feature group name (partially matched). |
type |
The export type: |
out |
The destination file for the exported data. |
relativeTo |
A character vector of groupings that should be used for unique comparison. The groupings
(e.g. replicates) are configured by the |
outer |
If |
weights |
A named |
flatnessFactor |
Passed to MetaClean as the |
featureQualities |
Specifies which feature qualities to calculate. Can be |
featureGroupQualities |
Analogous to |
avgFunc |
The function used to average the peak qualities and scores for each feature group. |
EICParams |
A named |
parallel |
If set to |
onlyMonoIso |
Set to |
chargeMismatch |
Specifies how to deal with a mismatch in charge between adduct and isotope annotations. Valid
values are: |
retentionRange |
Range of retention time (in seconds) to collect TIC traces. Should be a numeric vector with length of two containing the min/max values. Set to NULL to ignore. |
MSLevel |
Integer vector with the ms levels (i.e., 1 for MS1 and 2 for MS2) to obtain TIC traces. |
what |
A |
intWeight |
If |
set |
(sets workflow) The name of the set. |
reGroup |
(sets workflow) Set to |
sets |
(sets workflow) A |
The featureGroup class is the workhorse of patRoon: almost all functionality operate on its instantiated
objects. The class holds all information from grouped features (obtained from features). This class
itself is virtual, hence, objects are not created directly from it. Instead, 'feature groupers' such as
groupFeaturesXCMS return a featureGroups derived object after performing the actual grouping of
features across analyses.
delete returns the object for which the specified data was removed.
calculatePeakQualities returns a modified object amended with peak qualities and scores.
selectIons returns a featureGroups object with only the selected feature groups and amended
with adduct annotations.
normInts returns a featureGroups object, amended with data in the ISTDs and
ISTDAssignments slots if featNorm="istd".
names(featureGroups): Obtain feature group names.
analyses(featureGroups): returns a character vector with the names of the
analyses for which data is present in this object.
replicates(featureGroups): returns a character vector with the names of the
replicates for which data is present in this object.
groupNames(featureGroups): Same as names. Provided for consistency to other classes.
length(featureGroups): Obtain number of feature groups.
hasIMS(featureGroups): Returns TRUE if the feature groups object has ion mobility information.
fromIMS(featureGroups): Returns TRUE if the features were directly generated from IMS data.
show(featureGroups): Shows summary information for this object.
groupTable(featureGroups): Accessor for groups slot.
analysisInfo(featureGroups): Obtain analysisInfo (see analysisInfo slot in features).
analysisInfo(featureGroups) <- value: Modifies the analysis information of this features object.
This is primarily intended to change or add analysis metadata columns or can be used to re-order analysis. The
removal or addition of analyses and changes to the "analysis" column are not supported. This function
performs several internal updates after analysis information modifications. Hence, never attempt to change the
analysisInfo slot directly.
groupInfo(featureGroups): Accessor for groupInfo slot.
featureTable(featureGroups): Obtain feature information (see features).
getFeatures(featureGroups): Accessor for features slot.
groupFeatIndex(featureGroups): Accessor for ftindex slot.
groupQualities(featureGroups): Accessor for groupQualities slot.
groupScores(featureGroups): Accessor for groupScores slot.
getFeatureQualityNames(featureGroups): Returns feature quality names that were calculated for this object.
annotations(featureGroups): Accessor for annotations slot.
internalStandards(featureGroups): Accessor for ISTDs slot.
internalStandardAssignments(featureGroups): Accessor for ISTDAssignments slot.
adducts(featureGroups): Returns a named character with adduct annotations assigned to each feature group (if
available).
adducts(featureGroups) <- value: Sets adduct annotations for feature groups.
concentrations(featureGroups): Accessor for concentrations slot.
toxicities(featureGroups): Accessor for toxicities slot.
x[i: Subset on analyses/feature groups.
x[[i: Extract intensity values.
$: Extract intensity values for a feature group.
delete(featureGroups): Completely deletes specified feature groups.
export(featureGroups): Exports feature groups to a ‘.csv’ file that is readable to Bruker ProfileAnalysis (a
'bucket table'), Bruker TASQ (an analyte database) or that is suitable as input for the Targeted peak
detection functionality of MZmine.
unique(featureGroups): Obtain a subset with unique feature groups present in one or more analyses, replicates etc.
overlap(featureGroups): Obtain a subset with feature groups that overlap between a set of specified replicate(s).
calculatePeakQualities(featureGroups): Calculates peak and group qualities for all features and feature groups. The peak qualities
(and scores) are calculated with the features method of this
function, and subsequently averaged per feature group. Group metrics are then calculated and scored and
scaled by normalizing qualities among all groups and scaling them from ‘0’ (worst) to ‘1’ (best). The
totalScore for each group is then calculated as the weighted sum from all feature (group) scores. The
getMCTrainData and predictCheckFeaturesSession functions can be used to train and apply
Pass/Fail ML models from MetaClean.
selectIons(featureGroups): uses componentization results to select feature groups with
preferred adduct ion and/or isotope annotation. Typically, this means that only feature groups are kept if they are
(de-)protonated adducts and are monoisotopic. The adduct annotation assignments for the selected feature groups are
copied from the components to the annotations slot. If the adduct for a feature group is unknown, its
annotation is defaulted to the 'preferred' adduct, and hence, the feature group will never be removed. Furthermore,
if a component does not contain an annotation with the preferred adduct, the most intense feature group is selected
instead. Similarly, if no isotope annotation is available, the feature group is assumed to be monoisotopic and thus
not removed. An important advantage of selectIons is that it may considerably simplify your dataset.
Furthermore, the adduct assignments allow formula/compound annotation steps later in the workflow to improve their
annotation accuracy. On the other hand, it is important the componentization results are reliable. Hence, it is
highly recommended that, prior to calling selectIons, the settings to generateComponents are
optimized and its results are reviewed with checkComponents. Finally, the adducts<- method can
be used to manually correct adduct assignments afterwards if necessary.
normInts(featureGroups): Provides various methods to normalizes feature intensities for each sample analysis or of
all features within a feature group. See the Feature intensity normalization section below.
getTICs(featureGroups): Obtain the total ion chromatogram/s (TICs) of the analyses.
getBPCs(featureGroups): Obtain the base peak chromatogram/s (BPCs) of the analyses.
updateGroups(featureGroups): Recalculate group information from feature data.
groupsMatrix (data.table) with intensities for each feature group (columns) per analysis (rows).
Access with groups method.
featuresfeatures class associated with this object. Access withfeatureTable methods.
groupInfodata.table with retention time (ret column, in seconds) and m/z (mz
column) for each feature group. Access with groupInfo method.
ftindexMatrix (data.table) with feature indices for each feature group (columns) per analysis
(rows). Each index corresponds to the row within the feature table of the analysis (see
featureTable).
groupQualities,groupScoresA data.table with qualities/scores for each feature group (see the
calculatePeakQualities method).
annotationsA data.table with adduct annotations for each group (see the selectIons
method).
ISTDsA data.table with screening results for internal standards (filled in by the normInts
method).
ISTDAssignmentsA list, where each item is named by a feature group and consists of a vector with
feature group names of the internal standards assigned to it (filled in by the normInts method).
concentrations,toxicitiesA data.table with predicted concentrations/toxicities for each feature group.
Assigned by the calculateConcs/calculateTox methods. Use the
concentratrions/toxicities methods for access.
groupAlgo,groupArgs,groupVerbose(sets workflow) Grouping parameters that were used when this object was created. Used
by adducts<- and selectIons when these methods perform a re-grouping of features.
annotations,ISTDAssignments(sets workflow) As the featureGroups slots, but contains the data per set.
annotationsChangedSet internally by adducts()<- and applied as soon as reGroup=TRUE.
The raw data interface of patRoon is used by calculatePeakQualities to
process HRMS (or IMS-HRMS) data. Please see its documentation for more information on the supported
formats and available configuration options.
The normInts method performs normalization of feature intensities
(and areas). These values are amended in the features slot, while the original intensities/areas are kept.
To use the normalized intensities set normalized=TRUE to methods such as plotInt,
generateComponentsIntClust and as.data.table. Please see the
normalized argument documentation for these methods for more details.
The normInts method supports several methods to normalize intensities/areas of features within the same
analysis. Most methods are influenced by the normalization concentration (norm_conc in the
analysis information) set for each sample analysis. For NA or zero values the
output will be zero. If norm_conc is completely absent from the analysis information or all values are
NA, the normalization concentration is defaulted to one.
The different normalization methods are:
featNorm="istd" Uses internal standards (IS) for normalization. The IS are screened internally
by the screenSuspects function. Hence, the IS specified by the standards argument should
follow the format of a suspect list. Note that labelled elements in IS formulae should
be specified with the rcdk format, e.g. "[13]C" for 13C, "[2]H" for a deuterium etc.
Example IS lists are provided with the patRoonData and patRoonDataIMS packages.
The assignment of IS to features is automatically performed, using the following criteria:
Only analyses are considered with a defined normalization concentration.
The IS must be detected in all of the analyses in which the feature was detected.
The retention time and m/z are reasonably close (ISTDRTWindow/ISTDMZWindow arguments).
However, additional IS candidates outside these windows will be chosen if the number of candidates is less than the
minISTDs argument. In this case the next close(st) candidate(s) will be chosen.
Normalization of features within the same feature group always occur with the same IS. If multiple IS are assigned
to a feature then normalization occurs with the combined intensity (area), which is calculated with the function
defined by the normFunc argument. The (combined) IS intensity is then normalized by the normalization
concentration, and finally used for feature normalization.
featNorm="tic" Uses the Total Ion Current (TIC) to normalize intensities. The TIC is calculated by
combining all intensities with the function defined by the normFunc argument. For this reason, you may need
to take care to perform normalization before e.g. suspect screening or other prioritization techniques. The
TIC normalized intensities are finally divided by the normalization concentration.
featNorm="conc" Simply divides all intensities (areas) with the normalization concentration defined
for the sample.
featNorm="none" Performs no normalization. The raw intensity values are simply copied. This is mainly
useful if you only want to do group normalization (described below).
The meaning of the normalization concentration differs for each method: for "istd" it resembles the IS
concentration of a sample analysis, whereas for "tic" and "conc" it is used to normalize different
sample amounts (e.g. injection volume).
If groupNorm=TRUE then feature intensities (areas) will be normalized by the combined values for its feature
group (again, combination occurs with normFunc). This group normalization always occurs after
aforementioned normalization methods. Group normalization was the only method with patRoon ‘<2.1’, and
still occurs automatically if normInts was not called when a method is executed that requests normalized
data.
In IMS workflows with post mobility assignment (see assignMobilities), any IMS
features are excluded for the assignment of internal standards (featNorm="istd") or calculation of TICs
(featNorm="tic"). Furthermore, the normalized intensities and areas for IMS features are copied from
their IMS precursors.
The featureGroupsSet class is applicable for sets workflows. This class is derived from featureGroups and therefore largely follows the same user interface.
The following methods are specifically defined for sets workflows:
sets Returns the set names for this object.
unset Converts the object data for a specified set into a 'non-set' object (featureGroupsUnset), which allows it to be used in 'regular' workflows. The adduct annotations for the selected set are used to convert all
feature (group) masses to ionic m/z values. The annotations persist in the converted object.
The following methods are changed or with new functionality:
adducts, adducts<- require the set argument. The order of the data that is
returned/changed follows that of the annotations slot. Furthermore, adducts<- will perform a
re-grouping of features when its reGroup parameter is set to TRUE. The implications for this are
discussed below. Note that no adducts are changed until reGroup=TRUE.
the subset operator ([) has specific arguments to choose (feature presence in) sets. See the argument
descriptions.
export Only allows to export data from one set. The unset method is used prior to exporting the
data.
overlap and unique allow to handle data per set. See the sets argument description.
selectIons Will perform a re-grouping of features. The implications of this are discussed below.
normInts Performs normalization for each set independently.
A re-grouping of features occurs if selectIons is called or adducts<- is used with
reGroup=TRUE. Afterwards, it is very likely that feature group names are changed. Since data generated later
in the workflow (e.g. annotation steps) rely on feature group names, these objects are not valid
anymore, and must be re-generated.
Rick Helmus <[email protected]> and Ricardo Cunha <[email protected]> (getTICs and
getBPCs functions)
Chetnik K, Petrick L, Pandey G (2020). “MetaClean: a machine learning-based classifier for reduced false positive peak detection in untargeted LC-MS metabolomics data.” Metabolomics, 16(11). doi:10.1007/s11306-020-01738-3.
groupFeatures for generating feature groups, feature-filtering, feature-table
and feature-plotting for more advanced featureGroups methods.
Functionality to compare feature groups and make a consensus.
comparison(..., groupAlgo, groupArgs = list(rtalign = FALSE)) ## S4 method for signature 'featureGroups' comparison(..., groupAlgo, groupArgs = list(rtalign = FALSE)) ## S4 method for signature 'featureGroupsComparison' hasIMS(obj) ## S4 method for signature 'featureGroupsComparison,missing' plot(x, retMin = FALSE, ...) ## S4 method for signature 'featureGroupsComparison' plotVenn(obj, which = NULL, ...) ## S4 method for signature 'featureGroupsComparison' plotUpSet(obj, which = NULL, ...) ## S4 method for signature 'featureGroupsComparison' plotChord(obj, addSelfLinks = FALSE, addRetMzPlots = TRUE, ...) ## S4 method for signature 'featureGroupsComparison' consensus( obj, absMinAbundance = NULL, relMinAbundance = NULL, uniqueFrom = NULL, uniqueOuter = FALSE, verifyAnaInfo = TRUE ) ## S4 method for signature 'featureGroupsSet' comparison(..., groupAlgo, groupArgs = list(rtalign = FALSE)) ## S4 method for signature 'featureGroupsComparisonSet' consensus(obj, ...)comparison(..., groupAlgo, groupArgs = list(rtalign = FALSE)) ## S4 method for signature 'featureGroups' comparison(..., groupAlgo, groupArgs = list(rtalign = FALSE)) ## S4 method for signature 'featureGroupsComparison' hasIMS(obj) ## S4 method for signature 'featureGroupsComparison,missing' plot(x, retMin = FALSE, ...) ## S4 method for signature 'featureGroupsComparison' plotVenn(obj, which = NULL, ...) ## S4 method for signature 'featureGroupsComparison' plotUpSet(obj, which = NULL, ...) ## S4 method for signature 'featureGroupsComparison' plotChord(obj, addSelfLinks = FALSE, addRetMzPlots = TRUE, ...) ## S4 method for signature 'featureGroupsComparison' consensus( obj, absMinAbundance = NULL, relMinAbundance = NULL, uniqueFrom = NULL, uniqueOuter = FALSE, verifyAnaInfo = TRUE ) ## S4 method for signature 'featureGroupsSet' comparison(..., groupAlgo, groupArgs = list(rtalign = FALSE)) ## S4 method for signature 'featureGroupsComparisonSet' consensus(obj, ...)
... |
For For For |
groupAlgo |
The (IMS workflow) For IMS workflows the algorithm selection is equally limited to as what is used by
|
groupArgs |
A |
x, obj
|
The |
retMin |
If |
which |
A character vector specifying one or more labels of compared feature groups. For |
addSelfLinks |
If |
addRetMzPlots |
Set to |
absMinAbundance, relMinAbundance
|
Minimum absolute or relative
(‘0-1’) abundance across objects for a result to be kept. For
instance, |
uniqueFrom |
Set this argument to only retain feature groups that are unique
within one or more of the objects for which the consensus is made.
Selection is done by setting the value of |
uniqueOuter |
If |
verifyAnaInfo |
If |
Feature groups objects originating from differing feature finding and/or grouping algorithms (or their parameters) may be compared to assess their output and generate a consensus.
The comparison method generates a
featureGroupsComparison object from given feature groups
objects, which in turn may be used for (visually) comparing presence of
feature groups and generating a consensus. Internally, this function will
collapse each feature groups object to pseudo features objects by
averaging their retention times, m/z values and intensities, where
each original feature groups object becomes an 'analysis'. All
pseudo features are then grouped using
regular feature grouping algorithms so that a
comparison can be made.
hasIMS returns TRUE if the object has ion mobility information.
plot generates an m/z vs retention time plot.
plotVenn plots a Venn diagram outlining unique and shared
feature groups between up to five compared feature groups.
plotUpSet plots an UpSet diagram outlining unique and shared
feature groups.
plotChord plots a chord diagram to visualize the distribution
of feature groups.
consensus combines all compared feature groups and averages their retention, m/z and intensity
data. Not yet supported for sets workflows.
comparison returns a featureGroupsComparison
object.
plotVenn (invisibly) returns a list with the following fields:
gList the gList object that was returned by
the utilized VennDiagram plotting function.
areas The total area for each plotted group.
intersectionCounts The number of intersections between groups.
The order for the areas and intersectionCounts fields is the same as the parameter order
from the used plotting function (see e.g. draw.pairwise.venn and
draw.triple.venn).
consensus returns a featureGroups object with a consensus from the compared feature
groups.
This class is used for comparing different featureGroups
objects.
## S4 method for signature 'featureGroupsComparison' names(x) ## S4 method for signature 'featureGroupsComparison' length(x) ## S4 method for signature 'featureGroupsComparison,ANY,missing,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'featureGroupsComparison,ANY,missing' x[[i, j]] ## S4 method for signature 'featureGroupsComparison' x$name## S4 method for signature 'featureGroupsComparison' names(x) ## S4 method for signature 'featureGroupsComparison' length(x) ## S4 method for signature 'featureGroupsComparison,ANY,missing,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'featureGroupsComparison,ANY,missing' x[[i, j]] ## S4 method for signature 'featureGroupsComparison' x$name
x |
A |
i |
For |
... |
Ignored. |
drop, j
|
ignored. |
name |
The label name (partially matched). |
Objects from this class are returned by comparison.
names(featureGroupsComparison): Obtain the labels that were given to each compared feature group.
length(featureGroupsComparison): Number of feature groups objects that were compared.
x[i: Subset on labels that were assigned to compared feature groups.
x[[i: Extract a featureGroups object by its label.
$: Extract a compound table for a feature group.
fGroupsListA list of featureGroups object that
were compared
comparedFGroupsA pseudo featureGroups object containing
grouped feature groups.
This class derives from featureGroups and adds suspect screening information.
screenInfo(obj) ## S4 method for signature 'featureGroupsScreening' screenInfo(obj) ## S4 method for signature 'featureGroupsScreening' show(object) ## S4 method for signature 'featureGroupsScreening,ANY,ANY,missing' x[i, j, ..., suspects = NULL, reorder = FALSE, drop = TRUE] ## S4 method for signature 'featureGroupsScreening' delete(obj, i = NULL, j = NULL, k = NULL, ...) ## S4 method for signature 'featureGroupsScreening' filter( obj, ..., onlyHits = NULL, IMSMatchParams = NULL, selectHitsBy = NULL, selectBestFGroups = FALSE, maxLevel = NULL, maxFormRank = NULL, maxCompRank = NULL, minAnnSimForm = NULL, minAnnSimComp = NULL, minAnnSimBoth = NULL, absMinFragMatches = NULL, relMinFragMatches = NULL, minRF = NULL, maxLC50 = NULL, negate = FALSE, applyIMS = "both" ) ## S4 method for signature 'featureGroupsScreeningSet' screenInfo(obj) ## S4 method for signature 'featureGroupsScreeningSet' show(object) ## S4 method for signature 'featureGroupsScreeningSet,ANY,ANY,missing' x[i, j, ..., suspects = NULL, sets = NULL, reorder = FALSE, drop = TRUE] ## S4 method for signature 'featureGroupsScreeningSet' delete(obj, i = NULL, j = NULL, k = NULL, ...) ## S4 method for signature 'featureGroupsScreeningSet' filter( obj, ..., onlyHits = NULL, IMSMatchParams = NULL, selectHitsBy = NULL, selectBestFGroups = FALSE, maxLevel = NULL, maxFormRank = NULL, maxCompRank = NULL, minAnnSimForm = NULL, minAnnSimComp = NULL, minAnnSimBoth = NULL, absMinFragMatches = NULL, relMinFragMatches = NULL, minRF = NULL, maxLC50 = NULL, negate = FALSE, applyIMS = "both" ) ## S4 method for signature 'featureGroupsScreeningSet' unset(obj, set)screenInfo(obj) ## S4 method for signature 'featureGroupsScreening' screenInfo(obj) ## S4 method for signature 'featureGroupsScreening' show(object) ## S4 method for signature 'featureGroupsScreening,ANY,ANY,missing' x[i, j, ..., suspects = NULL, reorder = FALSE, drop = TRUE] ## S4 method for signature 'featureGroupsScreening' delete(obj, i = NULL, j = NULL, k = NULL, ...) ## S4 method for signature 'featureGroupsScreening' filter( obj, ..., onlyHits = NULL, IMSMatchParams = NULL, selectHitsBy = NULL, selectBestFGroups = FALSE, maxLevel = NULL, maxFormRank = NULL, maxCompRank = NULL, minAnnSimForm = NULL, minAnnSimComp = NULL, minAnnSimBoth = NULL, absMinFragMatches = NULL, relMinFragMatches = NULL, minRF = NULL, maxLC50 = NULL, negate = FALSE, applyIMS = "both" ) ## S4 method for signature 'featureGroupsScreeningSet' screenInfo(obj) ## S4 method for signature 'featureGroupsScreeningSet' show(object) ## S4 method for signature 'featureGroupsScreeningSet,ANY,ANY,missing' x[i, j, ..., suspects = NULL, sets = NULL, reorder = FALSE, drop = TRUE] ## S4 method for signature 'featureGroupsScreeningSet' delete(obj, i = NULL, j = NULL, k = NULL, ...) ## S4 method for signature 'featureGroupsScreeningSet' filter( obj, ..., onlyHits = NULL, IMSMatchParams = NULL, selectHitsBy = NULL, selectBestFGroups = FALSE, maxLevel = NULL, maxFormRank = NULL, maxCompRank = NULL, minAnnSimForm = NULL, minAnnSimComp = NULL, minAnnSimBoth = NULL, absMinFragMatches = NULL, relMinFragMatches = NULL, minRF = NULL, maxLC50 = NULL, negate = FALSE, applyIMS = "both" ) ## S4 method for signature 'featureGroupsScreeningSet' unset(obj, set)
obj, object, x
|
The |
i, j, reorder
|
See |
... |
Further arguments passed to the base method. |
suspects |
An optional |
drop |
Ignored. |
k |
The
Setting both |
onlyHits |
If
|
IMSMatchParams |
(IMS workflow) A |
selectHitsBy |
Should be |
selectBestFGroups |
If |
maxLevel, maxFormRank, maxCompRank, minAnnSimForm, minAnnSimComp, minAnnSimBoth
|
Filter suspects by maximum
identification level (e.g. |
absMinFragMatches, relMinFragMatches
|
Only retain suspects with this minimum number MS/MS matches with the
fragments specified in the suspect list (i.e. |
minRF |
Filter suspect hits by the given minimum predicted response factor (as calculated by
|
maxLC50 |
Filter suspect hits by the given maximum toxicity (LC50) (as calculated by
|
negate |
If set to |
applyIMS |
(IMS workflow) whether the filters are only applied to IMS precursors ( |
sets |
(sets workflow) A |
set |
(sets workflow) The name of the set. |
delete returns the object for which the specified data was removed.
filter returns a filtered featureGroupsScreening object.
screenInfo(featureGroupsScreening): Returns a table with screening information
(see screenInfo slot).
show(featureGroupsScreening): Shows summary information for this object.
x[i: Subset on analyses, feature groups and/or
suspects.
delete(featureGroupsScreening): Completely deletes specified feature groups or screening results.
filter(featureGroupsScreening): Performs rule based filtering. This method builds on the comprehensive filter
functionality from the base filter,featureGroups-method. It adds several filters to select
e.g. the best ranked suspects or those with a minimum estimated identification level. NOTE: most
filters only affect suspect hits, not feature groups. Set onlyHits=TRUE to subsequently remove any
feature groups that lost any suspect matches due to these filter steps.
screenInfoA (data.table) with results from suspect screening. This table will be amended with
ID confidence data when estimateIDConfidence is run.
MS2QuantMetaMetadata from MS2Quant filled in by predictRespFactors.
(sets workflow) A named list with the metadata stored for each set.
The featureGroupsScreeningSet class is applicable for sets workflows. This class is derived from featureGroupsScreening and therefore largely follows the same user interface.
The following methods are specifically defined for sets workflows:
All the methods from base class workflowStepSet.
unset Converts the object data for a specified set into a 'non-set' object (featureGroupsScreeningUnset), which allows it to be used in 'regular' workflows. Only the screening results present in the specified set are kept.
The following methods are changed or with new functionality:
estimateIDConfidence See the Sets workflows section in the documentation for
estimateIDConfidence.
filter All filters related to estimated identification levels and formula/compound rankings are
applied to the overall set data (see above). All others are applied to set specific data: in this case candidates
are only removed if none of the set data confirms to the filter.
This class derives also from featureGroupsSet. Please see its documentation for more relevant details
with sets workflows.
Note that the formRank and compRank columns are not updated when the data is subset.
filter removes suspect hits with NA values when any of the filters related to minimum or maximum
values are applied (unless negate=TRUE).
Holds information for all features present within a set of analysis.
## S4 method for signature 'features' length(x) ## S4 method for signature 'features' show(object) ## S4 method for signature 'features' featureTable(obj) ## S4 method for signature 'features' analysisInfo(obj, df = FALSE) ## S4 method for signature 'features' getFeatureQualityNames(obj, scores = FALSE, totScore = TRUE) ## S4 replacement method for signature 'features' analysisInfo(obj) <- value ## S4 method for signature 'features' analyses(obj) ## S4 method for signature 'features' replicates(obj) ## S4 method for signature 'features' hasIMS(obj) ## S4 method for signature 'features' fromIMS(obj) ## S4 method for signature 'features' as.data.table(x) ## S4 method for signature 'features' filter( obj, absMinIntensity = NULL, relMinIntensity = NULL, retentionRange = NULL, mzRange = NULL, mzDefectRange = NULL, chromWidthRange = NULL, IMSRangeParams = NULL, qualityRange = NULL, negate = FALSE ) ## S4 method for signature 'features,ANY,missing,missing' x[i, j, ..., ni, reorder = FALSE, drop = TRUE] ## S4 method for signature 'features,ANY,missing' x[[i]] ## S4 method for signature 'features' x$name ## S4 method for signature 'features' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'features' calculatePeakQualities( obj, weights, flatnessFactor, featureQualities = NULL, EICParams = getDefEICParams(window = 0), parallel = TRUE ) ## S4 method for signature 'features' getTICs(obj, retentionRange = NULL, MSLevel = 1) ## S4 method for signature 'features' getBPCs(obj, retentionRange = NULL, MSLevel = 1) ## S4 method for signature 'features' plotTICs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, groupBy = NULL, showLegend = TRUE, xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'features' plotBPCs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, groupBy = NULL, showLegend = TRUE, xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'featuresSet' sets(obj) ## S4 method for signature 'featuresSet' show(object) ## S4 method for signature 'featuresSet' as.data.table(x) ## S4 method for signature 'featuresSet,ANY,missing,missing' x[i, ..., sets = NULL, reorder = FALSE, drop = TRUE] ## S4 method for signature 'featuresSet' filter(obj, ..., negate = FALSE, sets = NULL) ## S4 method for signature 'featuresSet' unset(obj, set) ## S4 method for signature 'featuresKPIC2' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'featuresPiek' delete(obj, i = NULL, j = NULL, ...) ## S4 replacement method for signature 'featuresXCMS' analysisInfo(obj) <- value ## S4 method for signature 'featuresXCMS' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'featuresXCMS3' delete(obj, i = NULL, j = NULL, ...)## S4 method for signature 'features' length(x) ## S4 method for signature 'features' show(object) ## S4 method for signature 'features' featureTable(obj) ## S4 method for signature 'features' analysisInfo(obj, df = FALSE) ## S4 method for signature 'features' getFeatureQualityNames(obj, scores = FALSE, totScore = TRUE) ## S4 replacement method for signature 'features' analysisInfo(obj) <- value ## S4 method for signature 'features' analyses(obj) ## S4 method for signature 'features' replicates(obj) ## S4 method for signature 'features' hasIMS(obj) ## S4 method for signature 'features' fromIMS(obj) ## S4 method for signature 'features' as.data.table(x) ## S4 method for signature 'features' filter( obj, absMinIntensity = NULL, relMinIntensity = NULL, retentionRange = NULL, mzRange = NULL, mzDefectRange = NULL, chromWidthRange = NULL, IMSRangeParams = NULL, qualityRange = NULL, negate = FALSE ) ## S4 method for signature 'features,ANY,missing,missing' x[i, j, ..., ni, reorder = FALSE, drop = TRUE] ## S4 method for signature 'features,ANY,missing' x[[i]] ## S4 method for signature 'features' x$name ## S4 method for signature 'features' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'features' calculatePeakQualities( obj, weights, flatnessFactor, featureQualities = NULL, EICParams = getDefEICParams(window = 0), parallel = TRUE ) ## S4 method for signature 'features' getTICs(obj, retentionRange = NULL, MSLevel = 1) ## S4 method for signature 'features' getBPCs(obj, retentionRange = NULL, MSLevel = 1) ## S4 method for signature 'features' plotTICs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, groupBy = NULL, showLegend = TRUE, xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'features' plotBPCs( obj, retentionRange = NULL, MSLevel = 1, retMin = FALSE, title = NULL, groupBy = NULL, showLegend = TRUE, xlim = NULL, ylim = NULL, ... ) ## S4 method for signature 'featuresSet' sets(obj) ## S4 method for signature 'featuresSet' show(object) ## S4 method for signature 'featuresSet' as.data.table(x) ## S4 method for signature 'featuresSet,ANY,missing,missing' x[i, ..., sets = NULL, reorder = FALSE, drop = TRUE] ## S4 method for signature 'featuresSet' filter(obj, ..., negate = FALSE, sets = NULL) ## S4 method for signature 'featuresSet' unset(obj, set) ## S4 method for signature 'featuresKPIC2' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'featuresPiek' delete(obj, i = NULL, j = NULL, ...) ## S4 replacement method for signature 'featuresXCMS' analysisInfo(obj) <- value ## S4 method for signature 'featuresXCMS' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'featuresXCMS3' delete(obj, i = NULL, j = NULL, ...)
obj, x, object
|
|
df |
If |
scores |
If |
totScore |
If |
value |
A |
absMinIntensity, relMinIntensity
|
Minimum absolute/relative intensity for features to be kept. The relative
intensity is determined from the feature with highest intensity (within the same analysis). Set to ‘0’ or |
retentionRange, mzRange, mzDefectRange, chromWidthRange
|
Range of retention time (in seconds), m/z, mass
defect (defined as the decimal part of m/z values) or chromatographic peak width (in seconds), respectively.
Features outside this range will be removed. Should be a numeric vector with length of two containing the min/max
values. The maximum can be |
IMSRangeParams |
(IMS workflow) A |
qualityRange |
Used to filter features by their peak qualities/scores
(see |
negate |
If set to |
i, j
|
For |
... |
For For For sets workflow methods: further arguments passed to the base |
ni |
Optional argument. An expression used for subsetting the analyses. The
analysis information is first subset and the remaining rows are used to determine for
which analyses the results should be kept. The unevaluated |
reorder |
If (sets workflow) If the |
drop |
Ignored. |
name |
The analysis name (partially matched). |
weights |
A named |
flatnessFactor |
Passed to MetaClean as the |
featureQualities |
Specifies which feature qualities to calculate. Can be |
EICParams |
A named |
parallel |
If set to |
MSLevel |
Integer vector with the ms levels (i.e., 1 for MS1 and 2 for MS2) to obtain traces. |
retMin |
Plot retention time in minutes (instead of seconds). |
title |
Character string used for title of the plot. If |
groupBy |
Specifies how results are grouped in the plot. Should be a name of a column in the
analysis information table which is used to make analysis groups (e.g.
|
showLegend |
Plot a legend if TRUE. |
xlim, ylim
|
Sets the plot size limits used by
|
sets |
(sets workflow) For |
set |
(sets workflow) The name of the set. |
This class provides a way to store intensity, retention times, m/z and other data for all features in a set of
analyses. The class is virtual and derived objects are created by 'feature finders' such as
findFeaturesOpenMS, findFeaturesXCMS and findFeaturesBruker.
featureTable: A list containing a
data.table for each analysis with feature data
analysisInfo: The analysis information of this features object.
delete returns the object for which the specified data was removed.
calculatePeakQualities returns a modified object amended with peak qualities and scores.
length(features): Obtain total number of features.
show(features): Shows summary information for this object.
featureTable(features): Get table with feature information
analysisInfo(features): Get analysis information
getFeatureQualityNames(features): Returns the present chromatographic peak quality and score names for features.
analysisInfo(features) <- value: Modifies analysis information
analyses(features): returns a character vector with the names of the
analyses for which data is present in this object.
replicates(features): returns a character vector with the names of the
replicates for which data is present in this object.
hasIMS(features): Returns TRUE if the features object has mobility information.
fromIMS(features): Returns TRUE if the features object was directly created from IMS data.
as.data.table(features): Returns all feature data in a table.
filter(features): Performs common rule based filtering of features. Note
that this (and much more) functionality is also provided by the
filter method defined for featureGroups. However,
filtering a features object may be useful to avoid grouping large
amounts of features.
x[i: Subset on analyses.
x[[i: Extract a feature table for an analysis.
$: Extract a feature table for an analysis.
delete(features): Completely deletes specified features.
calculatePeakQualities(features): Calculates peak qualities for each feature. Please see the
featureQualities function and MetaClean publication (referenced below) for
more details. For each metric, an additional score is calculated by normalizing all feature values (unless the
quality metric definition has a fixed range) and scale from ‘0’ (worst) to ‘1’ (best). Then, a
totalScore for each feature is calculated by the (weighted) sum of all score values.
getTICs(features): Obtain the total ion chromatogram/s (TICs) of the analyses.
getBPCs(features): Obtain the base peak chromatogram/s (BPCs) of the analyses.
plotTICs(features): Plots the TICs of the analyses.
plotBPCs(features): Plots the BPCs of the analyses.
featuresList of features per analysis file. Use the featureTable method for access.
analysisInfoA data.table with the analysis information. Use the
analysisInfo method for access.
featureQualityNamesCharacter vector with the names of the chromatographic peak quality metrics that are present.
hasIMSA logical that is TRUE if the features object contain mobility/CCS information. Use the
hasIMS method for access.
fromIMSA logical that is TRUE if the features object was directly created from IMS data
(i.e. direct mobility assignment workflow). Use the fromIMS method for access.
The raw data interface of patRoon is used by calculatePeakQualities and TIC/BPC related functions to
process HRMS (or IMS-HRMS) data. Please see its documentation for more information on the supported
formats and available configuration options.
The featuresSet class is applicable for sets workflows. This class is derived from features and therefore largely follows the same user interface.
The following methods are specifically defined for sets workflows:
sets Returns the set names for this object.
unset Converts the object data for a specified set into a 'non-set' object (featuresUnset), which allows it to be used in 'regular' workflows. The adduct annotations for the selected set (e.g. as passed to
makeSet) are used to convert all feature masses to ionic m/z values.
The following methods are changed or with new functionality:
filter and the subset operator ([) have specific arguments to choose/filter by (feature
presence in) sets. See the sets argument description.
Important: the mzRange, mzDefectRange and IMSRangeParams filters use neutral
feature masses, whereas non-sets workflows use m/z values. Hence, adjust accordingly to avoid (slightly)
different results!
For calculatePeakQualities: sometimes MetaClean may return NA for the Gaussian
Similarity and Symmetry metrics, in which case it will be set to ‘0’.
Rick Helmus <[email protected]> and Ricardo Cunha <[email protected]> (getTICs,
getBPCs, plotTICs and plotBPCs functions)
Chetnik K, Petrick L, Pandey G (2020). “MetaClean: a machine learning-based classifier for reduced false positive peak detection in untargeted LC-MS metabolomics data.” Metabolomics, 16(11). doi:10.1007/s11306-020-01738-3.
Automatically find features.
findFeatures(analysisInfo, algorithm, ..., verbose = TRUE)findFeatures(analysisInfo, algorithm, ..., verbose = TRUE)
analysisInfo |
A |
algorithm |
A character string describing the algorithm that should be
used: |
... |
Further parameters passed to the selected feature finding algorithms. |
verbose |
If set to |
Several functions exist to collect features (i.e. retention and MS information that represent potential
compounds) from a set of analyses. All 'feature finders' return an object derived from the features
base class. The next step in a general workflow is to group and align these features across analyses with
groupFeatures. Note that some feature finders have a plethora of options which sometimes may have a
large effect on the quality of results. Fine-tuning parameters is therefore important, and the optimum is largely
dependent upon applied analysis methodology and instrumentation.
findFeatures is a generic function that will find features by one of the supported algorithms. The actual
functionality is provided by algorithm specific functions such as findFeaturesOpenMS and findFeaturesXCMS. While these
functions may be called directly, findFeatures provides a generic interface and is therefore usually preferred.
An object of a class which is derived from features.
In most cases it will be necessary to centroid your MS input files. The only exception is Bruker,
however, you will still need centroided ‘mzXML’/‘mzML’ files for e.g. plotting chromatograms. In
this case the centroided MS files should be stored in the same directory as the raw Bruker ‘.d’
files. The convertMSFiles function can be used to centroid data.
The features output class and its methods and the algorithm specific functions:
findFeaturesBruker, findFeaturesOpenMS, findFeaturesXCMS, findFeaturesXCMS3, findFeaturesEnviPick, findFeaturesSIRIUS, findFeaturesKPIC2, findFeaturesSAFD, findFeaturesPiek
Uses the 'Find Molecular Features' (FMF) algorithm of Bruker DataAnalysis vendor software to find features.
findFeaturesBruker( analysisInfo, doFMF = "auto", startRange = 0, endRange = 0, save = TRUE, close = save, verbose = TRUE )findFeaturesBruker( analysisInfo, doFMF = "auto", startRange = 0, endRange = 0, save = TRUE, close = save, verbose = TRUE )
analysisInfo |
A |
doFMF |
Run the 'Find Molecular Features' algorithm before loading compounds. Valid options are: |
startRange, endRange
|
Start/End retention range (seconds) from which to collect features. A 0 (zero) for
|
close, save
|
If |
verbose |
If set to |
This function uses Bruker to automatically find features. This function is called when calling findFeatures with
algorithm="bruker".
The resulting 'compounds' are transferred from DataAnalysis and stored as features.
This algorithm only works with Bruker data files (.d extension) and requires Bruker DataAnalysis
and the RDCOMClient package to be installed. Furthermore, DataAnalysis combines multiple related masses in a
feature (e.g. isotopes, adducts) but does not report the actual (monoisotopic) mass of the feature.
Therefore, it is simply assumed that the feature mass equals that of the highest intensity mass peak.
An object of a class which is derived from features.
If any errors related to DCOM appear it might be necessary to
terminate DataAnalysis (note that DataAnalysis might still be running as a
background process). The ProcessCleaner application installed
with DataAnalayis can be used for this.
findFeatures for more details and other algorithms.
Uses the enviPickwrap function from the enviPick R package to extract features.
findFeaturesEnviPick(analysisInfo, ..., parallel = TRUE, verbose = TRUE)findFeaturesEnviPick(analysisInfo, ..., parallel = TRUE, verbose = TRUE)
analysisInfo |
A |
... |
Further parameters passed to |
parallel |
If set to |
verbose |
If set to |
This function uses enviPick to automatically find features. This function is called when calling findFeatures with
algorithm="envipick".
The input MS data files need to be centroided. The convertMSFiles function can be used
to centroid data.
An object of a class which is derived from features.
The analysis files must be in the mzXML format.
findFeatures for more details and other algorithms.
Uses the KPIC2 R package to extract features.
findFeaturesKPIC2( analysisInfo, kmeans = TRUE, level = 1000, ..., parallel = TRUE, verbose = TRUE )findFeaturesKPIC2( analysisInfo, kmeans = TRUE, level = 1000, ..., parallel = TRUE, verbose = TRUE )
analysisInfo |
A |
kmeans |
If |
level |
Passed to |
... |
Further parameters passed to |
parallel |
If set to |
verbose |
If set to |
This function uses KPIC2 to automatically find features. This function is called when calling findFeatures with
algorithm="kpic2".
The MS files should be in the mzML or mzXML format.
The input MS data files need to be centroided. The convertMSFiles function can be used
to centroid data.
An object of a class which is derived from features.
Ji H, Zeng F, Xu Y, Lu H, Zhang Z (2017). “KPIC2: An Effective Framework for Mass Spectrometry-Based Metabolomics Using Pure Ion Chromatograms.” Analytical Chemistry, 89(14), 7631–7640. doi:10.1021/acs.analchem.7b01547.
findFeatures for more details and other algorithms.
uses the FeatureFinderMetabo TOPP tool (see http://www.openms.de) to find features.
findFeaturesOpenMS( analysisInfo, noiseThrInt = 1000, chromSNR = 3, chromFWHM = 5, mzPPM = defaultLim("mz", "medium_rel"), reEstimateMTSD = TRUE, traceTermCriterion = "sample_rate", traceTermOutliers = 5, minSampleRate = 0.5, minTraceLength = 3, maxTraceLength = -1, widthFiltering = "fixed", minFWHM = 1, maxFWHM = 30, traceSNRFiltering = FALSE, localRTRange = 10, localMZRange = 6.5, isotopeFilteringModel = "metabolites (5% RMS)", MZScoring13C = FALSE, useSmoothedInts = TRUE, extraOpts = NULL, useFFMIntensities = FALSE, verbose = TRUE )findFeaturesOpenMS( analysisInfo, noiseThrInt = 1000, chromSNR = 3, chromFWHM = 5, mzPPM = defaultLim("mz", "medium_rel"), reEstimateMTSD = TRUE, traceTermCriterion = "sample_rate", traceTermOutliers = 5, minSampleRate = 0.5, minTraceLength = 3, maxTraceLength = -1, widthFiltering = "fixed", minFWHM = 1, maxFWHM = 30, traceSNRFiltering = FALSE, localRTRange = 10, localMZRange = 6.5, isotopeFilteringModel = "metabolites (5% RMS)", MZScoring13C = FALSE, useSmoothedInts = TRUE, extraOpts = NULL, useFFMIntensities = FALSE, verbose = TRUE )
analysisInfo |
A |
noiseThrInt |
Noise intensity threshold. Sets |
chromSNR |
Minimum S/N of a mass trace. Sets |
chromFWHM |
Expected chromatographic peak width (in seconds). Sets |
mzPPM |
Allowed mass deviation (ppm) for trace detection. Sets |
reEstimateMTSD |
If |
traceTermCriterion, traceTermOutliers, minSampleRate
|
Termination criterion for the extension of mass traces. See
FeatureFinderMetabo.
Sets the |
minTraceLength, maxTraceLength
|
Minimum/Maximum length of mass trace (seconds). Set negative value for maxlength
to disable maximum. Sets |
widthFiltering, minFWHM, maxFWHM
|
Enable filtering of unlikely peak widths. See
FeatureFinderMetabo.
Sets |
traceSNRFiltering |
If |
localRTRange, localMZRange
|
Retention/MZ range where to look for coeluting/isotopic mass traces. Sets the
|
isotopeFilteringModel |
Remove/score candidate assemblies based on isotope intensities. See
FeatureFinderMetabo.
Sets the |
MZScoring13C |
Use the 13C isotope as the expected shift for isotope mass traces. See
FeatureFinderMetabo.
Sets |
useSmoothedInts |
If |
extraOpts |
Named |
useFFMIntensities |
If |
verbose |
If set to |
This function uses OpenMS to automatically find features. This function is called when calling findFeatures with
algorithm="openms".
This functionality has been tested with OpenMS version >= 2.0. Please make sure it is installed and
configured, e.g. by installing patRoonExt or configuring the path of the binaries with
the patRoon.path.OpenMS option or the system PATH variable.
The file format of analyses must be ‘mzML’.
The input MS data files need to be centroided. The convertMSFiles function can be used
to centroid data.
An object of a class which is derived from features.
findFeaturesOpenMS with useFFMIntensities=FALSE uses multiprocessing to parallelize
computations. Please see the parallelization section in the handbook for
more details and patRoon options for configuration
options.
Note that for caching purposes, the analyses files must always exist on the local host computer, even if it is not participating in computations.
The raw data interface of patRoon is used by findFeaturesOpenMS with useFFMIntensities=FALSE to
process HRMS (or IMS-HRMS) data. Please see its documentation for more information on the supported
formats and available configuration options.
Rost HL, Sachsenberg T, Aiche S, Bielow C, Weisser H, Aicheler F, Andreotti S, Ehrlich H, Gutenbrunner P, Kenar E, Liang X, Nahnsen S, Nilse L, Pfeuffer J, Rosenberger G, Rurik M, Schmitt U, Veit J, Walzer M, Wojnar D, Wolski WE, Schilling O, Choudhary JS, Malmstrom L, Aebersold R, Reinert K, Kohlbacher O (2016).
“OpenMS: a flexible open-source software platform for mass spectrometry data analysis.”
Nature Methods, 13(9), 741–748.
doi:10.1038/nmeth.3959.
pugixml (via
Rcpp) is used to process OpenMS XML output.
Eddelbuettel D (2013).
Seamless R and C++ Integration with Rcpp.
Springer, New York.
doi:10.1007/978-1-4614-6868-4.
ISBN 978-1-4614-6867-7.
Eddelbuettel D, Balamuta J (2018).
“Extending R with C++: A Brief Introduction to Rcpp.”
The American Statistician, 72(1), 28-36.
doi:10.1080/00031305.2017.1375990.
Eddelbuettel D, François R (2011).
“Rcpp: Seamless R and C++ Integration.”
Journal of Statistical Software, 40(8), 1–18.
doi:10.18637/jss.v040.i08.
Eddelbuettel D, Francois R, Allaire J, Ushey K, Kou Q, Russell N, Ucar I, Bates D, Chambers J (2026).
Rcpp: Seamless R and C++ Integration.
R package version 1.1.1-1.1, https://www.rcpp.org.
findFeatures for more details and other algorithms.
Uses the piek algorithm to find features.
findFeaturesPiek( analysisInfo, genEICParams = getPiekEICParams(), peakParams = getDefPeakParams("chrom", "piek"), IMS = FALSE, suspects = NULL, adduct = NULL, assignMethod = "basepeak", assignRTWindow = defaultLim("retention", "very_narrow"), rtWindowDup = defaultLim("retention", "narrow"), mzWindowDup = defaultLim("mz", "medium"), mobWindowDup = defaultLim("mobility", "medium"), minPeakOverlapDup = 0.25, minIntensityIMS = 25, EICBatchSize = Inf, keepDups = FALSE, verbose = TRUE ) getPiekEICParams(..., IMS = getLimIMS())findFeaturesPiek( analysisInfo, genEICParams = getPiekEICParams(), peakParams = getDefPeakParams("chrom", "piek"), IMS = FALSE, suspects = NULL, adduct = NULL, assignMethod = "basepeak", assignRTWindow = defaultLim("retention", "very_narrow"), rtWindowDup = defaultLim("retention", "narrow"), mzWindowDup = defaultLim("mz", "medium"), mobWindowDup = defaultLim("mobility", "medium"), minPeakOverlapDup = 0.25, minIntensityIMS = 25, EICBatchSize = Inf, keepDups = FALSE, verbose = TRUE ) getPiekEICParams(..., IMS = getLimIMS())
analysisInfo |
A |
genEICParams |
A |
peakParams |
A |
IMS |
A |
suspects |
The suspect list to be used for suspect pre-filtering of EIC bins. See
suspect screening for details on the suspect list format and NOTE: Suspect matching can only be performed by mobilities and not CCS values. The
|
adduct |
An |
assignMethod |
Should be |
assignRTWindow |
The retention time window (+/- seconds) used for aggregating EIC datapoints to assign feature
m/z and mobility data, using an intensity weighted mean. The maximum window is always bound by the feature
retention time range. Increasing this number may improve accuracy by averaging more points. However, decreasing the
window may reduce inaccuracies due to inclusion of data from closely eluting features (with similar m/z and
mobility) or noisy data from the chromatographic peak extremes. If The assignment window is automatically adjusted for the values set for |
rtWindowDup, mzWindowDup, mobWindowDup
|
The retention time (seconds), m/z and mobility windows used to
identify duplicate (redundant) features detected in multiple EIC bins. These values default to
|
minPeakOverlapDup |
The minimum overlap (fraction between 0 and 1) in retention time between two features to be considered a duplicate. |
minIntensityIMS |
(IMS workflow) Raw intensity threshold for IMS data. This is primarily intended to speed up raw data processing. |
EICBatchSize |
The number of EICs to be processed in a single batch. Decreasing this number will reduce memory
usage, at the cost of speed. Set to |
keepDups |
Set to |
verbose |
If set to |
... |
Any additional parameters to be set in the returned parameter list. These will override the defaults.
See the |
This function uses piek to automatically find features. This function is called when calling findFeatures with
algorithm="piek".
The piek algorithm extends and improves on the simple and fast feature detection algorithm introduced
by Dietrich C, Wick A, Ternes TA (2021).
“Open‐source feature detection for non‐target LC–MS analytics.”
Rapid Communications in Mass Spectrometry, 36(2).
ISSN 1097-0231.
doi:10.1002/rcm.9206.
http://dx.doi.org/10.1002/rcm.9206.. This algorithm first forms extracted ion chromatograms (EICs) and
subsequently performs automatic peak detection to generate features. The piek algorithm introduces the
following improvements and changes:
Support for IMS-HRMS workflows.
The msdata interface is used to efficiently form EICs from the raw data. All the file formats and types can be used that are supported by msdata. This includes IMS data, even if not used for feature detection, which allows the use of IMS data directly in non-IMS or post mobility assignment workflows.
The EIC binning approach can be extended with the mobility dimension to support direct mobility assignment workflows.
The EIC bins can be filtered with suspect or MS2 data to speed up feature detection.
Several filters are available to eliminate EICs with are likely devoid of any signal of interest.
The original peak detection algorithm was further optimized or can be be exchanged with others: see
getDefPeakParams for details.
Several filters are available to improve the data and reduce redundancy:
The original redundancy detection, which performs a second feature detection with EIC bins that are shifted
by 50% width and eliminates features with m/z values outside the center of any bin, was extended for IMS
support.
Redundant features across bins are eliminated if with close retention time, m/z, mobility and
chromatographic overlap. The most intense feature is kept.
Data from suspects or MS2 precursors that was used to pre-filter EICs, can also be used to filter the final feature list.
Various small bug fixes and improvements for the original code.
The output feature tables contain raw intensities/areas and those subtracted by the estimated noise level
(intensity, intensitySub, area and areaSub columns, respectively) and the estimated
signal to noise (signalToNoise column).
getPiekEICParams returns a list of parameters for the EIC generation, which is used to set the
genEICParams argument to findFeaturesPiek.
If IMS data is used to resolve features (IMS=TRUE), a 'pre-check' is performed to
avoid excessive numbers of two-dimensional bins for EIC formation and peak detection. These EICs are formed by only
considering the m/z dimension, and subsequently filtered by the parameters described in the EIC generation
parameters section. The final EICs for feature detection are then only formed if they have m/z data that was not
removed during the pre-check.
The m/z and mobility data from IMS-HRMS data is typically not or partially centroided. The feature
m/z and mobility values are derived from m/z or mobility versus intensity profiles. The
profiles are generated for each EIC timepoint, and the value at the maximum intensity or intensity weighted mean of
the profile is used to derive the intermediate values (configured by assignMethod). Several parameters exist
to improve the profile data (see next section).
The genEICParams argument to findFeaturesPiek configures the
generation of EICs. The getPiekEICParams function should be used to generate the parameter list.
The following general parameters exist:
filter Controls the pre-filtering of EIC bins with m/z data. Should be "none" (no
filtering), "suspects" (filter with suspect data) or "ms2" (filter with data from precursors
detected in a data-dependent MS/MS experiment).
mzRange,mzStep Configures the formation of m/z bins. mzRange is a numeric
vector of length two that specifies the min/max m/z range. mzStep specifies the bin widths.
retRange A numeric vector of length two that specifies the retention time range for the EICs.
Data outside this range is excluded. Set to NULL to use the full range.
gapFactor A numeric that configures gap filling for EICs. See getDefEICParams
for further details.
minEICIntensity The minimum intensity of the highest data point in the EIC. Used to filter EICs.
minEICAdjTime,minEICAdjPoints,minEICAdjIntensity The EIC should have at least a
continuous signal of minEICAdjTime seconds and minEICAdjPoints data points, where the continuity is
defined by data points with an intensity of at least minEICAdjIntensity high. Set minEICAdjTime or
minEICAdjPoints to zero to disable continuity checks for time or data points, respectively. Set
minEICAdjIntensity to zero to completely disable continuity checks.
topMostEICMZ Only keep this number of top-most intense EICs. The intensity is derived from the data
point with the highest intensity in the EIC. Set to zero to always select all EICs.
For IMS workflows, this parameter is only used to limit the number of EICs resulting from the 'pre-check' in the m/z dimension.
The following parameters are specifically used for IMS workflows:
filterIMS Similar to the filter parameter, but controls how mobility data is used for pre-filtering of EIC bins.
Different values for filter and filterIMS can be specified:
filter="none" and filterIMS="none"
filter="suspects" and filterIMS="suspects"
filter="suspects" and filterIMS="none" (only use m/z filtering)
filter="ms2" and filterIMS="ms2"
filter="ms2" and filterIMS="none"
Currently only Bruker DDA-PASEF experiments provide the data needed for "ms2" filtering.
mobRange,mobStep Equivalent to mzRange and mzStep, but for ion mobility binning.
sumWindowMZ,sumWindowMob The retention time window (+/- s) used to sum adjacent datapoints
for the determination of intermediate EIC m/z and mobility values. This data is aggregated to determine
the final feature values (see also the assignRTWindow argument). Set to ‘0’ to not sum any adjacent
timepoints. Larger values can generally improve accuracy for noisy data (e.g. from TIMS), but care must be
taken to stay below the expected minimum chromatographic peak width to avoid inclusion of data from other
features. Defaults to defaultLim("retention", "very_narrow") (see limits).
smoothWindowMZ,smoothWindowMob The window size used to perform centered moving average
smoothing on intensity data of the m/z and mobility profiles used to determine intermediate EIC values.
Smoothing of noisy data (e.g. TIMS) is highly recommended to improve accuracy and consistency. Set to
0 to disable smoothing.
smoothExtMZ,smoothExtMob The m/z or mobility window to extend the smoothing at the
edges of the EIC bin. This is recommended to improve smoothing, e.g. when the peak profile is only
partially captured in the bin. Defaults to the bin width, i.e. data from an adjacent bin on each side is
additionally included for smoothing. The final smoothed data is only taken from the actual EIC bin. Set to
0 to disable extension.
saveMZProfiles,saveEIMs Set to TRUE to save the m/z and mobility profiles for
each feature. Only the profiles at the feature retention time is saved. This can be useful for debugging or
parameter optimization, but will increase memory usage and processing times.
topMostEICMob Equivalent to topMostEICMZ, used to reduce the final two-dimensional EIC bins
with m/z and mobility information.
minEICsIMSPreCheck Only perform the m/z pre-check if the number of two-dimensional EIC bins
is at least minEICsIMSPreCheck.
The following parameters are specifically for when suspect data is used to pre-filter EIC bins:
rtWindow,mzwindow,mobWindow: The retention time, m/z and mobility tolerance
windows for suspect data. These are used for:
Pre-filtering of EIC bins with suspect data, i.e. larger tolerances will lead to more EIC bins
being kept. (only applicable for mzWindow and mobWindow).
Matching the final features to suspect data. rtWindow=Inf can be used to disable retention time
matching.
Defaults to defaultLim("retention", "medium"), defaultLim("mz", "medium") and
defaultLim("mobility", "medium"), see limits.
skipInvalid,prefCalcChemProps,neutralChemProps Controls preparing the suspect list
data. See screenSuspects.
The following parameters are specifically for when MS2 data is used to pre-filter EICs:
rtWindow Eliminates any features without an MS/MS spectrum within this retention time window. Set
rtWindow=Inf to disable this filter. Defaults to defaultLim("retention", "very_narrow") (see
limits).
mzIsoWindow The maximum m/z window considered for MS/MS precursors that were isolated by DDA.
These m/z isolation windows are used to pre-filter EICs and match the final features. Setting
mzIsoWindow to a value lower than typical instrument isolation windows will make feature detection more
specific, as features need to be more close to the triggered DDA precursor m/z values. In contrast, larger
values for mzIsoWindow allows to include features that were not specifically targeted by DDA, but may
still have MS/MS data as their m/z could still fall within the MS/MS isolation window. The effective
window used will never exceed the instrumental isolation window. Setting mzIsoWindow=Inf will always use
instrumental windows.
NOTE: Sometimes the isolation windows are not exported and cannot be deduced automatically (e.g. Agilent
data). In that case, the mzIsoWindow parameter is used as the isolation window and therefore needs to be
set to a finite value.
mobWindow The mobility tolerance window to match DDA MS/MS precursors in IMS workflows. Used for
pre-filtering EICs and the final features. To match DDA precursor data, the measured mobility range of the
corresponding MS/MS data is used as the mobility window. This window is then adjusted to be at least +/-
mobWindow. Defaults to defaultLim("mobility", "medium") (see limits)
minTIC The minimum total ion current (TIC) signal for an MS/MS spectrum to be considered. Can be
increased to eliminate features with low intensity MS/MS data.
The raw data interface of patRoon is used by findFeaturesPiek to
process HRMS (or IMS-HRMS) data. Please see its documentation for more information on the supported
formats and available configuration options.
The use of profile m/z HRMS data (not IMS-HRMS) is currently not supported.
There are no references for Rd macro \insertAllCites on this help page.
findFeatures for more details and other algorithms.
Uses SAFD to obtain features. This functionality is still experimental. Please see the details below.
findFeaturesSAFD( analysisInfo, prefCentroid = FALSE, mzRange = c(0, 400), maxNumbIter = 1000, maxTPeakW = 300, resolution = 30000, minMSW = 0.02, RThreshold = 0.75, minInt = 2000, sigIncThreshold = 5, S2N = 2, minPeakWS = 3, centroidMethod = "RFM", centroidDM = 0.005, verbose = TRUE )findFeaturesSAFD( analysisInfo, prefCentroid = FALSE, mzRange = c(0, 400), maxNumbIter = 1000, maxTPeakW = 300, resolution = 30000, minMSW = 0.02, RThreshold = 0.75, minInt = 2000, sigIncThreshold = 5, S2N = 2, minPeakWS = 3, centroidMethod = "RFM", centroidDM = 0.005, verbose = TRUE )
analysisInfo |
A |
prefCentroid |
Set to NOTE: if |
mzRange |
The m/z window to be imported. |
maxNumbIter, maxTPeakW, resolution, minMSW, RThreshold, minInt, sigIncThreshold, S2N, minPeakWS
|
Parameters directly
passed to the |
centroidMethod, centroidDM
|
Passed to the |
verbose |
If set to |
This function uses SAFD to automatically find features. This function is called when calling findFeatures with
algorithm="safd".
The support for SAFD is still experimental, and its interface might change in the future.
In order to use SAFD, please make sure that its Julia packages are installed and you have verified that
everything works, e.g. by running the test data with SAFD.
As of patRoon ‘3.0’, findFeaturesSAFD uses the msdata interface instead of the
MS_Import.jl Julia package to read HRMS data. This means that MS_Import.jl does not need to
be installed, and all file formats supported by msdata are also supported for SAFD feature
detection. This includes IMS-HRMS data, however, in that case IMS resolved spectra are summed and the IMS dimension
is removed to make the data compatible for SAFD.
The SAFD algorithm was primarily developed to detect features in profile m/z data, but centroided
data is also supported. To use profile data, ensure that the paths are correctly set up in the
analysisInfo. Furthermore, when using profile data you probably also need to specify
centroided data in the analysisInfo, as e.g. generateMSPeakLists currently does not
support profile data. If IMS-HRMS data is used it is treated as profile data, as this data is typically not or
partially centroided (generateMSPeakLists supports IMS-HRMS data directly).
An object of a class which is derived from features.
findFeaturesSAFD uses multiprocessing to parallelize
computations. Please see the parallelization section in the handbook for
more details and patRoon options for configuration
options.
Note that for caching purposes, the analyses files must always exist on the local host computer, even if it is not participating in computations.
Samanipour S, OBrien JW, Reid MJ, Thomas KV (2019). “Self Adjusting Algorithm for the Nontargeted Feature Detection of High Resolution Mass Spectrometry Coupled with Liquid Chromatography Profile Data.” Analytical Chemistry, 91(16), 10800–10807. doi:10.1021/acs.analchem.9b02422.
findFeatures for more details and other algorithms.
Uses SIRIUS to find features.
findFeaturesSIRIUS(analysisInfo, verbose = TRUE)findFeaturesSIRIUS(analysisInfo, verbose = TRUE)
analysisInfo |
A |
verbose |
If set to |
This function uses SIRIUS to automatically find features. This function is called when calling findFeatures with
algorithm="sirius".
The features are collected by running the lcms-align SIRIUS command for every analysis.
The MS files should be in the ‘mzML’ or ‘mzXML’ format. Furthermore, this algorithms requires the presence of (data-dependent) MS/MS data.
The input MS data files need to be centroided. The convertMSFiles function can be used
to centroid data.
An object of a class which is derived from features.
findFeaturesSIRIUS uses multiprocessing to parallelize
computations. Please see the parallelization section in the handbook for
more details and patRoon options for configuration
options.
Note that for caching purposes, the analyses files must always exist on the local host computer, even if it is not participating in computations.
Duhrkop K, Fleischauer M, Ludwig M, Aksenov AA, Melnik AV, Meusel M, Dorrestein PC, Rousu J, Bocker S (2019). “SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information.” Nature Methods, 16(4), 299–302. doi:10.1038/s41592-019-0344-8.
findFeatures for more details and other algorithms.
Uses the legacy xcmsSet function from the xcms package to find features.
findFeaturesXCMS(analysisInfo, method = "centWave", ..., verbose = TRUE)findFeaturesXCMS(analysisInfo, method = "centWave", ..., verbose = TRUE)
analysisInfo |
A |
method |
The method setting used by XCMS peak finding, see |
... |
Further parameters passed to |
verbose |
If set to |
This function uses XCMS to automatically find features. This function is called when calling findFeatures with
algorithm="xcms".
This function uses the legacy interface of xcms. It is recommended to use
findFeaturesXCMS3 instead.
The file format of analyses must be mzML or mzXML.
The input MS data files need to be centroided. The convertMSFiles function can be used
to centroid data.
An object of a class which is derived from features.
Benton HP, Want EJ, Ebbels TMD (2010).
“Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data.”
BIOINFORMATICS, 26, 2488.
Louail P, Brunius C, Garcia-Aloy M, Kumler W, Storz N, Stanstrup J, Treutler H, Vangeenderhuysen P, Witting M, Neumann S, Rainer J (2025).
“xcms in Peak Form: Now Anchoring a Complete Metabolomics Data Preprocessing and Analysis Software Ecosystem.”
Analytical Chemistry.
doi:10.1021/acs.analchem.5c04338.
https://doi.org/10.1021/acs.analchem.5c04338.
Smith, C.A., Want, E.J., O'Maille, G., Abagyan,R., Siuzdak, G. (2006).
“XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching and identification.”
Analytical Chemistry, 78, 779–787.
Tautenhahn R, Boettcher C, Neumann S (2008).
“Highly sensitive feature detection for high resolution LC/MS.”
BMC Bioinformatics, 9, 504.
findFeatures for more details and other algorithms.
Uses the new xcms3 interface from the xcms package to find features.
findFeaturesXCMS3( analysisInfo, param = xcms::CentWaveParam(), ..., verbose = TRUE )findFeaturesXCMS3( analysisInfo, param = xcms::CentWaveParam(), ..., verbose = TRUE )
analysisInfo |
A |
param |
The method parameters used by XCMS peak finding, see
|
... |
Further parameters passed to |
verbose |
If set to |
This function uses XCMS3 to automatically find features. This function is called when calling findFeatures with
algorithm="xcms3".
The file format of analyses must be mzML or mzXML.
The input MS data files need to be centroided. The convertMSFiles function can be used
to centroid data.
An object of a class which is derived from features.
Benton HP, Want EJ, Ebbels TMD (2010).
“Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data.”
BIOINFORMATICS, 26, 2488.
Louail P, Brunius C, Garcia-Aloy M, Kumler W, Storz N, Stanstrup J, Treutler H, Vangeenderhuysen P, Witting M, Neumann S, Rainer J (2025).
“xcms in Peak Form: Now Anchoring a Complete Metabolomics Data Preprocessing and Analysis Software Ecosystem.”
Analytical Chemistry.
doi:10.1021/acs.analchem.5c04338.
https://doi.org/10.1021/acs.analchem.5c04338.
Smith, C.A., Want, E.J., O'Maille, G., Abagyan,R., Siuzdak, G. (2006).
“XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching and identification.”
Analytical Chemistry, 78, 779–787.
Tautenhahn R, Boettcher C, Neumann S (2008).
“Highly sensitive feature detection for high resolution LC/MS.”
BMC Bioinformatics, 9, 504.
findFeatures for more details and other algorithms.
Contains data of generated chemical formulae for given feature groups.
## S4 method for signature 'formulas' annotations(obj, features = FALSE) ## S4 method for signature 'formulas' analyses(obj) ## S4 method for signature 'formulas' defaultExclNormScores(obj) ## S4 method for signature 'formulas' show(object) ## S4 method for signature 'formulas,ANY,ANY' x[[i, j]] ## S4 method for signature 'formulas' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'formulas' as.data.table( x, fGroups = NULL, fragments = FALSE, countElements = NULL, countFragElements = NULL, OM = FALSE, normalizeScores = "none", excludeNormScores = defaultExclNormScores(x), average = FALSE ) ## S4 method for signature 'formulas' annotatedPeakList( obj, index, groupName, analysis = NULL, MSPeakLists, onlyAnnotated = FALSE ) ## S4 method for signature 'formulas' plotSpectrum( obj, index, groupName, analysis = NULL, MSPeakLists, title = NULL, normalized = "multiple", specSimParams = getDefSpecSimParams(), mincex = 0.9, xlim = NULL, ylim = NULL, showLegend = TRUE, ... ) ## S4 method for signature 'formulas' plotScores( obj, index, groupName, analysis = NULL, normalizeScores = "max", excludeNormScores = defaultExclNormScores(obj) ) ## S4 method for signature 'formulas' consensus( obj, ..., MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), absMinAbundance = NULL, relMinAbundance = NULL, uniqueFrom = NULL, uniqueOuter = FALSE, rankWeights = 1, labels = NULL ) ## S4 method for signature 'formulasSet' show(object) ## S4 method for signature 'formulasSet' delete(obj, i, j, ...) ## S4 method for signature 'formulasSet,ANY,missing,missing' x[i, j, ..., sets = NULL, updateConsensus = FALSE, drop = TRUE] ## S4 method for signature 'formulasSet' filter(obj, ..., sets = NULL, updateConsensus = FALSE, negate = FALSE) ## S4 method for signature 'formulasSet' plotSpectrum( obj, index, groupName, analysis = NULL, MSPeakLists, title = NULL, normalized = "multiple", specSimParams = getDefSpecSimParams(), mincex = 0.9, xlim = NULL, ylim = NULL, showLegend = TRUE, perSet = TRUE, mirror = TRUE, ... ) ## S4 method for signature 'formulasSet' annotatedPeakList(obj, index, groupName, analysis = NULL, MSPeakLists, ...) ## S4 method for signature 'formulasSet' consensus( obj, ..., MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), absMinAbundance = NULL, relMinAbundance = NULL, uniqueFrom = NULL, uniqueOuter = FALSE, rankWeights = 1, labels = NULL, filterSets = FALSE, setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE ) ## S4 method for signature 'formulasSet' unset(obj, set) ## S4 method for signature 'formulasConsensusSet' unset(obj, set) ## S4 method for signature 'formulasSIRIUS' delete(obj, i = NULL, j = NULL, ...)## S4 method for signature 'formulas' annotations(obj, features = FALSE) ## S4 method for signature 'formulas' analyses(obj) ## S4 method for signature 'formulas' defaultExclNormScores(obj) ## S4 method for signature 'formulas' show(object) ## S4 method for signature 'formulas,ANY,ANY' x[[i, j]] ## S4 method for signature 'formulas' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'formulas' as.data.table( x, fGroups = NULL, fragments = FALSE, countElements = NULL, countFragElements = NULL, OM = FALSE, normalizeScores = "none", excludeNormScores = defaultExclNormScores(x), average = FALSE ) ## S4 method for signature 'formulas' annotatedPeakList( obj, index, groupName, analysis = NULL, MSPeakLists, onlyAnnotated = FALSE ) ## S4 method for signature 'formulas' plotSpectrum( obj, index, groupName, analysis = NULL, MSPeakLists, title = NULL, normalized = "multiple", specSimParams = getDefSpecSimParams(), mincex = 0.9, xlim = NULL, ylim = NULL, showLegend = TRUE, ... ) ## S4 method for signature 'formulas' plotScores( obj, index, groupName, analysis = NULL, normalizeScores = "max", excludeNormScores = defaultExclNormScores(obj) ) ## S4 method for signature 'formulas' consensus( obj, ..., MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), absMinAbundance = NULL, relMinAbundance = NULL, uniqueFrom = NULL, uniqueOuter = FALSE, rankWeights = 1, labels = NULL ) ## S4 method for signature 'formulasSet' show(object) ## S4 method for signature 'formulasSet' delete(obj, i, j, ...) ## S4 method for signature 'formulasSet,ANY,missing,missing' x[i, j, ..., sets = NULL, updateConsensus = FALSE, drop = TRUE] ## S4 method for signature 'formulasSet' filter(obj, ..., sets = NULL, updateConsensus = FALSE, negate = FALSE) ## S4 method for signature 'formulasSet' plotSpectrum( obj, index, groupName, analysis = NULL, MSPeakLists, title = NULL, normalized = "multiple", specSimParams = getDefSpecSimParams(), mincex = 0.9, xlim = NULL, ylim = NULL, showLegend = TRUE, perSet = TRUE, mirror = TRUE, ... ) ## S4 method for signature 'formulasSet' annotatedPeakList(obj, index, groupName, analysis = NULL, MSPeakLists, ...) ## S4 method for signature 'formulasSet' consensus( obj, ..., MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), absMinAbundance = NULL, relMinAbundance = NULL, uniqueFrom = NULL, uniqueOuter = FALSE, rankWeights = 1, labels = NULL, filterSets = FALSE, setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE ) ## S4 method for signature 'formulasSet' unset(obj, set) ## S4 method for signature 'formulasConsensusSet' unset(obj, set) ## S4 method for signature 'formulasSIRIUS' delete(obj, i = NULL, j = NULL, ...)
obj, x, object
|
The |
features |
If |
i, j
|
For Otherwise passed to the |
... |
For For For For sets workflow methods: further arguments passed to the base |
fGroups, fragments, countElements, countFragElements, OM
|
Passed to the
|
normalizeScores |
A |
excludeNormScores |
A
For |
average |
If set to |
index |
The candidate index (row). For |
groupName |
The name of the feature group for which a plot should be made. To compare spectra, two group names can be specified. |
analysis |
The name of the analysis for which a plot should be made. If
|
MSPeakLists |
The |
onlyAnnotated |
Set to |
title |
The title of the plot. If |
normalized |
Controls intensity normalization. Should be |
specSimParams |
A named |
mincex |
The formula annotation labels are automatically scaled. The
|
xlim, ylim
|
Sets the plot size limits used by
|
showLegend |
Set to |
absMinAbundance, relMinAbundance
|
Minimum absolute or relative
(‘0-1’) abundance across objects for a result to be kept. For
instance, |
uniqueFrom |
Set this argument to only retain formulas that are unique
within one or more of the objects for which the consensus is made.
Selection is done by setting the value of |
uniqueOuter |
If |
rankWeights |
A numeric vector with weights of to calculate the mean ranking score for each candidate. The value will be re-cycled if necessary, hence, the default value of ‘1’ means equal weights for all considered objects. |
labels |
A |
sets |
(sets workflow) A |
updateConsensus |
(sets workflow) If |
drop |
Passed to the |
negate |
Passed to the |
perSet, mirror
|
(sets workflow) If |
filterSets |
(sets workflow) Controls how algorithms concensus abundance filters are applied. See the |
setThreshold, setThresholdAnn
|
(sets workflow) Thresholds used to create the annotation set consensus. See
|
setAvgSpecificScores |
(sets workflow) If |
set |
(sets workflow) The name of the set. |
formulas objects are obtained with generateFormulas. This class is derived from the
featureAnnotations class, please see its documentation for more methods and other details.
annotations returns a list containing for each feature
group (or feature if features=TRUE) a data.table
with an overview of all generated formulae and other data such as candidate
scoring and MS/MS fragments.
consensus returns a formulas object that is produced by
merging results from multiple formulas objects.
annotations(formulas): Accessor method to obtain generated formulae.
analyses(formulas): returns a character vector with the names of the
analyses for which data is present in this object.
defaultExclNormScores(formulas): Returns default scorings that are excluded from normalization.
show(formulas): Show summary information for this object.
x[[i: Extracts a formula table, either for a feature group or for features in an analysis.
as.data.table(formulas): Generates a table with all candidate formulae for each feature group and other information such
as element counts.
annotatedPeakList(formulas): Returns an MS/MS peak list annotated with data from a
given candidate formula.
plotSpectrum(formulas): Plots an annotated spectrum for a given candidate formula of a feature or feature group. Two
spectra can be compared by specifying a two-sized vector for the index, groupName and (if desired)
analysis arguments.
plotScores(formulas): Plots a barplot with scoring of a candidate formula.
consensus(formulas): Generates a consensus of results from multiple
objects. In order to rank the consensus candidates, first
each of the candidates are scored based on their original ranking
(the scores are normalized and the highest ranked candidate gets value
‘1’). The (weighted) mean is then calculated for all scorings of each
candidate to derive the final ranking (if an object lacks the candidate its
score will be ‘0’). The original rankings for each object is stored in
the rank columns.
featureFormulasA list with all generated formulae for each analysis/feature group. Use the
annotations method for access.
setThreshold,setThresholdAnn,setAvgSpecificScores(sets workflow) A copy of the equally named arguments that were
passed when this object was created by generateFormulas.
origFGNames(sets workflow) The original (order of) names of the featureGroups object that was used to
create this object.
MS2QuantMeta(sets workflow) A named list with for each set the metadata from MS2Quant filled in by
predictRespFactors.
Subscripting of formulae for plots generated by
plotSpectrum is based on the chemistry2expression function
from the ReSOLUTION package.
The formulasSet class is applicable for sets workflows. This class is derived from formulas and therefore largely follows the same user interface.
The following methods are specifically defined for sets workflows:
All the methods from base class workflowStepSet.
unset Converts the object data for a specified set into a 'non-set' object (formulasUnset), which allows it to be used in 'regular' workflows. Only the annotation results that are present in the specified set are kept
(based on the set consensus, see below for implications).
The following methods are changed or with new functionality:
filter and the subset operator ([) Can be used to select data that is only present for selected
sets. Depending on the updateConsenus, both either operate on set consensus or original data (see below for
implications).
annotatedPeakList Returns a combined annotation table with all sets.
plotSpectrum Is able to highlight set specific mass peaks (perSet and mirror arguments).
consensus Creates the algorithm consensus based on the original annotation data (see below for
implications). Then, like the sets workflow method for generateFormulas, a consensus is made for all
sets, which can be controlled with the setThreshold and setThresholdAnn arguments. The candidate
coverage among the different algorithms is calculated for each set (e.g. coverage-positive column)
and for all sets (coverage column), which is based on the presence of a candidate in all the algorithms from
all sets data. The consensus method for sets workflow data supports the filterSets argument. This
controls how the algorithm consensus abundance filters (absMinAbundance/relMinAbundance) are applied:
if filterSets=TRUE then the minimum of all coverage set specific columns is used to obtain the
algorithm abundance. Otherwise the overall coverage column is used. For instance, consider a consensus
object to be generated from two objects generated by different algorithms (e.g. SIRIUS and
GenForm), which both have a positive and negative set. Then, if a candidate occurs with both
algorithms for the positive mode set, but only with the first algorithm in the negative mode set,
relMinAbundance=1 will remove the candidate if filterSets=TRUE (because the minimum relative
algorithm abundance is ‘0.5’), while filterSets=FALSE will not remove the candidate (because based on
all sets data the candidate occurs in both algorithms).
Two types of annotation data are stored in a formulasSet object:
Annotations that are produced from a consensus between set results (see generateFormulas).
The 'original' annotation data per set, prior to when the set consensus was made. This includes candidates
that were filtered out because of the thresholds set by setThreshold and setThresholdAnn. However,
when filter or subsetting ([) operations are performed, the original data is also updated.
In most cases the first data is used. However, in a few cases the original annotation data is used (as indicated
above), for instance, to re-create the set consensus. It is important to realize that the original annotation data
may have additional candidates, and a newly created set consensus may therefore have 'new' candidates. For
instance, when the object consists of the sets "positive" and "negative" and setThreshold=1
was used to create it, then formulas[, sets = "positive", updateConsensus = TRUE] may now have additional
candidates, i.e. those that were not present in the "negative" set and were previously removed due to
the consensus threshold filter.
The featureAnnotations base class for more relevant methods and
generateFormulas.
Returns a data.frame with information on which scoring terms are used and what their algorithm specific name
is.
formulaScorings()formulaScorings()
generateFormulas
This class is derived from formulas and contains additional specific SIRIUS data.
Objects from this class are generated by generateFormulasSIRIUS
fingerprintsA list with for each feature group result a data.table containing fingerprints
obtained with CSI:FingerID. Will be empty unless the getFingerprints argument to
generateFormulasSIRIUS was set to TRUE.
MS2QuantMetaMetadata from MS2Quant filled in by predictRespFactors.
Duhrkop K, Fleischauer M, Ludwig M, Aksenov AA, Melnik AV, Meusel M, Dorrestein PC, Rousu J, Bocker S (2019).
“SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information.”
Nature Methods, 16(4), 299–302.
doi:10.1038/s41592-019-0344-8.
Duhrkop K, Bocker S (2015).
“Fragmentation Trees Reloaded.”
In Przytycka TM (ed.), Research in Computational Molecular Biology, 65–79.
ISBN 978-3-319-16706-0.
Duhrkop K, Shen H, Meusel M, Rousu J, Bocker S (2015).
“Searching molecular structure databases with tandem mass spectra using CSI:FingerID.”
Proceedings of the National Academy of Sciences, 112(41), 12580–12585.
doi:10.1073/pnas.1509788112.
Bocker S, Letzel MC, Liptak Z, Pervukhin A (2008).
“SIRIUS: decomposing isotope patterns for metabolite identification.”
Bioinformatics, 25(2), 218–224.
doi:10.1093/bioinformatics/btn603.
formulas and generateFormulasSIRIUS
Functionality to automatically group related feature groups (e.g. isotopes, adducts and homologues) to assist and simplify annotation.
generateComponents(fGroups, algorithm, ...) ## S4 method for signature 'featureGroups' generateComponents(fGroups, algorithm, ...)generateComponents(fGroups, algorithm, ...) ## S4 method for signature 'featureGroups' generateComponents(fGroups, algorithm, ...)
fGroups |
|
algorithm |
A character string describing the algorithm that should be
used: |
... |
Any parameters to be passed to the selected component generation algorithm. |
Several algorithms are provided to group feature groups that are related in some (chemical) way to each other. How feature groups are related depends on the algorithm: examples include adducts, statistics and parents/transformation products. The linking of this data is generally useful for annotation purposes and reducing data complexity.
generateComponents is a generic function that will generateComponents by one of the supported algorithms. The actual
functionality is provided by algorithm specific functions such as generateComponentsRAMClustR and generateComponentsNontarget. While these
functions may be called directly, generateComponents provides a generic interface and is therefore usually preferred.
A components (derived) object containing all generated components.
In a sets workflow the componentization data is generated differently
depending on the used algorithm. Please see the details in the algorithm specific functions linked in the See Also section.
The components output class and its methods and the algorithm specific functions:
generateComponentsRAMClustR, generateComponentsCAMERA, generateComponentsNontarget, generateComponentsIntClust, generateComponentsOpenMS, generateComponentsCliqueMS, generateComponentsSpecClust, generateComponentsTPs
Interfaces with CAMERA to generate components from known adducts, isotopes and in-source fragments.
generateComponentsCAMERA(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsCAMERA( fGroups, ionization = NULL, onlyIsotopes = FALSE, minSize = 2, relMinReplicates = 0.5, extraOpts = NULL ) ## S4 method for signature 'featureGroupsSet' generateComponentsCAMERA(fGroups, ionization = NULL, ...)generateComponentsCAMERA(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsCAMERA( fGroups, ionization = NULL, onlyIsotopes = FALSE, minSize = 2, relMinReplicates = 0.5, extraOpts = NULL ) ## S4 method for signature 'featureGroupsSet' generateComponentsCAMERA(fGroups, ionization = NULL, ...)
fGroups |
|
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
ionization |
Which ionization polarity was used to generate the data: should be (sets workflow) This parameter is not supported for sets workflows, as the ionization will always be detected automatically. |
onlyIsotopes |
Logical value. If |
minSize |
The minimum size of a component. Smaller components than this size will be removed. See note below. |
relMinReplicates |
Feature groups within a component are only kept when they contain data for at least this (relative) amount of replicate analyses. For instance, ‘0.5’ means that at least half of the replicates should contain data for a particular feature group in a component. In this calculation replicates that are fully absent within a component are not taken in to account. See note below. |
extraOpts |
Named character vector with extra arguments directly passed to
|
This function uses CAMERA to generate components. This function is called when calling generateComponents with
algorithm="camera".
The specified featureGroups object is automatically converted to an xcmsSet object
using getXCMSSet.
A components (derived) object containing all generated components.
The componentization algorithm is not aware of the IMS dimension. For this reason, no
IMS feature groups will be considered for componentization, and direct IMS workflows (see
assignMobilitities) are currently not supported.
In a sets workflow the componentization is first performed for each
set independently. The resulting components are then all combined in a componentsSet object. Note that
the components themselves are never merged. The components are renamed to include the set name from which they were
generated (e.g. "CMP1" becomes "CMP1-positive").
The default value for minSize and relMinReplicates results in
extra filtering, hence, the final results may be different than what the algorithm normally would return.
Kuhl C, Tautenhahn R, Boettcher C, Larson TR, Neumann S (2012). “CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets.” Analytical Chemistry, 84, 283–289. http://pubs.acs.org/doi/abs/10.1021/ac202450g.
generateComponents for more details and other algorithms.
Uses cliqueMS to generate components using the
cliqueMS::getCliques function.
generateComponentsCliqueMS(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsCliqueMS( fGroups, ionization = NULL, maxCharge = 1, maxGrade = 2, ppm = 10, adductInfo = NULL, absMzDev = defaultLim("mz", "medium"), minSize = 2, relMinAdductAbundance = 0.75, adductConflictsUsePref = TRUE, NMConflicts = c("preferential", "mostAbundant", "mostIntense"), prefAdducts = c("[M+H]+", "[M-H]-"), extraOptsCli = NULL, extraOptsIso = NULL, extraOptsAnn = NULL, parallel = TRUE ) ## S4 method for signature 'featureGroupsSet' generateComponentsCliqueMS(fGroups, ionization = NULL, ...)generateComponentsCliqueMS(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsCliqueMS( fGroups, ionization = NULL, maxCharge = 1, maxGrade = 2, ppm = 10, adductInfo = NULL, absMzDev = defaultLim("mz", "medium"), minSize = 2, relMinAdductAbundance = 0.75, adductConflictsUsePref = TRUE, NMConflicts = c("preferential", "mostAbundant", "mostIntense"), prefAdducts = c("[M+H]+", "[M-H]-"), extraOptsCli = NULL, extraOptsIso = NULL, extraOptsAnn = NULL, parallel = TRUE ) ## S4 method for signature 'featureGroupsSet' generateComponentsCliqueMS(fGroups, ionization = NULL, ...)
fGroups |
|
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
ionization |
Which ionization polarity was used to generate the data: should be (sets workflow) This parameter is not supported for sets workflows, as the ionization will always be detected automatically. |
maxCharge, maxGrade, ppm
|
Arguments passed to |
adductInfo |
Sets the |
absMzDev |
Maximum absolute m/z deviation. |
minSize |
The minimum size of a component. Smaller components than this size will be removed. See note below. |
relMinAdductAbundance |
The minimum relative abundance (‘0-1’) that an adduct should be assigned to
features within the same feature group. See the |
adductConflictsUsePref |
If set to |
NMConflicts |
The strategies to employ when not all neutral masses within a component are equal. Valid options
are: |
prefAdducts |
A |
extraOptsCli, extraOptsIso, extraOptsAnn
|
Named |
parallel |
If set to |
This function uses cliqueMS to generate components. This function is called when calling generateComponents with
algorithm="cliquems".
The grouping of features in each component ('clique') is based on high similarity of chromatographic elution
profiles. All features in each component are then annotated with the
cliqueMS::getIsotopes and
cliqueMS::getAnnotation functions.
A componentsFeatures derived object.
The returned components are based on so called feature components. Unlike other algorithms, components are first made on a feature level (per analysis), instead of for complete feature groups. In the final step the feature components are converted to 'regular' components by employing a consensus approach with the following steps:
If an adduct assigned to a feature only occurs as a minority compared to other adduct assigments within the
same feature group, it is considered as an outlier and removed accordingly (controlled by the
relMinAdductAbundance argument).
For features within a feature group, only keep their adduct assignment if it occurs as the most frequent or
is preferential (controlled by adductConflictsUsePref and prefAdducts arguments).
Components are made by combining the feature groups for which at least one of their features are jointly present in the same feature component.
Conflicts of neutral mass assignments within a component (i.e. not all are the same) are dealt with.
Firstly, all feature groups with an unknown neutral mass are split in another component. Then, if conflicts still
occur, the feature groups with similar neutral mass (determined by absMzDev argument) are grouped. Depending
on the NMConflicts argument, the group with one or more preferential adduct(s) or that is the largest or
most intense is selected, whereas others are removed from the component. In case multiple groups contain
preferential adducts, and ‘>1’ preferential adducts are available, the group with the adduct that matches
first in prefAdducts 'wins'. In case of ties, one of the next strategies in NMConflicts is tried.
If a feature group occurs in multiple components it will be removed completely.
the minSize filter is applied.
The componentization algorithm is not aware of the IMS dimension. For this reason, no
IMS feature groups will be considered for componentization, and direct IMS workflows (see
assignMobilitities) are currently not supported.
In a sets workflow the componentization is first performed for each
set independently. The resulting components are then all combined in a componentsSet object. Note that
the components themselves are never merged. The components are renamed to include the set name from which they were
generated (e.g. "CMP1" becomes "CMP1-positive").
Senan O, Aguilar-Mogas A, Navarro M, Capellades J, Noon L, Burks D, Yanes O, Guimera R, Sales-Pardo M (2019). “CliqueMS: a computational tool for annotating in-source metabolite ions from LC-MS untargeted metabolomics data based on a coelution similarity network.” Bioinformatics, 35(20), 4089–4097. doi:10.1093/bioinformatics/btz207.
generateComponents for more details and other algorithms.
Generates components based on intensity profiles of feature groups.
generateComponentsIntClust(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsIntClust( fGroups, method = "complete", metric = "euclidean", normalized = TRUE, average = TRUE, maxTreeHeight = 1, deepSplit = TRUE, minModuleSize = 1, IMS = "maybe" )generateComponentsIntClust(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsIntClust( fGroups, method = "complete", metric = "euclidean", normalized = TRUE, average = TRUE, maxTreeHeight = 1, deepSplit = TRUE, minModuleSize = 1, IMS = "maybe" )
fGroups |
|
... |
Any parameters to be passed to the selected component generation algorithm. |
method |
Clustering method that should be applied (passed to
|
metric |
Distance metric used to calculate the distance matrix (passed to |
normalized, average
|
Passed to |
maxTreeHeight, deepSplit, minModuleSize
|
Arguments used by
|
IMS |
(IMS workflow) Specifies which feature groups are considered for componentization in IMS workflows. The following options are valid:
|
This function uses hierarchical clustering of intensity profiles to generate components. This function is called when calling generateComponents with
algorithm="intclust".
Hierarchical clustering is performed on normalized (and optionally replicate averaged) intensity data and
the resulting dendrogram is automatically cut with cutreeDynamicTree. The distance matrix is
calculated with daisy and clustering is performed with
fastcluster::hclust. The clustering of the resulting components can be further
visualized and modified using the methods defined for componentsIntClust.
The components are stored in objects derived from componentsIntClust.
In IMS workflows with post mobility assignment (see
assignMobilities) it may be necessary to call expandForIMS when
componentization was performed prior to mobility assignments, see its documentation for more details.
If mobilities were already assigned prior to componentization, then the IMS argument selects which feature
groups are subjected to componentization. Data for IMS feature groups that were not considered (i.e.
when IMS is FALSE or "maybe"), will be expanded similarly as is done by
expandForIMS.
In a sets workflow normalization of feature intensities occur per set.
Müllner D (2013). “fastcluster: Fast Hierarchical, Agglomerative Clustering Routines for R and Python.” Journal of Statistical Software, 53(9), 1–18. doi:10.18637/jss.v053.i09.
Schollee JE, Bourgin M, von Gunten U, McArdell CS, Hollender J (2018). “Non-target screening to trace ozonation transformation products in a wastewater treatment train including different post-treatments.” Water Research, 142, 267–278. doi:10.1016/j.watres.2018.05.045.
generateComponents for more details and other algorithms.
Uses the nontarget R package to generate components by unsupervised detection of homologous series.
generateComponentsNontarget(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsNontarget( fGroups, ionization = NULL, rtRange = c(-120, 120), mzRange = c(5, 120), elements = c("C", "H", "O"), rtDev = defaultLim("retention", "wide"), absMzDev = defaultLim("mz", "narrow"), absMzDevLink = defaultLim("mz", "medium"), traceHack = all(R.Version()[c("major", "minor")] >= c(3, 4)), ... ) ## S4 method for signature 'featureGroupsSet' generateComponentsNontarget(fGroups, ionization = NULL, ...)generateComponentsNontarget(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsNontarget( fGroups, ionization = NULL, rtRange = c(-120, 120), mzRange = c(5, 120), elements = c("C", "H", "O"), rtDev = defaultLim("retention", "wide"), absMzDev = defaultLim("mz", "narrow"), absMzDevLink = defaultLim("mz", "medium"), traceHack = all(R.Version()[c("major", "minor")] >= c(3, 4)), ... ) ## S4 method for signature 'featureGroupsSet' generateComponentsNontarget(fGroups, ionization = NULL, ...)
fGroups |
|
... |
Any further arguments passed to |
ionization |
Which ionization polarity was used to generate the data: should be (sets workflow) This parameter is not supported for sets workflows, as the ionization will always be detected automatically. |
rtRange |
A numeric vector containing the minimum and maximum retention time (in seconds) between homologues.
Series are always considered from low to high m/z, thus, a negative minimum retention time allows detection
of homologous series with increasing m/z and decreasing retention times. These values set the |
mzRange |
A numeric vector specifying the minimum and maximum m/z increment of a homologous series. Sets
the |
elements |
A character vector with elements to be considered for detection of repeating units. Sets the
|
rtDev |
Maximum retention time deviation. Sets the |
absMzDev |
Maximum absolute m/z deviation. Sets the |
absMzDevLink |
Maximum absolute m/z deviation when linking series. This should usually be a bit higher
than |
traceHack |
Currently |
This function uses nontarget to generate components. This function is called when calling generateComponents with
algorithm="nontarget".
In the first step the homol.search function is used to detect all homologous series
within each replicate (analyses within each replicate are averaged prior to detection). Then,
homologous series across replicates are merged in case of full overlap or when merging of partial overlapping
series causes no conflicts.
The generated comnponents are returned as an object from the componentsNT class.
In a sets workflow the componentization is first performed for each
set independently. The resulting components are then all combined in a componentsNTSet object. Note that
the components themselves are never merged. The components are renamed to include the set name from which they were
generated (e.g. "CMP1" becomes "CMP1-positive").
The output class supports additional methods such as plotGraph.
The componentization algorithm is not aware of the IMS dimension. For this reason, no
IMS feature groups will be considered for componentization, and direct IMS workflows (see
assignMobilitities) are currently not supported.
Loos M, Singer H (2017).
“Nontargeted homologue series extraction from hyphenated high resolution mass spectrometry data.”
Journal of Cheminformatics, 9(1).
doi:10.1186/s13321-017-0197-z.
Loos M, Gerber C, Corona F, Hollender J, Singer H (2015).
“Accelerated Isotope Fine Structure Calculation Using Pruned Transition Trees.”
Analytical Chemistry, 87(11), 5738-5744.
https://pubs.acs.org/doi/abs/10.1021/acs.analchem.5b00941.
generateComponents for more details and other algorithms.
Uses the MetaboliteAdductDecharger utility (see http://www.openms.de) to generate components.
generateComponentsOpenMS(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsOpenMS( fGroups, ionization = NULL, chargeMin = 1, chargeMax = 1, chargeSpan = 3, qTry = "heuristic", potentialAdducts = NULL, minRTOverlap = 0.66, retWindow = defaultLim("retention", "very_narrow"), absMzDev = defaultLim("mz", "medium"), minSize = 2, relMinAdductAbundance = 0.75, adductConflictsUsePref = TRUE, NMConflicts = c("preferential", "mostAbundant", "mostIntense"), prefAdducts = c("[M+H]+", "[M-H]-"), extraOpts = NULL ) ## S4 method for signature 'featureGroupsSet' generateComponentsOpenMS( fGroups, ionization = NULL, chargeMin = 1, chargeMax = 1, chargeSpan = 3, qTry = "heuristic", potentialAdducts = NULL, ... )generateComponentsOpenMS(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsOpenMS( fGroups, ionization = NULL, chargeMin = 1, chargeMax = 1, chargeSpan = 3, qTry = "heuristic", potentialAdducts = NULL, minRTOverlap = 0.66, retWindow = defaultLim("retention", "very_narrow"), absMzDev = defaultLim("mz", "medium"), minSize = 2, relMinAdductAbundance = 0.75, adductConflictsUsePref = TRUE, NMConflicts = c("preferential", "mostAbundant", "mostIntense"), prefAdducts = c("[M+H]+", "[M-H]-"), extraOpts = NULL ) ## S4 method for signature 'featureGroupsSet' generateComponentsOpenMS( fGroups, ionization = NULL, chargeMin = 1, chargeMax = 1, chargeSpan = 3, qTry = "heuristic", potentialAdducts = NULL, ... )
fGroups |
|
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
ionization |
Which ionization polarity was used to generate the data: should be (sets workflow) This parameter is not supported for sets workflows, as the ionization will always be detected automatically. |
chargeMin, chargeMax
|
The minimum/maximum charge to consider. Corresponds to the
|
chargeSpan |
The maximum charge span for a single analyte. Corresponds to
|
qTry |
Sets how charges are determined. Corresponds to |
potentialAdducts |
The adducts to consider. Should be a (sets workflow) Should be a |
minRTOverlap, retWindow
|
Sets feature retention tolerances when grouping features. Sets the
|
absMzDev |
Maximum absolute m/z deviation. Sets the |
minSize |
The minimum size of a component. Smaller components than this size will be removed. See note below. |
relMinAdductAbundance |
The minimum relative abundance (‘0-1’) that an adduct should be assigned to
features within the same feature group. See the |
adductConflictsUsePref |
If set to |
NMConflicts |
The strategies to employ when not all neutral masses within a component are equal. Valid options
are: |
prefAdducts |
A |
extraOpts |
Named character vector with extra command line parameters directly passed to
|
This function uses OpenMS to generate components. This function is called when calling generateComponents with
algorithm="openms".
Features that show highly similar chromatographic elution profiles are grouped, and subsequently annotated with their adducts.
A componentsFeatures derived object.
The returned components are based on so called feature components. Unlike other algorithms, components are first made on a feature level (per analysis), instead of for complete feature groups. In the final step the feature components are converted to 'regular' components by employing a consensus approach with the following steps:
If an adduct assigned to a feature only occurs as a minority compared to other adduct assigments within the
same feature group, it is considered as an outlier and removed accordingly (controlled by the
relMinAdductAbundance argument).
For features within a feature group, only keep their adduct assignment if it occurs as the most frequent or
is preferential (controlled by adductConflictsUsePref and prefAdducts arguments).
Components are made by combining the feature groups for which at least one of their features are jointly present in the same feature component.
Conflicts of neutral mass assignments within a component (i.e. not all are the same) are dealt with.
Firstly, all feature groups with an unknown neutral mass are split in another component. Then, if conflicts still
occur, the feature groups with similar neutral mass (determined by absMzDev argument) are grouped. Depending
on the NMConflicts argument, the group with one or more preferential adduct(s) or that is the largest or
most intense is selected, whereas others are removed from the component. In case multiple groups contain
preferential adducts, and ‘>1’ preferential adducts are available, the group with the adduct that matches
first in prefAdducts 'wins'. In case of ties, one of the next strategies in NMConflicts is tried.
If a feature group occurs in multiple components it will be removed completely.
the minSize filter is applied.
The componentization algorithm is not aware of the IMS dimension. For this reason, no
IMS feature groups will be considered for componentization, and direct IMS workflows (see
assignMobilitities) are currently not supported.
In a sets workflow the componentization is first performed for each
set independently. The resulting components are then all combined in a componentsSet object. Note that
the components themselves are never merged. The components are renamed to include the set name from which they were
generated (e.g. "CMP1" becomes "CMP1-positive").
generateComponentsOpenMS uses multiprocessing to parallelize computations. Please see the parallelization section in the handbook for more details and patRoon options for configuration options.
Bielow C, Ruzek S, Huber CG, Reinert K (2010). “Optimal Decharging and Clustering of Charge Ladders Generated in ESI-MS.” Journal of Proteome Research, 9(5), 2688–2695. doi:10.1021/pr100177k.
generateComponents for more details and other algorithms.
Uses RAMClustR to generate components from feature groups which follow similar chromatographic retention profiles and annotate their relationships (e.g. adducts and isotopes).
generateComponentsRAMClustR(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsRAMClustR( fGroups, ionization = NULL, st = NULL, sr = NULL, maxt = 12, hmax = 0.3, normalize = "TIC", absMzDev = defaultLim("mz", "narrow"), relMzDev = defaultLim("mz", "narrow_rel"), minSize = 2, relMinReplicates = 0.5, RCExperimentVals = list(design = list(platform = "LC-MS"), instrument = list(ionization = ionization, MSlevs = 1)), extraOptsRC = NULL, extraOptsFM = NULL ) ## S4 method for signature 'featureGroupsSet' generateComponentsRAMClustR(fGroups, ionization = NULL, ...)generateComponentsRAMClustR(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsRAMClustR( fGroups, ionization = NULL, st = NULL, sr = NULL, maxt = 12, hmax = 0.3, normalize = "TIC", absMzDev = defaultLim("mz", "narrow"), relMzDev = defaultLim("mz", "narrow_rel"), minSize = 2, relMinReplicates = 0.5, RCExperimentVals = list(design = list(platform = "LC-MS"), instrument = list(ionization = ionization, MSlevs = 1)), extraOptsRC = NULL, extraOptsFM = NULL ) ## S4 method for signature 'featureGroupsSet' generateComponentsRAMClustR(fGroups, ionization = NULL, ...)
fGroups |
|
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
ionization |
Which ionization polarity was used to generate the data: should be (sets workflow) This parameter is not supported for sets workflows, as the ionization will always be detected automatically. |
st, sr, maxt, hmax, normalize
|
Arguments to tune the behaviour of feature group clustering. See their documentation
from |
absMzDev |
Maximum absolute m/z deviation. Sets the |
relMzDev |
Maximum relative mass deviation (ppm). Sets the |
minSize |
The minimum size of a component. Smaller components than this size will be removed. See note below. Sets the |
relMinReplicates |
Feature groups within a component are only kept when they contain data for at least this (relative) amount of replicate analyses. For instance, ‘0.5’ means that at least half of the replicates should contain data for a particular feature group in a component. In this calculation replicates that are fully absent within a component are not taken in to account. See note below. |
RCExperimentVals |
A named |
extraOptsRC, extraOptsFM
|
Named |
This function uses RAMClustR to generate components. This function is called when calling generateComponents with
algorithm="ramclustr".
This method uses the ramclustR functions for generating the components, whereas
do.findmain is used for annotation.
A components (derived) object containing all generated components.
The componentization algorithm is not aware of the IMS dimension. For this reason, no
IMS feature groups will be considered for componentization, and direct IMS workflows (see
assignMobilitities) are currently not supported.
In a sets workflow the componentization is first performed for each
set independently. The resulting components are then all combined in a componentsSet object. Note that
the components themselves are never merged. The components are renamed to include the set name from which they were
generated (e.g. "CMP1" becomes "CMP1-positive").
The default value for relMinReplicates results in
extra filtering, hence, the final results may be different than what the algorithm normally would return.
Broeckling, Heuberger CD;, Prince AL;, Ingelsson JA;, Prenni E;, E. J (2013).
“Assigning precursor-product ion relationships in indiscriminant MS/MS data from non-targeted metabolite profiling studies.”
Analytical Chemistry, 9, 33-43.
Broeckling CD, Afsar FA, Neumann S, Ben-Hur A, Prenni JE (2014).
“RAMClust: A Novel Feature Clustering Method Enables Spectral-Matching-Based Annotation for Metabolomics Data.”
Analytical Chemistry, 86 (14), 6812–6817.
generateComponents for more details and other algorithms.
Generates components based on MS/MS similarity between feature groups.
generateComponentsSpecClust(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsSpecClust( fGroups, MSPeakLists, method = "complete", specSimParams = getDefSpecSimParams(), maxTreeHeight = 1, deepSplit = TRUE, minModuleSize = 1, IMS = "maybe" )generateComponentsSpecClust(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsSpecClust( fGroups, MSPeakLists, method = "complete", specSimParams = getDefSpecSimParams(), maxTreeHeight = 1, deepSplit = TRUE, minModuleSize = 1, IMS = "maybe" )
fGroups |
|
... |
Any parameters to be passed to the selected component generation algorithm. |
MSPeakLists |
The |
method |
Clustering method that should be applied (passed to
|
specSimParams |
A named |
maxTreeHeight, deepSplit, minModuleSize
|
Arguments used by
|
IMS |
(IMS workflow) Specifies which feature groups are considered for componentization in IMS workflows. The following options are valid:
|
This function uses hierarchical clustering of MS/MS spectra to generate components. This function is called when calling generateComponents with
algorithm="specclust".
The similarities are converted to a distance matrix and used as input for hierarchical clustering, and the
resulting dendrogram is automatically cut with cutreeDynamicTree. The clustering is performed with
fastcluster::hclust.
The components are stored in objects derived from componentsSpecClust.
In IMS workflows with post mobility assignment (see
assignMobilities) it may be necessary to call expandForIMS when
componentization was performed prior to mobility assignments, see its documentation for more details.
If mobilities were already assigned prior to componentization, then the IMS argument selects which feature
groups are subjected to componentization. Data for IMS feature groups that were not considered (i.e.
when IMS is FALSE or "maybe"), will be expanded similarly as is done by
expandForIMS.
In a sets workflow the spectral similarities for each set are
combined as is described for the spectrumSimilarity method
for sets workflows.
Rick Helmus <[email protected]> and Bas van de Velde (major contributions to spectral binning and similarity calculation).
Müllner D (2013). “fastcluster: Fast Hierarchical, Agglomerative Clustering Routines for R and Python.” Journal of Statistical Software, 53(9), 1–18. doi:10.18637/jss.v053.i09.
generateComponents for more details and other algorithms.
Generates components by linking feature groups of transformation products and their parents.
generateComponentsTPs(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsTPs( fGroups, fGroupsTPs = fGroups, TPs = NULL, MSPeakLists = NULL, formulas = NULL, compounds = NULL, ignoreParents = FALSE, minRTDiff = 20, specSimParams = getDefSpecSimParams(), IMS = "maybe" ) ## S4 method for signature 'featureGroupsSet' generateComponentsTPs( fGroups, fGroupsTPs = fGroups, TPs = NULL, MSPeakLists = NULL, formulas = NULL, compounds = NULL, ignoreParents = FALSE, minRTDiff = 20, specSimParams = getDefSpecSimParams(), IMS = "maybe" )generateComponentsTPs(fGroups, ...) ## S4 method for signature 'featureGroups' generateComponentsTPs( fGroups, fGroupsTPs = fGroups, TPs = NULL, MSPeakLists = NULL, formulas = NULL, compounds = NULL, ignoreParents = FALSE, minRTDiff = 20, specSimParams = getDefSpecSimParams(), IMS = "maybe" ) ## S4 method for signature 'featureGroupsSet' generateComponentsTPs( fGroups, fGroupsTPs = fGroups, TPs = NULL, MSPeakLists = NULL, formulas = NULL, compounds = NULL, ignoreParents = FALSE, minRTDiff = 20, specSimParams = getDefSpecSimParams(), IMS = "maybe" )
fGroups |
The input |
... |
Further arguments specified to the methods. |
fGroupsTPs |
A |
TPs |
A |
MSPeakLists, formulas, compounds
|
A |
ignoreParents |
If |
minRTDiff |
Minimum retention time (in seconds) difference between the parent and a TP to calculate the retention order direction. |
specSimParams |
A named |
IMS |
(IMS workflow) Specifies which feature groups are considered for componentization in IMS workflows. The following options are valid:
|
This function uses transformation product screening to generate components. This function is called when calling generateComponents with
algorithm="tp".
This method typically employs data from generated transformation products to find
parents and their TPs. However, this data is not necessary, and components can also be made based on MS/MS
similarity and/or other annotation similarities between the parent and its TPs. For more details see the
Linking parents and transformation products section below.
The components are stored in objects derived from componentsTPs.
Each component consists of feature groups that are considered
to be transformation products for one parent (the parent that 'belongs' to the component can be retrieved with the
componentInfo method). The parent feature groups are taken from the fGroups parameter, while
the feature groups for TPs are taken from fGroupsTPs. If a feature group occurs in both variables, it may
therefore be considered as both a parent or TP.
If transformation product data is given, i.e. the TPs argument is set, then a suspect screening of
the parents and/or TPs may need to be performed in advance to facilitate linkage. This depends on the algorithm
that was used to generate the TPs:
the parents need to be screened for all algorithms except logic (generateTPsLogic)
the TPs need to be screened for all algorithms except ann_form and ann_comp
(generateTPsAnnForm and generateTPsAnnComp).
See screenSuspects to perform the screening and convertToSuspects to create the suspect
list. To include parents make sure to set includeParents=TRUE when calling convertToSuspects or first
screen for the parents and then amend the screening object with TP screening results by setting amend=TRUE
to screenSuspects. If the the suspect screening yields multiple TP hits, all will be reported. Similarly, if
the suspect screening contains multiple hits for a parent, a component is made for each of the parent hits.
In case no transformation product data is provided (TPs=NULL), the componentization algorithm simply assumes
that each feature group from fGroupsTPs is a potential TP for every parent feature group in fGroups.
For this reason, it is highly recommended to specify which feature groups are parents/TPs (see the
fGroupsTPs argument description above) and crucial that the data is post-processed, for instance by
only retaining TPs that have high annotation similarity with their parents (see the
filter method for componentsTPs).
A typical way to distinguish which feature groups are parents or TPs from two different (groups of) samples is by
calculating Fold Changes (see the as.data.table method for
feature groups and plotVolcano). Of course, other statistical techniques from R are also suitable.
During componentization, several characteristics are calculated which may be useful for
post-processing. These can be obtained with e.g.
as.data.table or componentTable. The properties
are either reported for each feature group in a component, or for each candidate of a feature group in a component
(only if TPs was set).
The following properties may be reported for each feature group:
specSimilarity: the MS/MS spectral similarity between the feature groups of the TP and its parent
(‘0-1’).
specSimilarityPrec,specSimilarityBoth: as specSimilarity, but calculated with binned
data using the "precursor" and "both" method, respectively (see MS spectral
similarity parameters for more details).
totalFragmentMatches The total number of MS/MS fragment annotations that overlap between all
feature annotation candidates for the TP feature group and the feature annotations specifically for the parent
(based on the assigned fragment formula). If both the formulas and compounds arguments are
specified then the annotation data is pooled prior to calculation. Each unique match is only counted once.
totalNeutralLossMatches As totalFragmentMatches, but counting overlapping neutral loss
formulae.
retDir,TP_retDir The retention order direction derived from the feature groups
(retDir) or the (expected) value from TP data (TP_retDir).
retDiff,mzDiff, The retention time and m/z difference between the parent and TP.
The candidate specific properties are stored inside the candidates column in component tables, and can be
obtained with as.data.table by setting candidates=TRUE. The following properties may be present:
fragmentMatches,neutralLossMatches As totalFragmentMatches and
totalNeutralLossMatches, but only considering the feature annotations specifically for this candidate.
formulaDiff The formula difference between the parent and TP (if formula data is available).
TPScore,annSim,fitFormula,fitCompound,simSusps: TP scoring properties,
see generateTPsAnnForm and generateTPsAnnComp.
In IMS workflows with post mobility assignment (see
assignMobilities) it may be necessary to call expandForIMS when
componentization was performed prior to mobility assignments, see its documentation for more details.
If mobilities were already assigned prior to componentization, then the IMS argument selects which feature
groups are subjected to componentization. Data for IMS feature groups that were not considered (i.e.
when IMS is FALSE or "maybe"), will be expanded similarly as is done by
expandForIMS.
NOTE: IMS expansion by expandForIMS only expands results for TP
candidates, i.e. no new components from parents assigned to IMS feature groups will be added.
In a sets workflow the component tables are amended with extra information such as overall/specific set spectrum similarities. As sets data is mixed, transformation products are able to be linked with a parent, even if they were not measured in the same set.
The shift parameter of specSimParams is ignored by generateComponentsTPs, since it always
calculates similarities with all supported options.
generateComponents for more details and other algorithms.
Automatically perform chemical compound annotation for feature groups.
generateCompounds( fGroups, MSPeakLists, algorithm, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), ... ) ## S4 method for signature 'featureGroups' generateCompounds( fGroups, MSPeakLists, algorithm, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), ... )generateCompounds( fGroups, MSPeakLists, algorithm, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), ... ) ## S4 method for signature 'featureGroups' generateCompounds( fGroups, MSPeakLists, algorithm, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), ... )
fGroups |
|
MSPeakLists |
A |
algorithm |
A character string describing the algorithm that should be
used: |
specSimParams |
A named |
... |
Any parameters to be passed to the selected compound generation algorithm. |
Several algorithms are provided to automatically perform compound annotation for feature groups. To this end, measured masses for all feature groups are searched within online database(s) (e.g. PubChem) to retrieve a list of potential candidate chemical compounds. Depending on the algorithm and its parameters, further scoring of candidates is then performed using, for instance, matching of measured and theoretical isotopic patterns, presence within other data sources such as patent databases and similarity of measured and in-silico predicted MS/MS fragments. Note that this process is often quite time consuming, especially for large feature group sets. Therefore, this is often one of the last steps within the workflow and not performed before feature groups have been prioritized.
generateCompounds is a generic function that will generateCompounds by one of the supported algorithms. The actual
functionality is provided by algorithm specific functions such as generateCompoundsMetFrag and generateCompoundsSIRIUS. While these
functions may be called directly, generateCompounds provides a generic interface and is therefore usually preferred.
A compounds derived object containing all compound annotations.
Each algorithm implements their own scoring system. Their names have been simplified and
harmonized where possible. The compoundScorings function can be used to get an overview of both the
algorithm specific and generic scoring names.
With a sets workflow, annotation is first performed for each set. This is important, since the annotation algorithms typically cannot work with data from mixed ionization modes. The annotation results are then combined to generate a sets consensus:
The annotation tables for each feature group from the set specific data are combined. Rows with overlapping candidates (determined by the first-block InChIKey) are merged.
Set specific data (e.g. the ionic formula) is retained by renaming their columns with set specific names.
The MS/MS fragment annotations (fragInfo column) from each set are combined.
The scorings for each set are averaged to calculate overall scores. if setAvgSpecificScores=FALSE then
scorings that are considered set specific (e.g. MS/MS and isotopic pattern match) are not averaged.
The candidates are re-ranked based on their average ranking among the set data (if a candidate is absent in a set it is assigned the poorest rank in that set).
The coverage of each candidate among sets is calculated. Depending on the setThreshold and
setThresholdAnn arguments, candidates with low abundance are removed.
The compounds output class and its methods and the algorithm specific functions:
generateCompoundsMetFrag, generateCompoundsSIRIUS, generateCompoundsLibrary
Uses a MS library loaded by loadMSLibrary for compound annotation.
generateCompoundsLibrary(fGroups, ...) ## S4 method for signature 'featureGroups' generateCompoundsLibrary( fGroups, MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), MSLibrary, minSim = 0.75, minAnnSim = minSim, absMzDev = defaultLim("mz", "narrow"), adduct = NULL, checkIons = "adduct", spectrumType = "MS2", specSimParamsLib = specSimParams, minIMSSpecSim = 0 ) ## S4 method for signature 'featureGroupsSet' generateCompoundsLibrary( fGroups, MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), MSLibrary, minSim = 0.75, minAnnSim = minSim, absMzDev = defaultLim("mz", "narrow"), adduct = NULL, ..., setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE )generateCompoundsLibrary(fGroups, ...) ## S4 method for signature 'featureGroups' generateCompoundsLibrary( fGroups, MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), MSLibrary, minSim = 0.75, minAnnSim = minSim, absMzDev = defaultLim("mz", "narrow"), adduct = NULL, checkIons = "adduct", spectrumType = "MS2", specSimParamsLib = specSimParams, minIMSSpecSim = 0 ) ## S4 method for signature 'featureGroupsSet' generateCompoundsLibrary( fGroups, MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), MSLibrary, minSim = 0.75, minAnnSim = minSim, absMzDev = defaultLim("mz", "narrow"), adduct = NULL, ..., setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE )
fGroups |
|
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
MSPeakLists |
A |
specSimParams |
A named |
MSLibrary |
The |
minSim |
The minimum spectral similarity for candidate records. |
minAnnSim |
The minimum spectral similarity of a record for it to be used to find annotations (see the
|
absMzDev |
The maximum absolute m/z deviation between the feature group and library record m/z values for candidate selection. |
adduct |
An (sets workflow) The |
checkIons |
A |
spectrumType |
A |
specSimParamsLib |
Like |
minIMSSpecSim |
(IMS workflow) If the spectrum similarity of an IMS feature group compared to its IMS precursor (see
|
setThreshold |
(sets workflow) Minimum abundance for a candidate among all sets (‘0-1’). For instance, a value of ‘1’ means that the candidate needs to be present in all the set data. |
setThresholdAnn |
(sets workflow) As |
setAvgSpecificScores |
(sets workflow) If |
This function uses MS library spectra to generate compound candidates. This function is called when calling generateCompounds with
algorithm="library".
This method matches measured MS/MS data (peak lists) with those from an MS library to find candidate structures. Hence, only feature groups with MS/MS peak list data are annotated.
The library is searched for candidates with the following criteria:
Only records with ion m/z (PrecursorMZ), SMILES, InChI, InChIKey
and formula data are considered.
Depending on the value of the checkIons argument, records with different adduct
(Precursor_type) or polarity (Ion_mode) may be ignored.
The m/z values of the candidate and feature group should match (tolerance set by absMzDev
argument).
The spectral similarity should not be lower than the value defined for the minSim argument.
If multiple candidates with the same first-block InChIKey are found then only the candidate with the best spectral match is kept.
If the library contains annotations these will be added to the matched MS/MS peaks. However, since the candidate
selected from criterion #5 above may not contain all the annotation data available from the MS library, annotations
from other records are also considered (controlled by the minAnnSim argument). If this leads to different
annotations for the same mass peak then only the most abundant annotation is kept.
The score, libMatch and annSim output columns are all equal and resemble the spectral
similarity between the experimental and library spectra.
generateCompounds for more details and other algorithms.
loadMSLibrary to obtain MS library data and the methods for MSLibrary to treat
the data before using it for annotation.
Uses the metfRag package or MetFrag CL for compound identification (see
http://ipb-halle.github.io/MetFrag/).
generateCompoundsMetFrag(fGroups, ...) ## S4 method for signature 'featureGroups' generateCompoundsMetFrag( fGroups, MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), method = "CL", timeout = 300, timeoutRetries = 2, errorRetries = 2, topMost = 100, dbRelMzDev = defaultLim("mz", "narrow_rel"), fragRelMzDev = defaultLim("mz", "narrow_rel"), fragAbsMzDev = defaultLim("mz", "narrow"), adduct = NULL, database = "pubchemlite", extendedPubChem = "auto", chemSpiderToken = "", scoreTypes = compoundScorings("metfrag", database, onlyDefault = TRUE)$name, scoreWeights = 1, preProcessingFilters = c("UnconnectedCompoundFilter", "IsotopeFilter"), postProcessingFilters = c("InChIKeyFilter"), maxCandidatesToStop = 2500, identifiers = NULL, extraOpts = NULL, minIMSSpecSim = 0 ) ## S4 method for signature 'featureGroupsSet' generateCompoundsMetFrag( fGroups, MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), method = "CL", timeout = 300, timeoutRetries = 2, errorRetries = 2, topMost = 100, dbRelMzDev = defaultLim("mz", "narrow_rel"), fragRelMzDev = defaultLim("mz", "narrow_rel"), fragAbsMzDev = defaultLim("mz", "narrow"), adduct = NULL, ..., setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE )generateCompoundsMetFrag(fGroups, ...) ## S4 method for signature 'featureGroups' generateCompoundsMetFrag( fGroups, MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), method = "CL", timeout = 300, timeoutRetries = 2, errorRetries = 2, topMost = 100, dbRelMzDev = defaultLim("mz", "narrow_rel"), fragRelMzDev = defaultLim("mz", "narrow_rel"), fragAbsMzDev = defaultLim("mz", "narrow"), adduct = NULL, database = "pubchemlite", extendedPubChem = "auto", chemSpiderToken = "", scoreTypes = compoundScorings("metfrag", database, onlyDefault = TRUE)$name, scoreWeights = 1, preProcessingFilters = c("UnconnectedCompoundFilter", "IsotopeFilter"), postProcessingFilters = c("InChIKeyFilter"), maxCandidatesToStop = 2500, identifiers = NULL, extraOpts = NULL, minIMSSpecSim = 0 ) ## S4 method for signature 'featureGroupsSet' generateCompoundsMetFrag( fGroups, MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), method = "CL", timeout = 300, timeoutRetries = 2, errorRetries = 2, topMost = 100, dbRelMzDev = defaultLim("mz", "narrow_rel"), fragRelMzDev = defaultLim("mz", "narrow_rel"), fragAbsMzDev = defaultLim("mz", "narrow"), adduct = NULL, ..., setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE )
fGroups |
|
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
MSPeakLists |
A |
specSimParams |
A named |
method |
Which method should be used for MetFrag execution: |
timeout |
Maximum time (in seconds) before a metFrag query for a feature group is stopped. Also see
|
timeoutRetries |
Maximum number of retries after reaching a timeout before completely skipping the metFrag query
for a feature group. Also see |
errorRetries |
Maximum number of retries after an error occurred. This may be useful to handle e.g. connection errors. |
topMost |
Only keep this number of candidates (per feature group) with highest score. Set to |
dbRelMzDev |
Relative mass deviation (in ppm) for database search. Sets the DatabaseSearchRelativeMassDeviation option. |
fragRelMzDev |
Relative mass deviation (in ppm) for fragment matching. Sets the FragmentPeakMatchRelativeMassDeviation option. |
fragAbsMzDev |
Absolute mass deviation (in Da) for fragment matching. Sets the FragmentPeakMatchAbsoluteMassDeviation option. |
adduct |
An (sets workflow) The |
database |
Compound database to use. Valid values are: |
extendedPubChem |
If |
chemSpiderToken |
A character string with the ChemSpider security token that should be set when the ChemSpider database is used. Sets the ChemSpiderToken option. |
scoreTypes |
A character vector defining the scoring types. See the |
scoreWeights |
Numeric vector containing weights of the used scoring types. Order is the same as set in
|
preProcessingFilters, postProcessingFilters
|
A character vector defining pre/post filters applied before/after
fragmentation and scoring (e.g. |
maxCandidatesToStop |
If more than this number of candidate structures are found then processing will be aborted and no results this feature group will be reported. Low values increase the chance of missing data, whereas too high values will use too much computer resources and significantly slowdown the process. Sets the MaxCandidateLimitToStop option. |
identifiers |
A |
extraOpts |
A named |
minIMSSpecSim |
(IMS workflow) If the spectrum similarity of an IMS feature group compared to its IMS precursor (see
|
setThreshold |
(sets workflow) Minimum abundance for a candidate among all sets (‘0-1’). For instance, a value of ‘1’ means that the candidate needs to be present in all the set data. |
setThresholdAnn |
(sets workflow) As |
setAvgSpecificScores |
(sets workflow) If |
This function uses MetFrag to generate compound candidates. This function is called when calling generateCompounds with
algorithm="metfrag".
Several online compound databases such as PubChem and
ChemSpider may be chosen for retrieval of candidate structures. This method
requires the availability of MS/MS data, and feature groups without it will be ignored. Many options exist to score
and filter resulting data, and it is highly suggested to optimize these to improve results. The MetFrag
options PeakList, IonizedPrecursorMass and ExperimentalRetentionTimeValue (in minutes) fields
are automatically set from feature data.
generateCompoundsMetFrag returns a compoundsMF object.
MetFrag supports many different scorings to rank candidates. The
compoundScorings function can be used to get an overview: (some columns are omitted)
| name | metfrag | database |
| score | Score | |
| fragScore | FragmenterScore | |
| metFusionScore | OfflineMetFusionScore | |
| individualMoNAScore | OfflineIndividualMoNAScore | |
| numberPatents | PubChemNumberPatents | pubchem |
| numberPatents | Patent_Count | pubchemlite |
| pubMedReferences | PubChemNumberPubMedReferences | pubchem |
| pubMedReferences | ChemSpiderNumberPubMedReferences | chemspider |
| pubMedReferences | NUMBER_OF_PUBMED_ARTICLES | comptox |
| pubMedReferences | PubMed_Count | pubchemlite |
| extReferenceCount | ChemSpiderNumberExternalReferences | chemspider |
| dataSourceCount | ChemSpiderDataSourceCount | chemspider |
| referenceCount | ChemSpiderReferenceCount | chemspider |
| RSCCount | ChemSpiderRSCCount | chemspider |
| smartsInclusionScore | SmartsSubstructureInclusionScore | |
| smartsExclusionScore | SmartsSubstructureExclusionScore | |
| suspectListScore | SuspectListScore | |
| retentionTimeScore | RetentionTimeScore | |
| CPDATCount | CPDAT_COUNT | comptox |
| TOXCASTActive | TOXCAST_PERCENT_ACTIVE | comptox |
| dataSources | DATA_SOURCES | comptox |
| pubChemDataSources | PUBCHEM_DATA_SOURCES | comptox |
| EXPOCASTPredExpo | EXPOCAST_MEDIAN_EXPOSURE_PREDICTION_MG/KG-BW/DAY | comptox |
| ECOTOX | ECOTOX | comptox |
| NORMANSUSDAT | NORMANSUSDAT | comptox |
| MASSBANKEU | MASSBANKEU | comptox |
| TOX21SL | TOX21SL | comptox |
| TOXCAST | TOXCAST | comptox |
| KEMIMARKET | KEMIMARKET | comptox |
| MZCLOUD | MZCLOUD | comptox |
| pubMedNeuro | PubMedNeuro | comptox |
| CIGARETTES | CIGARETTES | comptox |
| INDOORCT16 | INDOORCT16 | comptox |
| SRM2585DUST | SRM2585DUST | comptox |
| SLTCHEMDB | SLTCHEMDB | comptox |
| THSMOKE | THSMOKE | comptox |
| ITNANTIBIOTIC | ITNANTIBIOTIC | comptox |
| STOFFIDENT | STOFFIDENT | comptox |
| KEMIMARKET_EXPO | KEMIMARKET_EXPO | comptox |
| KEMIMARKET_HAZ | KEMIMARKET_HAZ | comptox |
| REACH2017 | REACH2017 | comptox |
| KEMIWW_WDUIndex | KEMIWW_WDUIndex | comptox |
| KEMIWW_StpSE | KEMIWW_StpSE | comptox |
| KEMIWW_SEHitsOverDL | KEMIWW_SEHitsOverDL | comptox |
| ZINC15PHARMA | ZINC15PHARMA | comptox |
| PFASMASTER | PFASMASTER | comptox |
| peakFingerprintScore | AutomatedPeakFingerprintAnnotationScore | |
| lossFingerprintScore | AutomatedLossFingerprintAnnotationScore | |
| agroChemInfo | AgroChemInfo | pubchemlite |
| bioPathway | BioPathway | pubchemlite |
| drugMedicInfo | DrugMedicInfo | pubchemlite |
| foodRelated | FoodRelated | pubchemlite |
| pharmacoInfo | PharmacoInfo | pubchemlite |
| safetyInfo | SafetyInfo | pubchemlite |
| toxicityInfo | ToxicityInfo | pubchemlite |
| knownUse | KnownUse | pubchemlite |
| disorderDisease | DisorderDisease | pubchemlite |
| identification | Identification | pubchemlite |
| annoTypeCount | AnnoTypeCount | pubchemlite |
| annotHitCount | AnnotHitCount | pubchemlite |
In addition, the compoundScorings function is also useful to programmatically
generate a set of scorings to be used for ranking with MetFrag. For instance, the following can be given
to the scoreTypes argument to use all default scorings for PubChem: compoundScorings("metfrag",
"pubchem", onlyDefault=TRUE)$name.
For all MetFrag scoring types refer to the Candidate Scores section on the
MetFragR homepage.
When database="chemspider" setting the chemSpiderToken argument is
mandatory.
If a local database is chosen via sdf, psv, or csv then its file location should be set with
the LocalDatabasePath value via the extraOpts argument. For example: extraOpts =
list(LocalDatabasePath = "C:/myDB.csv").
If database="pubchemlite" or database="comptox" and patRoonExt is not installed then the
file location must be specified as above or by setting the
patRoon.path.MetFragPubChemLite/patRoon.path.MetFragCompTox option. See the installation section in
the handbook for more details.
If database="pubchemlite" and the local file has CCS predictions (see
https://zenodo.org/records/15311000), then CCS values will be copied from the corresponding adduct
of the feature groups (or as specified by the adduct argument). These can be converted to mobilities with
assignMobilities and used for candidate filtering with
filter.
generateCompoundsMetFrag uses multiprocessing to parallelize computations. Please see the parallelization section in the handbook for more details and patRoon options for configuration options.
When local database files are used with generateCompoundsMetFrag (e.g. when
database is set to "pubchemlite", "csv" etc.) and patRoon.MP.method="future", then
the database file must be present on all the nodes. When pubchemlite or comptox is used, the location
for these databases can be configured on the host with the respective package options
(patRoon.path.MetFragPubChemLite and patRoon.path.MetFragCompTox) or made available by installing
the patRoonExt package. Note that these files must also be present on the local host computer, even if
it is not participating in computations.
If the compound database is not local, e.g. database="pubchem", then parallelization is
disabled to avoid connection errors that typically occur otherwise.
Ruttkies C, Schymanski EL, Wolf S, Hollender J, Neumann S (2016).
“MetFrag relaunched: incorporating strategies beyond in silico fragmentation.”
Journal of Cheminformatics, 8(1).
doi:10.1186/s13321-016-0115-9.
Schymanski EL, Kondić T, Neumann S, Thiessen PA, Zhang J, Bolton EE (2021).
“Empowering large chemical knowledge bases for exposomics: PubChemLite meets MetFrag.”
Journal of Cheminformatics, 13(1).
ISSN 1758-2946.
doi:10.1186/s13321-021-00489-0.
http://dx.doi.org/10.1186/s13321-021-00489-0.
Elapavalore A, Ross DH, Grouès V, Aurich D, Krinsky AM, Kim S, Thiessen PA, Zhang J, Dodds JN, Baker ES, Bolton EE, Xu L, Schymanski EL (2025).
“PubChemLite Plus Collision Cross Section (CCS) Values for Enhanced Interpretation of Nontarget Environmental Data.”
Environmental Science & Technology Letters, 12(2), 166–174.
ISSN 2328-8930.
doi:10.1021/acs.estlett.4c01003.
http://dx.doi.org/10.1021/acs.estlett.4c01003.
Ross DH, Cho JH, Xu L (2020).
“Breaking Down Structural Diversity for Comprehensive Prediction of Ion-Neutral Collision Cross Sections.”
Analytical Chemistry, 92(6), 4548–4557.
ISSN 1520-6882.
doi:10.1021/acs.analchem.9b05772.
http://dx.doi.org/10.1021/acs.analchem.9b05772.
generateCompounds for more details and other algorithms.
Uses SIRIUS in combination with CSI:FingerID for compound annotation.
generateCompoundsSIRIUS(fGroups, ...) ## S4 method for signature 'featureGroups' generateCompoundsSIRIUS( fGroups, MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), relMzDev = defaultLim("mz", "narrow_rel"), adduct = NULL, projectPath = NULL, elements = "CHNOP", profile = "qtof", formulaDatabase = NULL, fingerIDDatabase = "pubchem", noise = NULL, cores = NULL, topMost = 100, topMostFormulas = 5, login = "check", alwaysLogin = FALSE, extraOptsGeneral = NULL, extraOptsFormula = NULL, minIMSSpecSim = 0, verbose = TRUE, splitBatches = FALSE, dryRun = FALSE ) ## S4 method for signature 'featureGroupsSet' generateCompoundsSIRIUS( fGroups, MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), relMzDev = defaultLim("mz", "narrow_rel"), adduct = NULL, projectPath = NULL, ..., setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE )generateCompoundsSIRIUS(fGroups, ...) ## S4 method for signature 'featureGroups' generateCompoundsSIRIUS( fGroups, MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), relMzDev = defaultLim("mz", "narrow_rel"), adduct = NULL, projectPath = NULL, elements = "CHNOP", profile = "qtof", formulaDatabase = NULL, fingerIDDatabase = "pubchem", noise = NULL, cores = NULL, topMost = 100, topMostFormulas = 5, login = "check", alwaysLogin = FALSE, extraOptsGeneral = NULL, extraOptsFormula = NULL, minIMSSpecSim = 0, verbose = TRUE, splitBatches = FALSE, dryRun = FALSE ) ## S4 method for signature 'featureGroupsSet' generateCompoundsSIRIUS( fGroups, MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), relMzDev = defaultLim("mz", "narrow_rel"), adduct = NULL, projectPath = NULL, ..., setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE )
fGroups |
|
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
MSPeakLists |
A |
specSimParams |
A named |
relMzDev |
Maximum relative deviation between the measured and candidate formula m/z values (in ppm). Sets the --ppm-max command line option. |
adduct |
An (sets workflow) The |
projectPath, dryRun
|
These are mainly for internal purposes. (sets workflow) |
elements |
Elements to be considered for formulae calculation. This will heavily affects the number of
candidates! Always try to work with a minimal set by excluding elements you don't expect. The minimum/maximum
number of elements can also be specified, for example: a value of |
profile |
Name of the configuration profile, for example: "qtof", "orbitrap", "fticr". Sets the --profile commandline option. |
formulaDatabase |
If not |
fingerIDDatabase |
Database specifically used for |
noise |
Median intensity of the noise ( |
cores |
The number of cores |
topMost |
Only keep this number of candidates (per feature group) with highest score. Set to |
topMostFormulas |
Do not return more than this number of candidate formulae. Note that only compounds for these formulae will be searched. Sets the --candidates commandline option. |
login, alwaysLogin
|
Specifies if and how account logging of SIRIUS should be handled:
if See the SIRIUS website and patRoon handbook for more information. |
extraOptsGeneral, extraOptsFormula
|
a |
minIMSSpecSim |
(IMS workflow) If the spectrum similarity of an IMS feature group compared to its IMS precursor (see
|
verbose |
If |
splitBatches |
If |
setThreshold |
(sets workflow) Minimum abundance for a candidate among all sets (‘0-1’). For instance, a value of ‘1’ means that the candidate needs to be present in all the set data. |
setThresholdAnn |
(sets workflow) As |
setAvgSpecificScores |
(sets workflow) If |
This function uses SIRIUS to generate compound candidates. This function is called when calling generateCompounds with
algorithm="sirius".
Similar to generateFormulasSIRIUS, candidate formulae are generated with SIRIUS. These results
are then fed to CSI:FingerID to acquire candidate structures. Candidate formulae without any assigned structure
will be removed (unlike generateFormulasSIRIUS). This method requires the availability of MS/MS data,
and feature groups without it will be ignored.
A compoundsSIRIUS object.
generateCompoundsSIRIUS uses multiprocessing to parallelize computations. Please see the parallelization section in the handbook for more details and patRoon options for configuration options.
For annotations performed with SIRIUS it is often the fastest to keep the default
splitBatches=FALSE. In this case, all SIRIUS output will be printed to the terminal (unless
verbose=FALSE or patRoon.MP.method="future"). Furthermore, please note that only annotations to be
performed for the same adduct are grouped in a single batch execution.
Duhrkop K, Fleischauer M, Ludwig M, Aksenov AA, Melnik AV, Meusel M, Dorrestein PC, Rousu J, Bocker S (2019).
“SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information.”
Nature Methods, 16(4), 299–302.
doi:10.1038/s41592-019-0344-8.
Duhrkop K, Bocker S (2015).
“Fragmentation Trees Reloaded.”
In Przytycka TM (ed.), Research in Computational Molecular Biology, 65–79.
ISBN 978-3-319-16706-0.
Duhrkop K, Shen H, Meusel M, Rousu J, Bocker S (2015).
“Searching molecular structure databases with tandem mass spectra using CSI:FingerID.”
Proceedings of the National Academy of Sciences, 112(41), 12580–12585.
doi:10.1073/pnas.1509788112.
Bocker S, Letzel MC, Liptak Z, Pervukhin A (2008).
“SIRIUS: decomposing isotope patterns for metabolite identification.”
Bioinformatics, 25(2), 218–224.
doi:10.1093/bioinformatics/btn603.
generateCompounds for more details and other algorithms.
Automatically calculate chemical formulae for all feature groups.
generateFormulas( fGroups, MSPeakLists, algorithm, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), ... ) ## S4 method for signature 'featureGroups' generateFormulas( fGroups, MSPeakLists, algorithm, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), ... )generateFormulas( fGroups, MSPeakLists, algorithm, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), ... ) ## S4 method for signature 'featureGroups' generateFormulas( fGroups, MSPeakLists, algorithm, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), ... )
fGroups |
|
MSPeakLists |
An |
algorithm |
A character string describing the algorithm that should be
used: |
specSimParams |
A named |
... |
Any parameters to be passed to the selected formula generation algorithm. |
Several algorithms are provided to automatically generate formulae for given feature groups. All algorithms use the accurate mass of a feature to back-calculate candidate formulae. Depending on the algorithm and data availability, other data such as isotopic pattern and MS/MS fragments may be used to further improve formula assignment and ranking.
generateFormulas is a generic function that will generateFormulas by one of the supported algorithms. The actual
functionality is provided by algorithm specific functions such as generateFormulasDA and generateFormulasGenForm. While these
functions may be called directly, generateFormulas provides a generic interface and is therefore usually preferred.
A formulas object containing all generated formulae.
Formula candidate assignment occurs in one of the following ways:
Candidates are first generated for each feature and then pooled to form consensus candidates for the feature group.
Candidates are directly generated for each feature group by group averaged MS peak list data.
With approach (1), scorings and mass errors are averaged and outliers are removed (controlled by
featThreshold and featThresholdAnn arguments). Other candidate properties that cannot be averaged are
from the feature from the analysis as specified in the "analysis" column of the results. The second approach only generates candidate formulae once for every feature group, and is therefore generally much
faster. However, this inherently prevents removal of outliers.
Note that with either approach subsequent workflow steps that use formula data (e.g.
addFormulaScoring and reporting functions) only use formula data that was eventually assigned
to feature groups.
Each algorithm implements their own scoring system. Their names have been harmonized where
possible. An overview is obtained with the formulaScorings function:
| name | genform | sirius | bruker | description |
| combMatch | comb_match | - | - | MS and MS/MS combined match value |
| isoScore | MS_match | isoScore | - | How well the isotopic pattern matches |
| mSigma | - | - | mSigma | Deviation of the isotopic pattern |
| MSMSScore | MSMS_match | treeScore | - | How well MS/MS data matches |
| score | - | score | Score | Overall MS formula score |
With a sets workflow, annotation is first performed for each set. This is important, since the annotation algorithms typically cannot work with data from mixed ionization modes. The annotation results are then combined to generate a sets consensus:
The annotation tables for each feature group from the set specific data are combined. Rows with overlapping candidates (determined by the neutral formula) are merged.
Set specific data (e.g. the ionic formula) is retained by renaming their columns with set specific names.
The MS/MS fragment annotations (fragInfo column) from each set are combined.
The scorings for each set are averaged to calculate overall scores. if setAvgSpecificScores=FALSE then
scorings that are considered set specific (e.g. MS/MS and isotopic pattern match) are not averaged.
The candidates are re-ranked based on their average ranking among the set data (if a candidate is absent in a set it is assigned the poorest rank in that set).
The coverage of each candidate among sets is calculated. Depending on the setThreshold and
setThresholdAnn arguments, candidates with low abundance are removed.
The formulas output class and its methods and the algorithm specific functions:
generateFormulasDA, generateFormulasGenForm, generateFormulasSIRIUS
The GenForm manual (also known as MOLGEN-MSMS).
Uses GenForm to generate chemical formula candidates.
generateFormulasGenForm(fGroups, ...) ## S4 method for signature 'featureGroups' generateFormulasGenForm( fGroups, MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), relMzDev = defaultLim("mz", "narrow_rel"), adduct = NULL, elements = "CHNOP", hetero = TRUE, oc = FALSE, thrMS = NULL, thrMSMS = NULL, thrComb = NULL, maxCandidates = Inf, extraOpts = NULL, calculateFeatures = FALSE, featThreshold = 0, featThresholdAnn = 0.75, absAlignMzDev = defaultLim("mz", "narrow"), MSMode = "both", isolatePrec = TRUE, minIMSSpecSim = 0, timeout = 120, topMost = 50, batchSize = 8 ) ## S4 method for signature 'featureGroupsSet' generateFormulasGenForm( fGroups, MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), relMzDev = defaultLim("mz", "narrow_rel"), adduct = NULL, ..., setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE )generateFormulasGenForm(fGroups, ...) ## S4 method for signature 'featureGroups' generateFormulasGenForm( fGroups, MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), relMzDev = defaultLim("mz", "narrow_rel"), adduct = NULL, elements = "CHNOP", hetero = TRUE, oc = FALSE, thrMS = NULL, thrMSMS = NULL, thrComb = NULL, maxCandidates = Inf, extraOpts = NULL, calculateFeatures = FALSE, featThreshold = 0, featThresholdAnn = 0.75, absAlignMzDev = defaultLim("mz", "narrow"), MSMode = "both", isolatePrec = TRUE, minIMSSpecSim = 0, timeout = 120, topMost = 50, batchSize = 8 ) ## S4 method for signature 'featureGroupsSet' generateFormulasGenForm( fGroups, MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), relMzDev = defaultLim("mz", "narrow_rel"), adduct = NULL, ..., setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE )
fGroups |
|
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
MSPeakLists |
An |
specSimParams |
A named |
relMzDev |
Maximum relative deviation between the measured and candidate formula m/z values (in ppm). Sets the ppm command line option. |
adduct |
An (sets workflow) The |
elements |
Elements to be considered for formulae calculation. This will heavily affects the number of candidates! Always try to work with a minimal set by excluding elements you don't expect. Sets the el command line option. |
hetero |
Only consider formulae with at least one hetero atom. Sets the het commandline option. |
oc |
Only consider organic formulae (i.e. with at least one carbon atom). Sets the oc commandline option. |
thrMS, thrMSMS, thrComb
|
Sets the thresholds for the |
maxCandidates |
If this number of candidates are found then |
extraOpts |
An optional character vector with any other command line options that will be passed to
|
calculateFeatures |
If |
featThreshold |
If |
featThresholdAnn |
As |
absAlignMzDev |
When the group formula annotation consensus is made from feature annotations, the m/z
values of annotated MS/MS fragments may slightly deviate from those of the corresponding group MS/MS peak list. The
|
MSMode |
Whether formulae should be generated only from MS data ( |
isolatePrec |
Settings used for isolation of precursor mass peaks and their isotopes. This isolation is highly
important for accurate isotope scoring of candidates, as non-relevant mass peaks will dramatically decrease the
score. The value of |
minIMSSpecSim |
(IMS workflow) If the spectrum similarity of an IMS feature group compared to its IMS precursor (see
This argument does not affect the annotation results for MS-only formulae. |
timeout |
Maximum time (in seconds) that a |
topMost |
Only keep this number of candidates (per feature group) with highest score. |
batchSize |
Maximum number of |
setThreshold |
(sets workflow) Minimum abundance for a candidate among all sets (‘0-1’). For instance, a value of ‘1’ means that the candidate needs to be present in all the set data. |
setThresholdAnn |
(sets workflow) As |
setAvgSpecificScores |
(sets workflow) If |
This function uses genform to generate formula candidates. This function is called when calling generateFormulas with
algorithm="genform".
When MS/MS data is available it will be used to score candidate formulae by presence of 'fitting' fragments.
A formulas object containing all generated formulae.
Below is a list of options (generated by running GenForm without commandline
options) which can be set by the extraOpts parameter.
Formula calculation from MS and MS/MS data as described in
Meringer et al (2011) MATCH Commun Math Comput Chem 65: 259-290
Usage: GenForm ms=<filename> [msms=<filename>] [out=<filename>]
[exist[=mv]] [m=<number>] [ion=-e|+e|-H|+H|+Na] [cha=<number>]
[ppm=<number>] [msmv=ndp|nsse|nsae] [acc=<number>] [rej=<number>]
[thms=<number>] [thmsms=<number>] [thcomb=<number>]
[sort[=ppm|msmv|msmsmv|combmv]] [el=<elements> [oc]] [ff=<fuzzy formula>]
[vsp[=<even|odd>]] [vsm2mv[=<value>]] [vsm2ap2[=<value>]] [hcf] [kfer[=ex]]
[wm[=lin|sqrt|log]] [wi[=lin|sqrt|log]] [exp=<number>] [oei]
[dbeexc=<number>] [ivsm2mv=<number>] [vsm2ap2=<number>]
[oms[=<filename>]] [omsms[=<filename>]] [oclean[=<filename>]]
[analyze [loss] [intens]] [dbe] [cm] [pc] [sc] [max]
Explanation:
ms : filename of MS data (*.txt)
msms : filename of MS/MS data (*.txt)
out : output generated formulas
exist : allow only molecular formulas for that at least one
structural formula exists;overrides vsp, vsm2mv, vsm2ap2;
argument mv enables multiple valencies for P and S
m : experimental molecular mass (default: mass of MS basepeak)
ion : type of ion measured (default: M+H)
ppm : accuracy of measurement in parts per million (default: 5)
msmv : MS match value based on normalized dot product, normalized
sum of squared or absolute errors (default: nsae)
acc : allowed deviation for full acceptance of MS/MS peak in ppm
(default: 2)
rej : allowed deviation for total rejection of MS/MS peak in ppm
(default: 4)
thms : threshold for the MS match value
thmsms : threshold for the MS/MS match value
thcomb : threshold for the combined match value
sort : sort generated formulas according to mass deviation in ppm,
MS match value, MS/MS match value or combined match value
el : used chemical elements (default: CHBrClFINOPSSi)
oc : only organic compounds, i.e. with at least one C atom
ff : overwrites el and oc and uses fuzzy formula for limits of
element multiplicities
het : formulas must have at least one hetero atom
vsp : valency sum parity (even for graphical formulas)
vsm2mv : lower bound for valency sum - 2 * maximum valency
(>=0 for graphical formulas)
vsm2ap2 : lower bound for valency sum - 2 * number of atoms + 2
(>=0 for graphical connected formulas)
hcf : apply Heuerding-Clerc filter
kfer : apply Kind-Fiehn element ratio (extended) ranges
wm : m/z weighting for MS/MS match value
wi : intensity weighting for MS/MS match value
exp : exponent used, when wi is set to log
oei : allow odd electron ions for explaining MS/MS peaks
dbeexc : excess of double bond equivalent for ions
ivsm2mv : lower bound for valency sum - 2 * maximum valency
for fragment ions
ivsm2ap2: lower bound for valency sum - 2 * number of atoms + 2
for fragment ions
oms : write scaled MS peaks to output
omsms : write weighted MS/MS peaks to output
oclean : write explained MS/MS peaks to output
analyze : write explanations for MS/MS peaks to output
loss : for analyzing MS/MS peaks write losses instead of fragments
intens : write intensities of MS/MS peaks to output
dbe : write double bond equivalents to output
cm : write calculated ion masses to output
pc : output match values in percent
sc : strip calculated isotope distributions
noref : hide the reference information
max : maximum number of final candidates (0 is no limit)
generateFormulasGenForm uses multiprocessing to parallelize computations. Please see the parallelization section in the handbook for more details and patRoon options for configuration options.
When futures are used for parallel processing (patRoon.MP.method="future"),
calculations with GenForm are done with batch mode disabled (see batchSize argument), which
generally limit overall performance.
This function always sets the exist and oei GenForm command line options.
Formula calculation with GenForm may produce an excessive number of candidates for high m/z values
(e.g. above 600) and/or many elemental combinations (set by elements). In this scenario formula
calculation may need a very long time. Timeouts are used to avoid excessive computational times by terminating long
running commands (set by the timeout argument).
Meringer M, Reinker S, Zhang J, Muller A (2011). “MS/MS Data Improves Automated Determination of Molecular Formulas by Mass Spectrometry.” MATCH Commun. Math. Comput. Chem., 65(2), 259–290.
generateFormulas for more details and other algorithms.
Uses SIRIUS to generate chemical formulae candidates.
generateFormulasSIRIUS(fGroups, ...) ## S4 method for signature 'featureGroups' generateFormulasSIRIUS( fGroups, MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), relMzDev = defaultLim("mz", "narrow_rel"), adduct = NULL, projectPath = NULL, elements = "CHNOP", profile = "qtof", database = NULL, noise = NULL, cores = NULL, getFingerprints = FALSE, topMost = 100, login = FALSE, alwaysLogin = FALSE, extraOptsGeneral = NULL, extraOptsFormula = NULL, calculateFeatures = FALSE, featThreshold = 0, featThresholdAnn = 0.75, absAlignMzDev = defaultLim("mz", "narrow"), minIMSSpecSim = 0, verbose = TRUE, splitBatches = FALSE, dryRun = FALSE ) ## S4 method for signature 'featureGroupsSet' generateFormulasSIRIUS( fGroups, MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), relMzDev = defaultLim("mz", "narrow_rel"), adduct = NULL, projectPath = NULL, ..., setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE )generateFormulasSIRIUS(fGroups, ...) ## S4 method for signature 'featureGroups' generateFormulasSIRIUS( fGroups, MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), relMzDev = defaultLim("mz", "narrow_rel"), adduct = NULL, projectPath = NULL, elements = "CHNOP", profile = "qtof", database = NULL, noise = NULL, cores = NULL, getFingerprints = FALSE, topMost = 100, login = FALSE, alwaysLogin = FALSE, extraOptsGeneral = NULL, extraOptsFormula = NULL, calculateFeatures = FALSE, featThreshold = 0, featThresholdAnn = 0.75, absAlignMzDev = defaultLim("mz", "narrow"), minIMSSpecSim = 0, verbose = TRUE, splitBatches = FALSE, dryRun = FALSE ) ## S4 method for signature 'featureGroupsSet' generateFormulasSIRIUS( fGroups, MSPeakLists, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), relMzDev = defaultLim("mz", "narrow_rel"), adduct = NULL, projectPath = NULL, ..., setThreshold = 0, setThresholdAnn = 0, setAvgSpecificScores = FALSE )
fGroups |
|
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
MSPeakLists |
An |
specSimParams |
A named |
relMzDev |
Maximum relative deviation between the measured and candidate formula m/z values (in ppm). Sets the --ppm-max command line option. |
adduct |
An (sets workflow) The |
projectPath, dryRun
|
These are mainly for internal purposes. (sets workflow) |
elements |
Elements to be considered for formulae calculation. This will heavily affects the number of
candidates! Always try to work with a minimal set by excluding elements you don't expect. The minimum/maximum
number of elements can also be specified, for example: a value of |
profile |
Name of the configuration profile, for example: "qtof", "orbitrap", "fticr". Sets the --profile commandline option. |
database |
If not |
noise |
Median intensity of the noise ( |
cores |
The number of cores |
getFingerprints |
Set to |
topMost |
Only keep this number of candidates (per feature group) with highest score. Sets the --candidates command line option. |
login, alwaysLogin
|
Specifies if and how account logging of SIRIUS should be handled:
if See the SIRIUS website and patRoon handbook for more information. |
extraOptsGeneral, extraOptsFormula
|
a |
calculateFeatures |
If |
featThreshold |
If |
featThresholdAnn |
As |
absAlignMzDev |
When the group formula annotation consensus is made from feature annotations, the m/z
values of annotated MS/MS fragments may slightly deviate from those of the corresponding group MS/MS peak list. The
|
minIMSSpecSim |
(IMS workflow) If the spectrum similarity of an IMS feature group compared to its IMS precursor (see
|
verbose |
If |
splitBatches |
If |
setThreshold |
(sets workflow) Minimum abundance for a candidate among all sets (‘0-1’). For instance, a value of ‘1’ means that the candidate needs to be present in all the set data. |
setThresholdAnn |
(sets workflow) As |
setAvgSpecificScores |
(sets workflow) If |
This function uses sirius to generate formula candidates. This function is called when calling generateFormulas with
algorithm="sirius".
Similarity of measured and theoretical isotopic patterns will be used for scoring candidates. Note that
SIRIUS requires availability of MS/MS data.
A formulasSIRIUS object.
generateFormulasSIRIUS uses multiprocessing to parallelize computations. Please see the parallelization section in the handbook for more details and patRoon options for configuration options.
For annotations performed with SIRIUS it is often the fastest to keep the default
splitBatches=FALSE. In this case, all SIRIUS output will be printed to the terminal (unless
verbose=FALSE or patRoon.MP.method="future"). Furthermore, please note that only annotations to be
performed for the same adduct are grouped in a single batch execution.
Duhrkop K, Fleischauer M, Ludwig M, Aksenov AA, Melnik AV, Meusel M, Dorrestein PC, Rousu J, Bocker S (2019).
“SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information.”
Nature Methods, 16(4), 299–302.
doi:10.1038/s41592-019-0344-8.
Duhrkop K, Bocker S (2015).
“Fragmentation Trees Reloaded.”
In Przytycka TM (ed.), Research in Computational Molecular Biology, 65–79.
ISBN 978-3-319-16706-0.
Duhrkop K, Shen H, Meusel M, Rousu J, Bocker S (2015).
“Searching molecular structure databases with tandem mass spectra using CSI:FingerID.”
Proceedings of the National Academy of Sciences, 112(41), 12580–12585.
doi:10.1073/pnas.1509788112.
Bocker S, Letzel MC, Liptak Z, Pervukhin A (2008).
“SIRIUS: decomposing isotope patterns for metabolite identification.”
Bioinformatics, 25(2), 218–224.
doi:10.1093/bioinformatics/btn603.
generateFormulas for more details and other algorithms.
Functionality to convert MS and MS/MS spectra into MS peak lists.
generateMSPeakLists(fGroups, ...) ## S4 method for signature 'featureGroups' generateMSPeakLists( fGroups, maxMSRTWindow = defaultLim("retention", "narrow"), fixedIsolationWidth = FALSE, topMost = NULL, avgFeatParams = getDefAvgPListParams(), avgFGroupParams = getDefAvgPListParams() ) ## S4 method for signature 'featureGroupsSet' generateMSPeakLists(fGroups, ...)generateMSPeakLists(fGroups, ...) ## S4 method for signature 'featureGroups' generateMSPeakLists( fGroups, maxMSRTWindow = defaultLim("retention", "narrow"), fixedIsolationWidth = FALSE, topMost = NULL, avgFeatParams = getDefAvgPListParams(), avgFGroupParams = getDefAvgPListParams() ) ## S4 method for signature 'featureGroupsSet' generateMSPeakLists(fGroups, ...)
fGroups |
The |
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
maxMSRTWindow |
Maximum chromatographic peak window used for spectrum averaging (in seconds, +/- retention
time). If |
fixedIsolationWidth |
Configures how MS/MS spectra are selected for a feature:
If no isolation was applied to record MS/MS data (e.g. data-independent MS/MS), then all MS/MS spectra will be always be selected. NOTE: Sometimes the isolation windows are not exported and cannot be deduced automatically (e.g. Agilent
data). In that case, |
topMost |
Only extract MS peak lists from a maximum of |
avgFeatParams |
Parameters used for averaging MS peak lists of individual features. Analogous to
|
avgFGroupParams |
A |
Data processing steps that use mass spectral data (e.g. formula generation and
compound generation) typically use 'MS peak lists', which are tables storing the
m/z, intensity and other data from the raw mass spectra. The generateMSPeakLists function generates MS
and MS/MS peak lists first for all features (or a subset, if the topMost argument is set). During this step
multiple spectra over the feature elution profile are averaged. Subsequently, peak lists will be generated for each
feature group by averaging peak lists of the features within the group. The data processing steps that uses peak
lists will either use the data from individual features or from group averaged peak lists. For instance, the former
may be used by formulae calculation, while compound identification and plotting functionality typically uses group
averaged peak lists.
An MSPeakLists object.
Prior to patRoon 3.0 the generateMSPeakLists function was a wrapper to
the now deprecated functions generateMSPeakListsMzR, generateMSPeakListsDA and
generateMSPeakListsDAFMF. These functions are now unmaintained and may not (fully) work anymore. The
current interface is much faster and provides most of the previous functionality (and additional). However, please
provide feedback if you feel any functionality is missing.
The raw data interface of patRoon is used by generateMSPeakLists to
process HRMS (or IMS-HRMS) data. Please see its documentation for more information on the supported
formats and available configuration options.
The use of profile m/z HRMS data (not IMS-HRMS) is currently not supported.
With a sets workflow, the feature group averaged peak lists are made per set. This is important, because for averaging peak lists cannot be mixed, for instance, when different ionization modes were used to generate the sets. The group averaged peaklists are then simply combined and labelled in the final peak lists. However, annotation and most other functionality typically uses only the (uncombined and) set specific peak lists, as most algorithms cannot work with mixed peak lists.
Functionality to automatically obtain transformation products for a given set of parent compounds.
generateTPs(algorithm, ...)generateTPs(algorithm, ...)
algorithm |
A character string describing the algorithm that should be
used: |
... |
Any parameters to be passed to the selected TP generation algorithm. |
generateTPs is a generic function that will generate transformation products by one of the supported algorithms. The actual
functionality is provided by algorithm specific functions such as generateTPsBioTransformer and generateTPsLogic. While these
functions may be called directly, generateTPs provides a generic interface and is therefore usually preferred.
A transformationProducts (derived) object containing all generated TPs.
The transformationProducts output class and its methods and the algorithm specific functions:
generateTPsBioTransformer, generateTPsLogic, generateTPsLibrary, generateTPsLibraryFormula, generateTPsCTS, generateTPsAnnForm, generateTPsAnnComp
The derived classes transformationProductsFormula,
transformationProductsStructure, transformationProductsAnnForm and
transformationProductsAnnComp for more specific methods to post-process TP data.
Transforms and prioritizes compound annotation candidates to obtain TPs.
generateTPsAnnComp( parents, compounds, TPsRef = NULL, fGroupsComps = NULL, minRTDiff = 20, minFitFormula = 0.94, minFitCompound = 0, minSimSusp = 0, minFitCompOrSimSusp = c(0.54, 0.65), extraOptsFMCSR = NULL, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, neutralizeTPs = TRUE, TPStructParams = getDefTPStructParams(), parallel = TRUE )generateTPsAnnComp( parents, compounds, TPsRef = NULL, fGroupsComps = NULL, minRTDiff = 20, minFitFormula = 0.94, minFitCompound = 0, minSimSusp = 0, minFitCompOrSimSusp = c(0.54, 0.65), extraOptsFMCSR = NULL, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, neutralizeTPs = TRUE, TPStructParams = getDefTPStructParams(), parallel = TRUE )
parents |
The parents for which transformation products should be obtained. This can be
The parents need to have SMILES or InChI information available. |
compounds |
The |
TPsRef |
A |
fGroupsComps |
The |
minRTDiff |
Minimum retention time (in seconds) difference between the parent and a TP to calculate the retention order direction. Candidates with unexpected retention orders are filtered out. |
minFitFormula, minFitCompound, minSimSusp
|
Thresholds to filter out unlikely candidates. For |
minFitCompOrSimSusp |
A two-sized numeric vector specifying the thresholds for |
extraOptsFMCSR |
A |
skipInvalid |
If set to |
prefCalcChemProps |
If |
neutralChemProps |
If |
neutralizeTPs |
If |
TPStructParams |
Parameters that influence the calculation of structural properties. See
|
parallel |
If set to |
This function uses compound annotations to obtain transformation products. This function is called when calling generateTPs with
algorithm="ann_comp".
The generateTPsAnnComp function implements the unknown TP screening from compound candidates approach
as described in (Helmus et al. 2025). This algorithm does not rely on any known or predicted TPs and is
therefore suitable for 'full non-target' workflows. All compound candidates are considered as potential TPs
and are ranked by the TP score:
With:
annSim: the annotation similarity
fitCompound: the structural fit of the compound candidate into the parent (or vice versa, maximum is
taken). Calculated as the "Overlap coefficient" with fmcsR::fmcs. The molecular
data is prepared with rcdk and ChemmineR.
simSusp: the maximum structural similarity with TP suspect candidates for this parent, i.e.
obtained from other algorithms of generateTPs). The calculation is configured by the
TPStructParams.
To speed up the calculation process, several thresholds are applied to rule out unlikely candidates. These thresholds are defaulted to those derived in (Helmus et al. 2025). Nevertheless, calculations can take a very long time (multiple hours), especially when processing large numbers of candidates from e.g. PubChem.
Unlike most other TP generation algorithms, no additional suspect screening step is required.
generateTPsAnnComp returns an object of the class transformationProductsAnnComp. Please
see its documentation for e.g. filtering steps that can be performed on this object.
Chemical properties such as SMILES, InChIKey and formulae in the parent suspect list are automatically validated and calculated if missing/invalid.
The internal validation/calculation process performs the following steps:
Validation of SMILES, InChI, InChIKey and formula data (if present). Invalid
entries will be set to NA.
If neutralChemProps=TRUE then chemical data (SMILES, formulae etc.) is neutralized by
(de-)protonation (using the --neutralized option of OpenBabel). An additional column
molNeutralized is added to mark those molecules that were neutralized. Note that neutralization requires
either SMILES or InChI data to be available.
The SMILES and InChI data are used to calculate missing or invalid SMILES,
InChI, InChIKey and formula data. If prefCalcChemProps=TRUE then existing
InChIKey and formula data is overwritten by calculated values whenever possible.
The chemical formulae which were not calculated are verified and normalized. This process may be time
consuming, and is potentially largely avoided by setting prefCalcChemProps=TRUE.
Neutral masses are calculated for missing values (prefCalcChemProps=FALSE) or whenever possible
(prefCalcChemProps=TRUE).
Note that calculation of formulae for molecules that are isotopically labelled is currently only supported for deuterium (2H) elements.
This functionality relies heavily on OpenBabel, please make sure it is installed.
Setting parallel=TRUE can speed up calculations considerably on multi-core systems. but will also add to
RAM usage. Furthermore, parallelization is only favorable for long calculations due to the overhead of setting up
multiple R processes. Note that the parallel workers must be on the same system, i.e. this will not work on
e.g. clusters.
It is possible that candidates are equal to their parent. To remove these the removeParentIsomers
filter can be used afterwards.
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011). “Open Babel: An open chemical toolbox.” Journal of Cheminformatics, 3(1). doi:10.1186/1758-2946-3-33.
Helmus R, Bagdonaite I, de Voogt P, van Bommel MR, Schymanski EL, van Wezel AP, ter Laak TL (2025).
“Comprehensive Mass Spectrometry Workflows to Systematically Elucidate Transformation Processes of Organic Micropollutants: A Case Study on the Photodegradation of Four Pharmaceuticals.”
Environmental Science & Technology, 59(7), 3723–3736.
ISSN 1520-5851.
doi:10.1021/acs.est.4c09121.
http://dx.doi.org/10.1021/acs.est.4c09121.
Wang Y, Backman TWH, Horan K, Girke T (2013).
“fmcsR: mismatch tolerant maximum common substructure searching in R.”
Bioinformatics, 29(21), 2792–2794.
ISSN 1367-4811.
doi:10.1093/bioinformatics/btt475.
http://dx.doi.org/10.1093/bioinformatics/btt475.
Guha R (2007).
“Chemical Informatics Functionality in R.”
Journal of Statistical Software, 18(6).
Cao Y, Charisi A, Cheng L, Jiang T, Girke T (2008).
“ChemmineR: a compound mining framework for R.”
Bioinformatics, 24(15), 1733–1734.
ISSN 1367-4803.
doi:10.1093/bioinformatics/btn307.
http://dx.doi.org/10.1093/bioinformatics/btn307.
generateTPs for more details and other algorithms.
Transforms and prioritizes formula annotation candidates to obtain TPs.
generateTPsAnnForm( parents, formulas, minFitFormula = 0.94, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, parallel = TRUE )generateTPsAnnForm( parents, formulas, minFitFormula = 0.94, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, parallel = TRUE )
parents |
The parents for which transformation products should be obtained. This can be
The parents need to have formula information available. |
formulas |
The |
minFitFormula |
Minimum |
skipInvalid |
If set to |
prefCalcChemProps |
If |
neutralChemProps |
If |
parallel |
If set to |
This function uses formula annotations to obtain transformation products. This function is called when calling generateTPs with
algorithm="ann_form".
The generateTPsAnnForm function implements the unknown TP screening from formula candidates approach
as described in (Helmus et al. 2025). This algorithm does not rely on any known or predicted TPs and is
therefore suitable for 'full non-target' workflows. All formula candidates are considered as potential TPs
and are ranked by the TP score:
With:
annSim: the annotation similarity
fitFormula: the common element count divided by the total element count for the formulae of the
parent/TP or TP/parent (maximum is taken)
To speed up the calculation process, a threshold for fitFormula is applied to rule out unlikely candidates.
The default was derived in (Helmus et al. 2025).
Unlike most other TP generation algorithms, no additional suspect screening step is required.
generateTPsAnnForm returns an object of the class transformationProductsAnnForm. Please
see its documentation for e.g. filtering steps that can be performed on this object.
Chemical properties such as SMILES, InChIKey and formulae in the parent suspect list are automatically validated and calculated if missing/invalid.
The internal validation/calculation process performs the following steps:
Validation of SMILES, InChI, InChIKey and formula data (if present). Invalid
entries will be set to NA.
If neutralChemProps=TRUE then chemical data (SMILES, formulae etc.) is neutralized by
(de-)protonation (using the --neutralized option of OpenBabel). An additional column
molNeutralized is added to mark those molecules that were neutralized. Note that neutralization requires
either SMILES or InChI data to be available.
The SMILES and InChI data are used to calculate missing or invalid SMILES,
InChI, InChIKey and formula data. If prefCalcChemProps=TRUE then existing
InChIKey and formula data is overwritten by calculated values whenever possible.
The chemical formulae which were not calculated are verified and normalized. This process may be time
consuming, and is potentially largely avoided by setting prefCalcChemProps=TRUE.
Neutral masses are calculated for missing values (prefCalcChemProps=FALSE) or whenever possible
(prefCalcChemProps=TRUE).
Note that calculation of formulae for molecules that are isotopically labelled is currently only supported for deuterium (2H) elements.
This functionality relies heavily on OpenBabel, please make sure it is installed.
Setting parallel=TRUE may speed up calculations, but is only favorable for long calculations due to the
overhead of setting up multiple R processes.
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011). “Open Babel: An open chemical toolbox.” Journal of Cheminformatics, 3(1). doi:10.1186/1758-2946-3-33.
Helmus R, Bagdonaite I, de Voogt P, van Bommel MR, Schymanski EL, van Wezel AP, ter Laak TL (2025). “Comprehensive Mass Spectrometry Workflows to Systematically Elucidate Transformation Processes of Organic Micropollutants: A Case Study on the Photodegradation of Four Pharmaceuticals.” Environmental Science & Technology, 59(7), 3723–3736. ISSN 1520-5851. doi:10.1021/acs.est.4c09121. http://dx.doi.org/10.1021/acs.est.4c09121.
generateTPs for more details and other algorithms.
Uses BioTransformer to predict TPs
generateTPsBioTransformer( parents, type = "env", generations = 2, maxExpGenerations = generations + 2, extraOpts = NULL, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, neutralizeTPs = TRUE, TPStructParams = getDefTPStructParams(), MP = FALSE )generateTPsBioTransformer( parents, type = "env", generations = 2, maxExpGenerations = generations + 2, extraOpts = NULL, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, neutralizeTPs = TRUE, TPStructParams = getDefTPStructParams(), MP = FALSE )
parents |
The parents for which transformation products should be obtained. This can be
The parents need to have SMILES or InChI information available. |
type |
The type of prediction. Valid values are: |
generations |
The number of generations (steps) for the predictions. Sets the |
maxExpGenerations |
The maximum number of generations during hierarchy expansion, see below. |
extraOpts |
A |
skipInvalid |
If set to |
prefCalcChemProps |
If |
neutralChemProps |
If |
neutralizeTPs |
If |
TPStructParams |
Parameters that influence the calculation of structural properties. See
|
MP |
If |
This function uses BioTransformer to obtain transformation products. This function is called when calling generateTPs with
algorithm="biotransformer".
In order to use this function the ‘.jar’ command line utility should be installed and specified in the
patRoon.path.BioTransformer option. The ‘.jar’ file can be obtained via
https://bitbucket.org/djoumbou/biotransformer/src/master. Alternatively, the patRoonExt package can be
installed to automatically install/configure the necessary files.
The TPs are stored in an object derived from the transformationProductsStructure class.
BioTransformer only reports the direct parent for a TP, not
the complete pathway. For instance, consider the following results:
parent –> TP1
parent –> TP2
TP1 –> TP2
TP2 –> TP3
In this case, TP3 may be formed either as:
parent –> TP1 –> TP2 –> TP3
parent –> TP2 –> TP3
For this reason, patRoon simply expands the hierarchy and assumes that all routes are possible. For instance,
Parent
/- -\
/- -\
- -
TP1 TP2
| |
| |
TP2 TP3
|
|
TP3
Note that this may result in pathways with more generations than defined by the generations argument. Thus,
the maxExpGenerations argument is used to avoid excessive expansions.
Chemical properties such as SMILES, InChIKey and formulae in the parent suspect list are automatically validated and calculated if missing/invalid.
The internal validation/calculation process performs the following steps:
Validation of SMILES, InChI, InChIKey and formula data (if present). Invalid
entries will be set to NA.
If neutralChemProps=TRUE then chemical data (SMILES, formulae etc.) is neutralized by
(de-)protonation (using the --neutralized option of OpenBabel). An additional column
molNeutralized is added to mark those molecules that were neutralized. Note that neutralization requires
either SMILES or InChI data to be available.
The SMILES and InChI data are used to calculate missing or invalid SMILES,
InChI, InChIKey and formula data. If prefCalcChemProps=TRUE then existing
InChIKey and formula data is overwritten by calculated values whenever possible.
The chemical formulae which were not calculated are verified and normalized. This process may be time
consuming, and is potentially largely avoided by setting prefCalcChemProps=TRUE.
Neutral masses are calculated for missing values (prefCalcChemProps=FALSE) or whenever possible
(prefCalcChemProps=TRUE).
Note that calculation of formulae for molecules that are isotopically labelled is currently only supported for deuterium (2H) elements.
This functionality relies heavily on OpenBabel, please make sure it is installed.
generateTPsBioTransformer uses multiprocessing to parallelize
computations. Please see the parallelization section in the handbook for
more details and patRoon options for configuration
options.
When the parents argument is a compounds object, the
candidate library identifier is used in case the candidate has no defined compoundName.
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011). “Open Babel: An open chemical toolbox.” Journal of Cheminformatics, 3(1). doi:10.1186/1758-2946-3-33.
Djoumbou-Feunang Y, Fiamoncini J, Gil-de-la-Fuente A, Greiner R, Manach C, Wishart DS (2019).
“BioTransformer: a comprehensive computational tool for small molecule metabolism prediction and metabolite identification.”
Journal of Cheminformatics, 11(1).
doi:10.1186/s13321-018-0324-5.
Wicker J, Lorsbach T, Gutlein M, Schmid E, Latino D, Kramer S, Fenner K (2015).
“enviPath - The environmental contaminant biotransformation pathway resource.”
Nucleic Acids Research, 44(D1), D502–D508.
doi:10.1093/nar/gkv1229.
generateTPs for more details and other algorithms.
Uses Chemical Transformation Simulator (CTS) to predict TPs.
generateTPsCTS( parents, transLibrary, generations = 1, errorRetries = 3, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, neutralizeTPs = TRUE, TPStructParams = getDefTPStructParams(), parallel = TRUE )generateTPsCTS( parents, transLibrary, generations = 1, errorRetries = 3, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, neutralizeTPs = TRUE, TPStructParams = getDefTPStructParams(), parallel = TRUE )
parents |
The parents for which transformation products should be obtained. This can be
The parents need to have SMILES or InChI information available. |
transLibrary |
A |
generations |
An |
errorRetries |
The maximum number of connection retries. Sets the |
skipInvalid |
If set to |
prefCalcChemProps |
If |
neutralChemProps |
If |
neutralizeTPs |
If |
TPStructParams |
Parameters that influence the calculation of structural properties. See
|
parallel |
If set to |
This function uses CTS to obtain transformation products. This function is called when calling generateTPs with
algorithm="cts".
This function uses the httr package to access the Web API of CTS for automatic TP prediction. Hence, an Internet connection is mandatory. Please take care to not 'abuse' the CTS servers, e.g. by running very large batch calculations in parallel, as this may result in rejected connections.
The TPs are stored in an object derived from the transformationProductsStructure class.
Chemical properties such as SMILES, InChIKey and formulae in the parent suspect list are automatically validated and calculated if missing/invalid.
The internal validation/calculation process performs the following steps:
Validation of SMILES, InChI, InChIKey and formula data (if present). Invalid
entries will be set to NA.
If neutralChemProps=TRUE then chemical data (SMILES, formulae etc.) is neutralized by
(de-)protonation (using the --neutralized option of OpenBabel). An additional column
molNeutralized is added to mark those molecules that were neutralized. Note that neutralization requires
either SMILES or InChI data to be available.
The SMILES and InChI data are used to calculate missing or invalid SMILES,
InChI, InChIKey and formula data. If prefCalcChemProps=TRUE then existing
InChIKey and formula data is overwritten by calculated values whenever possible.
The chemical formulae which were not calculated are verified and normalized. This process may be time
consuming, and is potentially largely avoided by setting prefCalcChemProps=TRUE.
Neutral masses are calculated for missing values (prefCalcChemProps=FALSE) or whenever possible
(prefCalcChemProps=TRUE).
Note that calculation of formulae for molecules that are isotopically labelled is currently only supported for deuterium (2H) elements.
This functionality relies heavily on OpenBabel, please make sure it is installed.
When the parents argument is a compounds object, the
candidate library identifier is used in case the candidate has no defined compoundName.
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011). “Open Babel: An open chemical toolbox.” Journal of Cheminformatics, 3(1). doi:10.1186/1758-2946-3-33.
Wolfe K, Pope N, Parmar R, Galvin M, Stevens C, Weber E, Flaishans J, Purucker T (2016).
“Chemical transformation system: Cloud based cheminformatic services to support integrated environmental modeling.”
Proceedings of the 8th International Congress on Environmental Modelling and Software.
Tebes-Stevens C, Patel JM, Jones WJ, Weber EJ (2017).
“Prediction of Hydrolysis Products of Organic Chemicals under Environmental pH Conditions.”
Environmental Science & Technology, 51(9), 5008–5016.
doi:10.1021/acs.est.6b05412.
Yuan C, Tebes-Stevens C, Weber EJ (2020).
“Reaction Library to Predict Direct Photochemical Transformation Products of Environmental Organic Contaminants in Sunlit Aquatic Systems.”
Environmental Science & Technology, 54(12), 7271–7279.
doi:10.1021/acs.est.0c00484.
Yuan C, Tebes-Stevens C, Weber EJ (2021).
“Prioritizing Direct Photolysis Products Predicted by the Chemical Transformation Simulator: Relative Reasoning and Absolute Ranking.”
Environmental Science & Technology, 55(9), 5950-5958.
doi:10.1021/acs.est.0c08745.
PMID: 33881833, https://doi.org/10.1021/acs.est.0c08745.
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011).
“Open Babel: An open chemical toolbox.”
Journal of Cheminformatics, 3(1).
doi:10.1186/1758-2946-3-33.
generateTPs for more details and other algorithms.
The website: https://qed.epa.gov/cts/ and the CTS User guide.
Automatically obtains transformation products from a library.
generateTPsLibrary( parents = NULL, TPLibrary = NULL, generations = 1, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, neutralizeTPs = FALSE, matchParentsBy = "InChIKey", matchGenerationsBy = "InChIKey", TPStructParams = getDefTPStructParams() )generateTPsLibrary( parents = NULL, TPLibrary = NULL, generations = 1, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, neutralizeTPs = FALSE, matchParentsBy = "InChIKey", matchGenerationsBy = "InChIKey", TPStructParams = getDefTPStructParams() )
parents |
The parents for which transformation products should be obtained. This can be
The parents need to have SMILES or InChI information available. |
TPLibrary |
If |
generations |
An |
skipInvalid |
If set to |
prefCalcChemProps |
If |
neutralChemProps |
If |
neutralizeTPs |
If |
matchParentsBy |
A |
matchGenerationsBy |
Similar to |
TPStructParams |
Parameters that influence the calculation of structural properties. See
|
This function uses a library to obtain transformation products. This function is called when calling generateTPs with
algorithm="library".
By default, a library is used that is based on data from PubChem. However, it also possible to use your own library.
The TPs are stored in an object derived from the transformationProductsStructure class.
The TPLibrary argument is used to specify a custom TP library. This should be a
data.frame where each row specifies a TP for a parent, with the following columns:
parent_name and TP_name: The name of the parent/TP.
parent_SMILES and TP_SMILES The SMILES of the parent/TP structure.
retDir The expected retention order direction. (optional)
For generateTPsLibrary: If not specified or forceCalcRetDir=TRUE from
TPStructParams, then the log P values below may be used to calculate
retention order directions.
parent_LogP and TP_LogP The log P values for the parent/TP. (optional)
logPDiff The difference between parent and TP Log P values. Ignored if both
parent_LogP and TP_LogP are specified. (optional)
Other columns are allowed, and will be included in the final object. Multiple TPs for a single parent are specified
by repeating the value within parent_ columns.
Chemical properties such as SMILES, InChIKey and formulae in the parent suspect list are automatically validated and calculated if missing/invalid.
The internal validation/calculation process performs the following steps:
Validation of SMILES, InChI, InChIKey and formula data (if present). Invalid
entries will be set to NA.
If neutralChemProps=TRUE then chemical data (SMILES, formulae etc.) is neutralized by
(de-)protonation (using the --neutralized option of OpenBabel). An additional column
molNeutralized is added to mark those molecules that were neutralized. Note that neutralization requires
either SMILES or InChI data to be available.
The SMILES and InChI data are used to calculate missing or invalid SMILES,
InChI, InChIKey and formula data. If prefCalcChemProps=TRUE then existing
InChIKey and formula data is overwritten by calculated values whenever possible.
The chemical formulae which were not calculated are verified and normalized. This process may be time
consuming, and is potentially largely avoided by setting prefCalcChemProps=TRUE.
Neutral masses are calculated for missing values (prefCalcChemProps=FALSE) or whenever possible
(prefCalcChemProps=TRUE).
Note that calculation of formulae for molecules that are isotopically labelled is currently only supported for deuterium (2H) elements.
This functionality relies heavily on OpenBabel, please make sure it is installed.
When the parents argument is a compounds object, the
candidate library identifier is used in case the candidate has no defined compoundName.
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011). “Open Babel: An open chemical toolbox.” Journal of Cheminformatics, 3(1). doi:10.1186/1758-2946-3-33.
generateTPs for more details and other algorithms.
Automatically obtains transformation products from a library with formula data.
generateTPsLibraryFormula( parents = NULL, TPLibrary, generations = 1, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, matchParentsBy = "name", matchGenerationsBy = "name" )generateTPsLibraryFormula( parents = NULL, TPLibrary, generations = 1, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, matchParentsBy = "name", matchGenerationsBy = "name" )
parents |
The parents for which transformation products should be obtained. This can be
The parents need to have formula information available. |
TPLibrary |
A |
generations |
An |
skipInvalid |
If set to |
prefCalcChemProps |
If |
neutralChemProps |
If |
matchParentsBy |
A |
matchGenerationsBy |
Similar to |
This function uses a library to obtain transformation products. This function is called when calling generateTPs with
algorithm="library_formula".
This function is similar to generateTPsLibrary, however, it only require formula information
of the parent and TPs.
The TPs are stored in an object derived from the transformationProductsFormula class.
The TPLibrary argument is used to specify a custom TP library. This should be a
data.frame where each row specifies a TP for a parent, with the following columns:
parent_name and TP_name: The name of the parent/TP.
parent_formula and TP_formula The formula of the parent/TP structure.
retDir The expected retention order direction. (optional)
For generateTPsLibrary: If not specified or forceCalcRetDir=TRUE from
TPStructParams, then the log P values below may be used to calculate
retention order directions.
parent_LogP and TP_LogP The log P values for the parent/TP. (optional)
logPDiff The difference between parent and TP Log P values. Ignored if both
parent_LogP and TP_LogP are specified. (optional)
Other columns are allowed, and will be included in the final object. Multiple TPs for a single parent are specified
by repeating the value within parent_ columns.
Chemical properties such as SMILES, InChIKey and formulae in the parent suspect list are automatically validated and calculated if missing/invalid.
The internal validation/calculation process performs the following steps:
Validation of SMILES, InChI, InChIKey and formula data (if present). Invalid
entries will be set to NA.
If neutralChemProps=TRUE then chemical data (SMILES, formulae etc.) is neutralized by
(de-)protonation (using the --neutralized option of OpenBabel). An additional column
molNeutralized is added to mark those molecules that were neutralized. Note that neutralization requires
either SMILES or InChI data to be available.
The SMILES and InChI data are used to calculate missing or invalid SMILES,
InChI, InChIKey and formula data. If prefCalcChemProps=TRUE then existing
InChIKey and formula data is overwritten by calculated values whenever possible.
The chemical formulae which were not calculated are verified and normalized. This process may be time
consuming, and is potentially largely avoided by setting prefCalcChemProps=TRUE.
Neutral masses are calculated for missing values (prefCalcChemProps=FALSE) or whenever possible
(prefCalcChemProps=TRUE).
Note that calculation of formulae for molecules that are isotopically labelled is currently only supported for deuterium (2H) elements.
This functionality relies heavily on OpenBabel, please make sure it is installed.
Unlike generateTPsLibrary, this function defaults the matchParentsBy and
matchGenerationsBy arguments to "name". While matching by formula is also possible, it is
likely that duplicate parent formulae (i.e. isomers) are present in parents and/or TPLibrary,
making matching by formula unsuitable. However, if you are sure that no duplicate formulae are present, it may be
better to set the matching method to "formula".
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011). “Open Babel: An open chemical toolbox.” Journal of Cheminformatics, 3(1). doi:10.1186/1758-2946-3-33.
generateTPs for more details and other algorithms.
generateTPsLibrary to generate TPs from a library that contains structural information.
genFormulaTPLibrary to automatically generate formula TP libraries.
Automatically calculate potential transformation products with metabolic logic.
generateTPsLogic(fGroups, minMass = 40, ...) ## S4 method for signature 'featureGroups' generateTPsLogic(fGroups, minMass = 40, adduct = NULL, transformations = NULL) ## S4 method for signature 'featureGroupsSet' generateTPsLogic(fGroups, minMass = 40, transformations = NULL)generateTPsLogic(fGroups, minMass = 40, ...) ## S4 method for signature 'featureGroups' generateTPsLogic(fGroups, minMass = 40, adduct = NULL, transformations = NULL) ## S4 method for signature 'featureGroupsSet' generateTPsLogic(fGroups, minMass = 40, transformations = NULL)
fGroups |
A |
minMass |
A |
... |
Further arguments specified to the methods. |
adduct |
An (sets workflow) The |
transformations |
A |
This function uses metabolic logic to obtain transformation products. This function is called when calling generateTPs with
algorithm="logic".
With this algorithm TPs are predicted from common (environmental) chemical reactions, such as hydroxylation, demethylation etc. The generated TPs result from calculating the mass differences between a parent feature after it underwent the reaction. While this only results in little information on chemical properties of the TP, an advantage of this method is that it does not rely on structural information of the parent, which may be unknown in a full non-target analysis.
A transformationProducts (derived) object containing all generated TPs.
The transformations argument specifies custom rules to calculate
transformation products. This should be a data.frame with the following columns:
transformation The name of the chemical transformation
add The elements that are added by this reaction (e.g. "O").
sub The elements that are removed by this reaction (e.g. "H2O").
retDir The expected retention order direction.
The algorithms using transformation reactions are directly based on the work done by Schollee et al. (see references).
Schollee JE, Schymanski EL, Avak SE, Loos M, Hollender J (2015). “Prioritizing Unknown Transformation Products from Biologically-Treated Wastewater Using High-Resolution Mass Spectrometry, Multivariate Statistics, and Metabolic Logic.” Analytical Chemistry, 87(24), 12121–12129. doi:10.1021/acs.analchem.5b02905.
generateTPs for more details and other algorithms.
Various (S4) generic functions providing a common interface for common tasks such as plotting and filtering data. The actual functionality and function arguments are often specific for the implemented methods, for this reason, please refer to the linked method documentation for each generic.
adducts(obj, ...) adducts(obj, ...) <- value algorithm(obj) analysisInfo(obj, df = FALSE) analysisInfo(obj) <- value analyses(obj) annotatedPeakList(obj, ...) annotations(obj, ...) assignMobilities(obj, ...) calculatePeakQualities( obj, weights = NULL, flatnessFactor = 0.05, featureQualities = NULL, featureGroupQualities = NULL, ... ) clusterProperties(obj) clusters(obj) consensus(obj, ...) convertToMFDB(TPs, out, ...) convertToSuspects(obj, ...) cutClusters(obj) defaultExclNormScores(obj) export(obj, type, out, ...) featureTable(obj, ...) filter(obj, ...) fromIMS(obj) getBPCs(obj, ...) getFeatures(obj) getFeatureQualityNames(obj, ...) getMCS(obj, ...) getTICs(obj, ...) groupNames(obj) hasIMS(obj) plotBPCs(obj, ...) plotChord(obj, addSelfLinks = FALSE, addRetMzPlots = TRUE, ...) plotChroms(obj, ...) plotChroms3D(obj, ...) plotGraph(obj, ...) plotInt(obj, ...) plotScores(obj, ...) plotSilhouettes(obj, kSeq, ...) plotSpectrum(obj, ...) plotStructure(obj, ...) plotTICs(obj, ...) plotVenn(obj, ...) plotUpSet(obj, ...) predictRespFactors(obj, ...) predictTox(obj, ...) delete(obj, ...) plotVolcano(obj, ...) replicates(obj) setObjects(obj) sets(obj) treeCut(obj, k = NULL, h = NULL, ...) treeCutDynamic( obj, maxTreeHeight = 1, deepSplit = TRUE, minModuleSize = 1, ... ) unset(obj, set)adducts(obj, ...) adducts(obj, ...) <- value algorithm(obj) analysisInfo(obj, df = FALSE) analysisInfo(obj) <- value analyses(obj) annotatedPeakList(obj, ...) annotations(obj, ...) assignMobilities(obj, ...) calculatePeakQualities( obj, weights = NULL, flatnessFactor = 0.05, featureQualities = NULL, featureGroupQualities = NULL, ... ) clusterProperties(obj) clusters(obj) consensus(obj, ...) convertToMFDB(TPs, out, ...) convertToSuspects(obj, ...) cutClusters(obj) defaultExclNormScores(obj) export(obj, type, out, ...) featureTable(obj, ...) filter(obj, ...) fromIMS(obj) getBPCs(obj, ...) getFeatures(obj) getFeatureQualityNames(obj, ...) getMCS(obj, ...) getTICs(obj, ...) groupNames(obj) hasIMS(obj) plotBPCs(obj, ...) plotChord(obj, addSelfLinks = FALSE, addRetMzPlots = TRUE, ...) plotChroms(obj, ...) plotChroms3D(obj, ...) plotGraph(obj, ...) plotInt(obj, ...) plotScores(obj, ...) plotSilhouettes(obj, kSeq, ...) plotSpectrum(obj, ...) plotStructure(obj, ...) plotTICs(obj, ...) plotVenn(obj, ...) plotUpSet(obj, ...) predictRespFactors(obj, ...) predictTox(obj, ...) delete(obj, ...) plotVolcano(obj, ...) replicates(obj) setObjects(obj) sets(obj) treeCut(obj, k = NULL, h = NULL, ...) treeCutDynamic( obj, maxTreeHeight = 1, deepSplit = TRUE, minModuleSize = 1, ... ) unset(obj, set)
obj |
The object the generic should be applied to. |
... |
Any further method specific arguments. See method documentation for details. |
value |
The replacement value. |
df |
If |
weights, flatnessFactor, featureQualities, featureGroupQualities
|
See method documentation. |
TPs |
The |
out |
Output file. |
type |
The export type. |
addSelfLinks |
If |
addRetMzPlots |
Set to |
kSeq |
An integer vector containing the sequence that should be used for average silhouette width calculation. |
k, h
|
Desired numbers of clusters. See |
maxTreeHeight, deepSplit, minModuleSize
|
Arguments used by
|
set |
The name of the set. |
adducts returns assigned adducts of the object.
Methods are defined for: featureGroups; featureGroupsSet.
adducts<- sets adducts of the object.
Methods are defined for: featureGroups; featureGroupsSet.
algorithm returns the algorithm that was used to generate the object.
Methods are defined for: optimizationResult; workflowStep.
analysisInfo returns the analysis information of an object.
Methods are defined for: featureGroups; features; MSPeakListsSet.
analysisInfo<- modifies the analysis information of an object.
Methods are defined for: featureGroups; featureGroupsXCMS; features; featuresXCMS.
analyses returns a character vector with the analyses for which data is present in this object.
Methods are defined for: featureGroups; features; formulas; MSPeakLists.
annotatedPeakList returns an annotated MS peak list.
Methods are defined for: compounds; compoundsSet; formulas; formulasSet.
annotations returns annotations.
Methods are defined for: featureAnnotations; featureGroups; formulas.
assignMobilities assigns ion mobility and/or CCS values to workflow data.
Methods are defined for: compounds; compoundsSet; featureGroups; featureGroupsScreening; featureGroupsScreeningSet; featureGroupsSet.
calculatePeakQualities calculates chromatographic peak qualities and scores.
Methods are defined for: featureGroups; features.
clusterProperties Obtain a list with properties of the generated cluster(s).
Methods are defined for: componentsClust; compoundsCluster.
clusters Obtain clustering object(s).
Methods are defined for: componentsClust; compoundsCluster.
consensus combines and merges data from various algorithms to generate a consensus.
Methods are defined for: components; componentsSet; compounds; compoundsSet; featureGroupsComparison; featureGroupsComparisonSet; formulas; formulasSet; transformationProductsStructure.
convertToMFDB Exports the object to a local database that can be used with MetFrag.
Methods are defined for: .
convertToSuspects Converts an object to a suspect list.
Methods are defined for: MSLibrary; transformationProducts.
cutClusters Returns assigned cluster indices of a cut cluster.
Methods are defined for: componentsClust; compoundsCluster.
defaultExclNormScores Returns default scorings that are excluded from normalization.
export exports workflow data to a given format.
Methods are defined for: featureGroups; featureGroupsSet; MSLibrary.
featureTable returns feature information.
Methods are defined for: featureGroups; featureGroupsSet; features.
filter provides various functionality to do post-filtering of data.
Methods are defined for: components; componentsSet; componentsTPs; compounds; compoundsSet; featureAnnotations; featureGroups; featureGroupsScreening; featureGroupsScreeningSet; featureGroupsSet; features; featuresSet; formulasSet; MSLibrary; MSPeakLists; MSPeakListsSet; transformationProducts; transformationProductsAnnComp; transformationProductsAnnForm; transformationProductsStructure.
fromIMS returns TRUE if the object was directly created from IMS data.
Methods are defined for: featureGroups; features.
getBPCs gets base peak chromatogram(s).
Methods are defined for: featureGroups; features.
getFeatures returns the object's features object.
Methods are defined for: featureGroups.
getFeatureQualityNames returns the object's feature quality names.
Methods are defined for: featureGroups; features.
getMCS Calculates the maximum common substructure.
Methods are defined for: compounds; compoundsCluster.
getTICs gets total ion chromatogram(s).
Methods are defined for: featureGroups; features.
groupNames returns a character vector with the names of the feature groups for which data is present in this object.
Methods are defined for: components; compoundsCluster; featureAnnotations; featureGroups; MSPeakLists.
hasIMS returns TRUE if the object has ion mobility values
Methods are defined for: featureGroups; featureGroupsComparison; features.
plotBPCs plots base peak chromatogram(s).
Methods are defined for: featureGroups; features.
plotChord plots a Chord diagram to assess overlapping data.
Methods are defined for: featureGroups; featureGroupsComparison.
plotChroms plots extracted ion chromatogram(s).
Methods are defined for: components; featureGroups.
plotChroms3D plots a three dimensional chromatogram.
Methods are defined for: featureGroups; featureGroupsSet.
plotGraph Plots an interactive network graph.
Methods are defined for: componentsNT; componentsNTSet; componentsTPs; featureGroups; featureGroupsSet; transformationProductsFormula; transformationProductsStructure.
plotInt plots the intensity of all contained features.
Methods are defined for: componentsIntClust; featureGroups.
plotScores plots candidate scorings.
plotSilhouettes plots silhouette widths to evaluate the desired cluster size.
Methods are defined for: componentsClust; compoundsCluster.
plotSpectrum plots a (annotated) spectrum.
Methods are defined for: components; compounds; compoundsSet; formulas; formulasSet; MSPeakLists; MSPeakListsSet.
plotStructure plots a chemical structure.
Methods are defined for: compounds; compoundsCluster.
plotTICs plots total ion chromatogram(s).
Methods are defined for: featureGroups; features.
plotVenn plots a Venn diagram to assess unique and overlapping data.
Methods are defined for: featureAnnotations; featureGroups; featureGroupsComparison; transformationProductsStructure.
plotUpSet plots an UpSet diagram to assess unique and overlapping data.
Methods are defined for: featureAnnotations; featureGroups; featureGroupsComparison; transformationProductsStructure.
predictRespFactors Prediction of response factors.
Methods are defined for: compounds; compoundsSet; compoundsSIRIUS; featureGroupsScreening; featureGroupsScreeningSet; formulasSet; formulasSIRIUS.
predictTox Prediction of toxicity values.
Methods are defined for: compounds; compoundsSet; compoundsSIRIUS; featureGroupsScreening; featureGroupsScreeningSet; formulasSet; formulasSIRIUS.
delete Deletes results.
Methods are defined for: components; componentsClust; componentsSet; compoundsSet; compoundsSIRIUS; featureAnnotations; featureGroups; featureGroupsKPIC2; featureGroupsScreening; featureGroupsScreeningSet; featureGroupsSet; featureGroupsXCMS; featureGroupsXCMS3; features; featuresKPIC2; featuresPiek; featuresXCMS; featuresXCMS3; formulas; formulasSet; formulasSIRIUS; MSLibrary; MSPeakLists; MSPeakListsSet; transformationProducts.
plotVolcano plots a volcano plot.
Methods are defined for: featureGroups.
replicates returns a character vector with the replicates for which data is present in this object.
Methods are defined for: featureGroups; features.
setObjects returns the set objects of this object. See the documentation of workflowStepSet.
Methods are defined for: workflowStepSet.
sets returns the names of the sets inside this object. See the documentation for sets workflows.
Methods are defined for: featureGroupsSet; featuresSet; workflowStepSet.
treeCut Manually cut a cluster.
Methods are defined for: componentsClust; compoundsCluster.
treeCutDynamic Automatically cut a cluster.
Methods are defined for: componentsClust; compoundsCluster.
unset Converts this object to a regular non-set object. See the documentation for sets workflows.
Methods are defined for: componentsNTSet; componentsSet; compoundsConsensusSet; compoundsSet; featureGroupsScreeningSet; featureGroupsSet; featuresSet; formulasConsensusSet; formulasSet; MSPeakListsSet.
Below are methods that are defined for existing
generics (e.g. defined in base). Please see method specific
documentation for more details.
[ Subsets data within an object.
Methods are defined for: components,ANY,ANY,missing; componentsSet,ANY,ANY,missing; compoundsCluster,ANY,missing,missing; compoundsSet,ANY,missing,missing; featureAnnotations,ANY,missing,missing; featureGroups,ANY,ANY,missing; featureGroupsComparison,ANY,missing,missing; featureGroupsScreening,ANY,ANY,missing; featureGroupsScreeningSet,ANY,ANY,missing; featureGroupsSet,ANY,ANY,missing; features,ANY,missing,missing; featuresSet,ANY,missing,missing; formulasSet,ANY,missing,missing; MSLibrary,ANY,missing,missing; MSPeakLists,ANY,ANY,missing; MSPeakListsSet,ANY,ANY,missing; transformationProducts,ANY,missing,missing.
[[ Extract data from an object.
Methods are defined for: components,ANY,ANY; featureAnnotations,ANY,missing; featureGroups,ANY,ANY; featureGroupsComparison,ANY,missing; features,ANY,missing; formulas,ANY,ANY; MSLibrary,ANY,missing; MSPeakLists,ANY,ANY; transformationProducts,ANY,missing.
$ Extract data from an object.
Methods are defined for: components; featureAnnotations; featureGroups; featureGroupsComparison; features; MSLibrary; MSPeakLists; transformationProducts.
as.data.table Converts an object to a table (data.table).
Methods are defined for: components; componentsTPs; featureAnnotations; featureGroups; featureGroupsScreening; featureGroupsScreeningSet; features; featuresSet; formulas; MSLibrary; MSPeakLists; MSPeakListsSet; transformationProducts; workflowStep.
as.data.frame Converts an object to a table (data.frame).
Methods are defined for: workflowStep.
length Returns the length of an object.
Methods are defined for: components; compoundsCluster; featureAnnotations; featureGroups; featureGroupsComparison; features; MSLibrary; MSPeakLists; optimizationResult; transformationProducts.
lengths Returns the lengths of elements within this object.
Methods are defined for: compoundsCluster; optimizationResult.
names Return names for this object.
Methods are defined for: components; featureGroups; featureGroupsComparison; MSLibrary; transformationProducts.
plot Generates a plot for an object.
Methods are defined for: componentsClust,missing; compoundsCluster,missing; featureGroups,missing; featureGroupsComparison,missing; optimizationResult,missing.
show Prints information about this object.
Methods are defined for: adduct; C++Object; components; componentsFeatures; componentsSet; compounds; compoundsCluster; compoundsSet; featureGroups; featureGroupsScreening; featureGroupsScreeningSet; featureGroupsSet; features; featuresSet; formulas; formulasSet; MSLibrary; MSPeakLists; MSPeakListsSet; optimizationResult; transformationProducts; workflowStep; workflowStepSet.
Functionality to automatically generate a TP library with formula data from a set of transformation rules, which can
be used with generateTPsLibraryFormula. TP calculation will be skipped if the transformation involves
subtraction of elements not present in the parent.
genFormulaTPLibrary( parents, transformations = NULL, minMass = 40, generations = 1, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE )genFormulaTPLibrary( parents, transformations = NULL, minMass = 40, generations = 1, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE )
parents |
The parents to which the given transformation rules should be used to generate the TP library. Should
be either a suspect list (see suspect screening for more information) or the resulting
output of |
transformations |
A |
minMass |
The minimum mass for a TP to be kept. |
generations |
An |
skipInvalid |
Set to |
prefCalcChemProps |
If |
neutralChemProps |
If |
A data.table that is suitable for the TPLibrary argument to
generateTPsLibraryFormula.
The transformations argument specifies custom rules to calculate
transformation products. This should be a data.frame with the following columns:
transformation The name of the chemical transformation
add The elements that are added by this reaction (e.g. "O").
sub The elements that are removed by this reaction (e.g. "H2O").
retDir The expected retention order direction.
The algorithms using transformation reactions are directly based on the work done by Schollee et al. (see references).
Chemical properties such as SMILES, InChIKey and formulae in the parent suspect list are automatically validated and calculated if missing/invalid.
The internal validation/calculation process performs the following steps:
Validation of SMILES, InChI, InChIKey and formula data (if present). Invalid
entries will be set to NA.
If neutralChemProps=TRUE then chemical data (SMILES, formulae etc.) is neutralized by
(de-)protonation (using the --neutralized option of OpenBabel). An additional column
molNeutralized is added to mark those molecules that were neutralized. Note that neutralization requires
either SMILES or InChI data to be available.
The SMILES and InChI data are used to calculate missing or invalid SMILES,
InChI, InChIKey and formula data. If prefCalcChemProps=TRUE then existing
InChIKey and formula data is overwritten by calculated values whenever possible.
The chemical formulae which were not calculated are verified and normalized. This process may be time
consuming, and is potentially largely avoided by setting prefCalcChemProps=TRUE.
Neutral masses are calculated for missing values (prefCalcChemProps=FALSE) or whenever possible
(prefCalcChemProps=TRUE).
Note that calculation of formulae for molecules that are isotopically labelled is currently only supported for deuterium (2H) elements.
This functionality relies heavily on OpenBabel, please make sure it is installed.
Schollee JE, Schymanski EL, Avak SE, Loos M, Hollender J (2015). “Prioritizing Unknown Transformation Products from Biologically-Treated Wastewater Using High-Resolution Mass Spectrometry, Multivariate Statistics, and Metabolic Logic.” Analytical Chemistry, 87(24), 12121–12129. doi:10.1021/acs.analchem.5b02905.
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011). “Open Babel: An open chemical toolbox.” Journal of Cheminformatics, 3(1). doi:10.1186/1758-2946-3-33.
generateTPsLibraryFormula and generateTPsLogic
Detects background MS/MS peaks by gathering frequently occurring peaks in MS/MS spectra from blanks.
getBGMSMSPeaks( anaInfo, replicates = NULL, MSLevel = 2, retentionRange = NULL, mobilityRange = NULL, minBPIntensity = 5000, avgSpectraParams = getDefAvgPListParams(relMinAbundance = 0.1, topMost = 25), avgAnalysesParams = getDefAvgPListParams(relMinAbundance = 0.8, topMost = 25) )getBGMSMSPeaks( anaInfo, replicates = NULL, MSLevel = 2, retentionRange = NULL, mobilityRange = NULL, minBPIntensity = 5000, avgSpectraParams = getDefAvgPListParams(relMinAbundance = 0.1, topMost = 25), avgAnalysesParams = getDefAvgPListParams(relMinAbundance = 0.8, topMost = 25) )
anaInfo |
The analysis info object with the analyses to be used for background peak detection. |
replicates |
A |
MSLevel |
The MS level of the spectra to be used for background peak detection. This should be ‘1’ or
‘2’. This function is only tested with |
retentionRange, mobilityRange
|
A two-sized |
minBPIntensity |
The minimum basepeak intensity of a spectrum to be considered for background peak detection. This is primarily intended to optimize the detection procedure. |
avgSpectraParams, avgAnalysesParams
|
A |
This function iterates through all MS/MS spectra of the given analyses and collects the most frequently occurring
peaks. It first averages all spectra within the same analyses, and retains those peaks above an abundance threshold
(set by avgSpectraParams). The analyses-averaged spectra are then also averaged, and again peaks with a
minimum abundance are kept (set by avgAnalysesParams).
The frequent occurrence of a peak throughout MS/MS spectra, including DDA spectra different isolation m/z values, is often a sign of contamination from the analytical system (e.g. from the mobile phase, MS quadrupoles etc). Hence, the analyses used for blank subtraction are typically just measurements of e.g. ultrapure water or another solvent, and not need to be treated by any extraction procedure.
The output of this function can directly be used to the filter method for
MSPeakLists to subsequently remove these background peaks, which possibly improves subsequent feature annotation.
A data.table with the detected background peaks and abundance statistics. The table can
directly be passed to the removeMZs argument of the filter method for
MSPeakLists.
The raw data interface of patRoon is used by getBGMSMSPeaks to
process HRMS (or IMS-HRMS) data. Please see its documentation for more information on the supported
formats and available configuration options.
The use of profile m/z HRMS data (not IMS-HRMS) is currently not supported.
Helmus R, Bagdonaite I, de Voogt P, van Bommel MR, Schymanski EL, van Wezel AP, ter Laak TL (2025). “Comprehensive Mass Spectrometry Workflows to Systematically Elucidate Transformation Processes of Organic Micropollutants: A Case Study on the Photodegradation of Four Pharmaceuticals.” Environmental Science & Technology, 59(7), 3723–3736. ISSN 1520-5851. doi:10.1021/acs.est.4c09121. http://dx.doi.org/10.1021/acs.est.4c09121.
Configuration and description of parameters used for CCS calculation.
getCCSParams(method, ..., calibrant = NULL)getCCSParams(method, ..., calibrant = NULL)
method, calibrant
|
Sets the CCS calculation method and the calibrant data (only required if
|
... |
optional named arguments that override defaults. |
The following parameters exist to configure the CCS calculation:
method The CCS calculation method. Should be "bruker", "mason-schamp_k",
"mason-schamp_1/k" or "agilent". See details below.
defaultCharge The default charge of the ions. This is used when no charge information is available.
temperature,massGas The temperature (Kelvin) and exact mass of the drift gas. See calculation
details below.
MasonSchampConstant The Mason-Schamp constant. See calculation details below.
calibrant If method="agilent": the calibrant data to be used for CCS calculation. This should
either be
A path to an Agilent ‘.d’ file.
A path to an ‘OverrideImsCal.xml’ file (found in ‘sample.d/AcqData’).
A named list with the elements massGas, TFix and beta.
The CCS calculation depends on the method parameter:
bruker: uses the Bruker TDF-SDK for calculations. See msdata for configuration
options. Only applicable to TIMS data.
mason-schamp_k: uses the Mason-Schamp equation:
With
C the Mason-Schamp constant, can be changed by setting the MasonSchampConstant parameter.
See (George et al. 2024) for details.
u the reduced mass of the drift gas and the ion:
The mass of the drift gas is defined by the massGas parameter.
T the temperature (Kelvin) as defined by the temperature parameter.
mason-schamp_1/k: as mason-schamp_k but assuming an inversed mobility ().
This is meant for TIMS data. Compared to method="bruker", this doesn't rely on the TDF-SDK but may
produce results with very minor differences (George et al. 2024).
agilent: uses Agilent calibration data with the following equation:
With and the TFix and beta values from the calibration data. The
massGas parameter sets the value.
The getCCSParams function generates such parameter list with defaults.
The calculation formulas was derived from (Haler et al. 2017), (George et al. 2024)
and the implementation used by MS-DIAL
(Tsugawa et al. 2020). (MobilityToCrossSection method from the IonMobilityUtility class).
George AC, Schmitz I, Rouviere F, Alves S, Colsch B, Heinisch S, Afonso C, Fenaille F, Loutelier-Bourhis C (2024).
“Interplatform comparison between three ion mobility techniques for human plasma lipid collision cross sections.”
Analytica Chimica Acta, 1304, 342535.
ISSN 0003-2670.
doi:10.1016/j.aca.2024.342535.
http://dx.doi.org/10.1016/j.aca.2024.342535.
Haler JRN, Kune C, Massonnet P, Comby-Zerbino C, Jordens J, Honing M, Mengerink Y, Far J, De Pauw E (2017).
“Comprehensive Ion Mobility Calibration: Poly(ethylene oxide) Polymer Calibrants and General Strategies.”
Analytical Chemistry, 89(22), 12076–12086.
ISSN 1520-6882.
doi:10.1021/acs.analchem.7b02564.
http://dx.doi.org/10.1021/acs.analchem.7b02564.
Tsugawa H, Ikeda K, Takahashi M, Satoh A, Mori Y, Uchino H, Okahashi N, Yamada Y, Tada I, Bonini P, Higashi Y, Okazaki Y, Zhou Z, Zhu Z, Koelmel J, Cajka T, Fiehn O, Saito K, Arita M, Arita M (2020).
“A lipidome atlas in MS-DIAL 4.”
Nature Biotechnology, 38(10), 1159–1163.
ISSN 1546-1696.
doi:10.1038/s41587-020-0531-2.
http://dx.doi.org/10.1038/s41587-020-0531-2.
Create parameter lists for averaging MS peak list data.
getDefAvgPListParams(..., IMS = getLimIMS())getDefAvgPListParams(..., IMS = getLimIMS())
... |
Optional named arguments that override defaults. |
IMS |
A |
The parameters set used for averaging peak lists are set by the avgFeatParams and avgFGroupParams
arguments to generateMSPeakLists and its related algorithm specific functions. The parameters are
specified as a named list with the following values:
method,clusterMzWindow The cluster method and window (see clustering
parameters) used to average mass spectra. clusterMzWindow is defaulted as defaultLim("mz", "medium") (see
limits).
topMost Only retain this maximum number of MS peaks when generating averaged spectra. Lowering this
number may exclude more irrelevant (noisy) MS peaks and decrease processing time, whereas higher values may avoid
excluding lower intense MS peaks that may still be of interest.
minIntensityPre MS peaks with intensities below this value will be removed (applied prior to selection
by topMost and averaging).
minIntensityPost MS peaks with intensities below this value will be removed (after averaging).
minIntensityIMS MS peaks in spectra of raw IMS frames with intensities below this value will be removed
(applied prior to any other treatment steps).
absMinAbundance,relMinAbundance Minimum absolute/relative abundance of an MS peak across the
spectra that are averaged. If absMinAbundance exceeds the number of spectra then the threshold is
automatically lowered to the number of spectra.
minRelCumIntensity Minimum relative cumulative intensity of an MS peak in the averaged spectrum.
smoothWindowIMS,halfWindowIMS,maxGapIMS Parameters used for centroiding m/z peaks
from IMS-HRMS data. See Centroiding IMS data for more details.
withPrecursorMS For MS data only: ignore any spectra that do not contain the precursor peak.
For IMS data this excludes MS spectra within an IMS frame that do not contain the precursor peak, typically due to mobility separation. Hence, setting this option performs some crude cleanup of MS spectra, even for features for which no mobilities were assigned (e.g. non-IMS workflows).
pruneMissingPrecursorMS For MS data only: if TRUE then peak lists without a precursor peak are
removed. Note that even when this is set to FALSE, functionality that relies on MS (not MS/MS) peak lists
(e.g. formulae calculation) will still skip calculation if a precursor is not found.
retainPrecursorMSMS For MS/MS data only: if TRUE then always retain the precursor mass peak even
if is not amongst the topMost peaks. Note that MS precursor mass peaks are always kept. Furthermore, note that
precursor peaks in both MS and MS/MS data may still be removed by intensity thresholds (this is unlike the
filter method function).
The getDefAvgPListParams function can be used to generate a default parameter list. The defaults are (with
IMS="bruker"):
list( method = "distance_mean", clusterMzWindow = 0.005, topMost = 50, minIntensityPre = 500, minIntensityPost = 500, minIntensityIMS = 25, absMinAbundance = 0, relMinAbundance = 0, minRelCumIntensity = 0, smoothWindowIMS = 0, halfWindowIMS = 2, maxGapIMS = 0.005, withPrecursorMS = TRUE, pruneMissingPrecursorMS = TRUE, retainPrecursorMSMS = TRUE )
getDefAvgPListParams returns a list with the peak list averaging parameters.
With IMS-HRMS data the m/z peaks are often not or partially centroided. The following steps are performed to centroid the data:
Sum up mass spectra within an IMS frame. If the feature has mobility data, only spectra within its mobility boundaries are considered.
Use point-distance clustering (see clustering parameters) with a window defined by
maxGapIMS to find related mass signals. This is primarily meant for non-continuous data, e.g. due to
intensity thresholding. The maxGapIMS parameter should be set to a value that represents the maximum
expected distance between two m/z datapoints. For some instruments, such as Agilent IMS-QTOF, this value may
be higher than expected. For that reason, if IMS="agilent" then the default is set to 0.01.
Smooth the intensity data using a centered moving average with window size smoothWindowIMS (set to
zero to disable smoothing).
Find local maxima within sliding window with +/- halfWindowIMS points and eliminate non-centroids.
This algorithm is based on the C_localMaxima function from MALDIquant.
Gibb S, Strimmer K (2012). “MALDIquant: a versatile R package for the analysis of mass spectrometry data.” Bioinformatics, 28(17), 2270–2271. doi:10.1093/bioinformatics/bts447.
Algorithms and parameters for automatic detection of peaks in chromatograms and mobilograms.
getDefPeakParams(type, algorithm, ...)getDefPeakParams(type, algorithm, ...)
type |
The type of parameter defaults: |
algorithm |
The peak detection algorithm: |
... |
optional named arguments that override defaults. |
The algorithm and its parameters for peak detection should be in a named list with the format:
list(algorithm = <algorithm>, param1 = ..., param2 = ..., ...)
Where <algorithm> is the name of the algorithm and param1, param2 etc are the parameters. The
getDefPeakParams function generates such parameter list with the algorithm and default parameters.
The following algorithms are currently supported:
"openms": uses
MRMTransitionGroupPicker
tool from OpenMS.
"xcms3": uses the xcms::peaksWithCentWave function.
"envipick": uses the enviPick::mzpick function.
"piek": uses the peak detection algorithm from (Dietrich et al. 2021), which was optimized
with OpenMP parallelization. See findFeaturesPiek for more details.
The parameters are discussed in the next sections.
These parameters are applicable to all algorithms
forcePeakWidth a two-sized numeric vector with the minimum and maximum width for a peak. Peaks
that are more narrow or wide will be clamped to this range. This is especially useful for algorithms that consider
an extensive part of the fronting/tailing noise as part as the peak. Set to c(0, 0) to disable.
relMinIntensity the minimum intensity threshold for a peak relative to the highest peak in the same
chromatogram/mobilogram. This is e.g. useful to exclude noise in mobilograms where normally few peaks are
expected.
calcCentroid Controls how the peak centroid is calculated, which is used for retention time or
mobility determination. Valid values are: "algorithm" (use the centroid as determined by the algorithm),
"max" (use the apex of the peak), "weighted.mean" (use the intensity weighted mean of all data points
in the peak) or "centerOfMass" (use the center of mass or first statistical moment of the peak). The latter
two might of interest for assymaterical peaks. However, most algorithms, including those not interfaced by
patRoon, seem to use the peak apex. Hence, calcCentroid="max" (or calcCentroid="algorithm"
which is usually the same) seems a good default for comparative reasons.
openms
The parameters directly map to the command line options for
MRMTransitionGroupPicker, please see
its
documentation.
minPeakWidth the minimum peak width, sets the min_peak_width option.
backgroundSubtraction the background subtraction method, sets the
-algorithm:background_subtraction option.
SGolayFrameLength the frame length for Savitzky-Golay smoothing, sets the
-algorithm:PeakPickerMRM:sgolay_frame_length option.
SGolayPolyOrder order of the polynomial, sets the
-algorithm:PeakPickerMRM:sgolay_polynomial_order option.
useGauss set to TRUE to use Gaussian smoothing (instead of Savitzky-Golay, sets the
-algorithm:PeakPickerMRM:use_gauss option.
gauss_width the Gaussian width, estimated peak size, sets the
-algorithm:PeakPickerMRM:gauss_width option.
SN signal to noise threshold, sets the -algorithm:PeakPickerMRM:signal_to_noise option.
SNWinLen SN window length, sets the -algorithm:PeakPickerMRM:sn_win_len option.
SNBinCount SN bin count, sets the -algorithm:PeakPickerMRM:sn_bin_count option.
method peak picking method, sets the -algorithm:PeakPickerMRM:method option.
integrationType the integration technique, sets the
-algorithm:PeakIntegrator:integration_type option.
baselineType the baseline type, sets the -algorithm:PeakIntegrator:baseline_type option.
fitEMG if TRUE then the EMG model is used for fitting, sets the
-algorithm:PeakIntegrator:fit_EMG option.
xcms3 and envipick
See the documentation for
xcms::peaksWithCentWave and enviPick::mzpick
for xcms3 and envipick, respectively.
piek
minIntensity the minimum intensity of a peak.
SN the signal to noise ratio.
peakWidth two-sized vector with the minimum and maximum peak width (seconds)
RTRange two-sized vector with the minimum and maximum retention time range (seconds). Set the
2nd element to Inf for no upper limit.
maxPeaksPerSignal upper threshold for consecutive maxima of similar size to be regarded as noise.
The peak detection used by algorithm="openms" is different than that of
findFeaturesOpenMS.
The patRoon.threads package option sets the number of threads for the piek algorithm.
Rost HL, Sachsenberg T, Aiche S, Bielow C, Weisser H, Aicheler F, Andreotti S, Ehrlich H, Gutenbrunner P, Kenar E, Liang X, Nahnsen S, Nilse L, Pfeuffer J, Rosenberger G, Rurik M, Schmitt U, Veit J, Walzer M, Wojnar D, Wolski WE, Schilling O, Choudhary JS, Malmstrom L, Aebersold R, Reinert K, Kohlbacher O (2016).
“OpenMS: a flexible open-source software platform for mass spectrometry data analysis.”
Nature Methods, 13(9), 741–748.
doi:10.1038/nmeth.3959.
pugixml (via
Rcpp) is used to process OpenMS XML output.
Eddelbuettel D (2013).
Seamless R and C++ Integration with Rcpp.
Springer, New York.
doi:10.1007/978-1-4614-6868-4.
ISBN 978-1-4614-6867-7.
Eddelbuettel D, Balamuta J (2018).
“Extending R with C++: A Brief Introduction to Rcpp.”
The American Statistician, 72(1), 28-36.
doi:10.1080/00031305.2017.1375990.
Eddelbuettel D, François R (2011).
“Rcpp: Seamless R and C++ Integration.”
Journal of Statistical Software, 40(8), 1–18.
doi:10.18637/jss.v040.i08.
Eddelbuettel D, Francois R, Allaire J, Ushey K, Kou Q, Russell N, Ucar I, Bates D, Chambers J (2026).
Rcpp: Seamless R and C++ Integration.
R package version 1.1.1-1.1, https://www.rcpp.org.
Dietrich C, Wick A, Ternes TA (2021).
“Open‐source feature detection for non‐target LC–MS analytics.”
Rapid Communications in Mass Spectrometry, 36(2).
ISSN 1097-0231.
doi:10.1002/rcm.9206.
http://dx.doi.org/10.1002/rcm.9206. Benton HP, Want EJ, Ebbels TMD (2010).
“Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data.”
BIOINFORMATICS, 26, 2488.
Louail P, Brunius C, Garcia-Aloy M, Kumler W, Storz N, Stanstrup J, Treutler H, Vangeenderhuysen P, Witting M, Neumann S, Rainer J (2025).
“xcms in Peak Form: Now Anchoring a Complete Metabolomics Data Preprocessing and Analysis Software Ecosystem.”
Analytical Chemistry.
doi:10.1021/acs.analchem.5c04338.
https://doi.org/10.1021/acs.analchem.5c04338.
Smith, C.A., Want, E.J., O'Maille, G., Abagyan,R., Siuzdak, G. (2006).
“XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching and identification.”
Analytical Chemistry, 78, 779–787.
Tautenhahn R, Boettcher C, Neumann S (2008).
“Highly sensitive feature detection for high resolution LC/MS.”
BMC Bioinformatics, 9, 504.
Parameters used by generateTPs with algorithms that use structural information.
getDefTPStructParams(...)getDefTPStructParams(...)
... |
optional named arguments that override defaults. |
The following parameters can be configured:
calcLogP A character specifying whether log P values should be calculated with
rcdk::get.xlogp (calcLogP="rcdk"),
OpenBabel (calcLogP="obabel") or not at all
(calcLogP="none"). The log P are values of parents and TPs are used for retention
order calculation.
forceCalcLogP Force calculation of Log P values, even if already provided by the TP generation
algorithm. This is primarily useful to obtain log P values that were consistently calculated with the same
algorithm, as some algorithms may only partially output these values (e.g. not for parents).
forceCalcRetDir Force calculation of retention order directions, even if
already provided by the TP generation algorithm. This is primarily intended for re-calculation of library TP data,
which may have been calculated with different log P values.
minLogPDiff The minimum difference in log P values between a parent and its TPs to be
considered eluting differently. This is used for retention order calculation.
calcSims If set to TRUE then structural similarities between the parent and its TPs are
calculated. The filter method can be used to threshold
structural similarities. This may be useful under the assumption that parents and TPs who have a high structural
similarity, also likely have a high MS/MS spectral similarity (which can be evaluated after componentization with
generateComponentsTPs).
fpType The type of structural fingerprint that should be calculated. See the type argument of
the get.fingerprint function of rcdk.
fpSimMethod The method for calculating similarities (i.e. not dissimilarity!). See the method
argument of the fp.sim.matrix function of the fingerprint package.
These parameters are passed as a named list as the TPStructParams argument to functions.
The getDefTPStructParams function generates such parameter list with defaults.
Guha R (2007).
“Chemical Informatics Functionality in R.”
Journal of Statistical Software, 18(6).
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011).
“Open Babel: An open chemical toolbox.”
Journal of Cheminformatics, 3(1).
doi:10.1186/1758-2946-3-33.
This function generates one or more EIC(s) for given retention time, m/z and optionally mobility ranges.
getEICs( analysisInfo, ranges, gapFactor = 3, output = "fill", minIntensityIMS = 25 )getEICs( analysisInfo, ranges, gapFactor = 3, output = "fill", minIntensityIMS = 25 )
analysisInfo |
A |
ranges |
A |
gapFactor |
A |
output |
Should be |
minIntensityIMS |
(IMS workflow) Raw intensity threshold for IMS data. This is primarily intended to speed up raw data processing. |
A list with for each analysis a list with EIC data for each of the rows in ranges.
If output="raw" then additional columns with e.g. mean-averaged and base peak m/z values for
each data point are returned. Furthermore, the allXValues attribute is set that can be used to obtain the
original retention time values to reconstruct the original complete chromatogram.
The raw data interface of patRoon is used by getEICs to
process HRMS (or IMS-HRMS) data. Please see its documentation for more information on the supported
formats and available configuration options.
Fold change calculation
getFCParams(replicates, ...)getFCParams(replicates, ...)
replicates |
A |
... |
Optional named arguments that override defaults. |
Fold change calculation can be used to easily identify significant changes between replicates. The
calculation process is configured through a parameter list, which can be constructed with the getFCParams
function. The parameter list has the following entries:
replicates the name of the two replicates to compare (taken from the replicates argument to
getFCParams).
thresholdFC: the threshold log FC for a feature group to be classified as increasing/decreasing.
thresholdPV: the threshold log P for a feature group to be significantly different.
zeroMethod,zeroValue: how to handle zero values when calculating the FC: add adds an
offset to zero values, "fixed" sets zero values to a fixed number and "omit" removes zero data. The
number that is added/set by the former two options is defined by zeroValue.
PVTestFunc: a function that is used to calculate P values (usually using t.test).
PVAdjFunc: a function that is used to adjust P values (usually using p.adjust)
The code to calculate and plot Fold change data was created by Bas van de Velde.
featureGroups-class and feature-plotting
Parameters that define how mobility or CCS values between e.g. features and suspects should be matched.
getIMSMatchParams(param, ...)getIMSMatchParams(param, ...)
param |
Should be |
... |
optional named arguments that override defaults. |
The following parameters should be defined:
param Should be "mobility" or "CCS" to match by mobility or CCS, respectively.
window,relative The window parameter sets the tolerance window size used for matching.
If relative=TRUE then the tolerance is relative (‘0-1’). The defaults for window are
(see limits):
defaultLim("mobility", "medium") (param="mobility" and relative=FALSE)
defaultLim("mobility", "medium_rel") (param="mobility" and relative=TRUE)
defaultLim("CCS", "medium") (param="CCS" and relative=FALSE)
defaultLim("CCS", "medium_rel") (param="CCS" and relative=TRUE)
minMatches The minimum number of mobility/CCS matches for a suspect hit. If the number of
available reference mobility/CCS values in the suspect list is less than minMatches, then that
number is used as threshold. Set to 0 to disable.
These parameters are passed as a named list as the IMSMatchParams argument to functions.
The getIMSMatchParams function generates such parameter list with defaults.
If negation is enabled with suspect filtering and minMatches>0, then the window match filter is
not negated. Negating both would lead to unexpected results, i.e. suspects outside window are
kept and increase the number of matched suspects as seen by the minMatches filter.
Parameters that define a range of mobility or CCS values.
getIMSRangeParams(param, lower, upper, mzRelative = FALSE)getIMSRangeParams(param, lower, upper, mzRelative = FALSE)
param, lower, upper, mzRelative
|
Arguments to specify the IMS range parameters, see Details. |
The following parameters are used to define an IMS range:
param Should be "mobility" or "CCS" to specify a mobility or CCS range,
respectively.
lower,upper The lower and upper range.
mzRelative Set to TRUE to specify an IMS range that is normalized by m/z.
These parameters are passed as a named list as the IMSRangeParams argument to functions.
The getIMSRangeParams function generates such parameter list with defaults.
Returns the supported file formats of MS files in patRoon.
getMSFileFormats(fileType = NULL)getMSFileFormats(fileType = NULL)
fileType |
The type of file for which formats should be returned (see |
Get supported MS file types
getMSFileTypes()getMSFileTypes()
Group equal features across analyses.
groupFeatures(obj, algorithm, ...) ## S4 method for signature 'features' groupFeatures(obj, algorithm, ..., verbose = TRUE) ## S4 method for signature 'data.frame' groupFeatures(obj, algorithm, ..., verbose = TRUE)groupFeatures(obj, algorithm, ...) ## S4 method for signature 'features' groupFeatures(obj, algorithm, ..., verbose = TRUE) ## S4 method for signature 'data.frame' groupFeatures(obj, algorithm, ..., verbose = TRUE)
obj |
Either a |
algorithm |
A |
... |
Further parameters passed to the selected grouping algorithm. |
verbose |
if |
After features have been found, the next step is to align and group them across analyses. This process is necessary to allow comparison of features between multiple analyses, which otherwise would be difficult due to small deviations in retention and mass data. Thus, algorithms of 'feature groupers' are used to collect features with similar retention and mass data. In addition, advanced retention time alignment algorithms exist to enhance grouping of features even with relative large retention time deviations (e.g. possibly observed from analyses collected over a long period). Like findFeatures, various algorithms are supported which may have many parameters that can be fine-tuned. This fine-tuning is likely to be necessary, since optimal settings often depend on applied methodology and instrumentation.
groupFeatures is a generic function that will groupFeatures by one of the supported algorithms. The actual
functionality is provided by algorithm specific functions such as groupFeaturesOpenMS and groupFeaturesXCMS3. While these
functions may be called directly, groupFeatures provides a generic interface and is therefore usually preferred.
The data.frame method for groupFeatures is a special case that currently only supports the
"sirius" algorithm.
An object of a class which is derived from featureGroups.
The featureGroups output class and its methods and the algorithm specific functions:
groupFeaturesOpenMS, groupFeaturesXCMS, groupFeaturesXCMS3, groupFeaturesKPIC2, groupFeaturesGreedy, groupFeaturesSIRIUS
Group features using a greedy algorithm that maximizes group scores based on retention time, m/z, mobility, and intensity similarities.
groupFeaturesGreedy(feat, ...) ## S4 method for signature 'features' groupFeaturesGreedy( feat, rtalign = FALSE, rtWindow = defaultLim("retention", "medium"), mzWindow = defaultLim("mz", "medium"), mobWindow = defaultLim("mobility", "medium"), scoreWeights = c(retention = 1, mz = 1, mobility = 1, intensity = 1), verbose = TRUE )groupFeaturesGreedy(feat, ...) ## S4 method for signature 'features' groupFeaturesGreedy( feat, rtalign = FALSE, rtWindow = defaultLim("retention", "medium"), mzWindow = defaultLim("mz", "medium"), mobWindow = defaultLim("mobility", "medium"), scoreWeights = c(retention = 1, mz = 1, mobility = 1, intensity = 1), verbose = TRUE )
feat |
The |
... |
Further parameters passed to the selected grouping algorithm. |
rtalign |
Not yet supported. Provided for consistency with other grouping methods. |
rtWindow, mzWindow, mobWindow
|
Numeric tolerances for retention time (seconds), m/z, and mobility,
respectively. The scoring terms are normalized to these values. Defaults to |
scoreWeights |
Numeric vector specifying the scoring weights. Should contain the following named elements:
|
verbose |
if |
This function uses greedy to group features. This function is called when calling groupFeatures with
algorithm="greedy".
The greedy algorithm is a simple feature grouping algorithm that can work with both HRMS and IMS-HRMS
data. The algorithm groups features by iteratively building the best possible groups. Features are processed in
order of decreasing intensity. For each feature, candidate groups are formed from all other (ungrouped) features
within the specified retention time, m/z and mobility windows. Each candidate group only contains a maximum
of one feature per analysis. The candidates are then scored and the group with the lowest overall variations in
retention time, m/z, mobility and replicate intensity is then selected. This process is repeated until all
features have been assigned to a group. The weights for each of the scoring terms can be configured.
An object of a class which is derived from featureGroups.
Any links between IMS precursors and IMS features are removed. This can occur e.g. when greedy
is used to generate a feature consensus from a post
mobility assignment workflow.
groupFeatures for more details and other algorithms.
Uses the the KPIC2 R package for grouping of features.
groupFeaturesKPIC2(feat, ...) ## S4 method for signature 'features' groupFeaturesKPIC2( feat, rtalign = TRUE, loadRawData = TRUE, groupArgs = list(tolerance = c(0.005, 12)), alignArgs = list(), verbose = TRUE ) ## S4 method for signature 'featuresSet' groupFeaturesKPIC2( feat, groupArgs = list(tolerance = c(0.005, 12)), verbose = TRUE )groupFeaturesKPIC2(feat, ...) ## S4 method for signature 'features' groupFeaturesKPIC2( feat, rtalign = TRUE, loadRawData = TRUE, groupArgs = list(tolerance = c(0.005, 12)), alignArgs = list(), verbose = TRUE ) ## S4 method for signature 'featuresSet' groupFeaturesKPIC2( feat, groupArgs = list(tolerance = c(0.005, 12)), verbose = TRUE )
feat |
The |
... |
Further parameters passed to the selected grouping algorithm. |
rtalign |
Set to |
loadRawData |
Set to |
groupArgs, alignArgs
|
Named |
verbose |
if |
This function uses KPIC2 to group features. This function is called when calling groupFeatures with
algorithm="kpic2".
Grouping of features and alignment of their retention times are performed with the
KPIC::PICset.group and KPIC::PICset.align
functions, respectively.
An object of a class which is derived from featureGroups.
loadRawData and arguments related to retention time alignment are currently not
supported for sets workflows.
Ji H, Zeng F, Xu Y, Lu H, Zhang Z (2017). “KPIC2: An Effective Framework for Mass Spectrometry-Based Metabolomics Using Pure Ion Chromatograms.” Analytical Chemistry, 89(14), 7631–7640. doi:10.1021/acs.analchem.7b01547.
groupFeatures for more details and other algorithms.
Group and align features with OpenMS tools
groupFeaturesOpenMS(feat, ...) ## S4 method for signature 'features' groupFeaturesOpenMS( feat, rtalign = TRUE, QT = FALSE, maxAlignRT = defaultLim("retention", "wide"), maxAlignMZ = defaultLim("mz", "medium"), maxGroupRT = defaultLim("retention", "medium"), maxGroupMZ = defaultLim("mz", "medium"), extraOptsRT = NULL, extraOptsGroup = NULL, verbose = TRUE )groupFeaturesOpenMS(feat, ...) ## S4 method for signature 'features' groupFeaturesOpenMS( feat, rtalign = TRUE, QT = FALSE, maxAlignRT = defaultLim("retention", "wide"), maxAlignMZ = defaultLim("mz", "medium"), maxGroupRT = defaultLim("retention", "medium"), maxGroupMZ = defaultLim("mz", "medium"), extraOptsRT = NULL, extraOptsGroup = NULL, verbose = TRUE )
feat |
The |
... |
Further parameters passed to the selected grouping algorithm. |
rtalign |
Set to |
QT |
If enabled, use |
maxAlignRT, maxAlignMZ
|
Used for retention alignment. Maximum retention time or m/z difference (seconds/Dalton)
for feature pairing. Sets |
maxGroupRT, maxGroupMZ
|
as |
extraOptsRT, extraOptsGroup
|
Named |
verbose |
if |
This function uses OpenMS to group features. This function is called when calling groupFeatures with
algorithm="openms".
Retention times may be aligned by the MapAlignerPoseClustering TOPP tool. Grouping is achieved by either the FeatureLinkerUnlabeled or FeatureLinkerUnlabeledQT TOPP tools.
An object of a class which is derived from featureGroups.
Rost HL, Sachsenberg T, Aiche S, Bielow C, Weisser H, Aicheler F, Andreotti S, Ehrlich H, Gutenbrunner P, Kenar E, Liang X, Nahnsen S, Nilse L, Pfeuffer J, Rosenberger G, Rurik M, Schmitt U, Veit J, Walzer M, Wojnar D, Wolski WE, Schilling O, Choudhary JS, Malmstrom L, Aebersold R, Reinert K, Kohlbacher O (2016).
“OpenMS: a flexible open-source software platform for mass spectrometry data analysis.”
Nature Methods, 13(9), 741–748.
doi:10.1038/nmeth.3959.
pugixml (via
Rcpp) is used to process OpenMS XML output.
Eddelbuettel D (2013).
Seamless R and C++ Integration with Rcpp.
Springer, New York.
doi:10.1007/978-1-4614-6868-4.
ISBN 978-1-4614-6867-7.
Eddelbuettel D, Balamuta J (2018).
“Extending R with C++: A Brief Introduction to Rcpp.”
The American Statistician, 72(1), 28-36.
doi:10.1080/00031305.2017.1375990.
Eddelbuettel D, François R (2011).
“Rcpp: Seamless R and C++ Integration.”
Journal of Statistical Software, 40(8), 1–18.
doi:10.18637/jss.v040.i08.
Eddelbuettel D, Francois R, Allaire J, Ushey K, Kou Q, Russell N, Ucar I, Bates D, Chambers J (2026).
Rcpp: Seamless R and C++ Integration.
R package version 1.1.1-1.1, https://www.rcpp.org.
groupFeatures for more details and other algorithms.
Uses SIRIUS to find and group features.
groupFeaturesSIRIUS(analysisInfo, verbose = TRUE)groupFeaturesSIRIUS(analysisInfo, verbose = TRUE)
analysisInfo |
A |
verbose |
if |
This function uses SIRIUS to group features. This function is called when calling groupFeatures with
algorithm="sirius".
Finding and grouping features is done by running the lcms-align command on every analyses at once.
For this reason, grouping feature data from other algorithms than SIRIUS is not supported.
The MS files should be in the ‘mzML’ or ‘mzXML’ format. Furthermore, this algorithms requires the presence of (data-dependent) MS/MS data.
The input MS data files need to be centroided. The convertMSFiles function can be used
to centroid data.
An object of a class which is derived from featureGroups.
Duhrkop K, Fleischauer M, Ludwig M, Aksenov AA, Melnik AV, Meusel M, Dorrestein PC, Rousu J, Bocker S (2019). “SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information.” Nature Methods, 16(4), 299–302. doi:10.1038/s41592-019-0344-8.
groupFeatures for more details and other algorithms.
Group and align features with the legacy xcmsSet function from the xcms package.
groupFeaturesXCMS(feat, ...) ## S4 method for signature 'features' groupFeaturesXCMS( feat, rtalign = TRUE, loadRawData = TRUE, groupArgs = list(mzwid = 0.015), retcorArgs = list(method = "obiwarp"), verbose = TRUE ) ## S4 method for signature 'featuresSet' groupFeaturesXCMS(feat, groupArgs = list(mzwid = 0.015), verbose = TRUE)groupFeaturesXCMS(feat, ...) ## S4 method for signature 'features' groupFeaturesXCMS( feat, rtalign = TRUE, loadRawData = TRUE, groupArgs = list(mzwid = 0.015), retcorArgs = list(method = "obiwarp"), verbose = TRUE ) ## S4 method for signature 'featuresSet' groupFeaturesXCMS(feat, groupArgs = list(mzwid = 0.015), verbose = TRUE)
feat |
The |
... |
Further parameters passed to the selected grouping algorithm. |
rtalign |
Set to |
loadRawData |
Set to |
groupArgs |
named |
retcorArgs |
named |
verbose |
if |
This function uses XCMS to group features. This function is called when calling groupFeatures with
algorithm="xcms".
Grouping of features and
alignment of their retention times are performed with the xcms::group and
xcms::retcor functions, respectively. Both functions have an extensive list of
parameters to modify their behavior and may therefore be used to potentially optimize results.
An object of a class which is derived from featureGroups.
loadRawData and arguments related to retention time alignment are currently not
supported for sets workflows.
Benton HP, Want EJ, Ebbels TMD (2010).
“Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data.”
BIOINFORMATICS, 26, 2488.
Louail P, Brunius C, Garcia-Aloy M, Kumler W, Storz N, Stanstrup J, Treutler H, Vangeenderhuysen P, Witting M, Neumann S, Rainer J (2025).
“xcms in Peak Form: Now Anchoring a Complete Metabolomics Data Preprocessing and Analysis Software Ecosystem.”
Analytical Chemistry.
doi:10.1021/acs.analchem.5c04338.
https://doi.org/10.1021/acs.analchem.5c04338.
Smith, C.A., Want, E.J., O'Maille, G., Abagyan,R., Siuzdak, G. (2006).
“XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching and identification.”
Analytical Chemistry, 78, 779–787.
Tautenhahn R, Boettcher C, Neumann S (2008).
“Highly sensitive feature detection for high resolution LC/MS.”
BMC Bioinformatics, 9, 504.
groupFeatures for more details and other algorithms.
Uses the new xcms3 interface from the xcms package to find features.
groupFeaturesXCMS3(feat, ...) ## S4 method for signature 'features' groupFeaturesXCMS3( feat, rtalign = TRUE, loadRawData = TRUE, groupParam = xcms::PeakDensityParam(sampleGroups = analysisInfo(feat)$replicate), preGroupParam = groupParam, retAlignParam = xcms::ObiwarpParam(), verbose = TRUE ) ## S4 method for signature 'featuresSet' groupFeaturesXCMS3( feat, groupParam = xcms::PeakDensityParam(sampleGroups = analysisInfo(feat)$replicate), verbose = TRUE )groupFeaturesXCMS3(feat, ...) ## S4 method for signature 'features' groupFeaturesXCMS3( feat, rtalign = TRUE, loadRawData = TRUE, groupParam = xcms::PeakDensityParam(sampleGroups = analysisInfo(feat)$replicate), preGroupParam = groupParam, retAlignParam = xcms::ObiwarpParam(), verbose = TRUE ) ## S4 method for signature 'featuresSet' groupFeaturesXCMS3( feat, groupParam = xcms::PeakDensityParam(sampleGroups = analysisInfo(feat)$replicate), verbose = TRUE )
feat |
The |
... |
Further parameters passed to the selected grouping algorithm. |
rtalign |
Set to |
loadRawData |
Set to |
groupParam, retAlignParam
|
parameter object that is directly passed to
|
preGroupParam |
grouping parameters applied when features are grouped prior to alignment (only with peak groups alignment). |
verbose |
if |
This function uses XCMS3 to group features. This function is called when calling groupFeatures with
algorithm="xcms3".
Grouping of features and alignment of their retention times are performed with the
xcms::groupChromPeaks and xcms::adjustRtime
functions, respectively. Both of these functions support an extensive amount of parameters that modify their
behavior and may therefore require optimization.
An object of a class which is derived from featureGroups.
loadRawData and arguments related to retention time alignment are currently not
supported for sets workflows.
Benton HP, Want EJ, Ebbels TMD (2010).
“Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data.”
BIOINFORMATICS, 26, 2488.
Louail P, Brunius C, Garcia-Aloy M, Kumler W, Storz N, Stanstrup J, Treutler H, Vangeenderhuysen P, Witting M, Neumann S, Rainer J (2025).
“xcms in Peak Form: Now Anchoring a Complete Metabolomics Data Preprocessing and Analysis Software Ecosystem.”
Analytical Chemistry.
doi:10.1021/acs.analchem.5c04338.
https://doi.org/10.1021/acs.analchem.5c04338.
Smith, C.A., Want, E.J., O'Maille, G., Abagyan,R., Siuzdak, G. (2006).
“XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching and identification.”
Analytical Chemistry, 78, 779–787.
Tautenhahn R, Boettcher C, Neumann S (2008).
“Highly sensitive feature detection for high resolution LC/MS.”
BMC Bioinformatics, 9, 504.
groupFeatures for more details and other algorithms.
Functions to estimate the identification confidence for suspects and annotation candidates.
estimateIDConfidence(obj, ...) ## S4 method for signature 'formulas' estimateIDConfidence( obj, absMzDev = defaultLim("mz", "medium"), normalizeScores = "max", IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"), logPath = NULL ) ## S4 method for signature 'compounds' estimateIDConfidence( obj, absMzDev = defaultLim("mz", "medium"), MSPeakLists = NULL, formulas = NULL, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), formulasNormalizeScores = "max", compoundsNormalizeScores = "max", IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"), logPath = NULL ) ## S4 method for signature 'featureGroupsScreening' estimateIDConfidence( obj, MSPeakLists = NULL, formulas = NULL, compounds = NULL, absMzDev = defaultLim("mz", "medium"), checkFragments = c("mz", "formula", "compound"), formulasNormalizeScores = "max", compoundsNormalizeScores = "max", IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"), logPath = file.path("log", "ident") ) ## S4 method for signature 'featureGroupsScreeningSet' estimateIDConfidence( obj, MSPeakLists = NULL, formulas = NULL, compounds = NULL, absMzDev = defaultLim("mz", "medium"), checkFragments = c("mz", "formula", "compound"), formulasNormalizeScores = "max", compoundsNormalizeScores = "max", IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"), logPath = file.path("log", "ident") ) ## S4 method for signature 'compoundsSet' estimateIDConfidence( obj, absMzDev = defaultLim("mz", "medium"), MSPeakLists = NULL, formulas = NULL, formulasNormalizeScores = "max", compoundsNormalizeScores = "max", IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"), logPath = NULL ) ## S4 method for signature 'formulasSet' estimateIDConfidence( obj, absMzDev = defaultLim("mz", "medium"), normalizeScores = "max", IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"), logPath = NULL ) numericIDLevel(level) genIDLevelRulesFile(out, inLevels = NULL, exLevels = NULL)estimateIDConfidence(obj, ...) ## S4 method for signature 'formulas' estimateIDConfidence( obj, absMzDev = defaultLim("mz", "medium"), normalizeScores = "max", IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"), logPath = NULL ) ## S4 method for signature 'compounds' estimateIDConfidence( obj, absMzDev = defaultLim("mz", "medium"), MSPeakLists = NULL, formulas = NULL, specSimParams = getDefSpecSimParams(removePrecursor = TRUE), formulasNormalizeScores = "max", compoundsNormalizeScores = "max", IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"), logPath = NULL ) ## S4 method for signature 'featureGroupsScreening' estimateIDConfidence( obj, MSPeakLists = NULL, formulas = NULL, compounds = NULL, absMzDev = defaultLim("mz", "medium"), checkFragments = c("mz", "formula", "compound"), formulasNormalizeScores = "max", compoundsNormalizeScores = "max", IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"), logPath = file.path("log", "ident") ) ## S4 method for signature 'featureGroupsScreeningSet' estimateIDConfidence( obj, MSPeakLists = NULL, formulas = NULL, compounds = NULL, absMzDev = defaultLim("mz", "medium"), checkFragments = c("mz", "formula", "compound"), formulasNormalizeScores = "max", compoundsNormalizeScores = "max", IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"), logPath = file.path("log", "ident") ) ## S4 method for signature 'compoundsSet' estimateIDConfidence( obj, absMzDev = defaultLim("mz", "medium"), MSPeakLists = NULL, formulas = NULL, formulasNormalizeScores = "max", compoundsNormalizeScores = "max", IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"), logPath = NULL ) ## S4 method for signature 'formulasSet' estimateIDConfidence( obj, absMzDev = defaultLim("mz", "medium"), normalizeScores = "max", IDFile = system.file("misc", "IDLevelRules.yml", package = "patRoon"), logPath = NULL ) numericIDLevel(level) genIDLevelRulesFile(out, inLevels = NULL, exLevels = NULL)
obj |
The object for which identification confidence should be estimated. |
... |
Method specific arguments. |
absMzDev |
Maximum absolute m/z deviation. |
normalizeScores, compoundsNormalizeScores, formulasNormalizeScores
|
A
|
IDFile |
A file path to a YAML file with rules used for estimation of identification levels. See the
|
logPath |
A directory path to store logging information. If |
MSPeakLists, formulas, compounds
|
Annotation data ( |
specSimParams |
A named |
checkFragments |
Which type(s) of MS/MS fragments from workflow data should be checked to evaluate the number of
suspect fragment matches (i.e. from the |
level |
The identification level to be converted. |
out |
The file path to the target file. |
inLevels, exLevels
|
A regular expression for the identification levels to include or exclude,
respectively. For instance, |
The estimateIDConfidence methods are used to estimate various properties to estimate the confidence of
identifications assigned to suspects and feature annotation candidates. These functions are typically executed after
running screenSuspects, generateFormulas and generateCompounds. Afterwards,
the following columns are added to the result tables (obtained with e.g. screenInfo,
annotations and as.data.table):
annSim The annotation similarity, defined as the similarity between the MS/MS peak list of a
feature with (a) only the peaks that were annotated and (b) all the peaks. Thus, a value of one means that all
MS/MS peaks were annotated. The similarity calculation is configured with the specSimParams argument to
estimateIDConfidence.
annSimForm The annotation similarity specifically for formula annotations (equaling the annSim
column from formula annotations). Only calculated for suspects and
compounds.
annSimBoth The annotation similarity calculated with the combined set of annotated MS/MS peaks from
formula and compound annotations. Only calculated for suspects and
compounds.
estIDLevel Provides an estimation of the identification level, roughly following that of
(Schymanski et al. 2014). However, please note that this value is only an estimation, and manual
interpretation is still necessary to assign final identification levels. The estimation is done through a set of
rules, see the Identification level rules section below.
In addition, the following columns are specifically added to suspect screening results:
annSimComp The annotation similarity specifically for compound annotations (this equals the
annSim column in compound annotations.
formRank,compRank The rank of the suspect within the formula/compound annotation results.
maxFrags The maximum number of MS/MS fragments that can be matched for this suspect (based on the
fragments_* columns from the suspect list).
maxFragMatches,maxFragMatchesRel The absolute and relative amount of experimental MS/MS peaks
that were matched from the fragments specified in the suspect list. The value for maxFragMatchesRel is
relative to the value for maxFrags. The calculation of this column is influenced by the
checkFragments argument to estimateIDConfidence.
The data for these columns is only calculated if estimateIDConfidence has the required data to do so. For
instance, annSimForm and formRank are only calculated if the formulas argument is set, and
levels for estIDLevel will be poor if no compound annotations are available.
numericIDLevel Extracts the numeric part of a given identification level (e.g. "3a"
becomes ‘3’).
genIDLevelRulesFile Generates a template YAML file that is used to configure the rules for automatic
estimation of identification levels. This file can then be used as input for estimateIDConfidence.
estimateIDConfidence amends the input object with aforementioned identification confidence properties.
The estimation of identification levels is configured through a YAML file which specifies the rules for each level. The default file is shown below.
1:
suspectFragments: 3
retention: 12
2a:
or:
- individualMoNAScore:
min: 0.9
higherThanNext: .inf
- libMatch:
min: 0.9
higherThanNext: .inf
rank:
max: 1
type: compound
3a:
or:
- individualMoNAScore: 0.7
- libMatch: 0.7
3b:
suspectFragments: 3
3c:
annMSMSSim:
type: compound
min: 0.7
4a:
annMSMSSim:
type: formula
min: 0.7
isoScore:
min: 0.5
higherThanNext: 0.2
rank:
max: 1
type: formula
4b:
isoScore:
min: 0.9
higherThanNext: 0.2
rank:
max: 1
type: formula
5:
all: yes
Most of the file should be self-explanatory. Some notes:
Each rule is either a field of suspectFragments (minimum number of MS/MS fragments matched from
suspect list), retention (maximum retention deviation from suspect list), rank (the maximum
annotation rank from formula or compound annotations), all (this level is always matched) or any of the
scorings available from the formula or compound annotations.
In case any of the rules could be applied to either formula or compound annotations, the annotation type must
be specified with the type field (formula or compound).
Identification levels should start with a number and may optionally be followed by a alphabetic character. The lowest levels are checked first.
If relative=yes then the relative scoring will be used for testing.
For suspectFragments: if the number of fragments from the suspect list (maxFrags column) is
less then the minimum rule value, the minimum is adjusted to the number of available fragments.
The or and and keywords can be used to combine multiple conditions.
Any conditions that require suspect data (e.g. suspectFragments) are only met with the suspects
method for estimateIDConfidence method.
A template rules file can be generated with the genIDLevelRulesFile function, and this file can
subsequently passed to estimateIDConfidence. The file format is highly flexible and (sub)levels can be added
or removed if desired. Note that the default file is currently only suitable when annotation is performed with
GenForm and MetFrag, for other algorithms it is crucial to modify the rules.
estimateIDConfidence performs its estimations per set. In addition, the
following overall (not set specific) columns are calculated:
formRank and compRank based on the ranking of the formula/compound in the set consensus data.
estIDLevel: based on the 'best' estimated identification level among the sets data (i.e. the
lowest). In case there is a tie between sub-levels (e.g. ‘3a’ and ‘3b’), then the sub-level is
stripped (e.g. ‘3’).
Annotation similarities: taken as the maximum value from the data for each set.
Rick Helmus <[email protected]>, Emma Schymanski <[email protected]> (contributions to identification level rules), Bas van de Velde (contributions to spectral similarity calculation).
Schymanski EL, Jeon J, Gulde R, Fenner K, Ruff M, Singer HP, Hollender J (2014).
“Identifying Small Molecules via High Resolution Mass Spectrometry: Communicating Confidence.”
Environmental Science and Technology, 48(4), 2097–2098.
doi:10.1021/es5002105.
Stein SE, Scott DR (1994).
“Optimization and testing of mass spectral library search algorithms for compound identification.”
Journal of the American Society for Mass Spectrometry, 5(9), 859–866.
doi:10.1016/1044-0305(94)87009-8.
Generic function to import feature groups produced by other software from files.
importFeatureGroups(input, type, ...)importFeatureGroups(input, type, ...)
input |
The input object or path that should be imported. See the algorithm specific functions for more details. |
type |
What type of data should be imported: |
... |
Further arguments passed to the selected import algorithm function. |
importFeatureGroups is a generic function that will import feature groups from files by one of the supported algorithms. The actual
functionality is provided by algorithm specific functions such as importFeatureGroupsXCMS3 and importFeatureGroupsTable. While these
functions may be called directly, importFeatureGroups provides a generic interface and is therefore usually preferred.
An object of a class which is derived from featureGroups.
The featureGroups output class and its methods and the algorithm specific functions:
importFeatureGroupsXCMS, importFeatureGroupsXCMS3, importFeatureGroupsKPIC2, importFeatureGroupsTable, importFeatureGroupsBrukerPA, importFeatureGroupsBrukerTASQ, importFeatureGroupsEnviMass
groupFeatures to group features.
Imports a 'bucket table' produced by Bruker ProfileAnalysis (PA)
importFeatureGroupsBrukerPA( input, feat, rtWindow = defaultLim("retention", "medium"), mzWindow = defaultLim("mz", "medium"), intWindow = 5, warn = TRUE )importFeatureGroupsBrukerPA( input, feat, rtWindow = defaultLim("retention", "medium"), mzWindow = defaultLim("mz", "medium"), intWindow = 5, warn = TRUE )
input |
The file path to a exported 'bucket table' ‘.txt’ file from PA. |
feat |
The |
rtWindow, mzWindow, intWindow
|
Search window values for retention time (seconds), m/z (Da) and intensity used to find back features within feature groups from PA (+/- the retention/mass/intensity value of a feature). |
warn |
Warn about missing or duplicate features when relating them back from grouped features. |
This function imports data from Bruker ProfileAnalysis. This function is called when calling importFeatureGroups with
type="brukerpa".
The 'bucket table' should be exported as ‘.txt’ file. Please note that this function only supports
features generated by findFeaturesBruker and it is crucial that DataAnalysis files remain
unchanged when features are collected and the bucket table is generated. Furthermore, please note that PA does not
retain information about originating features for generated buckets. For this reason, this function tries to find
back the original features and care must be taken to correctly specify search parameters (rtWindow,
mzWindow, intWindow).
An object of a class which is derived from featureGroups.
importFeatureGroups for more details and other algorithms.
Imports screening results from Bruker TASQ as feature groups.
importFeatureGroupsBrukerTASQ( input, analysisInfo, clusterRTWindow = defaultLim("retention", "medium") )importFeatureGroupsBrukerTASQ( input, analysisInfo, clusterRTWindow = defaultLim("retention", "medium") )
input |
The file path to an Excel export of the Global results table from TASQ, converted to ‘.csv’ format. |
analysisInfo |
A |
clusterRTWindow |
This retention time window (in seconds) is used to group hits across analyses together. See also the details section. |
This function imports data from Bruker TASQ. This function is called when calling importFeatureGroups with
type="brukertasq".
The feature groups across analyses are formed based on the name of suspects and their closeness in retention
time. The latter is necessary because TASQ does not necessarily perform checks on retention times and may therefore
assign a suspect to peaks with different retention times across analyses (or within a single analysis). Hence,
suspects with equal names are hierarchically clustered on their retention times (using fastcluster) to
form the feature groups. The cut-off value for this is specified by the clusterRTWindow argument. The input
for this function is obtained by generating an Excel export of the 'global' results and subsequently converting the
file to ‘.csv’ format.
A new featureGroups object containing converted screening results from Bruker TASQ.
This function uses estimated min/max values for retention times and dummy min/max m/z values for
conversion to features, since this information is not (readily) available. Hence, when plotting, for instance,
extracted ion chromatograms (with plotChroms) the integrated chromatographic peak range shown is
incorrect.
This function may use suspect names to base file names used for reporting, logging etc. Therefore, it is important that these are file-compatible names.
Müllner D (2013). “fastcluster: Fast Hierarchical, Agglomerative Clustering Routines for R and Python.” Journal of Statistical Software, 53(9), 1–18. doi:10.18637/jss.v053.i09.
importFeatureGroups for more details and other algorithms.
Imports a 'profiles' produced by enviMass.
importFeatureGroupsEnviMass(input, feat, positive)importFeatureGroupsEnviMass(input, feat, positive)
input |
The path of the enviMass project. |
feat |
The |
positive |
Whether data from positive ( |
This function imports data from enviMass. This function is called when calling importFeatureGroups with
type="envimass".
This function only imports 'raw' profiles, not any results from further componentization steps
performed in enviMass. Furthermore, this functionality has only been tested with older versions of
enviMass. Finally, please note that this function only supports features imported by
importFeaturesEnviMass (obviously, the same project should be used for both importing functions).
An object of a class which is derived from featureGroups.
importFeatureGroups for more details and other algorithms.
Imports feature groups from KPIC2
importFeatureGroupsKPIC2(input, analysisInfo)importFeatureGroupsKPIC2(input, analysisInfo)
input |
A grouped |
analysisInfo |
A |
This function imports data from KPIC2. This function is called when calling importFeatureGroups with
type="kpic2".
An object of a class which is derived from featureGroups.
Ji H, Zeng F, Xu Y, Lu H, Zhang Z (2017). “KPIC2: An Effective Framework for Mass Spectrometry-Based Metabolomics Using Pure Ion Chromatograms.” Analytical Chemistry, 89(14), 7631–7640. doi:10.1021/acs.analchem.7b01547.
importFeatureGroups for more details and other algorithms.
This function imports grouped features from a table, which can be either a data frame or a path to a file with
tabular data (e.g. ‘.csv’).
importFeatureGroupsTable( input, analysisInfo, addCols = NULL, groupAlgo, groupArgs = NULL )importFeatureGroupsTable( input, analysisInfo, addCols = NULL, groupAlgo, groupArgs = NULL )
input |
The input to be imported: either a |
analysisInfo |
A |
addCols |
Passed to |
groupAlgo, groupArgs
|
(sets workflow) The grouping algorithm and a |
This function imports data from a table. This function is called when calling importFeatureGroups with
type="table".
This function can be used to import feature group data from a table generated by the
as.data.table (and as.data.frame) function, or the output
from any other feature detection and grouping software. The column format mostly follows the format used by the
as.data.table function with the features argument set to TRUE.
The function first imports the feature data from the table using importFeaturesTable. Please see its
documentation for an overview of the columns that are expected in the input table.
In addition to the columns for feature data, the following columns are expected in the input table:
group: a string naming the feature group in which the feature is part of. Values in this column
should be unique per analysis. (mandatory)
group_ret,group_mz: the retention time (in seconds) and m/z assigned to the feature group.
Will be calculated from mean values of feature data if missing. (optional)
For data from an IMS workflow, the following additional columns are relevant:
group_mobility,group_CCS: the mobility and CCS assigned to the feature group. Will
be calculated from mean values of feature data if missing. (optional)
ims_precursor_group: a string naming the IMS precursor feature group of the feature group (NA
if none or not a IMS feature). (optional)
For data from a sets workflow, the following additional columns are relevant:
group_ion_mz-<set>,group_neutralMass,group_adduct-<set>: the ion m/z, neutral mass and
adduct assigned to the feature group. Columns should be present for each set (e.g.
ion_mz-positive). Missing columns will be calculated from feature data. NOTE If feature columns
are missing, then the group columns will be used to create them.
An object derived from the class featureGroupsSet (if the imported data is from a
sets workflow) or featureGroups object otherwise.
This function does not yet allow importing more advanced feature group properties, such as normalized intensities and predicted concentrations.
importFeatureGroups for more details and other algorithms.
importFeaturesTable for importing features from a table.
as.data.table for converting feature groups data to a
data.table format. groupFeatures to generate feature groups.
Imports feature groups from XCMS (old interface)
importFeatureGroupsXCMS(input, analysisInfo)importFeatureGroupsXCMS(input, analysisInfo)
input |
An |
analysisInfo |
A |
This function imports data from XCMS. This function is called when calling importFeatureGroups with
type="xcms".
An object of a class which is derived from featureGroups.
Benton HP, Want EJ, Ebbels TMD (2010).
“Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data.”
BIOINFORMATICS, 26, 2488.
Louail P, Brunius C, Garcia-Aloy M, Kumler W, Storz N, Stanstrup J, Treutler H, Vangeenderhuysen P, Witting M, Neumann S, Rainer J (2025).
“xcms in Peak Form: Now Anchoring a Complete Metabolomics Data Preprocessing and Analysis Software Ecosystem.”
Analytical Chemistry.
doi:10.1021/acs.analchem.5c04338.
https://doi.org/10.1021/acs.analchem.5c04338.
Smith, C.A., Want, E.J., O'Maille, G., Abagyan,R., Siuzdak, G. (2006).
“XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching and identification.”
Analytical Chemistry, 78, 779–787.
Tautenhahn R, Boettcher C, Neumann S (2008).
“Highly sensitive feature detection for high resolution LC/MS.”
BMC Bioinformatics, 9, 504.
importFeatureGroups for more details and other algorithms.
Imports feature groups from XCMS (new interface)
importFeatureGroupsXCMS3(input, analysisInfo)importFeatureGroupsXCMS3(input, analysisInfo)
input |
An |
analysisInfo |
A |
This function imports data from XCMS3. This function is called when calling importFeatureGroups with
type="xcms3".
An object of a class which is derived from featureGroups.
Benton HP, Want EJ, Ebbels TMD (2010).
“Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data.”
BIOINFORMATICS, 26, 2488.
Louail P, Brunius C, Garcia-Aloy M, Kumler W, Storz N, Stanstrup J, Treutler H, Vangeenderhuysen P, Witting M, Neumann S, Rainer J (2025).
“xcms in Peak Form: Now Anchoring a Complete Metabolomics Data Preprocessing and Analysis Software Ecosystem.”
Analytical Chemistry.
doi:10.1021/acs.analchem.5c04338.
https://doi.org/10.1021/acs.analchem.5c04338.
Smith, C.A., Want, E.J., O'Maille, G., Abagyan,R., Siuzdak, G. (2006).
“XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching and identification.”
Analytical Chemistry, 78, 779–787.
Tautenhahn R, Boettcher C, Neumann S (2008).
“Highly sensitive feature detection for high resolution LC/MS.”
BMC Bioinformatics, 9, 504.
importFeatureGroups for more details and other algorithms.
Generic function to import features produced by other software.
importFeatures(input, type, ...)importFeatures(input, type, ...)
input |
The input object or path that should be imported. See the algorithm specific functions for more details. |
type |
What type of data should be imported: |
... |
Further arguments passed to the selected import algorithm function. |
importFeatures is a generic function that will import features by one of the supported algorithms. The actual
functionality is provided by algorithm specific functions such as importFeaturesXCMS3 and importFeaturesTable. While these
functions may be called directly, importFeatures provides a generic interface and is therefore usually preferred.
An object of a class which is derived from features.
The features output class and its methods and the algorithm specific functions:
importFeaturesXCMS, importFeaturesXCMS3, importFeaturesKPIC2, importFeaturesTable, importFeaturesEnviMass
findFeatures to find new features.
Imports features from a project generated by the enviMass package.
importFeaturesEnviMass(input, analysisInfo)importFeaturesEnviMass(input, analysisInfo)
input |
The path of the enviMass project. |
analysisInfo |
A |
This function imports data from enviMass. This function is called when calling importFeatures with
type="envimass".
An object of a class which is derived from features.
This functionality has only been tested with older versions of enviMass.
importFeatures for more details and other algorithms.
Imports feature data generated by the KPIC2 package.
importFeaturesKPIC2(input, analysisInfo)importFeaturesKPIC2(input, analysisInfo)
input |
A |
analysisInfo |
A |
This function imports data from KPIC2. This function is called when calling importFeatures with
type="kpic2".
An object of a class which is derived from features.
Ji H, Zeng F, Xu Y, Lu H, Zhang Z (2017). “KPIC2: An Effective Framework for Mass Spectrometry-Based Metabolomics Using Pure Ion Chromatograms.” Analytical Chemistry, 89(14), 7631–7640. doi:10.1021/acs.analchem.7b01547.
importFeatures for more details and other algorithms.
This function imports features from a table, which can be either a data frame or a path to a file with tabular
data (e.g. ‘.csv’).
importFeaturesTable(input, analysisInfo, addCols = NULL)importFeaturesTable(input, analysisInfo, addCols = NULL)
input |
The input to be imported: either a |
analysisInfo |
A |
addCols |
A |
This function imports data from a table. This function is called when calling importFeatures with
type="table".
This function can be used to import features from a table generated by the
as.data.table (and as.data.frame) function, or the output from
any other feature detection software. The column format mostly follows the format used by the as.data.table
function.
The following columns must be present:
analysis: the analysis name (corresponding the the analysis information).
ret: the retention time of the feature (in seconds).
mz: the m/z value of the feature.
intensity: the peak intensity of the feature.
The following columns are optional, but usually recommended:
ID: the feature ID. Should be unique across the features from the same analysis. If not present, a
sequential ID is automatically generated.
area: the peak area of the feature. If not present, it is calculated as 2.5*intensity.
retmin and retmax: the minimum and maximum retention time range of the feature. If not present,
they are derived by subtraction or addition of ret by defaultLim("retention", "narrow") (see
limits).
mzmin and mzmax: the minimum and maximum m/z range of the feature. If not present, they are
derived by subtraction or addition of mz by defaultLim("mz", "narrow") (see limits).
If a mobility column is present, it is assumed that the features are from an IMS
workflow. In this case
mobility: specifies the mobility of the feature.
mobmin and mobmax: the minimum and maximum mobility range of the feature. If not present,
they are derived by subtraction or addition of mobility by defaultLim("mobility", "narrow") (see
limits). (optional)
mob_area and mob_intensity: the peak area and intensity of the peak in the mobilogram for the
feature. (optional)
ims_precursor_ID: the ID of the IMS precursor in a post mobility
assignment workflow. (optional)
mob_assign_method: a string that names the method used to assign the mobility to the feature.
(optional)
If a set column is present, it is assumed that the features are from a sets workflow.
In this case
set: specifies the set name in which the feature is present.
mz and ion_mz: specify the neutral mass and ionized m/z of the feature,
respectively.
adduct: specifies the adduct of the feature (generic textual format), e.g.
"[M+H]+".
An object derived from the class featuresSet (if the imported features are from a
sets workflow) or features object otherwise.
The "set" column in the analysis information is not used when importing
data (this column is present when obtaining the analysis information from a sets object with
analysisInfo).
importFeatures for more details and other algorithms.
importFeatureGroupsTable to import feature group data from a table.
as.data.table for converting features to a data.table format.
findFeatures to generate feature data.
Imports feature data generated with the legacy xcmsSet function from the xcms package.
importFeaturesXCMS(input, analysisInfo)importFeaturesXCMS(input, analysisInfo)
input |
An |
analysisInfo |
A |
This function imports data from XCMS. This function is called when calling importFeatures with
type="xcms".
An object of a class which is derived from features.
Benton HP, Want EJ, Ebbels TMD (2010).
“Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data.”
BIOINFORMATICS, 26, 2488.
Louail P, Brunius C, Garcia-Aloy M, Kumler W, Storz N, Stanstrup J, Treutler H, Vangeenderhuysen P, Witting M, Neumann S, Rainer J (2025).
“xcms in Peak Form: Now Anchoring a Complete Metabolomics Data Preprocessing and Analysis Software Ecosystem.”
Analytical Chemistry.
doi:10.1021/acs.analchem.5c04338.
https://doi.org/10.1021/acs.analchem.5c04338.
Smith, C.A., Want, E.J., O'Maille, G., Abagyan,R., Siuzdak, G. (2006).
“XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching and identification.”
Analytical Chemistry, 78, 779–787.
Tautenhahn R, Boettcher C, Neumann S (2008).
“Highly sensitive feature detection for high resolution LC/MS.”
BMC Bioinformatics, 9, 504.
importFeatures for more details and other algorithms.
Imports feature data generated from an existing XCMSnExp object generated by the xcms package.
importFeaturesXCMS3(input, analysisInfo)importFeaturesXCMS3(input, analysisInfo)
input |
An |
analysisInfo |
A |
This function imports data from XCMS3. This function is called when calling importFeatures with
type="xcms3".
An object of a class which is derived from features.
Benton HP, Want EJ, Ebbels TMD (2010).
“Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data.”
BIOINFORMATICS, 26, 2488.
Louail P, Brunius C, Garcia-Aloy M, Kumler W, Storz N, Stanstrup J, Treutler H, Vangeenderhuysen P, Witting M, Neumann S, Rainer J (2025).
“xcms in Peak Form: Now Anchoring a Complete Metabolomics Data Preprocessing and Analysis Software Ecosystem.”
Analytical Chemistry.
doi:10.1021/acs.analchem.5c04338.
https://doi.org/10.1021/acs.analchem.5c04338.
Smith, C.A., Want, E.J., O'Maille, G., Abagyan,R., Siuzdak, G. (2006).
“XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching and identification.”
Analytical Chemistry, 78, 779–787.
Tautenhahn R, Boettcher C, Neumann S (2008).
“Highly sensitive feature detection for high resolution LC/MS.”
BMC Bioinformatics, 9, 504.
importFeatures for more details and other algorithms.
Automatically installs the C3SDB Python package
installC3SDB(envname = "patRoon-C3SDB", clearEnv = FALSE, ...)installC3SDB(envname = "patRoon-C3SDB", clearEnv = FALSE, ...)
envname |
The name of the virtual |
clearEnv |
Set to |
... |
Further arguments passed to |
This function uses reticulate to install the C3SDB
Python package in a virtual environment.
Ross DH, Cho JH, Xu L (2020). “Breaking Down Structural Diversity for Comprehensive Prediction of Ion-Neutral Collision Cross Sections.” Analytical Chemistry, 92(6), 4548–4557. ISSN 1520-6882. doi:10.1021/acs.analchem.9b05772. http://dx.doi.org/10.1021/acs.analchem.9b05772.
Automatically installs the TIMSCONVERT Python package
installTIMSCONVERT(envname = "patRoon-TIMSCONVERT", clearEnv = FALSE, ...)installTIMSCONVERT(envname = "patRoon-TIMSCONVERT", clearEnv = FALSE, ...)
envname |
The name of the virtual Python environment to install |
clearEnv |
Set to |
... |
Further arguments passed to |
This function uses reticulate to install the TIMSCONVERT Python package in a virtual environment.
Luu GT, Freitas MA, Lizama-Chamu I, McCaughey CS, Sanchez LM, Wang M (2022). “TIMSCONVERT: a workflow to convert trapped ion mobility data to open data formats.” Bioinformatics, 38(16), 4046–4047. ISSN 1367-4811. doi:10.1093/bioinformatics/btac419. http://dx.doi.org/10.1093/bioinformatics/btac419.
Converts a features object to an KPIC object.
getPICSet(obj, ...) ## S4 method for signature 'features' getPICSet( obj, loadRawData = TRUE, IMS = FALSE, EICParams = getDefEICParams(window = 0) ) ## S4 method for signature 'featuresKPIC2' getPICSet(obj, ...)getPICSet(obj, ...) ## S4 method for signature 'features' getPICSet( obj, loadRawData = TRUE, IMS = FALSE, EICParams = getDefEICParams(window = 0) ) ## S4 method for signature 'featuresKPIC2' getPICSet(obj, ...)
obj |
The |
... |
Ignored |
loadRawData |
Set to |
IMS |
(IMS workflow) Specifies which feature groups are considered for export in IMS workflows. The following options are valid:
This should be kept |
EICParams |
A named |
The conversion process will introduce some dummy values for metadata not present in patRoon objects. If the
features object was generated with KPIC2 and no post mobility assignment was performed, then no
conversion is performed and the original KPIC2 object will be returned.
The raw data interface of patRoon is used by getPICSet to
process HRMS (or IMS-HRMS) data. Please see its documentation for more information on the supported
formats and available configuration options.
Launches a Shiny app for generating and plotting EICs and EIMs.
## S4 method for signature 'data.frame' launchEICGUI(obj, suspects = NULL, adduct = NULL) ## S4 method for signature 'features' launchEICGUI(obj) ## S4 method for signature 'featureGroups' launchEICGUI(obj)## S4 method for signature 'data.frame' launchEICGUI(obj, suspects = NULL, adduct = NULL) ## S4 method for signature 'features' launchEICGUI(obj) ## S4 method for signature 'featureGroups' launchEICGUI(obj)
obj |
For data.frame method: analysis information data.frame. For other methods: features/featureGroups object. |
suspects |
Optional suspect list data.frame (data.frame method only). |
adduct |
An |
These tools were originally developed as internal tools to debug and explore IMS related algorithms. They were mostly generated with LLMs, hence, may contain nonsense and bugs. Nevertheless, they may be useful for users to explore EICs and EIMs interactively. Please use with caution and report any issues.
A Shiny app object.
Get and configure default limits and tolerances used by patRoon.
defaultLim(category, level) getLimIMS() genLimitsFile(out = "limits.yml", IMS = "bruker")defaultLim(category, level) getLimIMS() genLimitsFile(out = "limits.yml", IMS = "bruker")
category, level
|
The category and level of the limit to be returned. See the detail sections below. For mobility
related limits, the |
out |
The full output path for the new file. |
IMS |
The IMS instrument type. This sets the |
Tolerances for retention times, m/z values and other numerical limits and tolerances are widely used in patRoon to process data. Their defaults used to be hardcoded directly as defaults to function arguments. Since version 3.0 the defaults are centralized in a ‘limits YAML’ file. This simplifies their configuration and makes it easier to switch the defaults between different HRMS instruments.
The limits configuration file is taken from one of the following locations (in order):
the path specified in the patRoon.path.limits option (if specified)
the ‘limits.yml’ file in the current working directory (if present)
the default ‘limits.yml’ file embedded in patRoon
defaultLim returns the limits for a specific category and tolerance level.
getLimIMS returns the type of IMS instrument specified in the limits configuration file. This is used
to determine which mobility limits to use, and is also used in some other functions to determine the default
behavior for IMS related processing.
genLimitsFile generates a new ‘limits.yml’ configuration file in the specified path. The file is
created with the defaults embedded in patRoon (see details below). Generating a custom limits is primarily
useful for non-Bruker IMS workflows.
The limits are configured in a simple ‘YAML’ file. A brief summary of the format is given here.
The general section has the IMS field which specifies the type of instrument used. Currently this
is either bruker or agilent.
The next sections describe absolute and relative (suffixed by _rel) limits and tolerances for retention,
m/z, mobility and CCS data. These are divided into several tolerance levels: very_narrow,
narrow, medium and wide. In general, the very narrow tolerances are used to compare
data that should be equivalent, but may be slightly different due to e.g. small rounding errors. The
narrow tolerances are generally sufficient when only small deviations are expected, e.g. comparing
different m/z values from well calibrated HRMS instruments. The medium tolerances are generally used
when a slightly larger tolerance is needed, e.g. when m/z values from raw (non-averaged) spectra. The
wide values are mainly used for plot limits, e.g. to provide a reasonable zoom-out. The sections only
define values for the tolerance levels actually used in patRoon.
The mobility_bruker and mobility_agilent sections specify the default mobility limits for Bruker and
Agilent systems, respectively. Which are used is set by the IMS variable in the general section.
To see which limits are used in which functions, please refer to the Usage section of the respective
functions, specifically how the defaultLim function is used to assign function argument defaults.
NOTE: the choice between using the narrow and medium tolerance is not always clear, and
there is some inconsistency in its use throughout patRoon (primarily due to legacy code or keeping defaults
from external algorithms).
general:
version: 1
IMS: bruker
retention:
very_narrow: 2
narrow: 6
medium: 12
wide: 30
mz:
very_narrow: 0.001
narrow: 0.002
medium: 0.005
wide: 0.02
narrow_rel: 5
medium_rel: 10
mobility_bruker:
very_narrow: 0.01
narrow: 0.02
medium: 0.04
wide: 0.2
medium_rel: 0.05
mobility_agilent:
very_narrow: 0.1
narrow: 0.2
medium: 0.4
wide: 2
medium_rel: 0.05
CCS:
medium: 10
medium_rel: 0.05
Most of the defaults were derived with Bruker TOF HRMS instrumentation in mind, but should be reasonable with minor adjustments for others.
Loads, parses, verifies and curates MS library data, e.g. obtained from MassBank.
loadMSLibrary(file, algorithm, ...)loadMSLibrary(file, algorithm, ...)
file |
A |
algorithm |
A character string describing the algorithm that should be
used: |
... |
Any parameters to be passed to the selected MS library loading algorithm. |
loadMSLibrary is a generic function that will loads MS library data by one of the supported algorithms. The actual
functionality is provided by algorithm specific functions such as loadMSLibraryMSP and loadMSLibraryMoNAJSON. While these
functions may be called directly, loadMSLibrary provides a generic interface and is therefore usually preferred.
A MSLibrary object containing the loaded library data.
The MSLibrary output class and its methods and the algorithm specific functions:
loadMSLibraryMSP, loadMSLibraryMoNAJSON
This function loads, verifies and curates MS library data from MoNA ‘.json’ files.
loadMSLibraryMoNAJSON( file, prefCalcChemProps = TRUE, neutralChemProps = FALSE, potAdducts = TRUE, potAdductsLib = TRUE, absMzDev = defaultLim("mz", "narrow"), calcSPLASH = TRUE )loadMSLibraryMoNAJSON( file, prefCalcChemProps = TRUE, neutralChemProps = FALSE, potAdducts = TRUE, potAdductsLib = TRUE, absMzDev = defaultLim("mz", "narrow"), calcSPLASH = TRUE )
file |
A |
prefCalcChemProps |
If |
neutralChemProps |
If |
potAdducts, potAdductsLib
|
If and how missing adducts (
|
absMzDev |
The maximum absolute m/z deviation when guessing missing adducts. |
calcSPLASH |
If set to |
This function uses an efficient C++ JSON loader to load MS library data. This function is called when calling loadMSLibrary with
algorithm="json".
This function uses C++ with Rcpp and rapidjsonr to efficiently load and parse
JSON files from MoNA. An advantage compared to
loadMSLibraryMSP is that this function supports loading spectral annotations.
The record field names are converted to those used in ‘.msp’ files.
The loaded data is returned in an MSLibrary object.
Several strategies are applied to automatically verify and improve
library data. This is important, since library records may have inconsistent or erroneous data, which makes them
unsuitable in automated workflows such as compounds annotation with generateCompoundsLibrary.
The loaded library data is post-treated as follows:
The DB# field is renamed to DB_ID to improve compatibility with R column names.
Synonyms (Synon fields) are merged together, mainly to save memory usage.
Inconsistently formatted NA data (e.g. "n/a", "N/A" or empty strings) are set to
regular R NA values.
The case of record field names are made consistent.
The Formula and ExactMass fields are renamed to formula and neutralMass,
respectively. This is for consistency with other data generated with patRoon.
character field data is trimmed from leading/trailing whitespace.
Mass data is verified to be properly numeric, and set to NA otherwise.
The format of formulae data is made consistent: ionic species (with or without square brackets) or converted to a regular formula format.
Chemical identifiers such as SMILES and formulae are verified and missing values are calculated if possible. See below for more details.
Shortened data in the Ion_mode field (P/N) is converted to the long format
(POSITIVE/NEGATIVE).
Many different adduct flavors typically found as Precursor_type data are converted and normalized to
the generic textual format used by patRoon (see as.adduct).
If potAdducts!=FALSE then missing or invalid adduct data in Precursor_type is guessed based on
the difference between the neutral and ionic mass. If multiple adducts explain the mass difference the result is
NA.
Missing ion m/z data (PrecursorMZ field) is calculated from adduct data, if possible.
Missing SPLASH data is calculated with the splashR package
if calcSPLASH=TRUE.
Chemical properties such as SMILES, InChIKey and formulae in the MS library are automatically validated and calculated if missing/invalid.
The internal validation/calculation process performs the following steps:
Validation of SMILES, InChI, InChIKey and formula data (if present). Invalid
entries will be set to NA.
If neutralChemProps=TRUE then chemical data (SMILES, formulae etc.) is neutralized by
(de-)protonation (using the --neutralized option of OpenBabel). An additional column
molNeutralized is added to mark those molecules that were neutralized. Note that neutralization requires
either SMILES or InChI data to be available.
The SMILES and InChI data are used to calculate missing or invalid SMILES,
InChI, InChIKey and formula data. If prefCalcChemProps=TRUE then existing
InChIKey and formula data is overwritten by calculated values whenever possible.
The chemical formulae which were not calculated are verified and normalized. This process may be time
consuming, and is potentially largely avoided by setting prefCalcChemProps=TRUE.
Neutral masses are calculated for missing values (prefCalcChemProps=FALSE) or whenever possible
(prefCalcChemProps=TRUE).
Note that calculation of formulae for molecules that are isotopically labelled is currently only supported for deuterium (2H) elements.
This functionality relies heavily on OpenBabel, please make sure it is installed.
Guessing adducts from neutral/ionic mass differences was inspired from MetFrag.
Wohlgemuth G, Mehta SS, Mejia RF, Neumann S, Pedrosa D, Pluskal T, Schymanski EL, Willighagen EL, Wilson M, Wishart DS, Arita M, Dorrestein PC, Bandeira N, Wang M, Schulze T, Salek RM, Steinbeck C, Nainala VC, Mistrik R, Nishioka T, Fiehn O (2016).
“SPLASH, a hashed identifier for mass spectra.”
Nature Biotechnology, 34(11), 1099–1101.
doi:10.1038/nbt.3689.
Ruttkies C, Schymanski EL, Wolf S, Hollender J, Neumann S (2016).
“MetFrag relaunched: incorporating strategies beyond in silico fragmentation.”
Journal of Cheminformatics, 8(1).
doi:10.1186/s13321-016-0115-9.
Eddelbuettel D (2013).
Seamless R and C++ Integration with Rcpp.
Springer, New York.
doi:10.1007/978-1-4614-6868-4.
ISBN 978-1-4614-6867-7.
Eddelbuettel D, Balamuta J (2018).
“Extending R with C++: A Brief Introduction to Rcpp.”
The American Statistician, 72(1), 28-36.
doi:10.1080/00031305.2017.1375990.
Eddelbuettel D, François R (2011).
“Rcpp: Seamless R and C++ Integration.”
Journal of Statistical Software, 40(8), 1–18.
doi:10.18637/jss.v040.i08.
Eddelbuettel D, Francois R, Allaire J, Ushey K, Kou Q, Russell N, Ucar I, Bates D, Chambers J (2026).
Rcpp: Seamless R and C++ Integration.
R package version 1.1.1-1.1, https://www.rcpp.org.
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011). “Open Babel: An open chemical toolbox.” Journal of Cheminformatics, 3(1). doi:10.1186/1758-2946-3-33.
loadMSLibrary for more details and other algorithms.
The MSLibrary documentation for various methods to post-process the data and
generateCompoundsLibrary for annotation of features with the library data.
This function loads, verifies and curates MS library data from MSP files.
loadMSLibraryMSP( file, parseComments = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, potAdducts = TRUE, potAdductsLib = TRUE, absMzDev = defaultLim("mz", "narrow"), calcSPLASH = TRUE )loadMSLibraryMSP( file, parseComments = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, potAdducts = TRUE, potAdductsLib = TRUE, absMzDev = defaultLim("mz", "narrow"), calcSPLASH = TRUE )
file |
A |
parseComments |
If |
prefCalcChemProps |
If |
neutralChemProps |
If |
potAdducts, potAdductsLib
|
If and how missing adducts (
|
absMzDev |
The maximum absolute m/z deviation when guessing missing adducts. |
calcSPLASH |
If set to |
This function uses an efficient C++ MSP loader to load MS library data. This function is called when calling loadMSLibrary with
algorithm="msp".
This function uses C++ with Rcpp to efficiently load and parse MSP files, and is mainly
optimized for loading the ‘.msp’ files from MassBank EU and
MoNA. Files from other sources may also work, any feedback on this is
welcome!
The loaded data is returned in an MSLibrary object.
Several strategies are applied to automatically verify and improve
library data. This is important, since library records may have inconsistent or erroneous data, which makes them
unsuitable in automated workflows such as compounds annotation with generateCompoundsLibrary.
The loaded library data is post-treated as follows:
The DB# field is renamed to DB_ID to improve compatibility with R column names.
Synonyms (Synon fields) are merged together, mainly to save memory usage.
Inconsistently formatted NA data (e.g. "n/a", "N/A" or empty strings) are set to
regular R NA values.
The case of record field names are made consistent.
The Formula and ExactMass fields are renamed to formula and neutralMass,
respectively. This is for consistency with other data generated with patRoon.
character field data is trimmed from leading/trailing whitespace.
Mass data is verified to be properly numeric, and set to NA otherwise.
The format of formulae data is made consistent: ionic species (with or without square brackets) or converted to a regular formula format.
Chemical identifiers such as SMILES and formulae are verified and missing values are calculated if possible. See below for more details.
Shortened data in the Ion_mode field (P/N) is converted to the long format
(POSITIVE/NEGATIVE).
Many different adduct flavors typically found as Precursor_type data are converted and normalized to
the generic textual format used by patRoon (see as.adduct).
If potAdducts!=FALSE then missing or invalid adduct data in Precursor_type is guessed based on
the difference between the neutral and ionic mass. If multiple adducts explain the mass difference the result is
NA.
Missing ion m/z data (PrecursorMZ field) is calculated from adduct data, if possible.
Missing SPLASH data is calculated with the splashR package
if calcSPLASH=TRUE.
Chemical properties such as SMILES, InChIKey and formulae in the MS library are automatically validated and calculated if missing/invalid.
The internal validation/calculation process performs the following steps:
Validation of SMILES, InChI, InChIKey and formula data (if present). Invalid
entries will be set to NA.
If neutralChemProps=TRUE then chemical data (SMILES, formulae etc.) is neutralized by
(de-)protonation (using the --neutralized option of OpenBabel). An additional column
molNeutralized is added to mark those molecules that were neutralized. Note that neutralization requires
either SMILES or InChI data to be available.
The SMILES and InChI data are used to calculate missing or invalid SMILES,
InChI, InChIKey and formula data. If prefCalcChemProps=TRUE then existing
InChIKey and formula data is overwritten by calculated values whenever possible.
The chemical formulae which were not calculated are verified and normalized. This process may be time
consuming, and is potentially largely avoided by setting prefCalcChemProps=TRUE.
Neutral masses are calculated for missing values (prefCalcChemProps=FALSE) or whenever possible
(prefCalcChemProps=TRUE).
Note that calculation of formulae for molecules that are isotopically labelled is currently only supported for deuterium (2H) elements.
This functionality relies heavily on OpenBabel, please make sure it is installed.
The mass spectrum parser currently only supports space separated entries (MSP formerly also allows other formats).
Guessing adducts from neutral/ionic mass differences was inspired from MetFrag.
Wohlgemuth G, Mehta SS, Mejia RF, Neumann S, Pedrosa D, Pluskal T, Schymanski EL, Willighagen EL, Wilson M, Wishart DS, Arita M, Dorrestein PC, Bandeira N, Wang M, Schulze T, Salek RM, Steinbeck C, Nainala VC, Mistrik R, Nishioka T, Fiehn O (2016).
“SPLASH, a hashed identifier for mass spectra.”
Nature Biotechnology, 34(11), 1099–1101.
doi:10.1038/nbt.3689.
Ruttkies C, Schymanski EL, Wolf S, Hollender J, Neumann S (2016).
“MetFrag relaunched: incorporating strategies beyond in silico fragmentation.”
Journal of Cheminformatics, 8(1).
doi:10.1186/s13321-016-0115-9.
Eddelbuettel D (2013).
Seamless R and C++ Integration with Rcpp.
Springer, New York.
doi:10.1007/978-1-4614-6868-4.
ISBN 978-1-4614-6867-7.
Eddelbuettel D, Balamuta J (2018).
“Extending R with C++: A Brief Introduction to Rcpp.”
The American Statistician, 72(1), 28-36.
doi:10.1080/00031305.2017.1375990.
Eddelbuettel D, François R (2011).
“Rcpp: Seamless R and C++ Integration.”
Journal of Statistical Software, 40(8), 1–18.
doi:10.18637/jss.v040.i08.
Eddelbuettel D, Francois R, Allaire J, Ushey K, Kou Q, Russell N, Ucar I, Bates D, Chambers J (2026).
Rcpp: Seamless R and C++ Integration.
R package version 1.1.1-1.1, https://www.rcpp.org.
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011). “Open Babel: An open chemical toolbox.” Journal of Cheminformatics, 3(1). doi:10.1186/1758-2946-3-33.
loadMSLibrary for more details and other algorithms.
The MSLibrary documentation for various methods to post-process the data and
generateCompoundsLibrary for annotation of features with the library data.
Initiate sets workflows from specified feature data.
makeSet(obj, ...) ## S4 method for signature 'features' makeSet(obj, ..., adducts, labels = NULL) ## S4 method for signature 'featuresSet' makeSet(obj, ...) ## S4 method for signature 'featureGroups' makeSet( obj, ..., groupAlgo, groupArgs = NULL, verbose = TRUE, adducts = NULL, labels = NULL ) ## S4 method for signature 'featureGroupsSet' makeSet(obj, ...)makeSet(obj, ...) ## S4 method for signature 'features' makeSet(obj, ..., adducts, labels = NULL) ## S4 method for signature 'featuresSet' makeSet(obj, ...) ## S4 method for signature 'featureGroups' makeSet( obj, ..., groupAlgo, groupArgs = NULL, verbose = TRUE, adducts = NULL, labels = NULL ) ## S4 method for signature 'featureGroupsSet' makeSet(obj, ...)
obj, ...
|
|
adducts |
The adduct assignments to each set. Should either be a For the |
labels |
The labels, or set names, for each set to be created. The order should follow that of the
objects given to the |
groupAlgo |
groupAlgo The name of the feature grouping algorithm. See the |
groupArgs |
A |
verbose |
If set to |
The makeSet method function is used to initiate a sets workflow. The features from
input objects are combined and then neutralized by replacing their m/z values by neutral monoisotopic
masses. After neutralization features measured with e.g. different ionization polarities can be grouped since
their neutral mass will be the same.
The analysis information for this object is updated with all analyses, and a set
column is added to designate the set of each analysis. Note that currently, all analyses names must be
unique across different sets.
makeSet supports two types of input:
features objects: makeSet combines the input objects into a featuresSet object,
which is then grouped in the 'usual way' with groupFeatures.
featureGroups objects: In this case the features from the input objects are first neutralized and
feature groups between sets are then combined with groupFeatures.
The advantage of the featureGroups method is that it preserves any adduct annotations already present
(e.g. as set by selectIons or adducts<-). Furthermore, this approach allows more advanced
workflows where the input featureGroups are first pre-treated with e.g. filter before the sets object
is made. On the other hand, the features method is easier, as it doesn't require intermediate feature grouping
steps and is often sufficient since adduct annotations can be made afterwards with selectIons/adducts<-
and most filter operations do not need to be done per individual set.
The adduct information used for feature neutralization is specified through the adducts argument.
Alternatively, when the featureGroups method of makeSet is used, then the adduct annotations already
present in the input objects can also by used by setting adducts=NULL. The adduct information is also used to
add adduct annotations to the output of makeSet.
Either a featuresSet object (features method) or featureGroupsSet object
(featureGroups method).
Initiating a sets workflow recursively, i.e. with featuresSet or featureGroupsSet objects
as input, is currently not supported.
Conversion of MS analysis files between several open and closed data formats.
getMSConversionTypes(algorithm, direction) getMSConversionFormats(algorithm, direction, type = NULL) convertMSFilesPWiz( inFiles, outFiles, formatTo = "mzML", centroid = TRUE, IMS = FALSE, minIntensity = 0, filters = NULL, extraOpts = NULL, PWizBatchSize = 1 ) convertMSFilesOpenMS(inFiles, outFiles, formatTo = "mzML", extraOpts = NULL) convertMSFilesBruker(inFiles, outFiles, formatTo = "mzML", centroid = TRUE) convertMSFilesIMSCollapse( inFiles, outFiles, typeFrom, formatTo = "mzML", mzRange = NULL, mobilityRange = NULL, smoothWindow = 0, halfWindow = 2, maxGap = 0.005, clusterMethod = "distance_mean", mzWindow = defaultLim("mz", "medium"), minIntensityIMS = 0, includeMSMS = FALSE, ... ) convertMSFilesTIMSCONVERT( inFiles, outFiles, formatTo = "mzML", centroid = TRUE, centroidRaw = FALSE, IMS = FALSE, extraOpts = NULL, virtualenv = "patRoon-TIMSCONVERT" ) convertMSFilesPaths( files, formatFrom, formatTo = "mzML", outPath = NULL, dirs = TRUE, overwrite = FALSE, algorithm = "pwiz", ... ) convertMSFiles( anaInfo, typeFrom = "raw", typeTo = "centroid", formatFrom, formatTo = "mzML", overwrite = FALSE, algorithm = "pwiz", centroidVendor = TRUE, ... )getMSConversionTypes(algorithm, direction) getMSConversionFormats(algorithm, direction, type = NULL) convertMSFilesPWiz( inFiles, outFiles, formatTo = "mzML", centroid = TRUE, IMS = FALSE, minIntensity = 0, filters = NULL, extraOpts = NULL, PWizBatchSize = 1 ) convertMSFilesOpenMS(inFiles, outFiles, formatTo = "mzML", extraOpts = NULL) convertMSFilesBruker(inFiles, outFiles, formatTo = "mzML", centroid = TRUE) convertMSFilesIMSCollapse( inFiles, outFiles, typeFrom, formatTo = "mzML", mzRange = NULL, mobilityRange = NULL, smoothWindow = 0, halfWindow = 2, maxGap = 0.005, clusterMethod = "distance_mean", mzWindow = defaultLim("mz", "medium"), minIntensityIMS = 0, includeMSMS = FALSE, ... ) convertMSFilesTIMSCONVERT( inFiles, outFiles, formatTo = "mzML", centroid = TRUE, centroidRaw = FALSE, IMS = FALSE, extraOpts = NULL, virtualenv = "patRoon-TIMSCONVERT" ) convertMSFilesPaths( files, formatFrom, formatTo = "mzML", outPath = NULL, dirs = TRUE, overwrite = FALSE, algorithm = "pwiz", ... ) convertMSFiles( anaInfo, typeFrom = "raw", typeTo = "centroid", formatFrom, formatTo = "mzML", overwrite = FALSE, algorithm = "pwiz", centroidVendor = TRUE, ... )
algorithm |
Either |
direction |
A |
type, typeFrom, typeTo
|
The type of the input or output files. See |
inFiles, outFiles
|
A |
centroid |
Set to For |
IMS |
How to handle IMS data. For For |
minIntensity |
The minimum intensity of the mass peaks to be kept. Applying an intensity threshold is especially beneficial to reduce export file size when there are a lot of zero or very low intensity mass peaks. NOTE this currently does not work well with IMS data. |
filters |
A |
extraOpts |
A |
PWizBatchSize |
The number of analyses to process by a single call to |
mzRange, mobilityRange
|
A two sized vector specifying the m/z and mobility range to be exported, respectively.
Set to |
smoothWindow, halfWindow, maxGap
|
Centroiding parameters: see |
clusterMethod, mzWindow
|
The clustering method and window (see clustering parameters) used to find and combine MS/MS spectra of precursors with close m/z. |
minIntensityIMS |
The minimum intensity for MS peaks in raw data. |
includeMSMS |
Set to |
... |
For For |
centroidRaw |
Only applicable if |
virtualenv |
The virtual Python environment in which |
files, dirs
|
The |
formatFrom, formatTo
|
The input or output format. See |
outPath |
A character vector specifying directories that should be used for the output. Will be re-cycled if
necessary. If |
overwrite |
Should existing destination file be overwritten ( |
anaInfo |
An analysis info table that is used to retrieve the input files. The
paths set by |
centroidVendor |
Only for |
getMSConversionTypes returns a character with all supported input or output conversion types
for an algorithm.
getMSConversionFormats returns a character with all supported input or output conversion
formats for an algorithm, optionally filtered by the given type.
convertMSFilesPWiz converts and pre-treats HRMS data with the msconvert tool from
ProteoWizard.
convertMSFilesOpenMS converts HRMS data with the FileConvert tool of
OpenMS.
convertMSFilesBruker converts and pre-treats Bruker HRMS data with Bruker DataAnalysis. Note that
TIMS data currently is not supported.
convertMSFilesIMSCollapse is used to convert IMS data to data that mimics 'regular' HRMS data by
collapsing the IMS dimension. The raw data interface of patRoon first sums up all spectra within each IMS
frame, performs centroiding and finally exports the resulting data with the
mzR::writeMSData function. Several thresholds can be set to speed up the conversion
process and reduce noise, but care should be taken that no mass peaks of interest are lost.
convertMSFilesTIMSCONVERT converts and pre-treats TIMS data with
TIMSCONVERT. The installTIMSCONVERT function can be used
to automatically install TIMSCONVERT.
convertMSFilesPaths is a wrapper function that simplifies the use of algorithm specific MS conversion
functions, such as convertMSFilesPWiz, and convertMSFilesTIMSCONVERT.
convertMSFiles is a wrapper function that simplifies the use of convertMSFilesPaths.
convertMSFilesPWiz, convertMSFilesOpenMS and convertMSFilesTIMSCONVERT uses multiprocessing to parallelize
computations. Please see the parallelization section in the handbook for
more details and patRoon options for configuration
options.
The raw data interface of patRoon is used by convertMSFilesIMSCollapse to
process HRMS (or IMS-HRMS) data. Please see its documentation for more information on the supported
formats and available configuration options.
Rost HL, Sachsenberg T, Aiche S, Bielow C, Weisser H, Aicheler F, Andreotti S, Ehrlich H, Gutenbrunner P, Kenar E, Liang X, Nahnsen S, Nilse L, Pfeuffer J, Rosenberger G, Rurik M, Schmitt U, Veit J, Walzer M, Wojnar D, Wolski WE, Schilling O, Choudhary JS, Malmstrom L, Aebersold R, Reinert K, Kohlbacher O (2016).
“OpenMS: a flexible open-source software platform for mass spectrometry data analysis.”
Nature Methods, 13(9), 741–748.
doi:10.1038/nmeth.3959.
Chambers MC, Maclean B, Burke R, Amodei D, Ruderman DL, Neumann S, Gatto L, Fischer B, Pratt B, Egertson J, Hoff K, Kessner D, Tasman N, Shulman N, Frewen B, Baker TA, Brusniak M, Paulse C, Creasy D, Flashner L, Kani K, Moulding C, Seymour SL, Nuwaysir LM, Lefebvre B, Kuhlmann F, Roark J, Rainer P, Detlev S, Hemenway T, Huhmer A, Langridge J, Connolly B, Chadick T, Holly K, Eckels J, Deutsch EW, Moritz RL, Katz JE, Agus DB, MacCoss M, Tabb DL, Mallick P (2012).
“A cross-platform toolkit for mass spectrometry and proteomics.”
Nature Biotechnology, 30(10), 918–920.
doi:10.1038/nbt.2377.
Luu GT, Freitas MA, Lizama-Chamu I, McCaughey CS, Sanchez LM, Wang M (2022).
“TIMSCONVERT: a workflow to convert trapped ion mobility data to open data formats.”
Bioinformatics, 38(16), 4046–4047.
ISSN 1367-4811.
doi:10.1093/bioinformatics/btac419.
http://dx.doi.org/10.1093/bioinformatics/btac419.
Chambers, C. M, Maclean, Brendan, Burke, Robert, Amodei, Dario, Ruderman, L. D, Neumann, Steffen, Gatto, Laurent, Fischer, Bernd, Pratt, Brian, Egertson, Jarrett, Hoff, Katherine, Kessner, Darren, Tasman, Natalie, Shulman, Nicholas, Frewen, Barbara, Baker, A. T, Brusniak, Mi-Youn, Paulse, Christopher, Creasy, David, Flashner, Lisa, Kani, Kian, Moulding, Chris, Seymour, L. S, Nuwaysir, M. L, Lefebvre, Brent, Kuhlmann, Frank, Roark, Joe, Rainer, Paape, Detlev, Suckau, Hemenway, Tina, Huhmer, Andreas, Langridge, James, Connolly, Brian, Chadick, Trey, Holly, Krisztina, Eckels, Josh, Deutsch, W. E, Moritz, L. R, Katz, E. J, Agus, B. D, MacCoss, Michael, Tabb, L. D, Mallick, Parag (2012).
“A cross-platform toolkit for mass spectrometry and proteomics.”
Nat Biotech, 30(10), 918–920.
doi:10.1038/nbt.2377.
http://dx.doi.org/10.1038/nbt.2377.
Keller A, Eng J, Zhang N, Li X, Aebersold R (2005).
“A uniform proteomics MS/MS analysis platform utilizing open XML file formats.”
Mol Syst Biol.
Kessner D, Chambers M, Burke R, Agus D, Mallick P (2008).
“ProteoWizard: open source software for rapid proteomics tools
development.”
Bioinformatics, 24(21), 2534–2536.
doi:10.1093/bioinformatics/btn323.
Martens L, Chambers M, Sturm M, Kessner D, Levander F, Shofstahl J, Tang WH, Rompp A, Neumann S, Pizarro AD, Montecchi-Palazzi L, Tasman N, Coleman M, Reisinger F, Souda P, Hermjakob H, Binz P, Deutsch EW (2010).
“mzML - a Community Standard for Mass Spectrometry Data.”
Mol Cell Proteomics.
doi:10.1074/mcp.R110.000133.
Pedrioli PGA, Eng JK, Hubley R, Vogelzang M, Deutsch EW, Raught B, Pratt B, Nilsson E, Angeletti RH, Apweiler R, Cheung K, Costello CE, Hermjakob H, Huang S, Julian RK, Kapp E, McComb ME, Oliver SG, Omenn G, Paton NW, Simpson R, Smith R, Taylor CF, Zhu W, Aebersold R (2004).
“A common open representation of mass spectrometry data and its
application to proteomics research.”
Nat Biotechnol, 22(11), 1459–1466.
doi:10.1038/nbt1031.
Description, configuration and utilities for the raw (IMS-)HRMS data interface of patRoon.
availableBackends( anaInfo = NULL, needTypes = NULL, checkOption = TRUE, verbose = TRUE )availableBackends( anaInfo = NULL, needTypes = NULL, checkOption = TRUE, verbose = TRUE )
anaInfo |
Optional. If not |
needTypes |
Only applicable if |
checkOption |
Set to |
verbose |
Set to |
Version 3.0 of patRoon introduced an extensible and highly optimized interface to read raw data from HRMS and
IMS-HRMS instruments. This interface supports chooseable 'backends' which perform the reading of file data from
various formats. Subsequent steps such as the formation of extracted ion chromatograms, mobilograms and collection
and averaging of mass spectra are then performed in patRoon. The interface is largely coded in C++ (using
Rcpp), uses OpenMP parallelization and applies several other optimization
strategies to make it suitable to rapidly process large amounts of raw data, e.g. as encountered in IMS-HRMS
workflow.
The following backends for reading (IMS-)HRMS data are currently available:
"opentims": uses the OpenTIMS library to read Bruker
TIMS data. This backends supports very fast reading of raw instrument ‘.d’ data files directly, and therefore
does not require any file conversions. This backend only supports 64 bit Windows and Linux systems. See the
Backend installation section below installation instructions.
"mzr": uses the mzR package to read ‘.mzML’ and ‘.mzXML’ files. This package was
more or less the default in patRoon prior to 3.0, and due to its popularity and age is a stable and well
tested option. The mzr backend currently reads the complete analysis file at once, which makes it more RAM
intensive compared to other backends. The read data is cached to speed up any subsequent operations that require
the file data. This backend currently does not support IMS data. Since mzR is a dependency of patRoon,
no additional installation is necessary.
"mstoolkit": Uses the MSToolKit C++ library to read
‘.mzML’ and ‘.mzXML’ files, including IMS-HRMS data. The MSToolKit library has been developed
for many years, and was recently updated with IMS-HRMS support. See the Backend installation section below
installation instructions.
"streamcraft": Uses the StreamCraft C++ library to
read ‘.mzML’ and ‘.mzXML’ files, including IMS-HRMS data. The StreamCraft library is still
young and somewhat experimental. The library is integrated within patRoon and therefore does not require any
further installation.
The availableBackends function is used to query the available backends on the system.
availableBackends returns (invisibly) a character vector with the names of the available
backends.
The following package options influence the behavior of raw data interface:
patRoon.MS.backends: A character vector with the names of the backends that may be choosen. The
default is all backends. The first backend will be chosen that is available, is able to read at least one of the
available analysis file types and formats (as configured by the analysis information)
and supports IMS if needed.
patRoon.MS.preferIMS: A logical value that indicates whether the IMS data should be preferred,
even if the processing step does not require IMS data and non-IMS data is also available. Setting this to
TRUE probably result in some additional computational overhead, but may avoid any inconsistencies between
the IMS data and non-IMS data that may have been introduced during the conversion step of the latter. This option
is only relevant for the mstoolkit and streamcraft backends (and if one of these backends is actually
used).
patRoon.threads: An integer value that indicates the number of threads to use for
parallelization (multithreading). The default is determined from the number of physical cores of the system
(obtained with the parallel::detectCores function).
patRoon.path.TDFSDK: The file path to the Bruker TDF-SDK library file. See the
Backend installation section below.
The opentims backend requires the ‘win64/timsdata.dll’ (Windows) or
‘linux64/libtimsdata.so’ (Linux) file from the TDF-SDK from
Bruker
(requires login). The patRoonExt package makes these files automatically available for patRoon.
Otherwise the patRoon.path.TDFSDK option should be manually set to the file path of the
‘timsdata.dll’ or ‘linux64/libtimsdata.so’ file.
When patRoon is installed from source, e.g. on Linux/macOS systems or when using
remotes::install_github for installation, then the
https://github.com/rickhelmus/Rmstoolkitlib R package must be
installed in advance.
The availableBackends function can be used to verify if the dependencies for these backends are met.
Eddelbuettel D (2013).
Seamless R and C++ Integration with Rcpp.
Springer, New York.
doi:10.1007/978-1-4614-6868-4.
ISBN 978-1-4614-6867-7.
Eddelbuettel D, Balamuta J (2018).
“Extending R with C++: A Brief Introduction to Rcpp.”
The American Statistician, 72(1), 28-36.
doi:10.1080/00031305.2017.1375990.
Eddelbuettel D, François R (2011).
“Rcpp: Seamless R and C++ Integration.”
Journal of Statistical Software, 40(8), 1–18.
doi:10.18637/jss.v040.i08.
Eddelbuettel D, Francois R, Allaire J, Ushey K, Kou Q, Russell N, Ucar I, Bates D, Chambers J (2026).
Rcpp: Seamless R and C++ Integration.
R package version 1.1.1-1.1, https://www.rcpp.org.
Dagum L, Menon R (1998).
“OpenMP: an industry standard API for shared-memory programming.”
IEEE Computational Science and Engineering, 5(1), 46-55.
doi:10.1109/99.660313.
Łącki MK, Startek MP, Brehmer S, Distler U, Tenzer S (2021).
“OpenTIMS, TimsPy, and TimsR: Open and Easy Access to timsTOF Raw Data.”
Journal of Proteome Research, 20(4), 2122–2129.
ISSN 1535-3907.
doi:10.1021/acs.jproteome.0c00962.
http://dx.doi.org/10.1021/acs.jproteome.0c00962.
Chambers, C. M, Maclean, Brendan, Burke, Robert, Amodei, Dario, Ruderman, L. D, Neumann, Steffen, Gatto, Laurent, Fischer, Bernd, Pratt, Brian, Egertson, Jarrett, Hoff, Katherine, Kessner, Darren, Tasman, Natalie, Shulman, Nicholas, Frewen, Barbara, Baker, A. T, Brusniak, Mi-Youn, Paulse, Christopher, Creasy, David, Flashner, Lisa, Kani, Kian, Moulding, Chris, Seymour, L. S, Nuwaysir, M. L, Lefebvre, Brent, Kuhlmann, Frank, Roark, Joe, Rainer, Paape, Detlev, Suckau, Hemenway, Tina, Huhmer, Andreas, Langridge, James, Connolly, Brian, Chadick, Trey, Holly, Krisztina, Eckels, Josh, Deutsch, W. E, Moritz, L. R, Katz, E. J, Agus, B. D, MacCoss, Michael, Tabb, L. D, Mallick, Parag (2012).
“A cross-platform toolkit for mass spectrometry and proteomics.”
Nat Biotech, 30(10), 918–920.
doi:10.1038/nbt.2377.
http://dx.doi.org/10.1038/nbt.2377.
Keller A, Eng J, Zhang N, Li X, Aebersold R (2005).
“A uniform proteomics MS/MS analysis platform utilizing open XML file formats.”
Mol Syst Biol.
Kessner D, Chambers M, Burke R, Agus D, Mallick P (2008).
“ProteoWizard: open source software for rapid proteomics tools
development.”
Bioinformatics, 24(21), 2534–2536.
doi:10.1093/bioinformatics/btn323.
Martens L, Chambers M, Sturm M, Kessner D, Levander F, Shofstahl J, Tang WH, Rompp A, Neumann S, Pizarro AD, Montecchi-Palazzi L, Tasman N, Coleman M, Reisinger F, Souda P, Hermjakob H, Binz P, Deutsch EW (2010).
“mzML - a Community Standard for Mass Spectrometry Data.”
Mol Cell Proteomics.
doi:10.1074/mcp.R110.000133.
Pedrioli PGA, Eng JK, Hubley R, Vogelzang M, Deutsch EW, Raught B, Pratt B, Nilsson E, Angeletti RH, Apweiler R, Cheung K, Costello CE, Hermjakob H, Huang S, Julian RK, Kapp E, McComb ME, Oliver SG, Omenn G, Paton NW, Simpson R, Smith R, Taylor CF, Zhu W, Aebersold R (2004).
“A common open representation of mass spectrometry data and its
application to proteomics research.”
Nat Biotechnol, 22(11), 1459–1466.
doi:10.1038/nbt1031.
Stores the spectra and metadata from the records of an MS library.
records(obj) spectra(obj) ## S4 method for signature 'MSLibrary' records(obj) ## S4 method for signature 'MSLibrary' spectra(obj) ## S4 method for signature 'MSLibrary' length(x) ## S4 method for signature 'MSLibrary' names(x) ## S4 method for signature 'MSLibrary' show(object) ## S4 method for signature 'MSLibrary,ANY,missing,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'MSLibrary,ANY,missing' x[[i, j]] ## S4 method for signature 'MSLibrary' x$name ## S4 method for signature 'MSLibrary' as.data.table(x) ## S4 method for signature 'MSLibrary' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'MSLibrary' filter( obj, properties = NULL, massRange = NULL, mzRangeSpec = NULL, relMinIntensity = NULL, topMost = NULL, onlyAnnotated = FALSE, negate = FALSE ) ## S4 method for signature 'MSLibrary' convertToSuspects( obj, adduct, spectrumType = "MS2", avgSpecParams = getDefAvgPListParams(minIntensityPre = 0, minIntensityPost = 2, topMost = 10), collapse = TRUE, suspects = NULL, prefCalcChemProps = TRUE, neutralChemProps = FALSE ) ## S4 method for signature 'MSLibrary' export(obj, type = "msp", out) ## S4 method for signature 'MSLibrary,MSLibrary' merge(x, y, ...)records(obj) spectra(obj) ## S4 method for signature 'MSLibrary' records(obj) ## S4 method for signature 'MSLibrary' spectra(obj) ## S4 method for signature 'MSLibrary' length(x) ## S4 method for signature 'MSLibrary' names(x) ## S4 method for signature 'MSLibrary' show(object) ## S4 method for signature 'MSLibrary,ANY,missing,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'MSLibrary,ANY,missing' x[[i, j]] ## S4 method for signature 'MSLibrary' x$name ## S4 method for signature 'MSLibrary' as.data.table(x) ## S4 method for signature 'MSLibrary' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'MSLibrary' filter( obj, properties = NULL, massRange = NULL, mzRangeSpec = NULL, relMinIntensity = NULL, topMost = NULL, onlyAnnotated = FALSE, negate = FALSE ) ## S4 method for signature 'MSLibrary' convertToSuspects( obj, adduct, spectrumType = "MS2", avgSpecParams = getDefAvgPListParams(minIntensityPre = 0, minIntensityPost = 2, topMost = 10), collapse = TRUE, suspects = NULL, prefCalcChemProps = TRUE, neutralChemProps = FALSE ) ## S4 method for signature 'MSLibrary' export(obj, type = "msp", out) ## S4 method for signature 'MSLibrary,MSLibrary' merge(x, y, ...)
x, obj, object
|
|
i |
For |
... |
Unused. |
drop, j
|
ignored. |
name |
The record name (partially matched). |
properties |
A named |
massRange |
Records with a neutral mass outside this range will be removed. Should be a two-sized |
mzRangeSpec |
Similar to the |
relMinIntensity |
The minimum relative intensity (‘0-1’) of a mass peak to be kept. Set to |
topMost |
Only keep |
onlyAnnotated |
If |
negate |
If |
adduct |
An |
spectrumType |
A |
avgSpecParams |
A |
collapse |
Whether records with the same first-block InChIKey should be collapsed. See the
|
suspects |
If not |
prefCalcChemProps |
If |
neutralChemProps |
If |
type |
The export type. Currently just |
out |
The file path to the output library file. |
y |
The |
This class is used by loadMSLibrary to store the loaded MS library data.
delete returns the object for which the specified data was removed.
filter returns a filtered MSLibrary object.
convertToSuspects return a suspect list (data.table), which can be used with
screenSuspects.
merge returns a merged MSLibrary object.
records(MSLibrary): Accessor method for the records slot of an MSLibrary class.
spectra(MSLibrary): Accessor method for the spectra slot of an MSLibrary class.
length(MSLibrary): Obtains the total number of records stored.
names(MSLibrary): Obtains the names of the stored records (DB_ID field).
show(MSLibrary): Shows summary information for this object.
x[i: Subset on records.
x[[i: Extracts a spectrum table for a record.
$: Extracts a spectrum table for a record.
as.data.table(MSLibrary): Converts all the data (spectra and metadata) to a single data.table.
delete(MSLibrary): Completely deletes specified full records or spectra.
filter(MSLibrary): Performs rule-based filtering of records and spectra. This may be especially to improve
annotation with generateCompoundsLibrary.
convertToSuspects(MSLibrary): Converts the MS library data to a suspect list, which can be used with
screenSuspects. See the Suspect conversion section for details.
export(MSLibrary): Exports the library data to a ‘.msp’ file. The export is accelerated by an C++
interface with Rcpp.
merge(x = MSLibrary, y = MSLibrary): Merges two MSLibrary objects (x and y). The records from y that are
unique are added to x. Records that were already in x are simply ignored. The
SPLASH values are used to test equality between records, hence, the
calcSPLASH argument to loadMSLibrary should be TRUE.
recordsA data.table with metadata for all records. Use the records method for access.
spectraA list with all (annotated) spectra. Each spectrum is stored in a data.table. Use
the spectra method for access.
The convertToSuspects method converts MS library data to a suspect list, which
can be used with e.g. screenSuspects. Furthermore, this function can also amend existing
suspect lists with spectral data.
Conversion occurs in either of the following three methods:
Direct (collapse=FALSE and suspects=NULL): each record is considered a suspect, and the
resulting suspect list is generated directly by converting the records metadata. The fragments_mz column for
each suspect is constructed from the mass peaks of the corresponding record.
Collapse (collapse=TRUE and suspects=NULL): All records with the same first-block
InChIKey are first merged, and their spectra are averaged using the parameters from the
avgSpecParams argument (see getDefAvgPListParams). The suspect list is based on the merged
records, where the fragments_mz column is constructed from the averaged spectra. This is generally a good
default, especially with large MS libraries.
Amend (suspects is not NULL): only those records are considered if their first-block
InChIKey is present in the suspect list. The remaining records and their spectra are then collapsed as
described for the Collapse method, and the fragments_mz column for each suspect is set from the
averaged spectra. If a suspect is not present in the library, its fragments_mz value will be empty. Note
that any existing fragments_mz data will be overwritten.
Chemical properties such as SMILES,
InChIKey and formulae in the input suspect list to convertToSuspects are automatically validated and calculated if missing/invalid.
The internal validation/calculation process performs the following steps:
Validation of SMILES, InChI, InChIKey and formula data (if present). Invalid
entries will be set to NA.
If neutralChemProps=TRUE then chemical data (SMILES, formulae etc.) is neutralized by
(de-)protonation (using the --neutralized option of OpenBabel). An additional column
molNeutralized is added to mark those molecules that were neutralized. Note that neutralization requires
either SMILES or InChI data to be available.
The SMILES and InChI data are used to calculate missing or invalid SMILES,
InChI, InChIKey and formula data. If prefCalcChemProps=TRUE then existing
InChIKey and formula data is overwritten by calculated values whenever possible.
The chemical formulae which were not calculated are verified and normalized. This process may be time
consuming, and is potentially largely avoided by setting prefCalcChemProps=TRUE.
Neutral masses are calculated for missing values (prefCalcChemProps=FALSE) or whenever possible
(prefCalcChemProps=TRUE).
Note that calculation of formulae for molecules that are isotopically labelled is currently only supported for deuterium (2H) elements.
This functionality relies heavily on OpenBabel, please make sure it is installed.
export does not split any Synon data that was merged when the library was loaded.
Wohlgemuth G, Mehta SS, Mejia RF, Neumann S, Pedrosa D, Pluskal T, Schymanski EL, Willighagen EL, Wilson M, Wishart DS, Arita M, Dorrestein PC, Bandeira N, Wang M, Schulze T, Salek RM, Steinbeck C, Nainala VC, Mistrik R, Nishioka T, Fiehn O (2016).
“SPLASH, a hashed identifier for mass spectra.”
Nature Biotechnology, 34(11), 1099–1101.
doi:10.1038/nbt.3689.
Eddelbuettel D (2013).
Seamless R and C++ Integration with Rcpp.
Springer, New York.
doi:10.1007/978-1-4614-6868-4.
ISBN 978-1-4614-6867-7.
Eddelbuettel D, Balamuta J (2018).
“Extending R with C++: A Brief Introduction to Rcpp.”
The American Statistician, 72(1), 28-36.
doi:10.1080/00031305.2017.1375990.
Eddelbuettel D, François R (2011).
“Rcpp: Seamless R and C++ Integration.”
Journal of Statistical Software, 40(8), 1–18.
doi:10.18637/jss.v040.i08.
Eddelbuettel D, Francois R, Allaire J, Ushey K, Kou Q, Russell N, Ucar I, Bates D, Chambers J (2026).
Rcpp: Seamless R and C++ Integration.
R package version 1.1.1-1.1, https://www.rcpp.org.
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011). “Open Babel: An open chemical toolbox.” Journal of Cheminformatics, 3(1). doi:10.1186/1758-2946-3-33.
Contains all MS (and MS/MS where available) peak lists for a featureGroups object.
peakLists(obj, ...) averagedPeakLists(obj, ...) spectrumSimilarity(obj, ...) spectrumSimilarityIMS(obj, ...) ## S4 method for signature 'MSPeakLists' peakLists(obj) ## S4 method for signature 'MSPeakLists' averagedPeakLists(obj) ## S4 method for signature 'MSPeakLists' analyses(obj) ## S4 method for signature 'MSPeakLists' groupNames(obj) ## S4 method for signature 'MSPeakLists' length(x) ## S4 method for signature 'MSPeakLists' show(object) ## S4 method for signature 'MSPeakLists,ANY,ANY,missing' x[i, j, ..., reAverage = FALSE, drop = TRUE] ## S4 method for signature 'MSPeakLists,ANY,ANY' x[[i, j]] ## S4 method for signature 'MSPeakLists' x$name ## S4 method for signature 'MSPeakLists' as.data.table(x, fGroups = NULL, averaged = TRUE) ## S4 method for signature 'MSPeakLists' delete(obj, i = NULL, j = NULL, k = NULL, reAverage = FALSE, ...) ## S4 method for signature 'MSPeakLists' filter( obj, MSLevel = 1:2, absMinIntensity = NULL, relMinIntensity = NULL, topMostPeaks = NULL, minPeaks = NULL, maxMZOverPrec = NULL, absMinAbundanceFeat = NULL, relMinAbundanceFeat = NULL, absMinAbundanceFGroup = NULL, relMinAbundanceFGroup = NULL, relMinCumIntensity = NULL, isolatePrec = NULL, deIsotope = FALSE, removeMZs = NULL, withMSMS = FALSE, annotatedBy = NULL, retainPrecursor = TRUE, mzWindow = defaultLim("mz", "medium"), reAverage = FALSE, negate = FALSE ) ## S4 method for signature 'MSPeakLists' plotSpectrum( obj, groupName, analysis = NULL, MSLevel = 1, title = NULL, normalized = "multiple", specSimParams = getDefSpecSimParams(), xlim = NULL, ylim = NULL, showLegend = TRUE, ... ) ## S4 method for signature 'MSPeakLists' spectrumSimilarity( obj, groupName1, groupName2 = NULL, analysis1 = NULL, analysis2 = NULL, MSLevel = 1, specSimParams = getDefSpecSimParams(), NAToZero = FALSE, drop = TRUE ) ## S4 method for signature 'MSPeakLists' spectrumSimilarityIMS(obj, fGroups, doFGroups = TRUE, warn = TRUE, ...) ## S4 method for signature 'MSPeakListsSet' analysisInfo(obj, df = FALSE) ## S4 method for signature 'MSPeakListsSet' show(object) ## S4 method for signature 'MSPeakListsSet,ANY,ANY,missing' x[i, j, ..., reAverage = FALSE, sets = NULL, drop = TRUE] ## S4 method for signature 'MSPeakListsSet' as.data.table(x, fGroups = NULL, averaged = TRUE) ## S4 method for signature 'MSPeakListsSet' delete(obj, i = NULL, j = NULL, k = NULL, reAverage = FALSE, ...) ## S4 method for signature 'MSPeakListsSet' filter( obj, ..., removeMZs = NULL, withMSMS = FALSE, annotatedBy = NULL, retainPrecursor = TRUE, mzWindow = defaultLim("mz", "medium"), reAverage = FALSE, negate = FALSE, sets = NULL ) ## S4 method for signature 'MSPeakListsSet' plotSpectrum( obj, groupName, analysis = NULL, MSLevel = 1, title = NULL, normalized = "multiple", specSimParams = getDefSpecSimParams(), xlim = NULL, ylim = NULL, perSet = TRUE, mirror = TRUE, ... ) ## S4 method for signature 'MSPeakListsSet' spectrumSimilarity( obj, groupName1, groupName2 = NULL, analysis1 = NULL, analysis2 = NULL, MSLevel = 1, specSimParams = getDefSpecSimParams(), NAToZero = FALSE, drop = TRUE ) ## S4 method for signature 'MSPeakListsSet' unset(obj, set) getDefIsolatePrecParams(...)peakLists(obj, ...) averagedPeakLists(obj, ...) spectrumSimilarity(obj, ...) spectrumSimilarityIMS(obj, ...) ## S4 method for signature 'MSPeakLists' peakLists(obj) ## S4 method for signature 'MSPeakLists' averagedPeakLists(obj) ## S4 method for signature 'MSPeakLists' analyses(obj) ## S4 method for signature 'MSPeakLists' groupNames(obj) ## S4 method for signature 'MSPeakLists' length(x) ## S4 method for signature 'MSPeakLists' show(object) ## S4 method for signature 'MSPeakLists,ANY,ANY,missing' x[i, j, ..., reAverage = FALSE, drop = TRUE] ## S4 method for signature 'MSPeakLists,ANY,ANY' x[[i, j]] ## S4 method for signature 'MSPeakLists' x$name ## S4 method for signature 'MSPeakLists' as.data.table(x, fGroups = NULL, averaged = TRUE) ## S4 method for signature 'MSPeakLists' delete(obj, i = NULL, j = NULL, k = NULL, reAverage = FALSE, ...) ## S4 method for signature 'MSPeakLists' filter( obj, MSLevel = 1:2, absMinIntensity = NULL, relMinIntensity = NULL, topMostPeaks = NULL, minPeaks = NULL, maxMZOverPrec = NULL, absMinAbundanceFeat = NULL, relMinAbundanceFeat = NULL, absMinAbundanceFGroup = NULL, relMinAbundanceFGroup = NULL, relMinCumIntensity = NULL, isolatePrec = NULL, deIsotope = FALSE, removeMZs = NULL, withMSMS = FALSE, annotatedBy = NULL, retainPrecursor = TRUE, mzWindow = defaultLim("mz", "medium"), reAverage = FALSE, negate = FALSE ) ## S4 method for signature 'MSPeakLists' plotSpectrum( obj, groupName, analysis = NULL, MSLevel = 1, title = NULL, normalized = "multiple", specSimParams = getDefSpecSimParams(), xlim = NULL, ylim = NULL, showLegend = TRUE, ... ) ## S4 method for signature 'MSPeakLists' spectrumSimilarity( obj, groupName1, groupName2 = NULL, analysis1 = NULL, analysis2 = NULL, MSLevel = 1, specSimParams = getDefSpecSimParams(), NAToZero = FALSE, drop = TRUE ) ## S4 method for signature 'MSPeakLists' spectrumSimilarityIMS(obj, fGroups, doFGroups = TRUE, warn = TRUE, ...) ## S4 method for signature 'MSPeakListsSet' analysisInfo(obj, df = FALSE) ## S4 method for signature 'MSPeakListsSet' show(object) ## S4 method for signature 'MSPeakListsSet,ANY,ANY,missing' x[i, j, ..., reAverage = FALSE, sets = NULL, drop = TRUE] ## S4 method for signature 'MSPeakListsSet' as.data.table(x, fGroups = NULL, averaged = TRUE) ## S4 method for signature 'MSPeakListsSet' delete(obj, i = NULL, j = NULL, k = NULL, reAverage = FALSE, ...) ## S4 method for signature 'MSPeakListsSet' filter( obj, ..., removeMZs = NULL, withMSMS = FALSE, annotatedBy = NULL, retainPrecursor = TRUE, mzWindow = defaultLim("mz", "medium"), reAverage = FALSE, negate = FALSE, sets = NULL ) ## S4 method for signature 'MSPeakListsSet' plotSpectrum( obj, groupName, analysis = NULL, MSLevel = 1, title = NULL, normalized = "multiple", specSimParams = getDefSpecSimParams(), xlim = NULL, ylim = NULL, perSet = TRUE, mirror = TRUE, ... ) ## S4 method for signature 'MSPeakListsSet' spectrumSimilarity( obj, groupName1, groupName2 = NULL, analysis1 = NULL, analysis2 = NULL, MSLevel = 1, specSimParams = getDefSpecSimParams(), NAToZero = FALSE, drop = TRUE ) ## S4 method for signature 'MSPeakListsSet' unset(obj, set) getDefIsolatePrecParams(...)
obj, x, object
|
The |
... |
For the For For For For sets workflow methods: further arguments passed to the base |
i, j
|
For |
reAverage |
Set to |
drop |
If set to |
name |
The feature group name (partially matched). |
fGroups |
The |
averaged |
If |
k |
A vector with analyses ( |
MSLevel |
The MS level for which data is plotted or filtered: ‘1’ for regular MS, ‘2’ for MSMS. For |
absMinIntensity, relMinIntensity
|
Absolute/relative intensity threshold for peaks. Set to |
topMostPeaks |
Only consider this number of most intense peaks. Set to |
minPeaks |
If the number of peaks in an MS/MS peak list (excluding the precursor peak) is lower than
this it will be completely removed. Set to |
maxMZOverPrec |
Any mass peaks with an m/z higher than this value (relative to the precursor) will be removed.
Set to |
absMinAbundanceFeat, relMinAbundanceFeat
|
The minimum absolute/relative abundance for a mass peak across spectra
that are averaged for a feature. Setting
In most cases Set to |
absMinAbundanceFGroup, relMinAbundanceFGroup
|
The minimum absolute/relative abundance of a mass peak across
spectra that are averaged for a feature group. Set to |
relMinCumIntensity |
The minimum relative cumulative intensity of the peaks. For instance, a value of
‘0.95’ means that only the most intense peaks that together account for 95% of the total intensity are
retained. Set to |
isolatePrec |
If not |
deIsotope |
Remove any isotopic peaks in peak lists. This may improve data processing steps which do not assume
the presence of isotopic peaks (e.g. MetFrag for MS/MS). Note that |
removeMZs |
A set of m/z values to be removed from the peak lists. This is typically used to remove background
peaks. The m/z values should be specified by either be a (sets workflow) Should be a |
withMSMS |
If set to |
annotatedBy |
Either a NOTE: the |
retainPrecursor |
If |
mzWindow |
The m/z window used to find peaks to be removed from the |
negate |
If |
groupName |
The name of the feature group for which a plot should be made. To compare spectra, two group names can be specified. |
analysis |
The name of the analysis for which a plot should be made. If
|
title |
The title of the plot. If |
normalized |
Controls intensity normalization. Should be |
specSimParams |
A named |
xlim, ylim
|
Sets the plot size limits used by
|
showLegend |
Set to |
groupName1, groupName2
|
The names of the feature groups for which the comparison should be made. If both
arguments are specified then a comparison is made with the spectra specified by |
analysis1, analysis2
|
The name of the analysis (analyses) for the comparison. If |
NAToZero |
Set to |
doFGroups |
Set to |
warn |
Set to |
df |
If |
sets |
(sets workflow) A |
perSet, mirror
|
(sets workflow) If |
set |
(sets workflow) The name of the set. |
Objects for this class are returned by generateMSPeakLists.
The getDefIsolatePrecParams is used to create a parameter
list for isolating the precursor and its isotopes (see Isolating precursor data).
peakLists returns a nested list containing MS (and MS/MS where
available) peak lists per feature group and per analysis. The format is:
[[analysis]][[featureGroupName]][[MSType]][[PeakLists]] where
MSType is either "MS" or "MSMS" and PeakLists a
data.table containing all m/z values (mz
column) and their intensities (intensity column). In addition, the
peak list tables may contain a cmp column which contains an unique
alphabetical identifier to which isotopic cluster (or "compound") a mass
belongs (only supported by MS peak lists generated by Bruker tools at the
moment).
averagedPeakLists returns a nested list of feature group
averaged peak lists in a similar format as peakLists.
delete returns the object for which the specified data was removed.
spectrumSimilarityIMS returns a data.table with spectral similarities for each IMS
precursor and feature pair.
peakLists(MSPeakLists): Accessor method to obtain the MS peak lists.
averagedPeakLists(MSPeakLists): Accessor method to obtain the feature group averaged
MS peak lists.
analyses(MSPeakLists): returns a character vector with the names of the
analyses for which data is present in this object.
groupNames(MSPeakLists): returns a character vector with the names of the
feature groups for which data is present in this object.
length(MSPeakLists): Obtain total number of m/z values.
show(MSPeakLists): Shows summary information for this object.
x[i: Subset on analyses/feature groups.
x[[i: Extract a list with MS and MS/MS (if available) peak
lists. If the second argument (j) is not specified the averaged peak
lists for the group specified by the first argument (i) will be
returned.
$: Extract group averaged MS peaklists for a feature group.
as.data.table(MSPeakLists): Returns all MS peak list data in a table.
delete(MSPeakLists): Completely deletes specified peaks from MS peak lists.
filter(MSPeakLists): provides post filtering of generated MS peak lists, which may further enhance quality of
subsequent workflow steps (e.g. formulae calculation and compounds identification) and/or speed up these
processes. The filters are applied to peak lists for each feature and feature group. The feature group peak lists
are not re-averaged by default (see the reAverage argument). not filtered afterwards.
plotSpectrum(MSPeakLists): Plots a spectrum using MS or MS/MS peak lists for a given feature group. Two spectra can be
compared when two feature groups are specified.
spectrumSimilarity(MSPeakLists): Calculates the spectral similarity between two or more spectra.
spectrumSimilarityIMS(MSPeakLists): Calculates the spectral similarity between spectra from IMS features (or feature groups)
and their IMS precursors in post mobility workflows (see assignMobilities).
peakListsContains a list of all MS (and MS/MS) peak lists. Use the peakLists method for access.
metadataMetadata for all spectra used to generate peak lists. Follows the format of the peakLists slot.
averagedPeakListsA list with averaged MS (and MS/MS) peak lists for each feature group.
avgPeakListArgsA list with arguments used to generate feature group averaged MS(/MS) peak lists.
origFGNamesA character with the original input feature group names.
analysisInfo(sets workflow) Analysis information. Use the analysisInfo method
for access.
Formula calculation typically relies on evaluating the measured isotopic pattern
from the precursor to score candidates. Some algorithms (currently only GenForm) penalize candidates if
mass peaks are present in MS1 spectra that do not contribute to the isotopic pattern. Since these spectra are
typically very 'noisy' due to background and co-eluting ions, an additional filtering step may be recommended prior
to formula calculation. During this precursor isolation step all mass peaks are removed that are (1) not the
precursor and (2) not likely to be an isotopologue of the precursor. To determine potential isotopic peaks the
following parameters are used:
maxIsotopes The maximum number of isotopes to consider. For instance, a value of ‘5’ means that
M+0 (i.e. the monoisotopic peak) till M+5 is considered. All mass peaks outside this range are
removed.
mzDefectRange A two-sized vector specifying the minimum (can be negative) and maximum
m/z defect deviation compared to the precursor m/z defect. When chlorinated, brominated or other
compounds with strong m/z defect in their isotopologues are to be considered a higher range may be desired.
On the other hand, for natural compounds this range may be tightened. Note that the search range is propegated with
increasing distance from the precursor, e.g. the search range is doubled for M+2, tripled for
M+3 etc.
intRange A two-sized vector specifying the minimum and maximum relative intensity range
compared to the precursor. For instance, c(0.001, 2) removes all peaks that have an intensity below 0.1% or
above 200% of that of the precursor.
z The z value (i.e. absolute charge) to be considerd. For instance, a value of 2
would look for M+0.5, M+1 etc. Note that the mzDefectRange is adjusted accordingly
(e.g. halved if z=2).
maxGap The maximum number of missing adjacent isotopic peaks ('gaps'). If the (rounded) m/z
difference to the previous peak exceeds this value then this and all next peaks will be removed. Similar to
z, the maximum gap is automatically adjusted for charge.
These parameters should be in a list that is passed to the isolatePrec argument to filter. The
default values can be obtained with the getDefIsolatePrecParams function:
maxIsotopes=5; mzDefectRange=c(-0.01, 0.01); intRange=c(0.001, 2); z=1; maxGap=2
spectrumSimilarity: The principles of spectral binning and cosine similarity calculations
were loosely was based on the code from SpectrumSimilarity() function of OrgMassSpecR.
The MSPeakListsSet class is applicable for sets workflows. This class is derived from MSPeakLists and therefore largely follows the same user interface.
The following methods are specifically defined for sets workflows:
All the methods from base class workflowStepSet.
unset Converts the object data for a specified set into a 'non-set' object (MSPeakListsUnset), which allows it to be used in 'regular' workflows. Only the MS peaks that are present in the specified set are kept.
analysisInfo Returns the analysis info for this object.
The following methods are changed or with new functionality:
filter and the subset operator ([) Can be used to select data that is only present for selected
sets (sets argument).
The filter method is applied for each set individually, and afterwards the results are combined again
(see generateMSPeakLists). Note that this has important implications for e.g. intensity
filters (absMinIntensity/relMinIntensity), topMostPeaks and
minPeaks. Furthermore, when the annotatedBy filter is applied, each set specific MS peak list is
filtered by the annotation results from only that set. Finally, the removeMZs filter should be set for each
set separately.
plotSpectrum Is able to highlight set specific mass peaks (perSet and mirror arguments).
spectrumSimilarity First calculates similarities for each spectral pair per set (e.g. all
positive mode spectra are compared and then all negative mode spectra are compared). This data is then combined
into an overall similarity value. How this combination is performed depends on the setCombineMethod field of
the specSimParams argument.
For spectrumSimilarity: major contributions by Bas van de Velde for spectral binning and similarity
calculation.
The newProject function is used to quickly generate a processing R script. This tool allows the user to
quickly select the targeted analyses, workflow steps and configuring some of their common parameters. This function
requires to be run within a RStudio session. The resulting script is either added to
the current open file or to a new file. The analysis information will be written to a
‘.csv’ file so that it can easily be modified afterwards.
newProject(destPath = NULL)newProject(destPath = NULL)
destPath |
Set destination path value to this value (useful for debugging). Set to |
Objects from this class contain optimization results resulting from design of experiment (DoE).
optimizedParameters(object, paramSet = NULL, DoEIteration = NULL) optimizedObject(object, paramSet = NULL) scores(object, paramSet = NULL, DoEIteration = NULL) experimentInfo(object, paramSet, DoEIteration) ## S4 method for signature 'optimizationResult' algorithm(obj) ## S4 method for signature 'optimizationResult' length(x) ## S4 method for signature 'optimizationResult' lengths(x, use.names = FALSE) ## S4 method for signature 'optimizationResult' show(object) ## S4 method for signature 'optimizationResult,missing' plot( x, paramSet, DoEIteration, paramsToPlot = NULL, maxCols = NULL, type = "contour", image = TRUE, contours = "colors", ... ) ## S4 method for signature 'optimizationResult' optimizedParameters(object, paramSet = NULL, DoEIteration = NULL) ## S4 method for signature 'optimizationResult' optimizedObject(object, paramSet = NULL) ## S4 method for signature 'optimizationResult' scores(object, paramSet = NULL, DoEIteration = NULL) ## S4 method for signature 'optimizationResult' experimentInfo(object, paramSet, DoEIteration)optimizedParameters(object, paramSet = NULL, DoEIteration = NULL) optimizedObject(object, paramSet = NULL) scores(object, paramSet = NULL, DoEIteration = NULL) experimentInfo(object, paramSet, DoEIteration) ## S4 method for signature 'optimizationResult' algorithm(obj) ## S4 method for signature 'optimizationResult' length(x) ## S4 method for signature 'optimizationResult' lengths(x, use.names = FALSE) ## S4 method for signature 'optimizationResult' show(object) ## S4 method for signature 'optimizationResult,missing' plot( x, paramSet, DoEIteration, paramsToPlot = NULL, maxCols = NULL, type = "contour", image = TRUE, contours = "colors", ... ) ## S4 method for signature 'optimizationResult' optimizedParameters(object, paramSet = NULL, DoEIteration = NULL) ## S4 method for signature 'optimizationResult' optimizedObject(object, paramSet = NULL) ## S4 method for signature 'optimizationResult' scores(object, paramSet = NULL, DoEIteration = NULL) ## S4 method for signature 'optimizationResult' experimentInfo(object, paramSet, DoEIteration)
paramSet |
Numeric index of the parameter set (i.e. the first
parameter set gets index ‘1’). For some methods optional: if
|
DoEIteration |
Numeric index specifying the DoE iteration within the
specified |
obj, x, object
|
An |
use.names |
Ignored. |
paramsToPlot |
Which parameters relations should be plot. If |
maxCols |
Multiple parameter pairs are plotted in a grid. The maximum
number of columns can be set with this argument. Set to |
type |
The type of plots to be generated: |
image |
Passed to |
contours |
Passed to |
... |
Further arguments passed to |
Objects from this class are returned by optimizeFeatureFinding and
optimizeFeatureGrouping.
algorithm(optimizationResult): Returns the algorithm that was used for finding features.
length(optimizationResult): Obtain total number of experimental design iterations performed.
lengths(optimizationResult): Obtain number of experimental design iterations performed for each parameter set.
show(optimizationResult): Shows summary information for this object.
plot(x = optimizationResult, y = missing): Generates response plots for all or a selected
set of parameters.
optimizedParameters(optimizationResult): Returns parameter set yielding optimal
results. The paramSet and DoEIteration arguments can be
NULL.
optimizedObject(optimizationResult): Returns the object (i.e. a
features or featureGroups object) that was
generated with optimized parameters. The paramSet argument can be
NULL.
scores(optimizationResult): Returns optimization scores. The
paramSet and DoEIteration arguments can be NULL.
experimentInfo(optimizationResult): Returns a list with optimization
information from an DoE iteration.
algorithmA character specifying the algorithm that was optimized.
paramSetsA list with detailed results from each parameter set
that was tested.
bestParamSetNumeric index of the parameter set yielding the best response.
## Not run: # ftOpt is an optimization object. # plot contour of all parameter pairs from the first parameter set/iteration. plot(ftOpt, paramSet = 1, DoEIteration = 1) # as above, but only plot two parameter pairs plot(ftOpt, paramSet = 1, DoEIteration = 1, paramsToPlot = list(c("mzPPM", "chromFWHM"), c("chromFWHM", "chromSNR"))) # plot 3d perspective plots plot(ftOpt, paramSet = 1, DoEIteration = 1, type = "persp") ## End(Not run)## Not run: # ftOpt is an optimization object. # plot contour of all parameter pairs from the first parameter set/iteration. plot(ftOpt, paramSet = 1, DoEIteration = 1) # as above, but only plot two parameter pairs plot(ftOpt, paramSet = 1, DoEIteration = 1, paramsToPlot = list(c("mzPPM", "chromFWHM"), c("chromFWHM", "chromSNR"))) # plot 3d perspective plots plot(ftOpt, paramSet = 1, DoEIteration = 1, type = "persp") ## End(Not run)
Parameters that are used by method functions such as.data.table to
aggregate predicted concentrations or toxicities.
getDefPredAggrParams(all = mean, ...)getDefPredAggrParams(all = mean, ...)
all |
The default aggregation function for all types, e.g. |
... |
optional named arguments that override defaults. |
Multiple concentration or toxicity values may be assigned to a single feature group. To ease the interpretation and data handling, several functions aggregate these values prior their use. Aggregation occurs by the following data:
The candidate (i.e. suspect or annotation candidate). This is mainly relevant for sets workflows, where calculations among sets may yield different results for the same candidate.
The prediction type, e.g. all values that were obtained from suspect or compound annotation data.
The feature group.
The aggregation of all data first occurs by the same candidate/type/feature group, then the same type/feature group and finally for each feature group. This ensures that e.g. large numbers of data points for a prediction type do not bias results.
The candidateFunc, typeFunc and groupFunc parameters specify the function that should be used to
aggregate data. Commonly, functions such mean, min or max can be used here.
Note that the function does not need to handle NA values, as these are removed in advance.
The preferType parameters specifies the preferred prediction type. Any values from other prediction
types will be ignored unless the preferred type is not available for a feature group. Valid values are
"suspect" (the default), "compound" (results from compound annotation by SMILES),
"SIRIUS_FP" (results from formula/compound annotation with SIRIUS+CSI:FingerID) or "none".
These parameters should be stored inside a list. The getDefPredAggrParams function can be used to
generate such parameter list with defaults.
Functions to predict response factors and feature concentrations from SMILES and/or
SIRIUS+CSI:FingerID fingerprints using the MS2Quant package.
calculateConcs(fGroups, ...) ## S4 method for signature 'featureGroups' calculateConcs(fGroups, featureAnn, areas = FALSE) ## S4 method for signature 'featureGroupsSet' calculateConcs(fGroups, featureAnn, areas = FALSE) ## S4 method for signature 'compounds' predictRespFactors( obj, fGroups, calibrants, eluent, organicModifier, pHAq, concUnit = "ugL", calibConcUnit = concUnit, updateScore = FALSE, scoreWeight = 1, parallel = TRUE ) ## S4 method for signature 'featureGroupsScreening' predictRespFactors( obj, calibrants, eluent, organicModifier, pHAq, concUnit = "ugL", calibConcUnit = concUnit ) ## S4 method for signature 'featureGroupsScreening' calculateConcs(fGroups, featureAnn = NULL, areas = FALSE) ## S4 method for signature 'featureGroupsScreeningSet' predictRespFactors(obj, calibrants, ...) ## S4 method for signature 'featureGroupsScreeningSet' calculateConcs(fGroups, featureAnn = NULL, areas = FALSE) ## S4 method for signature 'compoundsSet' predictRespFactors(obj, fGroups, calibrants, ...) ## S4 method for signature 'compoundsSIRIUS' predictRespFactors( obj, fGroups, calibrants, eluent, organicModifier, pHAq, concUnit = "ugL", calibConcUnit = concUnit, type = "FP" ) ## S4 method for signature 'formulasSet' predictRespFactors(obj, fGroups, calibrants, ...) ## S4 method for signature 'formulasSIRIUS' predictRespFactors( obj, fGroups, calibrants, eluent, organicModifier, pHAq, concUnit = "ugL", calibConcUnit = concUnit ) getQuantCalibFromScreening( fGroups, concs, areas = FALSE, average = FALSE, IMS = "maybe" )calculateConcs(fGroups, ...) ## S4 method for signature 'featureGroups' calculateConcs(fGroups, featureAnn, areas = FALSE) ## S4 method for signature 'featureGroupsSet' calculateConcs(fGroups, featureAnn, areas = FALSE) ## S4 method for signature 'compounds' predictRespFactors( obj, fGroups, calibrants, eluent, organicModifier, pHAq, concUnit = "ugL", calibConcUnit = concUnit, updateScore = FALSE, scoreWeight = 1, parallel = TRUE ) ## S4 method for signature 'featureGroupsScreening' predictRespFactors( obj, calibrants, eluent, organicModifier, pHAq, concUnit = "ugL", calibConcUnit = concUnit ) ## S4 method for signature 'featureGroupsScreening' calculateConcs(fGroups, featureAnn = NULL, areas = FALSE) ## S4 method for signature 'featureGroupsScreeningSet' predictRespFactors(obj, calibrants, ...) ## S4 method for signature 'featureGroupsScreeningSet' calculateConcs(fGroups, featureAnn = NULL, areas = FALSE) ## S4 method for signature 'compoundsSet' predictRespFactors(obj, fGroups, calibrants, ...) ## S4 method for signature 'compoundsSIRIUS' predictRespFactors( obj, fGroups, calibrants, eluent, organicModifier, pHAq, concUnit = "ugL", calibConcUnit = concUnit, type = "FP" ) ## S4 method for signature 'formulasSet' predictRespFactors(obj, fGroups, calibrants, ...) ## S4 method for signature 'formulasSIRIUS' predictRespFactors( obj, fGroups, calibrants, eluent, organicModifier, pHAq, concUnit = "ugL", calibConcUnit = concUnit ) getQuantCalibFromScreening( fGroups, concs, areas = FALSE, average = FALSE, IMS = "maybe" )
fGroups |
For For For |
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
featureAnn |
A |
areas |
Set to |
obj |
The workflow object for which predictions should be performed, e.g. feature groups with screening
results ( |
calibrants |
A (sets workflow) Should be a |
eluent |
A |
organicModifier |
The organic modifier of the mobile phase: either |
pHAq |
The pH of the aqueous part of the mobile phase. |
concUnit |
The concentration unit for calculated concentrations. Can be molar based ( |
calibConcUnit |
The concentration unit used in the calibrants table. For possible values see the |
updateScore, scoreWeight
|
If |
parallel |
If set to |
type |
Which types of predictions should be performed: should be |
concs |
A |
average |
Set to |
IMS |
(IMS workflow) Specifies which feature groups are considered for generating calibrant data in IMS workflows. The following options are valid:
Best to keep this |
The MS2Quant R package predicts concentrations from SMILES
and/or MS/MS fingerprints obtained with SIRIUS+CSI:FingerID. The predictRespFactors method functions
interface with this package to calculate response factors, which can then be used to calculate feature concentrations
with the calculateConcs method function.
predictRespFactors returns an object amended with response factors (RF_SMILES/LRF_SIRFP
columns).
calculateConcs returns a featureGroups based object amended with concentrations for each
feature group (accessed with the concentrations method).
The MS2Quant package requires calibration to convert predicted ionization efficiencies to
instrument/method specific response factors. The calibration data should be specified with the calibrants
argument to predictRespFactors. This should be a data.frame with intensity observations at different
concentrations for a set of calibrants. Each row specifies one intensity observation at one concentration. The
table should have the following columns:
name The name of the calibrant. Can be freely chosen.
SMILES The SMILES of the calibrant.
rt The retention time of the calibrant (in seconds).
intensity The peak intensity (or area, see the areas argument) of the calibrant.
conc The concentration of the calibrant (see the calibConcUnit argument for specifying the unit).
It is recommended to include multiple calibrants (e.g. ‘>=10’) at multiple concentrations (e.g.
‘>=5’). The latter is achieved by adding multiple rows for the same calibrant (keeping the
name/SMILES/rt columns constant). It is also possible to follow the column naming used by
MS2Quant (however retention times should still be in seconds!). For more details and tips see
https://github.com/kruvelab/MS2Quant.
The getQuantCalibFromScreening function can be used to automatically generate a calibrants table from a
feature groups object with suspect screening results. Here, the idea is to perform a screening with
screenSuspects with a suspect list that contain the calibrants, which is then used to construct the
calibrant table. It is highly recommended to add retention times for the calibrants in the suspect list to ensure
the calibrant is assigned to the correct feature. Furthermore, it is possible to simply add the calibrants to the
'regular' suspect list in case a suspect screening was already part of the workflow. The
getQuantCalibFromScreening function still requires you to specify concentration data, which is achieved via
the concs argument. This should be a data.frame with a column name corresponding to the
calibrant name (i.e. same as used by screenSuspects above) and columns with concentration data. The
latter columns specify the concentrations of a calibrant in different replicates (as defined in the
analysis information). The concentration columns should be named after the
corresponding replicate. Only those replicates that should be used for calibration need to be included.
Furthermore, NA values can be used if a replicate should be ignored for a specific calibrant.
The response factors are predicted with the predictRespFactors generic functions,
which accepts the following input:
Suspect screening results. The SMILES data is used to predict response factors for suspect hits.
Formula annotation data obtained with "sirius" algorithm (generateFormulasSIRIUS). The
predictions are performed for each formula candidate using SIRIUS+CSI:FingerID fingerprints. For this
reason, the getFingerprint argument must be set to TRUE when generating the formula data.
Compound annotation data obtained with the "sirius" algorithm (generateCompoundsSIRIUS).
The predictions are performed for each annotation candidate using its SMILES and/or
SIRIUS+CSI:FingerID fingerprints. The predictions are performed on a per formula basis, hence,
response factors for isomers will be equal.
Compound annotation data obtained with algorithms other than "sirius". The response factors are predicted
from SMILES data.
When SMILES data is used then predictions of response factors are generally more accurate. However,
calculations with SIRIUS+CSI:FingerID fingerprints are faster and only require the formula and MS/MS
spectrum, i.e. not the full structure. Hence, calculations with SMILES are mostly useful in
suspect screening workflows, or with high confidence compound annotation data, whereas MS/MS fingerprints are
suitable with unknowns.
For annotation data the calculations are performed for all candidates. This can especially lead to long
running calculations when SMILES data is used. Hence, it is strongly recommended to first
prioritize the annotation results, e.g. with the topMost argument to the
filter method.
When response factors are predicted from SIRIUS+CSI:FingerID fingerprints then only formula and MS/MS
spectra are used, even if compound annotations are used for input. The major difference is that with formula
annotation input all formula candidates for which a fingerprint could be generated are considered, whereas
with compound annotations only candidate formulae are considered for which also a structure could be assigned.
Hence, the formula annotation input could be more comprehensive, whereas predictions from structure annotations
could lead to more representative results as only formulae are considered for which at least one structure could be
assigned.
The calculateConcs generic function is used to assign concentrations for each
feature using the response factors discussed in the previous section. The function takes response factors from suspect
screening results and/or feature annotation data. If multiple response factors were predicted for the same feature
group, for instance when multiple annotation candidates or suspect hits for this feature group are present, then a
concentrations is assigned for all response factors. These values can later be easily aggregated with e.g. the
as.data.table function.
In IMS workflows with post mobility assignments (see
assignMobilities), the intensities of IMS precursors are used for the
calculation of concentrations for IMS features (as calibration typically does not consider differences due to
mobility filtering). However, the response factor assigned to IMS features are still used. This may yield
small differences compared to workflows where concentrations are assigned prior to mobility assignments, as
assignMobilities simply copies concentrations from IMS precursors to IMS features.
The rcdk package and OpenBabel tool are used
internally to calculate molecular weights. Please make sure that OpenBabel is installed.
MS2Quant currently only supports ‘M+H’ and ‘M+’ adducts when performing predictions with
SIRIUS:FingerID fingerprints. Predictions for candidates with other adducts, including ‘M-H’, are
skipped with a warning.
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011).
“Open Babel: An open chemical toolbox.”
Journal of Cheminformatics, 3(1).
doi:10.1186/1758-2946-3-33.
Guha R (2007).
“Chemical Informatics Functionality in R.”
Journal of Statistical Software, 18(6).
Sepman H, Malm L, Peets P, MacLeod M, Martin J, Breitholtz M, Kruve A (2023). “Bypassing the Identification: MS2Quant for Concentration Estimations of Chemicals Detected with Nontarget LC-HRMS from MS2 Data.” Analytical Chemistry, 95(33), 12329–12338. doi:10.1021/acs.analchem.3c01744. https://doi.org/10.1021/acs.analchem.3c01744.
Functions to predict toxicities from SMILES and/or SIRIUS+CSI:FingerID fingerprints using the
MS2Tox package.
calculateTox(fGroups, ...) ## S4 method for signature 'featureGroups' calculateTox(fGroups, featureAnn) ## S4 method for signature 'featureGroupsSet' calculateTox(fGroups, featureAnn) ## S4 method for signature 'compounds' predictTox( obj, LC50Mode = "static", concUnit = "ugL", updateScore = FALSE, scoreWeight = 1, parallel = TRUE ) ## S4 method for signature 'featureGroupsScreening' predictTox(obj, LC50Mode = "static", concUnit = "ugL") ## S4 method for signature 'featureGroupsScreening' calculateTox(fGroups, featureAnn = NULL) ## S4 method for signature 'featureGroupsScreeningSet' predictTox(obj, LC50Mode = "static", concUnit = "ugL") ## S4 method for signature 'featureGroupsScreeningSet' calculateTox(fGroups, featureAnn = NULL) ## S4 method for signature 'compoundsSet' predictTox(obj, ...) ## S4 method for signature 'compoundsSIRIUS' predictTox(obj, type = "FP", LC50Mode = "static", concUnit = "ugL") ## S4 method for signature 'formulasSet' predictTox(obj, ...) ## S4 method for signature 'formulasSIRIUS' predictTox(obj, LC50Mode = "static", concUnit = "ugL")calculateTox(fGroups, ...) ## S4 method for signature 'featureGroups' calculateTox(fGroups, featureAnn) ## S4 method for signature 'featureGroupsSet' calculateTox(fGroups, featureAnn) ## S4 method for signature 'compounds' predictTox( obj, LC50Mode = "static", concUnit = "ugL", updateScore = FALSE, scoreWeight = 1, parallel = TRUE ) ## S4 method for signature 'featureGroupsScreening' predictTox(obj, LC50Mode = "static", concUnit = "ugL") ## S4 method for signature 'featureGroupsScreening' calculateTox(fGroups, featureAnn = NULL) ## S4 method for signature 'featureGroupsScreeningSet' predictTox(obj, LC50Mode = "static", concUnit = "ugL") ## S4 method for signature 'featureGroupsScreeningSet' calculateTox(fGroups, featureAnn = NULL) ## S4 method for signature 'compoundsSet' predictTox(obj, ...) ## S4 method for signature 'compoundsSIRIUS' predictTox(obj, type = "FP", LC50Mode = "static", concUnit = "ugL") ## S4 method for signature 'formulasSet' predictTox(obj, ...) ## S4 method for signature 'formulasSIRIUS' predictTox(obj, LC50Mode = "static", concUnit = "ugL")
fGroups |
For For |
... |
(sets workflow) Further arguments passed to the non-sets workflow method. |
featureAnn |
A |
obj |
The workflow object for which predictions should be performed, e.g. feature groups with screening
results ( |
LC50Mode |
The mode used for predictions: should be |
concUnit |
The concentration unit for calculated toxicities. Can be molar based ( |
updateScore, scoreWeight
|
If |
parallel |
If set to |
type |
Which types of predictions should be performed: should be |
The MS2Tox R package predicts toxicities from SMILES and/or
MS/MS fingerprints obtained with SIRIUS+CSI:FingerID. The predictTox method functions interface with
this package to predict toxicities, which can then be assigned to feature groups with the calculateTox method
function.
predictTox returns an object amended with LC 50 values (LC50_SMILES/LC50_SIRFP columns).
calculateTox returns a featureGroups based object amended with toxicity values for each
feature group (accessed with the toxicities method).
The toxicities are predicted with the predictTox generic functions,
which accepts the following input:
Suspect screening results. The SMILES data is used to predict toxicities for suspect hits.
Formula annotation data obtained with "sirius" algorithm (generateFormulasSIRIUS). The
predictions are performed for each formula candidate using SIRIUS+CSI:FingerID fingerprints. For this
reason, the getFingerprint argument must be set to TRUE when generating the formula data.
Compound annotation data obtained with the "sirius" algorithm (generateCompoundsSIRIUS).
The predictions are performed for each annotation candidate using its SMILES and/or
SIRIUS+CSI:FingerID fingerprints. The predictions are performed on a per formula basis, hence,
toxicities for isomers will be equal.
Compound annotation data obtained with algorithms other than "sirius". The toxicities are predicted
from SMILES data.
When SMILES data is used then predictions of toxicities are generally more accurate. However,
calculations with SIRIUS+CSI:FingerID fingerprints are faster and only require the formula and MS/MS
spectrum, i.e. not the full structure. Hence, calculations with SMILES are mostly useful in
suspect screening workflows, or with high confidence compound annotation data, whereas MS/MS fingerprints are
suitable with unknowns.
For annotation data the calculations are performed for all candidates. This can especially lead to long
running calculations when SMILES data is used. Hence, it is strongly recommended to first
prioritize the annotation results, e.g. with the topMost argument to the
filter method.
When toxicities are predicted from SIRIUS+CSI:FingerID fingerprints then only formula and MS/MS
spectra are used, even if compound annotations are used for input. The major difference is that with formula
annotation input all formula candidates for which a fingerprint could be generated are considered, whereas
with compound annotations only candidate formulae are considered for which also a structure could be assigned.
Hence, the formula annotation input could be more comprehensive, whereas predictions from structure annotations
could lead to more representative results as only formulae are considered for which at least one structure could be
assigned.
The calculateTox generic function is used to assign toxicities for each
feature using the toxicities discussed in the previous section. The function takes toxicities from suspect
screening results and/or feature annotation data. If multiple toxicities were predicted for the same feature
group, for instance when multiple annotation candidates or suspect hits for this feature group are present, then a
toxicities is assigned for all toxicities. These values can later be easily aggregated with e.g. the
as.data.table function.
The rcdk package and OpenBabel tool are used
internally to calculate molecular weights. Please make sure that OpenBabel is installed.
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011).
“Open Babel: An open chemical toolbox.”
Journal of Cheminformatics, 3(1).
doi:10.1186/1758-2946-3-33.
Guha R (2007).
“Chemical Informatics Functionality in R.”
Journal of Statistical Software, 18(6).
Peets P, Wang W, MacLeod M, Breitholtz M, Martin JW, Kruve A (2022). “MS2Tox Machine Learning Tool for Predicting the Ecotoxicity of Unidentified Chemicals in Water by Nontarget LC-HRMS.” Environmental Science & Technology, 56(22), 15508-15517. doi:10.1021/acs.est.2c02536. PMID: 36269851, https://doi.org/10.1021/acs.est.2c02536.
patRoon and their currently set values.Prints all the package options of patRoon and their currently set values.
printPackageOpts()printPackageOpts()
Functionality to report data produced by most workflow steps such as features, feature groups, formula and compound annotations, and TPs.
report( fGroups, MSPeakLists = NULL, formulas = NULL, compounds = NULL, compsCluster = NULL, components = NULL, TPs = NULL, settingsFile = system.file("report", "settings.yml", package = "patRoon"), path = NULL, EICParams = getDefEICParams(topMost = 1, topMostByReplicate = TRUE), EIMParams = getDefEIMParams(topMost = 1, topMostByReplicate = TRUE), specSimParams = getDefSpecSimParams(), clearPath = FALSE, openReport = TRUE, parallel = FALSE, overrideSettings = list() ) ## S4 method for signature 'featureGroups' report( fGroups, MSPeakLists = NULL, formulas = NULL, compounds = NULL, compsCluster = NULL, components = NULL, TPs = NULL, settingsFile = system.file("report", "settings.yml", package = "patRoon"), path = NULL, EICParams = getDefEICParams(topMost = 1, topMostByReplicate = TRUE), EIMParams = getDefEIMParams(topMost = 1, topMostByReplicate = TRUE), specSimParams = getDefSpecSimParams(), clearPath = FALSE, openReport = TRUE, parallel = FALSE, overrideSettings = list() ) genReportSettingsFile(out = "report.yml", baseFrom = NULL)report( fGroups, MSPeakLists = NULL, formulas = NULL, compounds = NULL, compsCluster = NULL, components = NULL, TPs = NULL, settingsFile = system.file("report", "settings.yml", package = "patRoon"), path = NULL, EICParams = getDefEICParams(topMost = 1, topMostByReplicate = TRUE), EIMParams = getDefEIMParams(topMost = 1, topMostByReplicate = TRUE), specSimParams = getDefSpecSimParams(), clearPath = FALSE, openReport = TRUE, parallel = FALSE, overrideSettings = list() ) ## S4 method for signature 'featureGroups' report( fGroups, MSPeakLists = NULL, formulas = NULL, compounds = NULL, compsCluster = NULL, components = NULL, TPs = NULL, settingsFile = system.file("report", "settings.yml", package = "patRoon"), path = NULL, EICParams = getDefEICParams(topMost = 1, topMostByReplicate = TRUE), EIMParams = getDefEIMParams(topMost = 1, topMostByReplicate = TRUE), specSimParams = getDefSpecSimParams(), clearPath = FALSE, openReport = TRUE, parallel = FALSE, overrideSettings = list() ) genReportSettingsFile(out = "report.yml", baseFrom = NULL)
fGroups |
The |
MSPeakLists, formulas, compounds, compsCluster, components, TPs
|
Further objects ( |
settingsFile |
The path to the report settings file used for report configuration (see |
path |
The destination file path for files generated during reporting. Will be generated if needed. If
|
EICParams |
A named |
EIMParams |
A named |
specSimParams |
A named |
clearPath |
If |
openReport |
If set to |
parallel |
If set to NOTE: parallelization is disabled by default, as it may slow down reporting on some systems (e.g. Windows) and under some circumstances. It is best to experiment with this setting to see if it speeds up report generation for your system and data. |
overrideSettings |
A |
out |
The output file path. |
baseFrom |
An existing report file to which the report settings should be based from. This is primarily used to update old settings files: the output settings file will be based on the old settings and amended with any missing. |
The reporting functionality is typically used at the very end of the workflow. It is used to overview the data generated during the workflow, such as features, their annotations and TP screening results.
report reports all workflow data in an interactive HTML file. The reports include both
tabular data (e.g. retention times, annotation properties, screening results) and various plots (e.g.
chromatograms, (annotated) mass spectra and many more). This function uses functionality from other R packages,
such as rmarkdown, knitr and bslib.
The genReportSettingsFile function generates a new template ‘YAML’ file to configure report
settings (see the next section).
The report generation can be customized with a variety of settings that are read from a
‘YAML’ file. This is especially useful if you want to change more advanced settings or want to add or remove
the parts that are reported. The report settings file is specified through the settingsFile argument. If not
specified then default settings will be used. To ease creation of a new template settings file, the
genReportSettingsFile function can be used.
The following settings are currently available:
General
version: version of the settings file.
format: the report format. Currently this can only be "html".
path: the destination path (ignored if the path argument is specified).
keepUnusedPlots: the number of days that unused plot files are kept (see Plot file caching).
selfContained: If true then the output ‘report.html’ embeds all graphics and script
dependencies. Otherwise these files are read from the report_files/ directory. Self-contained reports are
generally smaller and easily shared, since only the ‘report.html’ needs to be copied. However, they are
slower to generate and plots cannot be cached.
noDate: Set to true to omit the date from the report. Mainly used for internal purposes.
summary: defines the plots on the summary page: chord, venn and/or upset.
features
retMin: if true then retention times are reported in minutes.
chromatograms
large: inclusion of large chromatograms (used in feature group table and TP parent chromatogram
view).
small: inclusion of small chromatograms (feature group table).
features: inclusion of chromatograms for individual features (features view). Set to all
to also include plots for analyses in which a feature was not found (or removed afterwards).
intMax: Method to determine the maximum intensity plot range: eic or feature.
Sets the intMax argument to plotChroms.
mobilograms
large, small, features: inclusion of mobilogram plots, see chromatograms above.
intensityPlots: inclusion of intensity trend plots.
aggregateConcs, aggregateTox: function name used for concentration and toxicity aggregation,
e.g. mean.
MSPeakLists
spectra: inclusion of MS and MS/MS spectra (not annotated).
formulas
include: whether formula results are reported (formula view). If false then the input
formulas object is still used to amend e.g. compound annotated spectra.
normalizeScores, exclNormScores: controls score normalization and which score fields to
exclude from normalization; these are forwarded to e.g. plotScores.
topMost: only report this number of top ranked candidates. This number can be lowered to speed-up
report generation.
compounds
normalizeScores, exclNormScores, topMost: same as formulas, see above.
onlyUsedScorings: if true only scorings used by the current dataset are considered when
normalizing or reporting compound scores.
TPs
internalStandards
graph: inclusion of internal standard network plot
(plotGraph).
When a new report is generated the plot files are stored inside the report_files
sub-directory inside the destination path of the report. The plot files are kept so they can be reused to speed-up
re-creation of reports (e.g. with different report settings). After the report is generated, any unused plot
files are removed unless they were recently created (controlled by the keepUnusedPlots setting, see previous
section). The clearPath argument can be used to completely remove any old files.
The raw data interface of patRoon is used by report to
process HRMS (or IMS-HRMS) data. Please see its documentation for more information on the supported
formats and available configuration options.
No data will be reported for feature groups in any of the reported objects (formulas, compounds
etc) which are not present in the input featureGroups object (fGroups).
The topMost, topMostByReplicate and onlyPresent EIC parameters may be ignored,
e.g., when generating overview plots.
Creating MetFrag landing page URLs based on code from
MetFamily R package.
Xie Y (2014).
“knitr: A Comprehensive Tool for Reproducible Research in R.”
In Stodden V, Leisch F, Peng RD (eds.), Implementing Reproducible Computational Research.
Chapman and Hall/CRC.
ISBN 978-1466561595.
Xie Y (2015).
Dynamic Documents with R and knitr, 2nd edition.
Chapman and Hall/CRC, Boca Raton, Florida.
ISBN 978-1498716963, https://yihui.org/knitr/.
Xie Y (2025).
knitr: A General-Purpose Package for Dynamic Report Generation in R.
R package version 1.51, https://yihui.org/knitr/.
Functionality to report data produced by most workflow steps such as features, feature groups, calculated chemical formulae and tentatively identified compounds. This is the legacy interface, for the updated interface see reporting.
reportCSV( fGroups, path = "report", reportFeatures = FALSE, formulas = NULL, formulasNormalizeScores = "max", formulasExclNormScores = NULL, compounds = NULL, compoundsNormalizeScores = "max", compoundsExclNormScores = c("score", "individualMoNAScore", "annoTypeCount", "annotHitCount", "libMatch"), compsCluster = NULL, components = NULL, retMin = TRUE, clearPath = FALSE ) reportPDF( fGroups, path = "report", reportFGroups = TRUE, formulas = NULL, formulasTopMost = 5, formulasNormalizeScores = "max", formulasExclNormScores = NULL, reportFormulaSpectra = TRUE, compounds = NULL, compoundsNormalizeScores = "max", compoundsExclNormScores = c("score", "individualMoNAScore", "annoTypeCount", "annotHitCount", "libMatch"), compoundsOnlyUsedScorings = TRUE, compoundsTopMost = 5, compsCluster = NULL, components = NULL, MSPeakLists = NULL, retMin = TRUE, EICGrid = c(2, 1), EICParams = getDefEICParams(window = 20, topMost = 1, topMostByReplicate = TRUE), clearPath = FALSE ) ## S4 method for signature 'featureGroups' reportCSV( fGroups, path = "report", reportFeatures = FALSE, formulas = NULL, formulasNormalizeScores = "max", formulasExclNormScores = NULL, compounds = NULL, compoundsNormalizeScores = "max", compoundsExclNormScores = c("score", "individualMoNAScore", "annoTypeCount", "annotHitCount", "libMatch"), compsCluster = NULL, components = NULL, retMin = TRUE, clearPath = FALSE ) ## S4 method for signature 'featureGroups' reportPDF( fGroups, path = "report", reportFGroups = TRUE, formulas = NULL, formulasTopMost = 5, formulasNormalizeScores = "max", formulasExclNormScores = NULL, reportFormulaSpectra = TRUE, compounds = NULL, compoundsNormalizeScores = "max", compoundsExclNormScores = c("score", "individualMoNAScore", "annoTypeCount", "annotHitCount", "libMatch"), compoundsOnlyUsedScorings = TRUE, compoundsTopMost = 5, compsCluster = NULL, components = NULL, MSPeakLists = NULL, retMin = TRUE, EICGrid = c(2, 1), EICParams = getDefEICParams(), clearPath = FALSE )reportCSV( fGroups, path = "report", reportFeatures = FALSE, formulas = NULL, formulasNormalizeScores = "max", formulasExclNormScores = NULL, compounds = NULL, compoundsNormalizeScores = "max", compoundsExclNormScores = c("score", "individualMoNAScore", "annoTypeCount", "annotHitCount", "libMatch"), compsCluster = NULL, components = NULL, retMin = TRUE, clearPath = FALSE ) reportPDF( fGroups, path = "report", reportFGroups = TRUE, formulas = NULL, formulasTopMost = 5, formulasNormalizeScores = "max", formulasExclNormScores = NULL, reportFormulaSpectra = TRUE, compounds = NULL, compoundsNormalizeScores = "max", compoundsExclNormScores = c("score", "individualMoNAScore", "annoTypeCount", "annotHitCount", "libMatch"), compoundsOnlyUsedScorings = TRUE, compoundsTopMost = 5, compsCluster = NULL, components = NULL, MSPeakLists = NULL, retMin = TRUE, EICGrid = c(2, 1), EICParams = getDefEICParams(window = 20, topMost = 1, topMostByReplicate = TRUE), clearPath = FALSE ) ## S4 method for signature 'featureGroups' reportCSV( fGroups, path = "report", reportFeatures = FALSE, formulas = NULL, formulasNormalizeScores = "max", formulasExclNormScores = NULL, compounds = NULL, compoundsNormalizeScores = "max", compoundsExclNormScores = c("score", "individualMoNAScore", "annoTypeCount", "annotHitCount", "libMatch"), compsCluster = NULL, components = NULL, retMin = TRUE, clearPath = FALSE ) ## S4 method for signature 'featureGroups' reportPDF( fGroups, path = "report", reportFGroups = TRUE, formulas = NULL, formulasTopMost = 5, formulasNormalizeScores = "max", formulasExclNormScores = NULL, reportFormulaSpectra = TRUE, compounds = NULL, compoundsNormalizeScores = "max", compoundsExclNormScores = c("score", "individualMoNAScore", "annoTypeCount", "annotHitCount", "libMatch"), compoundsOnlyUsedScorings = TRUE, compoundsTopMost = 5, compsCluster = NULL, components = NULL, MSPeakLists = NULL, retMin = TRUE, EICGrid = c(2, 1), EICParams = getDefEICParams(), clearPath = FALSE )
fGroups |
The |
path |
The destination file path for files generated during reporting. Will be generated if needed. |
reportFeatures |
If set to |
formulas, compounds, compsCluster, components
|
Further objects ( |
compoundsNormalizeScores, formulasNormalizeScores
|
A |
compoundsExclNormScores, formulasExclNormScores
|
A
For |
retMin |
If |
clearPath |
If |
reportFGroups |
If |
formulasTopMost, compoundsTopMost
|
Only this amount of top ranked candidate formulae/compounds are reported.
Lower values may significantly speed up reporting. Set to |
reportFormulaSpectra |
If |
compoundsOnlyUsedScorings |
If |
MSPeakLists |
A |
EICGrid |
An integer vector in the form |
EICParams |
A named |
These functions are usually called at the very end of the workflow. It is used to report various data on features and
feature groups. In addition, these functions may be used for reporting formulae and/or compounds that were generated
for the specified feature groups. Data can be reported in tabular form (i.e. ‘.csv’ files) by
reportCSV or graphically by reportPDF and reportHTML. The latter functions will plot for
instance chromatograms and annotated mass spectra, which are useful to get a graphical overview of results.
All functions have a wide variety of arguments that influence the reporting process. Nevertheless, most parameters are optional and only required to be given for fine tuning. In addition, only those objects (e.g. formulae, compounds, clustering) that are desired to be reported need to be specified.
reportCSV generates tabular data (i.e. ‘.csv’
files) for given data to be reported. This may also be useful to allow
import by other tools for post processing.
reportPDF will report graphical data (e.g. chromatograms and mass spectra) within PDF files.
Compared to reportHTML this function may be faster and yield smaller report files, however, its
functionality is a bit more basic and generated data is more 'scattered' around.
The raw data interface of patRoon is used by the report functions to process HRMS (or IMS-HRMS) data. Please see its documentation for more information on the supported formats and available configuration options.
Any formulae and compounds for feature groups which are not present within fGroups (i.e. because
it has been subset afterwards) will not be reported.
The topMost, topMostByReplicate and onlyPresent EIC parameters may be ignored,
e.g., when generating overview plots.
reporting
Calculation of the relative retention order between a parent and its transformation product (TP).
The relative retention order between a parent and its TP (retDir) is used throughout TP screening workflows
for characterization and prioritization purposes. These are numeric values that hint what the the
chromatographic retention order of a TP might be compared to its parent: a value of ‘-1’ means it will elute
earlier, ‘1’ it will elute later and ‘0’ that there is no significant difference or the direction is
unknown.
For TP data obtained with generateTPs, the missing retDir values are automatically calculated
based on the log P difference between the parent and TP. Here, a typical reversed phase separation is assumed,
i.e. compounds with (significantly) lower log P values likely elute earlier. The minLogPDiff parameter
of the TPStructParams argument sets the minimum log P difference to be considered
significant.
For TP feature candidates that were linked by generateComponentsTPs, the retDir values are
calculated based on the retention time difference between the parent and TP feature groups. The minRTDiff
argument sets the minimum difference to be considered significant.
Helmus R, Bagdonaite I, de Voogt P, van Bommel MR, Schymanski EL, van Wezel AP, ter Laak TL (2025). “Comprehensive Mass Spectrometry Workflows to Systematically Elucidate Transformation Processes of Organic Micropollutants: A Case Study on the Photodegradation of Four Pharmaceuticals.” Environmental Science & Technology, 59(7), 3723–3736. ISSN 1520-5851. doi:10.1021/acs.est.4c09121. http://dx.doi.org/10.1021/acs.est.4c09121.
With sets workflows in patRoon a complete non-target (or suspect) screening workflow is performed with sample analyses that were measured with different MS methods (typically positive and negative ionization).
The analyses files that were measured with a different method are grouped in sets. In the most typical case,
there is a "positive" and "negative" set, for the positively/negatively ionized data, respectively.
However, other distinctions than polarity are also possible (although currently the chromatographic method should be
the same between sets). A sets workflow is typically initiated with the makeSet method. The handbook
contains much more details about sets workflows.
makeSet to initiate sets workflows, workflowStepSet, the Sets workflows
sections in other documentation pages and the patRoon handbook.
Parameters relevant for calculation of similarities between mass spectra.
getDefSpecSimParams(...)getDefSpecSimParams(...)
... |
optional named arguments that override defaults. |
For the calculation of spectral similarities the following parameters exist:
method The similarity method: either "cosine" or "jaccard".
removePrecursor If TRUE then precursor peaks (i.e. the mass peak corresponding to the
feature) are removed prior to similarity calculation.
mzWeight,intWeight Mass and intensity weights used for cosine calculation.
absMzDev Maximum absolute m/z deviation between mass peaks, used for binning spectra. Defaults to
defaultLim("mz", "medium") (see limits).
relMinIntensity The minimum relative intensity for mass peaks (‘0-1’). Peaks with lower intensities
are not considered for similarity calculation. The relative intensities are called after the precursor peak is
removed when removePrecursor=TRUE.
minPeaks Only consider spectra that have at least this amount of peaks (after the spectrum is
filtered).
shift If and how shifting is applied prior to similarity calculation. Valid options are: "none"
(no shifting), "precursor" (all mass peaks of the second spectrum are shifted by the mass difference between
the precursors of both spectra) or "both" (the spectra are first binned without shifting, and peaks still
unaligned are then shifted as is done when shift="precursor").
setCombinedMethod (sets workflow) Determines how spectral similarities from different sets are combined.
Possible values are "mean", "min" or "max", which calculates the combined value as the mean,
minimum or maximum value, respectively. NA values (e.g. if a set does not have peak list data to
combine) are removed in advance.
These parameters are typically passed as a named list as the specSimParams argument to functions that
do spectral similarity calculations. The getDefSpecSimParams function can be used to generate such parameter
list with defaults.
Utilities to screen for analytes with known or suspected identity.
screenSuspects( fGroups, suspects, rtWindow = defaultLim("retention", "medium"), mzWindow = defaultLim("mz", "medium"), IMSMatchParams = NULL, adduct = NULL, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, onlyHits = FALSE, ... ) ## S4 method for signature 'featureGroups' screenSuspects( fGroups, suspects, rtWindow, mzWindow, IMSMatchParams, adduct, skipInvalid, prefCalcChemProps, neutralChemProps, onlyHits ) ## S4 method for signature 'featureGroupsScreening' screenSuspects( fGroups, suspects, rtWindow, mzWindow, IMSMatchParams, adduct, skipInvalid, prefCalcChemProps, neutralChemProps, onlyHits, amend = FALSE ) ## S4 method for signature 'featureGroupsSet' screenSuspects( fGroups, suspects, rtWindow, mzWindow, IMSMatchParams, adduct, skipInvalid, prefCalcChemProps, neutralChemProps, onlyHits ) ## S4 method for signature 'featureGroupsScreeningSet' screenSuspects( fGroups, suspects, rtWindow, mzWindow, IMSMatchParams, adduct, skipInvalid, prefCalcChemProps, neutralChemProps, onlyHits, amend = FALSE )screenSuspects( fGroups, suspects, rtWindow = defaultLim("retention", "medium"), mzWindow = defaultLim("mz", "medium"), IMSMatchParams = NULL, adduct = NULL, skipInvalid = TRUE, prefCalcChemProps = TRUE, neutralChemProps = FALSE, onlyHits = FALSE, ... ) ## S4 method for signature 'featureGroups' screenSuspects( fGroups, suspects, rtWindow, mzWindow, IMSMatchParams, adduct, skipInvalid, prefCalcChemProps, neutralChemProps, onlyHits ) ## S4 method for signature 'featureGroupsScreening' screenSuspects( fGroups, suspects, rtWindow, mzWindow, IMSMatchParams, adduct, skipInvalid, prefCalcChemProps, neutralChemProps, onlyHits, amend = FALSE ) ## S4 method for signature 'featureGroupsSet' screenSuspects( fGroups, suspects, rtWindow, mzWindow, IMSMatchParams, adduct, skipInvalid, prefCalcChemProps, neutralChemProps, onlyHits ) ## S4 method for signature 'featureGroupsScreeningSet' screenSuspects( fGroups, suspects, rtWindow, mzWindow, IMSMatchParams, adduct, skipInvalid, prefCalcChemProps, neutralChemProps, onlyHits, amend = FALSE )
fGroups |
The |
suspects |
A (sets workflow) Can also be a |
rtWindow, mzWindow
|
The retention time window (in seconds) and m/z window that will be used for matching a suspect (+/- feature data). |
IMSMatchParams |
(IMS workflow) A |
adduct |
An |
skipInvalid |
If set to |
prefCalcChemProps |
If |
neutralChemProps |
If |
onlyHits |
If |
... |
Further arguments specified to the methods. |
amend |
If |
Besides 'full non-target analysis', where compounds may be identified with little to no prior knowledge, a common strategy is to screen for compounds with known or suspected identity. This may be a generally favorable approach if possible, as it can significantly reduce the load on data interpretation.
screenSuspects is used to perform suspect screening. The input featureGroups object will be
screened for suspects by m/z values and optionally retention times. Afterwards, any feature groups not matched
may be kept or removed, depending whether a full non-target analysis is desired.
screenSuspects returns a featureGroupsScreening object, which is a copy of the input
fGroups object amended with additional screening information.
the suspects argument for screenSuspects should be a data.frame
with the following mandatory and optional columns:
name The suspect name. Must be file-compatible. (mandatory)
rt The retention time (in seconds) for the suspect. If specified the suspect will only be matched if
its retention matches the experimental value (tolerance defined by the rtWindow argument).
(optional)
neutralMass,formula,SMILES,InChI The neutral monoisotopic mass, chemical formula,
SMILES or InChI for the suspect. (data from one of these columns are mandatory in case no value from the
mz column is available for a suspect)
mz The ionized m/z of the suspect. (mandatory unless it can be calculated from one of
the aforementioned columns)
adduct A character that can be converted with as.adduct. Can be used to
automatically calculate values for the mz column. (mandatory unless data from the mz column
is available, the adduct argument is set or fGroups has adduct annotations)
fragments_mz,fragments_formula One or more MS/MS fragments (specified as m/z or
formulae, respectively). Multiple values can be specified by separating them with a semicolon (;). This data
is used by estimateIDConfidence to report detected MS/MS fragments and calculate identification levels.
(optional)
mobility,CCS The mobility or CCS value of the suspect. These values may be used to
filter out suspects, see the IMSMatchParams argument. Multiple values for a single suspect can be specified
by separating them with a semicolon(;). Adduct specific columns may be added by suffixing the adduct to the
column name, e.g. mobility_[M+H]+ and CCS_[M-H]-. (optional)
How the mass of a suspect is matched with the mass of a feature depends on the available data:
If the suspect has data from the mz column of the suspect list, then this data is matched with the
detected feature m/z.
Otherwise, if the suspect has data in the adduct column of the suspect list, this data is used to
calculate its mz value, which is then used like above.
In the last case, the neutral mass of the suspect is matched with the neutral mass of the feature. Hence,
either the adduct argument needs to be specified, or the featureGroups input object must have adduct
annotations.
If both adduct specific and non-adduct specific reference values are available,
then non-adduct specific data is chosen (unless NA) as reference for the suspect hit. Otherwise, data is
taken from the adduct specific data corresponding to the adduct assigned to the feature group (or adduct
argument). If multiple mobility or CCS values for a suspect are specified in the suspect list, then the
reference value is chosen which is the closest to that of the feature.
Chemical properties such as SMILES, InChIKey and formulae in the suspect list are automatically validated and calculated if missing/invalid.
The internal validation/calculation process performs the following steps:
Validation of SMILES, InChI, InChIKey and formula data (if present). Invalid
entries will be set to NA.
If neutralChemProps=TRUE then chemical data (SMILES, formulae etc.) is neutralized by
(de-)protonation (using the --neutralized option of OpenBabel). An additional column
molNeutralized is added to mark those molecules that were neutralized. Note that neutralization requires
either SMILES or InChI data to be available.
The SMILES and InChI data are used to calculate missing or invalid SMILES,
InChI, InChIKey and formula data. If prefCalcChemProps=TRUE then existing
InChIKey and formula data is overwritten by calculated values whenever possible.
The chemical formulae which were not calculated are verified and normalized. This process may be time
consuming, and is potentially largely avoided by setting prefCalcChemProps=TRUE.
Neutral masses are calculated for missing values (prefCalcChemProps=FALSE) or whenever possible
(prefCalcChemProps=TRUE).
Note that calculation of formulae for molecules that are isotopically labelled is currently only supported for deuterium (2H) elements.
This functionality relies heavily on OpenBabel, please make sure it is installed.
In a sets workflow, screenSuspects performs suspect screening
for each set separately, and the screening results are combined afterwards. The sets column in the
screenInfo data marks in which sets the suspect hit was found.
screenSuspects may use the suspect names to base file names used for reporting, logging etc. Therefore,
it is important that these are file-compatible names. For this purpose, screenSuspects will automatically
try to convert long, non-unique and/or otherwise incompatible suspect names.
OBoyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011). “Open Babel: An open chemical toolbox.” Journal of Cheminformatics, 3(1). doi:10.1186/1758-2946-3-33.
featureGroupsScreening
This function returns a data.frame with the default rules for metabolic logic, which can be used by
generateTPsLogic and genFormulaTPLibrary.
TPLogicTransformations()TPLogicTransformations()
A data.frame with columns describing each transformation rule.
The table is based on the work done by Schollee et al. (see references).
Schollee JE, Schymanski EL, Avak SE, Loos M, Hollender J (2015). “Prioritizing Unknown Transformation Products from Biologically-Treated Wastewater Using High-Resolution Mass Spectrometry, Multivariate Statistics, and Metabolic Logic.” Analytical Chemistry, 87(24), 12121–12129. doi:10.1021/acs.analchem.5b02905.
Holds information for all TPs for a set of parents.
parents(TPs) products(TPs) ## S4 method for signature 'transformationProducts' parents(TPs) ## S4 method for signature 'transformationProducts' products(TPs) ## S4 method for signature 'transformationProducts' length(x) ## S4 method for signature 'transformationProducts' names(x) ## S4 method for signature 'transformationProducts' show(object) ## S4 method for signature 'transformationProducts,ANY,missing,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'transformationProducts,ANY,missing' x[[i, j]] ## S4 method for signature 'transformationProducts' x$name ## S4 method for signature 'transformationProducts' as.data.table(x) ## S4 method for signature 'transformationProducts' convertToSuspects(obj, includeParents = FALSE) ## S4 method for signature 'transformationProducts' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'transformationProducts' filter(obj, properties = NULL, verbose = TRUE, negate = FALSE)parents(TPs) products(TPs) ## S4 method for signature 'transformationProducts' parents(TPs) ## S4 method for signature 'transformationProducts' products(TPs) ## S4 method for signature 'transformationProducts' length(x) ## S4 method for signature 'transformationProducts' names(x) ## S4 method for signature 'transformationProducts' show(object) ## S4 method for signature 'transformationProducts,ANY,missing,missing' x[i, j, ..., drop = TRUE] ## S4 method for signature 'transformationProducts,ANY,missing' x[[i, j]] ## S4 method for signature 'transformationProducts' x$name ## S4 method for signature 'transformationProducts' as.data.table(x) ## S4 method for signature 'transformationProducts' convertToSuspects(obj, includeParents = FALSE) ## S4 method for signature 'transformationProducts' delete(obj, i = NULL, j = NULL, ...) ## S4 method for signature 'transformationProducts' filter(obj, properties = NULL, verbose = TRUE, negate = FALSE)
TPs, x, obj, object
|
|
i, j
|
For |
... |
For |
drop |
ignored. |
name |
The parent name (partially matched). |
includeParents |
If |
properties |
A named |
verbose |
If set to |
negate |
If |
This class holds all generated data for transformation products for a set of parents. The class is virtual and
derived objects are created by TP generators.
delete returns the object for which the specified data was removed.
filter returns a filtered transformationProducts object.
parents(transformationProducts): Accessor method for the parents slot of a
transformationProducts class.
products(transformationProducts): Accessor method for the products slot.
length(transformationProducts): Obtain total number of transformation products.
names(transformationProducts): Obtain the names of all parents in this object.
show(transformationProducts): Show summary information for this object.
x[i: Subset on parents.
x[[i: Extracts a table with TPs for a parent.
$: Extracts a table with TPs for a parent.
as.data.table(transformationProducts): Returns all TP data in a table.
convertToSuspects(transformationProducts): Converts this object to a suspect list that can be used as input for
screenSuspects.
delete(transformationProducts): Completely deletes specified transformation product data.
filter(transformationProducts): Performs rule-based filtering. Useful to simplify and clean-up the data.
parentsA data.table with metadata for all parents that have TPs in this object. Use the
parents method for access.
productsA list with data.table entries with TP information for each parent. Use the
products method for access.
The derived transformationProductsStructure class for more methods, generateTPs
and retDir.
Class to store results transformation products (TPs) obtained from compound annotations.
## S4 method for signature 'transformationProductsAnnComp' filter( obj, ..., minFitFormula = 0, minFitCompound = 0, minSimSusp = 0, minFitCompOrSimSusp = c(0, 0), minTPScore = 0, topMost = NULL, verbose = TRUE, negate = FALSE )## S4 method for signature 'transformationProductsAnnComp' filter( obj, ..., minFitFormula = 0, minFitCompound = 0, minSimSusp = 0, minFitCompOrSimSusp = c(0, 0), minTPScore = 0, topMost = NULL, verbose = TRUE, negate = FALSE )
obj |
The |
... |
Further arguments passed to the parent filter method. |
minFitFormula, minFitCompound, minSimSusp, minFitCompOrSimSusp, minTPScore
|
Thresholds related to TP scoring. See
|
topMost |
Only keep this number of top-most TPs (based on |
verbose |
If set to |
negate |
If |
This class is derived from the transformationProductsStructure base class, please see its documentation
for more details. Objects from this class are returned by generateTPsAnnComp.
filter(transformationProductsAnnComp): Performs rule-based filtering. Useful to simplify and clean-up the data.
The base class transformationProductsStructure for more relevant methods and
generateTPsAnnComp
Class to store results transformation products (TPs) obtained from formula annotations.
## S4 method for signature 'transformationProductsAnnForm' filter( obj, ..., minFitFormula = 0, minTPScore = 0, topMost = NULL, verbose = TRUE, negate = FALSE )## S4 method for signature 'transformationProductsAnnForm' filter( obj, ..., minFitFormula = 0, minTPScore = 0, topMost = NULL, verbose = TRUE, negate = FALSE )
obj |
The |
... |
Further arguments passed to the parent filter method. |
minFitFormula, minTPScore
|
Thresholds related to TP scoring. See |
topMost |
Only keep this number of top-most TPs (based on |
verbose |
If set to |
negate |
If |
This class is derived from the transformationProductsFormula base class, please see its documentation
for more details. Objects from this class are returned by generateTPsAnnForm.
filter(transformationProductsAnnForm): Performs rule-based filtering. Useful to simplify and clean-up the data.
The base class transformationProductsFormula for more relevant methods and
generateTPsAnnForm
Holds information for all TPs for a set of parents, including chemical formulae.
## S4 method for signature 'transformationProductsFormula' plotGraph( obj, which, components = NULL, prune = TRUE, onlyCompletePaths = FALSE, width = NULL, height = NULL )## S4 method for signature 'transformationProductsFormula' plotGraph( obj, which, components = NULL, prune = TRUE, onlyCompletePaths = FALSE, width = NULL, height = NULL )
obj |
|
which |
Either a |
components |
If specified (i.e. not |
prune |
If |
onlyCompletePaths |
If |
width, height
|
Passed to |
This (virtual) class is derived from the transformationProducts base class, please see its
documentation for more details. Objects from this class are returned by TP generators. More
specifically, algorithms that works with chemical formulae (e.g. library_formula), uses this class to
store their results. The methods defined for this class extend the functionality for the base
transformationProducts class.
plotGraph returns the result of visNetwork.
plotGraph(transformationProductsFormula): Plots an interactive hierarchy graph of the transformation products. The
resulting graph can be browsed interactively and allows exploration of the different TP formation pathways.
Furthermore, results from TP componentization can be used to match the hierarchy
with screening results. The graph is rendered with visNetwork.
The base class transformationProducts for more relevant methods and generateTPs
Holds information for all TPs for a set of parents, including structural information.
## S4 method for signature 'transformationProductsStructure' convertToMFDB(TPs, out, includeParents = FALSE) ## S4 method for signature 'transformationProductsStructure' filter( obj, ..., removeParentIsomers = FALSE, removeTPIsomers = FALSE, removeDuplicates = FALSE, minSimilarity = NULL, verbose = TRUE, negate = FALSE ) ## S4 method for signature 'transformationProductsStructure' plotGraph( obj, which, components = NULL, structuresMax = 25, prune = TRUE, onlyCompletePaths = FALSE, width = NULL, height = NULL ) ## S4 method for signature 'transformationProductsStructure' plotVenn(obj, ..., commonParents = FALSE, labels = NULL, vennArgs = NULL) ## S4 method for signature 'transformationProductsStructure' plotUpSet( obj, ..., commonParents = FALSE, labels = NULL, nsets = NULL, nintersects = NA, upsetArgs = NULL ) ## S4 method for signature 'transformationProductsStructure' consensus( obj, ..., absMinAbundance = NULL, relMinAbundance = NULL, uniqueFrom = NULL, uniqueOuter = FALSE, labels = NULL )## S4 method for signature 'transformationProductsStructure' convertToMFDB(TPs, out, includeParents = FALSE) ## S4 method for signature 'transformationProductsStructure' filter( obj, ..., removeParentIsomers = FALSE, removeTPIsomers = FALSE, removeDuplicates = FALSE, minSimilarity = NULL, verbose = TRUE, negate = FALSE ) ## S4 method for signature 'transformationProductsStructure' plotGraph( obj, which, components = NULL, structuresMax = 25, prune = TRUE, onlyCompletePaths = FALSE, width = NULL, height = NULL ) ## S4 method for signature 'transformationProductsStructure' plotVenn(obj, ..., commonParents = FALSE, labels = NULL, vennArgs = NULL) ## S4 method for signature 'transformationProductsStructure' plotUpSet( obj, ..., commonParents = FALSE, labels = NULL, nsets = NULL, nintersects = NA, upsetArgs = NULL ) ## S4 method for signature 'transformationProductsStructure' consensus( obj, ..., absMinAbundance = NULL, relMinAbundance = NULL, uniqueFrom = NULL, uniqueOuter = FALSE, labels = NULL )
out |
The file name of the the output |
includeParents |
Set to |
obj, TPs
|
|
... |
For For |
removeParentIsomers |
If |
removeTPIsomers |
If |
removeDuplicates |
If |
minSimilarity |
Minimum structure similarity (‘0-1’) that a TP should have relative to its parent. This
data is only available if the |
verbose |
If set to |
negate |
If |
which |
Either a |
components |
If specified (i.e. not |
structuresMax |
An |
prune |
If |
onlyCompletePaths |
If |
width, height
|
Passed to |
commonParents |
Only consider TPs from parents that are common to all compared objects. |
labels |
A |
vennArgs |
A |
nsets, nintersects
|
See |
upsetArgs |
A list with any further arguments to be passed to
|
absMinAbundance, relMinAbundance
|
Minimum absolute or relative
(‘0-1’) abundance across objects for a result to be kept. For
instance, |
uniqueFrom |
Set this argument to only retain TPs that are unique
within one or more of the objects for which the consensus is made.
Selection is done by setting the value of |
uniqueOuter |
If |
This (virtual) class is derived from the transformationProducts base class, please see its
documentation for more details. Objects from this class are returned by TP generators. More
specifically, algorithms that works with chemical structures (e.g. biotransformer), uses this class to
store their results. The methods defined for this class extend the functionality for the base
transformationProducts class.
filter returns a filtered transformationProductsStructure object.
plotGraph returns the result of visNetwork.
plotVenn (invisibly) returns a list with the following fields:
gList the gList object that was returned by
the utilized VennDiagram plotting function.
areas The total area for each plotted group.
intersectionCounts The number of intersections between groups.
The order for the areas and intersectionCounts fields is the same as the parameter order
from the used plotting function (see e.g. draw.pairwise.venn and
draw.triple.venn).
consensus returns a transformationProductsStructure object that is produced by merging results
from multiple transformationProductsStructure objects.
convertToMFDB(transformationProductsStructure): Exports this object as a ‘.csv’ file that can be used as a MetFrag local
database. Any duplicate TPs (formed by different pathways or parents) will be merged based on their
InChIKey.
filter(transformationProductsStructure): Performs rule-based filtering. Useful to simplify and clean-up the data.
plotGraph(transformationProductsStructure): Plots an interactive hierarchy graph of the transformation products. The
resulting graph can be browsed interactively and allows exploration of the different TP formation pathways.
Furthermore, results from TP componentization can be used to match the hierarchy
with screening results. The graph is rendered with visNetwork.
plotVenn(transformationProductsStructure): plots a Venn diagram (using VennDiagram) outlining unique and
shared candidates of up to five different featureAnnotations objects.
plotUpSet(transformationProductsStructure): Plots an UpSet diagram (using the upset function)
outlining unique and shared TPs between different transformationProductsStructure objects.
consensus(transformationProductsStructure): Generates a consensus from different
transformationProductsStructure objects. Currently this removes any hierarchical data, and all TPs are
considered to originate from the same (original) parent.
The methods that compare different objects (e.g. plotVenn and
consensus) use the InChIKey to match TPs between objects. Moreover, the parents between objects
are matched by their name. Hence, it is crucial that the input parents to generateTPs
(i.e. the parents argument) are named equally.
consensus: If the retDir values differs between matched TPs it will be set to ‘0’. If
structure similarity data is available (i.e. calcSims=TRUE to generateTPs) then the mean
similarity is calculated.
Conway JR, Lex A, Gehlenborg N (2017).
“UpSetR: an R package for the visualization of intersecting sets and their properties.”
Bioinformatics, 33(18), 2938-2940.
doi:10.1093/bioinformatics/btx364.
http://dx.doi.org/10.1093/bioinformatics/btx364.
Lex A, Gehlenborg N, Strobelt H, Vuillemot R, Pfister H (2014).
“UpSet: Visualization of Intersecting Sets.”
IEEE Transactions on Visualization and Computer Graphics, 20(12), 1983–1992.
doi:10.1109/tvcg.2014.2346248.
The base class transformationProducts for more relevant methods and generateTPs
Verifies if all dependencies are installed properly and instructs the user if this is not the case.
verifyDependencies()verifyDependencies()
This function is inspired by
withr::with_options: it can be used to
execute some code where package options are temporarily changed. This
function uses a shortened syntax, especially when changing options for
patRoon.
withOpt(code, ..., prefix = "patRoon.")withOpt(code, ..., prefix = "patRoon.")
code |
The code to be executed. |
... |
Named arguments with options to change. |
prefix |
A |
## Not run: # Set max parallel processes to five while performing formula calculations withOpt(MP.maxProcs = 5, { formulas <- generateFormulas(fGroups, "genform", ...) }) ## End(Not run)## Not run: # Set max parallel processes to five while performing formula calculations withOpt(MP.maxProcs = 5, { formulas <- generateFormulas(fGroups, "genform", ...) }) ## End(Not run)
All workflow objects (e.g. featureGroups,
compounds, etc) are derived from this class. Objects from this
class are never created directly.
## S4 method for signature 'workflowStep' algorithm(obj) ## S4 method for signature 'workflowStep' as.data.table(x, keep.rownames = FALSE, ...) ## S4 method for signature 'workflowStep' as.data.frame(x, row.names = NULL, optional = FALSE, ...) ## S4 method for signature 'workflowStep' show(object)## S4 method for signature 'workflowStep' algorithm(obj) ## S4 method for signature 'workflowStep' as.data.table(x, keep.rownames = FALSE, ...) ## S4 method for signature 'workflowStep' as.data.frame(x, row.names = NULL, optional = FALSE, ...) ## S4 method for signature 'workflowStep' show(object)
obj, x, object
|
An object (derived from) this class. |
keep.rownames |
Ignored. |
... |
Method specific arguments. Please see the documentation of the derived classes. |
row.names, optional
|
Ignored. |
algorithm(workflowStep): Returns the algorithm that was used to generate an
object.
as.data.table(workflowStep): Summarizes the data in this object and returns this
as a data.table.
as.data.frame(workflowStep): This method simply calls as.data.table and
converts the result to a classic a data.frame.
show(workflowStep): Shows summary information for this object.
algorithmThe algorithm that was used to generate this object. Use the
algorithm method for access.
This class is the base for many sets workflows related classes. This class is virtual, and therefore never created directly.
## S4 method for signature 'workflowStepSet' setObjects(obj) ## S4 method for signature 'workflowStepSet' sets(obj) ## S4 method for signature 'workflowStepSet' show(object)## S4 method for signature 'workflowStepSet' setObjects(obj) ## S4 method for signature 'workflowStepSet' sets(obj) ## S4 method for signature 'workflowStepSet' show(object)
obj, object
|
An object that is derived from |
The most important purpose of this class is to hold data that is specific for a set. These set objects are
typically objects with classes from a regular non-sets workflow (e.g. components,
compounds), and are used by the sets workflow object to e.g. form a consensus. Since the set
objects may contain additional data, such as algorithm specific slots, it may in some cases be of interest to access
them directly with the setObjects method (described below).
setObjects(workflowStepSet): Accessor for the setObjects slot.
sets(workflowStepSet): Returns the names for each set in this object.
show(workflowStepSet): Shows summary information for this object.
setObjectsA list with the set objects (see the Details section). The list is named
with the set names.
Converts a features or featureGroups object to an xcmsSet or
XCMSnExp object.
getXCMSSet(obj, verbose = TRUE, ...) getXCMSnExp(obj, verbose = TRUE, ...) ## S4 method for signature 'features' getXCMSSet(obj, verbose, loadRawData, IMS = FALSE) ## S4 method for signature 'featuresXCMS' getXCMSSet(obj, verbose = TRUE, ...) ## S4 method for signature 'featureGroups' getXCMSSet(obj, verbose, loadRawData, IMS = FALSE) ## S4 method for signature 'featureGroupsXCMS' getXCMSSet(obj, verbose, loadRawData, ...) ## S4 method for signature 'featuresSet' getXCMSSet(obj, ..., set) ## S4 method for signature 'featureGroupsSet' getXCMSSet(obj, ..., set) ## S4 method for signature 'features' getXCMSnExp(obj, verbose, loadRawData, IMS = FALSE) ## S4 method for signature 'featuresXCMS3' getXCMSnExp(obj, verbose = TRUE, ...) ## S4 method for signature 'featureGroups' getXCMSnExp(obj, verbose, loadRawData, IMS = FALSE) ## S4 method for signature 'featureGroupsXCMS3' getXCMSnExp(obj, verbose, loadRawData, ...) ## S4 method for signature 'featuresSet' getXCMSnExp(obj, ..., set) ## S4 method for signature 'featureGroupsSet' getXCMSnExp(obj, ..., set)getXCMSSet(obj, verbose = TRUE, ...) getXCMSnExp(obj, verbose = TRUE, ...) ## S4 method for signature 'features' getXCMSSet(obj, verbose, loadRawData, IMS = FALSE) ## S4 method for signature 'featuresXCMS' getXCMSSet(obj, verbose = TRUE, ...) ## S4 method for signature 'featureGroups' getXCMSSet(obj, verbose, loadRawData, IMS = FALSE) ## S4 method for signature 'featureGroupsXCMS' getXCMSSet(obj, verbose, loadRawData, ...) ## S4 method for signature 'featuresSet' getXCMSSet(obj, ..., set) ## S4 method for signature 'featureGroupsSet' getXCMSSet(obj, ..., set) ## S4 method for signature 'features' getXCMSnExp(obj, verbose, loadRawData, IMS = FALSE) ## S4 method for signature 'featuresXCMS3' getXCMSnExp(obj, verbose = TRUE, ...) ## S4 method for signature 'featureGroups' getXCMSnExp(obj, verbose, loadRawData, IMS = FALSE) ## S4 method for signature 'featureGroupsXCMS3' getXCMSnExp(obj, verbose, loadRawData, ...) ## S4 method for signature 'featuresSet' getXCMSnExp(obj, ..., set) ## S4 method for signature 'featureGroupsSet' getXCMSnExp(obj, ..., set)
obj |
The object that should be converted. |
verbose |
If |
... |
(sets workflow) Further arguments passed to non-sets method. Otherwise ignored. |
loadRawData |
Set to |
IMS |
(IMS workflow) Specifies which feature groups are considered for export in IMS workflows. The following options are valid:
This should be kept |
set |
(sets workflow) The name of the set to be exported. |
The conversion process will introduce some dummy values for metadata not present in patRoon objects. If the
features or featureGroups object was generated with XCMS, then no conversion is performed and the
original XCMS object will be returned, if possible. Conversion may still occur e.g. due to the
application of some subsetting or filtering steps or the re-ordering of analyses.
In a sets workflow, unset is used to convert the
feature (group) data before the object is exported.
reference Benton HP, Want EJ, Ebbels TMD (2010).
“Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data.”
BIOINFORMATICS, 26, 2488.
Louail P, Brunius C, Garcia-Aloy M, Kumler W, Storz N, Stanstrup J, Treutler H, Vangeenderhuysen P, Witting M, Neumann S, Rainer J (2025).
“xcms in Peak Form: Now Anchoring a Complete Metabolomics Data Preprocessing and Analysis Software Ecosystem.”
Analytical Chemistry.
doi:10.1021/acs.analchem.5c04338.
https://doi.org/10.1021/acs.analchem.5c04338.
Smith, C.A., Want, E.J., O'Maille, G., Abagyan,R., Siuzdak, G. (2006).
“XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching and identification.”
Analytical Chemistry, 78, 779–787.
Tautenhahn R, Boettcher C, Neumann S (2008).
“Highly sensitive feature detection for high resolution LC/MS.”
BMC Bioinformatics, 9, 504.