Title: | Detecting Isotope, Adduct and Homologue Relations in LC-MS Data |
---|---|
Description: | Screening a HRMS data set for peaks related by (1) isotope patterns, (2) different adducts of the same molecule and/or (3) homologue series. The resulting isotopic pattern and adduct groups can then be combined to so-called components, with homologue series information attached. Also allows plotting and filtering HRMS data for mass defects, frequent m/z distances and components vs. non-components. |
Authors: | Martin Loos |
Maintainer: | Martin Loos <[email protected]> |
License: | GPL-3 |
Version: | 2.2 |
Built: | 2024-11-12 04:26:47 UTC |
Source: | https://github.com/blosloos/nontarget |
Grouping of peaks in a HRMS data set for (1) isotopic pattern relations and (2) different adducts of the same molecule; detection of (3) homologue series. Isotopic pattern and adduct groups can then be related to their (unknown) candidate chemical component, with homologue series information attached. Includes various plotting and filtering functions for e.g. mass defects, frequent m/z distances, components vs. non-components, adduct frequencies.
Package: | nontarget |
Type: | Package |
Version: | 1.9 |
Date: | 2016-03-21 |
License: | GPL-3 |
Screens a HRMS data set for peaks related by (1) isotopic patterns and/or (2) different adducts of the same molecule and/or (3) being part of a homologue series, including various plausibility checks. The resulting isotopic pattern groups and adduct groups can then be combined to components, with each component tagged if being part of a homologue series. This does not require prior knowledge about the chemical nature of the components assigned.
Includes various plotting functions, such as (a) m/z vs. RT vs. mass defect, (b) mass defect vs. detected isotope m/z increments, (c) adduct frequencies and their intensity distributions, (d) relations among peaks within single isotope/adduct groups and within single components and (e) homologue series within RT vs. m/z plots. Allows filtering HRMS data for mass defects, satellite peaks, frequent m/z distances and components vs. non-components. Lists of most-common adducts and isotopes are provided or may be user-defined.
Requires HRMS centroid peak lists as input, i.e., a dataframe or matrix with values of (a) m/z, (b) intensity and (c) retention time (RT) per peak. In addition, tolerances for m/z, RT and uncertainties in peak intensity must be defined by the user.
Martin Loos
Maintainer: Martin Loos <[email protected]>
Loos, M., Hollender, J., Schymanski, E., Ruff, M., Singer, H., 2012. Bottom-up peak grouping for unknown identification from high-resolution mass spectrometry data. ASMS 2012 annual conference Vancouver, oral session Informatics: Identification.
Detecting isotope pattern groups:
peaklist
make.isos
pattern.search
pattern.search2
plotisotopes
plotdefect
isotopes
resolution_list
Detecting adduct groups:
peaklist
adduct.search
plotadduct
adducts
Detecting homologue series:
peaklist
homol.search
plothomol
On combining groups to components:
combine
plotisotopes
plotcomp
ms.filter
On filtering and plotting:
rm.sat
plotall
plotgroup
ms.filter
plotdiff
deter.iso
###################################################### # (0) load required data: ############################ # (0.1) HRMS peak list & remove satelite peaks: ###### data(peaklist); peaklist<-rm.sat(peaklist,dmz=0.3,drt=0.1,intrat=0.015,spar=0.8,corcut=-1000,plotit=TRUE); peaklist<-peaklist[peaklist[,4],1:3]; # (0.2) list of adducts - package enviPat ############ data(adducts); # (0.3) list of isotopes - package enviPat ########### data(isotopes); ###################################################### # (1) run isotope pattern grouping ################### # (1.1) define isotopes and charge argument ########## iso<-make.isos(isotopes, use_isotopes=c("13C","15N","34S","37Cl","81Br","41K","13C","15N","34S","37Cl","81Br","41K"), use_charges=c(1,1,1,1,1,1,2,2,2,2,2,2)) # (1.2) run isotope grouping ######################### pattern<-pattern.search( peaklist, iso, cutint=10000, rttol=c(-0.05,0.05), mztol=2, mzfrac=0.1, ppm=TRUE, inttol=0.2, rules=c(TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE), deter=FALSE, entry=50 ); # (1.3) plot results ################################# plotisotopes(pattern); plotdefect(pattern,elements=c("N")); ###################################################### # (2.1) run grouping of peaks for different adducts ## # of the same candidate molecule ##################### adduct<-adduct.search( peaklist, adducts, rttol=0.05, mztol=3, ppm=TRUE, use_adducts=c("M+K","M+H","M+Na","M+NH4"), ion_mode="positive" ); # (2.2) plot results ################################# plotadduct(adduct); ###################################################### # (3) show single pattern group and its relation ##### # to adduct groups ################################### plotall(pattern,adduct); plotgroup(pattern,adduct,groupID=1,massrange=10,allmass=FALSE); ###################################################### # (4.1) Screen for homologue series ################## homol<-homol.search( peaklist, isotopes, elements=c("C","H","O"), use_C=TRUE, minmz=5, maxmz=120, minrt=-1, maxrt=2, ppm=TRUE, mztol=3.5, rttol=0.5, minlength=5, mzfilter=FALSE, vec_size=3E6, spar=.45, R2=.98, plotit=FALSE ) # (4.2) Plot results ################################# plothomol(homol,xlim=FALSE,ylim=FALSE,plotlegend=TRUE); plothomolplotly(homol); ###################################################### # (5.1) Combine grouping results to components ####### comp <- combine( pattern, adduct, homol, dont = FALSE, rules = c(TRUE, TRUE, FALSE, FALSE) ); # (5.2) plot results ################################# plotisotopes(comp); #plotcomp(comp,compoID=1,peakID=FALSE); ###################################################### # (6) Select data from interactive plot ############## # ms.filter( component=comp,x="mz",y="dm",xlim=FALSE, # ylim=FALSE,rm.comp=TRUE,plot.comp=TRUE,rm.noncomp=FALSE, # select.polygon="inside",res=100,filter.for="raw" ); ######################################################
###################################################### # (0) load required data: ############################ # (0.1) HRMS peak list & remove satelite peaks: ###### data(peaklist); peaklist<-rm.sat(peaklist,dmz=0.3,drt=0.1,intrat=0.015,spar=0.8,corcut=-1000,plotit=TRUE); peaklist<-peaklist[peaklist[,4],1:3]; # (0.2) list of adducts - package enviPat ############ data(adducts); # (0.3) list of isotopes - package enviPat ########### data(isotopes); ###################################################### # (1) run isotope pattern grouping ################### # (1.1) define isotopes and charge argument ########## iso<-make.isos(isotopes, use_isotopes=c("13C","15N","34S","37Cl","81Br","41K","13C","15N","34S","37Cl","81Br","41K"), use_charges=c(1,1,1,1,1,1,2,2,2,2,2,2)) # (1.2) run isotope grouping ######################### pattern<-pattern.search( peaklist, iso, cutint=10000, rttol=c(-0.05,0.05), mztol=2, mzfrac=0.1, ppm=TRUE, inttol=0.2, rules=c(TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE), deter=FALSE, entry=50 ); # (1.3) plot results ################################# plotisotopes(pattern); plotdefect(pattern,elements=c("N")); ###################################################### # (2.1) run grouping of peaks for different adducts ## # of the same candidate molecule ##################### adduct<-adduct.search( peaklist, adducts, rttol=0.05, mztol=3, ppm=TRUE, use_adducts=c("M+K","M+H","M+Na","M+NH4"), ion_mode="positive" ); # (2.2) plot results ################################# plotadduct(adduct); ###################################################### # (3) show single pattern group and its relation ##### # to adduct groups ################################### plotall(pattern,adduct); plotgroup(pattern,adduct,groupID=1,massrange=10,allmass=FALSE); ###################################################### # (4.1) Screen for homologue series ################## homol<-homol.search( peaklist, isotopes, elements=c("C","H","O"), use_C=TRUE, minmz=5, maxmz=120, minrt=-1, maxrt=2, ppm=TRUE, mztol=3.5, rttol=0.5, minlength=5, mzfilter=FALSE, vec_size=3E6, spar=.45, R2=.98, plotit=FALSE ) # (4.2) Plot results ################################# plothomol(homol,xlim=FALSE,ylim=FALSE,plotlegend=TRUE); plothomolplotly(homol); ###################################################### # (5.1) Combine grouping results to components ####### comp <- combine( pattern, adduct, homol, dont = FALSE, rules = c(TRUE, TRUE, FALSE, FALSE) ); # (5.2) plot results ################################# plotisotopes(comp); #plotcomp(comp,compoID=1,peakID=FALSE); ###################################################### # (6) Select data from interactive plot ############## # ms.filter( component=comp,x="mz",y="dm",xlim=FALSE, # ylim=FALSE,rm.comp=TRUE,plot.comp=TRUE,rm.noncomp=FALSE, # select.polygon="inside",res=100,filter.for="raw" ); ######################################################
Algorithm for detecting m/z differences among peaks that may correspond to m/z differences among different adducts.
adduct.search(peaklist, adducts, rttol = 0, mztol = 2,ppm = TRUE, use_adducts = c("M+H", "M+K", "M+Na"), ion_mode = "positive", get_pairs = FALSE)
adduct.search(peaklist, adducts, rttol = 0, mztol = 2,ppm = TRUE, use_adducts = c("M+H", "M+K", "M+Na"), ion_mode = "positive", get_pairs = FALSE)
peaklist |
Dataframe of HRMS peaks with three numeric columns for (a) m/z, (b) intensity and (c) retention time, such as |
adducts |
Data.frame |
rttol |
Retention time tolerance. Units as given in column 3 of peaklist argument, e.g. [min]. |
mztol |
m/z tolerance setting: +/- value by which the m/z of a measured peak may vary from its expected m/z value. If parameter |
ppm |
Should |
use_adducts |
Vector of adducts to be screened for. Corresponds to names in the first column of |
ion_mode |
|
get_pairs |
enviMass output, please ignore. |
Given a peak from the peaklist, the adduct.search
algorithm screens within tolerances mztol
and rttol
whether any other peaks may correspond to this one peak via adduct m/z differences.
More precisely, the one peak m/z is reset to all possible candidate molecular mass values (M; uncharged, non-adduct). The latter are then used to calculate for all other candidate adduct peaks,
which, if found, are subsequently grouped.
For example, consider use_adducts=c("M+H", "M+K")
. Given the m/z-value of the one peak, two other peaks with
((m/z*z("M+H")-X("M+H"))/z("M+K"))+X("M+K") and ((m/z*z("M+K")-X("M+K"))/z("M+H"))+X("M+H") are searched for. The peak found for
the first term (i.e. with "M+H" being the candidate adduct of the one peak) leads to one group of associated adduct peaks (M+H<->M+K). Another adduct
peak (i.e. with "M+K" being the candidate adduct of the one peak) would lead to a second group of associated adduct peaks (M+K<->M+H).
Logically, larger adduct groups than the one exemplified can be present, if argument "use_adducts" allows for it (e.g. M+H<->M+K,M+H<->M+Na,M+Na<->M+K).
For clarification, mztol
states the maximum m/z deviation of a measured peak from its true value, i.e., the theoretical mass-to-charge ratio of the (often unknown) analyte adduct measured. The latter true value thus ranges +/-mztol
from
the measured value, leading to a lower and an upper m/z bound for this true value. These bounds are then modified by pairwise adduct m/z differences, leading to new bounds for the true value at other m/z positions. In turn, the values of measured peaks at
exactly these positions can again deviate by +/-mztol
from these bounds, which are hence adapted accordingly at these positions for a final search window.
This entails: (a) the bounds are calculated from measured instead of true values (using ppm==TRUE
, bound differences are assumed negligible between utilizing measured or true values) and
(b) the final search window is larger than 2*mztol
and can therefore lead to contradictory assignments (i.e., peaks B and C can be assigned to peak A for different adduct m/z differences within mztol
but peaks B and C cannot be paired
within mztol
).
List of type adduct with 5 entries
adduct[[1]] |
|
adduct[[2]] |
|
adduct[[3]] |
|
adduct[[4]] |
|
adduct[[5]] |
|
Peak IDs refer to the order in which peaks are provided.
Different IDs exist for adduct groups, isotope pattern groups, grouped homologue series (HS) peaks
and homologue series cluster. Yet other IDs exist for the individual components (see note section of combine
).
The same peak may appear as different adducts in column adduct[[1]][,7]
, indicating a conflict in assigning the correct adduct.
Beware, some adduct combinations from adducts
may lead to the same results (e.g. M+H<->M+Na vs M+3H<->M+3Na).
Martin Loos
rm.sat
adducts
peaklist
plotadduct
combine
plotgroup
###################################################### # load required data: ################################ # HRMS peak list: #################################### data(peaklist) # list of adducts #################################### data(adducts) ###################################################### # run grouping of peaks for different adducts ######## # of the same candidate molecule ##################### adduct<-adduct.search( peaklist, adducts, rttol=0.05, mztol=3, ppm=TRUE, use_adducts=c("M+K","M+H","M+Na","M+NH4"), ion_mode="positive" ); # plot results ####################################### plotadduct(adduct); ######################################################
###################################################### # load required data: ################################ # HRMS peak list: #################################### data(peaklist) # list of adducts #################################### data(adducts) ###################################################### # run grouping of peaks for different adducts ######## # of the same candidate molecule ##################### adduct<-adduct.search( peaklist, adducts, rttol=0.05, mztol=3, ppm=TRUE, use_adducts=c("M+K","M+H","M+Na","M+NH4"), ion_mode="positive" ); # plot results ####################################### plotadduct(adduct); ######################################################
Combines the grouping of isotope pattern peaks from pattern.search
with that of adducts from adduct.search
to components, with information on homologue series relations from homol.search
attached. Includes some checks for component
plausibility. Needs at least two inputs of (1) isotope pattern relations, (2) adduct relations and/or (3) homologue series relations. Extracts the most intensive peak
per component, allowing for a comparison of components among HRMS data sets.
Individual components and peak relations therein can then be plotted with plotcomp
.
Numbers for detected isotope m/z differences among components can be summarized with plotisotopes
.
Subsets of components and HRMS data can be interactively selected for with ms.filter
.
combine(pattern, adduct, homol = FALSE, rules = c(TRUE, TRUE, FALSE, FALSE), dont = FALSE)
combine(pattern, adduct, homol = FALSE, rules = c(TRUE, TRUE, FALSE, FALSE), dont = FALSE)
pattern |
List of type pattern produced by |
adduct |
List of type adduct produced by |
homol |
List of type homol produed by |
rules |
Vector with four entries of |
dont |
Numeric vector with one or several values in between 1 and 4, to exclude components with particular warnings; if not used, set to |
The algorithm sorts relations among peaks in HRMS data sets generated by pattern.search
, adduct.search
and
homol.search
to components in a repetition of four consecutive steps.
In a first step, and along decreasing peak intensities, individual peaks are checked for being part of an isotope pattern group and thus relatable to other peaks.
In a second step, all peaks within this group from the first step are checked for being part of adduct groups, thus relating to more peaks. Step one and two should thereby lead to the full set of peaks defining a component.
In a third step (check rules[1] = TRUE
), all peaks in a component are checked for having adduct or isotope pattern relations to other interfering peaks not yet subsumed into the component, e.g., as
a result of overlapping isotope pattern groups.
In a fourth step, all peaks found for a component are, if available, related to any homologue series they may be part of.
Once thus assigned to a component, peaks take not further part in subsequent repetitions of step one to initiate a new component (except for interfering peaks, check rules[2] = TRUE
).
However, they may repeatedly be involved in steps two and three to reflect ambiguities of assigning components.
Four plausibility checks are implemented, represented by warning
indices 1 to 4.
The first test checks whether the adduct relations found for the peaks assorted under above steps one and two are consistent. If ambiguous adduct relations (e.g. M+H<->M+K AND M+Na<->M+NH4)
are found for at least one peak, warning 1 is tagged to the concerned component.
The second test checks whether variations in peak intensities within isotope pattern groups are consistent among the different adducts of the same component. This
must account for uncertainty in peak intensities via argument inttol of pattern.search
.
The third check examines whether interfering peaks occur.
The fourth check takes effect if a component consists of ambiguously merged isotope pattern groups (only relevant if several charges are used, see use_charges
argument in
make.isos
and the last of the rules
in pattern.search
).
These warning indices can then be used to exclude components affected, using argument dont.
For example, dont=c(1,3)
excludes components with ambiguous adduct relations and interfering peaks from the
final component list.
List of type comp with 7 entries
comp[[1]] |
|
comp[[2]] |
|
comp[[3]] |
|
comp[[4]] |
|
comp[[5]] |
|
comp[[6]] |
|
comp[[7]] |
|
rules[1]
: TRUE
= interfering peaks are annotated to a component (see details). Set to FALSE if peak matrix is too crowded, e.g., in the chromatographic dead time.
rules[2]
: TRUE
= interfering peaks already annotated to a component can also enter step one of the algorithm to check for their component as well (see details).
rules[2]
: TRUE
= remove single-peaked components.
rules[3]
: TRUE
= only list components being part of (a) homologue serie(s).
Do not combine adduct pattern groups and/or isotope pattern groups and/or homologue series information from (a) different peak lists or (b) the same peak list differently ordered.
Beware of combining rules[1] = FALSE
with rules[2] = FALSE
: most peaks interfering in components will get lost.
Component IDs are allocated in decreasing peak intensity order of the most intensive peak per component, see section value, comp[[1]]
.
In contrast, IDs of individual peaks refer to the order in which peaks are provided.
Setting the argument pattern
to FALSE
skips the first step in the algorithm; adducts group are then only searched for a single peak along decreasing peak intensities.
Setting the argument adduct
to FALSE
skips the second step in the algorithm; no adduct groups are then searched for.
Martin Loos
pattern.search
pattern.search2
adduct.search
homol.search
plotisotopes
plotcomp
ms.filter
plotisotopes
###################################################### # (0) Group for isotopologues, adducts & homologues # data(peaklist); data(adducts); data(isotopes); iso<-make.isos(isotopes, use_isotopes=c("13C","15N","34S","37Cl","81Br","41K","13C","15N","34S","37Cl","81Br","41K"), use_charges=c(1,1,1,1,1,1,2,2,2,2,2,2)) pattern<-pattern.search( peaklist, iso, cutint=10000, rttol=c(-0.05,0.05), mztol=2, mzfrac=0.1, ppm=TRUE, inttol=0.2, rules=c(TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE), deter=FALSE, entry=50 ); adduct<-adduct.search( peaklist, adducts, rttol=0.05, mztol=3, ppm=TRUE, use_adducts=c("M+K","M+H","M+Na","M+NH4"), ion_mode="positive" ); homol<-homol.search( peaklist, isotopes, elements=c("C","H","O"), use_C=TRUE, minmz=5, maxmz=120, minrt=1, maxrt=2, ppm=TRUE, mztol=3.5, rttol=0.5, minlength=5, mzfilter=FALSE, vec_size=3E6, spar=.45, R2=.98, plotit=FALSE ) ############################################################## # Combine these individual groups to components # ############################################################## # (1) Standard setting: # # Produce a component list, allowing for single-peaked # # components and with interfering peaks also listed as indi- # # vidual components (with inputs pattern, adduct, homol): # comp <- combine( pattern, adduct, homol, dont = FALSE, rules = c(TRUE, TRUE, FALSE, FALSE) ); comp[[6]]; ############################################################## # (2) Produce a list with those components related to a homo-# # logue series only (requires inputs pattern,adduct,homol): # comp <- combine( pattern, adduct, homol, dont = FALSE, rules = c(TRUE, TRUE, FALSE, TRUE) ); comp[[6]]; ################################################################ # (3) Extract only components seeming plausible and containing # # more than one peak per component, without homologue series # # information attached (with inputs pattern and adduct): # comp <- combine( pattern, adduct, homol=FALSE, dont= FALSE, rules= c(TRUE, TRUE, TRUE, TRUE) ); comp[[6]]; ##############################################################
###################################################### # (0) Group for isotopologues, adducts & homologues # data(peaklist); data(adducts); data(isotopes); iso<-make.isos(isotopes, use_isotopes=c("13C","15N","34S","37Cl","81Br","41K","13C","15N","34S","37Cl","81Br","41K"), use_charges=c(1,1,1,1,1,1,2,2,2,2,2,2)) pattern<-pattern.search( peaklist, iso, cutint=10000, rttol=c(-0.05,0.05), mztol=2, mzfrac=0.1, ppm=TRUE, inttol=0.2, rules=c(TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE), deter=FALSE, entry=50 ); adduct<-adduct.search( peaklist, adducts, rttol=0.05, mztol=3, ppm=TRUE, use_adducts=c("M+K","M+H","M+Na","M+NH4"), ion_mode="positive" ); homol<-homol.search( peaklist, isotopes, elements=c("C","H","O"), use_C=TRUE, minmz=5, maxmz=120, minrt=1, maxrt=2, ppm=TRUE, mztol=3.5, rttol=0.5, minlength=5, mzfilter=FALSE, vec_size=3E6, spar=.45, R2=.98, plotit=FALSE ) ############################################################## # Combine these individual groups to components # ############################################################## # (1) Standard setting: # # Produce a component list, allowing for single-peaked # # components and with interfering peaks also listed as indi- # # vidual components (with inputs pattern, adduct, homol): # comp <- combine( pattern, adduct, homol, dont = FALSE, rules = c(TRUE, TRUE, FALSE, FALSE) ); comp[[6]]; ############################################################## # (2) Produce a list with those components related to a homo-# # logue series only (requires inputs pattern,adduct,homol): # comp <- combine( pattern, adduct, homol, dont = FALSE, rules = c(TRUE, TRUE, FALSE, TRUE) ); comp[[6]]; ################################################################ # (3) Extract only components seeming plausible and containing # # more than one peak per component, without homologue series # # information attached (with inputs pattern and adduct): # comp <- combine( pattern, adduct, homol=FALSE, dont= FALSE, rules= c(TRUE, TRUE, TRUE, TRUE) ); comp[[6]]; ##############################################################
Produces a list of m/z differences from diffs
output generated by plotdiff
to be used as argument iso
in pattern.search
.
Thus, replaces the iso
argument to pattern.search
from make.isos
and
isotopes
by another iso
argument of the most frequent m/z differences detected among the HRMS peaks by plotdiff
.
deter.iso(diffs, histbreaks = 50000, mzmin = 0, mzmax = 0.5, cutcount = 180, plotit = TRUE)
deter.iso(diffs, histbreaks = 50000, mzmin = 0, mzmax = 0.5, cutcount = 180, plotit = TRUE)
diffs |
vector diffs, i.e. output of function |
histbreaks |
Number of histogram breaks; thus defines the mztol window to be used with |
mzmin |
Minimum value for m/z differences in resulting iso list. |
mzmax |
Maximum value for m/z differences in resulting iso list. |
cutcount |
Histogram classes with frequencies below cutcount will be excluded from iso |
plotit |
Should a plot of the sekected histogram classes be plotted? Recommended. |
In addition to returning a iso
argument, prints argument values for mztol
, ppm
and deter
of pattern.search
when using iso from deter.iso
.
Also see deter
argument of pattern.search
.
List of type iso with 5 entries
iso[[1]] |
"list of isotopes". |
iso[[2]] |
"list of isotope masses". |
iso[[3]] |
"charges". |
iso[[4]] |
"number of isotope m/z". |
iso[[5]] |
"elements". |
Experimental. For more consistent results, use make.isos
instead.
Martin Loos
pattern.search
plotdiff
make.isos
data(peaklist) diffs<-plotdiff(peaklist, histbreaks = 10000, rttol = c(0, 0), mztol = c(0.3, 100), plotit = TRUE); iso<-deter.iso(diffs,histbreaks=10000,mzmin=0,mzmax=40,cutcount=500,plotit=TRUE); pattern<-pattern.search( peaklist, iso, cutint=10000, rttol=c(-0.05,0.05), mztol=0.005, mzfrac=0.1, ppm=FALSE, inttol=0.2, #rules=c(FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE), rules=c(TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE), deter=TRUE, entry=50 );
data(peaklist) diffs<-plotdiff(peaklist, histbreaks = 10000, rttol = c(0, 0), mztol = c(0.3, 100), plotit = TRUE); iso<-deter.iso(diffs,histbreaks=10000,mzmin=0,mzmax=40,cutcount=500,plotit=TRUE); pattern<-pattern.search( peaklist, iso, cutint=10000, rttol=c(-0.05,0.05), mztol=0.005, mzfrac=0.1, ppm=FALSE, inttol=0.2, #rules=c(FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE), rules=c(TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE), deter=TRUE, entry=50 );
Dynamic programming algorithm for unsupervised detection of homologue series in LC-(HR)MS data.
homol.search(peaklist,isotopes, elements=c("C","H","O"),use_C=FALSE,minmz=5, maxmz=120,minrt=-2,maxrt=2,ppm=TRUE,mztol=3.5,rttol=0.5,minlength=5, mzfilter=FALSE,vec_size=3E6,mat_size=3,R2=.98,spar=.45,plotit=FALSE,deb=0)
homol.search(peaklist,isotopes, elements=c("C","H","O"),use_C=FALSE,minmz=5, maxmz=120,minrt=-2,maxrt=2,ppm=TRUE,mztol=3.5,rttol=0.5,minlength=5, mzfilter=FALSE,vec_size=3E6,mat_size=3,R2=.98,spar=.45,plotit=FALSE,deb=0)
peaklist |
Dataframe of picked LC-MS peaks with three numeric columns for (a) m/z, (b) intensity and (c) retention time, such as |
isotopes |
Dataframe |
elements |
FALSE or chemical elements in the changing units of the homologue series, e.g. c("C","H") for alkane chains. Used to restrict search. |
use_C |
For |
minmz |
Defines the lower limit of the m/z window to search homologue series peaks, relative to the m/z of the one peak to search from. Absolute m/z value [u]. |
maxmz |
Defines the upper limit of the m/z window to search homologue series peaks, relative to the m/z of the one peak to search from. Absolute m/z value [u]. |
minrt |
Defines the lower limit of the retention time (RT) window to look for other homologue peaks, relative to the RT of the one peak to search from, i.e., RT+minrt. For decreasing RT with increasing HS mass, use negative values of minrt. |
maxrt |
Defines the upper limit of the RT window to look for other homologue peaks, relative to the RT of the one peak to search from, i.e., RT+maxrt. See |
ppm |
Should |
mztol |
m/z tolerance setting: +/- value by which the m/z of a peak may vary from its expected value. If parameter |
rttol |
Retention time (RT) tolerance by which the RT between two adjacent pairs of a homologue series is allowed to differ. Units as given in column 3 of peaklist argument, e.g. [min]. |
minlength |
Minimum number of peaks in a homologue series. |
mzfilter |
Vector of numerics to filter for homologue series with specific m/z differences of their repeating units, given the tolerances in |
vec_size |
Vector size. Ignore unless a relevant error message is printed (then try to increase size). |
mat_size |
Matrix size for recombining, multiple of input tuples. Ignore unless a relevant error message is printed (then try to increase size). |
R2 |
FALSE or 0<numeric<=1. Coefficient of determination for cubic smoothing spline fits of m/z versus retention time; homologue series with lower R2 are rejected. See |
spar |
Smoothing parameter, typically (but not necessarily) in (0,1]. See |
plotit |
Logical FALSE or 0<integer<5. Intermediate plots of nearest neigbour paths, spline fits of individual homologues series >= |
deb |
Debug returns, ignore. |
A dynamic programming approach is used to extract series of peaks that differ in constant m/z units and smooth changes in their retention time within bounds of mass defect changes. First, a nearest neighbour path through a kd-tree representation of the data is used to extract all feasible peak triplets. These triplets are then combined to all plausible n-tupels in n-3 steps. At each such step, each newly formed n-tupel is checked for smooth changes of RT with increasing m/z of the homologues, using cubic splines and a R2-based threshold of the model fit.
List of type homol with 6 entries
homol[[1]] |
|
homol[[2]] |
|
homol[[3]] |
|
homol[[4]] |
|
homol[[5]] |
|
homol[[6]] |
Ignore. List with superjacent HS IDs per group - for set |
The rttol
argument of homol.search
must not be mixed with that of pattern.search
or pattern.search2
.
Arguments isotopes
and elements
are needed to limit intermediate numbers of m/z differences to screen over, based on feasible changes in mass defect.
Similarly, intermediate numbers are also limited by the retention time and m/z windows defined by minmz/maxmz
and minrt/maxrt/rttol
, respectively.
The latter are always set relative to the individual RT and m/z values of the peaks to be searched from.
Overall, these parameters must be chosen carefully to avoid a combinatorial explosion of triplet m/z differences, leading to slow computation, memory problems or senseless results.
Values for spar
and R2
have to be adjusted for different chromatographic settings; the smoothing spline fits are used to eliminate homologue series candidates with erratic RT-behaviour.
Spline fits at >=minlength
can be viewed by plotit=2
.
Peak IDs refer to the order in which peaks are provided. Different IDs exist for adduct groups, isotope pattern groups, grouped homologue series (HS) peaks
and homologue series cluster. Yet other IDs exist for the individual components (see note section of combine
).
Here, IDs of homologue series group are given both in the function output homol[[1]]
, homol[[3]]
and homol[[6]]
, with one homologue series stating one group of interrelated peaks.
Martin Loos
rm.sat
isotopes
peaklist
plothomol
data(peaklist); data(isotopes) homol<-homol.search( peaklist, isotopes, elements=c("C","H","O"), use_C=TRUE, minmz=5, maxmz=120, minrt=-.5, maxrt=2, ppm=TRUE, mztol=3.5, rttol=0.5, minlength=5, mzfilter=FALSE, vec_size=3E6, mat_size=3, spar=.45, R2=.98, plotit=FALSE ) plothomol(homol);
data(peaklist); data(isotopes) homol<-homol.search( peaklist, isotopes, elements=c("C","H","O"), use_C=TRUE, minmz=5, maxmz=120, minrt=-.5, maxrt=2, ppm=TRUE, mztol=3.5, rttol=0.5, minlength=5, mzfilter=FALSE, vec_size=3E6, mat_size=3, spar=.45, R2=.98, plotit=FALSE ) plothomol(homol);
pattern.search
.
Deriving list of m/z isotope differences for input into pattern.search
.
make.isos(isotopes, use_isotopes=c("13C","15N","34S","37Cl","81Br","41K","13C", "15N","34S","37Cl","81Br","41K"), use_charges=c(1,1,1,1,1,1,2,2,2,2,2,2))
make.isos(isotopes, use_isotopes=c("13C","15N","34S","37Cl","81Br","41K","13C", "15N","34S","37Cl","81Br","41K"), use_charges=c(1,1,1,1,1,1,2,2,2,2,2,2))
isotopes |
Dataframe with isotopes, from dependency |
use_isotopes |
Character string of non-monoisotopic isotopes for isotopologue search. |
use_charges |
Vector of signed integers with length equal to that of |
List of type iso with 5 entries
iso[[1]] |
|
iso[[2]] |
|
iso[[3]] |
|
iso[[4]] |
|
iso[[5]] |
|
Martin Loos
data(isotopes) iso<-make.isos(isotopes, use_isotopes=c("13C","15N","34S","37Cl","81Br","41K","13C","15N","34S","37Cl","81Br","41K"), use_charges=c(1,1,1,1,1,1,2,2,2,2,2,2))
data(isotopes) iso<-make.isos(isotopes, use_isotopes=c("13C","15N","34S","37Cl","81Br","41K","13C","15N","34S","37Cl","81Br","41K"), use_charges=c(1,1,1,1,1,1,2,2,2,2,2,2))
Mark peaks and components in plots of retention time, m/z, mass defect or peak intensity. Select components or peaks by drawing a polygon.
ms.filter(component, x = "mz", y = "dm", xlim = FALSE, ylim = FALSE, rm.comp = FALSE, plot.comp = TRUE, rm.noncomp = TRUE, select.polygon = "inside", res = 100, filter.for = "raw")
ms.filter(component, x = "mz", y = "dm", xlim = FALSE, ylim = FALSE, rm.comp = FALSE, plot.comp = TRUE, rm.noncomp = TRUE, select.polygon = "inside", res = 100, filter.for = "raw")
component |
List of type comp generated by |
x |
Scale of x-axis, any of |
y |
Scale of y-axis, any of |
xlim |
xlim=c(upper bound,lower bound), default = |
ylim |
ylim=c(upper bound,lower bound), default = |
rm.comp |
Select (i.e. remove) peaks assigned to components by |
plot.comp |
Highlight peaks part of a component (red)? |
rm.noncomp |
Select (i.e. remove) peaks not assigned to components by |
select.polygon |
Select peaks and/or components to be excluded |
res |
Resolution of polygon selection; increase if problems with selection by complicated polygons or along polygon boarder occur. Otherwise, ignore. |
filter.for |
What should be filtered and subsequently returned as value by the polygon selection? Any of |
Selection refers to those peaks and/or components to be excluded. If not all peaks in the data set are assigned to components, they are still plotted
and can thus e.g. be separated from those assigned to components by setting rm.comp
vs. rm.noncomp
.
See filter.for
argument.
Either raw data (i.e. peak list), isotope pattern peak relations, (i.e. subset of first entry in list of type pattern generated by pattern.search
) or
adduct relations (i.e. subset of first entry in list of type adduct generated by adduct.search
).
Here, mass defect is defined as the difference of m/z to the nearest integer from rounding.
rm.comp = FALSE
and rm.noncomp = FALSE
leads to no selection and thus no exclusions of anything.
Martin Loos
Algorithm for detecting isotopes pattern peak groups generated by an unknown candidate chemical component.
pattern.search(peaklist, iso, cutint = min(peaklist[, 2]), rttol = c(-0.5, 0.5), mztol = 3, mzfrac = 0.1, ppm = TRUE, inttol = 0.5, rules = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE), deter = FALSE, entry = 20)
pattern.search(peaklist, iso, cutint = min(peaklist[, 2]), rttol = c(-0.5, 0.5), mztol = 3, mzfrac = 0.1, ppm = TRUE, inttol = 0.5, rules = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE), deter = FALSE, entry = 20)
peaklist |
Dataframe of HRMS peaks with three numeric columns for (a) m/z, (b) intensity and (c) retention time, such as |
iso |
Object generated by |
cutint |
Cutoff intensity. Peaks below this intensity will be (a) omitted and (b) not expected by any of the plausibility rules (see details). See parameter |
rttol |
Minus, plus retention time tolerance. Units as given in column 3 of |
mztol |
m/z tolerance setting: value by which the m/z of a peak may vary from its expected value. If parameter |
mzfrac |
"Small" mass tolerance used. Given as a fraction of |
ppm |
Should |
inttol |
Intensity tolerance setting: fraction by which peak intensities may vary. E.g. if set to 0.2, a peak with expected intensity 10000 may range in between 8000 and 12000. |
rules |
Enabling( |
deter |
If using |
entry |
Memory allocation setting. Increase value if the corresponding warning is issued. Otherwise, ignore. |
Detecting groups of isotope pattern peaks involves two steps.
In a first step, and within the given tolerances rttol
and mztol
, m/z differences among any two peaks are screened for matching differences in m/z among different isotope(s) of an element,
as provided by the iso
argument. This leads to a set of candidate isotope m/z differences, with each subsequently undergoing four plausibility checks (rules
parameter entries 1 to 7).
In a second step, the remaining candidate m/z isotope differences are sorted in tree-like structures (so-called isotope pattern groups), starting from the lowest m/z peak of the data set.
Thus, a tree consists of several (>=2) peaks related by isotope m/z differences; the peak with lowest m/z in the tree (root node) represents the monoisotopic peak of the associated candidate
molecular component. This does not require prior knowledge about the chemical nature of the components assigned. Again, the resulting trees undergo plausibilization (rules
parameter entries 8 to 11).
In addition, groups with m/z isotope differences being detected within "small" mztol
are used to calculate a minimum number of atoms per element associated with that m/z isotope difference.
List of type pattern with 12 entries
pattern[[1]] |
|
pattern[[2]] |
|
pattern[[3]] |
|
pattern[[4]] |
|
pattern[[5]] |
|
pattern[[6]] |
|
pattern[[7]] |
|
pattern[[8]] |
|
pattern[[9]] |
|
pattern[[10]] |
|
pattern[[11]] |
|
pattern[[12]] |
|
rules[1]
: Intensities between two peaks associated via any of the candidate m/z isotope differences of the iso
argument are compared.
Given this difference in intensity, the minimum number of atoms for the element with highest abundance in argument iso
is calculated.
If , the candidate m/z difference is
found implausible and therefore rejected. The minimum mass is set to that of protium (1H) plus its minimum association to numbers of
carbon atoms, i.e. 1.0078 + (1/6 * 12.0000). Fast precheck to
rules[2]
and rules[3]
.
rules[2]
: Repeats rules[1]
, but uses abundances and minimum masses (including the C-ratios of isotopes
) for only those isotope(s) of argument iso
ranging within the "large" m/z tolerance set by mztol
.
rules[3]
: Repeats rules[1]
, but now uses abundance and minimum masses (including the C-ratios of isotopes
) individually for only
those isotope(s) of argument iso
ranging within the "small" m/z tolerance set by mztol*mzfrac
.
rules[4]
: If the intensity ratio between two peaks associated via any of the candidate m/z isotope differences of the iso
argument is smaller than
the smallest isotope abundance ratio of an element of argument iso
, the candidate m/z difference is found implausible and therefore rejected.
Fast precheck to rules[5]
and rules[6]
.
rules[5]
: Repeats rules[4]
, but now uses abundances for only those isotope(s) of argument iso
ranging within the "large" m/z tolerance set by mztol
.
rules[6]
: Repeats rules[4]
, but now uses abundances for only those isotope(s) of argument iso
ranging within the "small" m/z tolerance set by mztol*mzfrac
.
rules[7]
: Given those isotopes of argument iso
ranging within the "small" m/z tolerance set by mztol
and mzfrac
and their C-ratio set in isotopes
,
the minimum number of carbon atoms and the associated 13C peak intensity to be expected at M+1 can be calculated. Checks if this expected 13C peak is present in the data set.
If not, the candidate m/z difference is rejected.
rules[8]
: Given the intensity and m/z of the monoisotopic peak in a growing isotope pattern tree and values from argument iso
, the maximum m/z to which a tree can grow is restrict.
rules[9]
: Given (a) the intensities of the monoisotopic peak (=tree root node, interaction level 1) and its first isotopic daughter peaks (tree interaction level 2) and
(b) the candidate m/z isotope(s) within the "small" m/z tolerance set by mztol
and mzfrac
associated with (a), the occurrence of expected peaks (interaction level >2) above
the value set by argument cutint is checked. If expected but not found, the peak at interaction level 1 is rejected as being the monoisotopic candidate peak, and a tree is
grown on the remaining interrelated peaks. For example, if a monoisotopic peak (= tree interaction level 1) is associated with an intensive 13-C isotope peak (= tree interaction
level 2), a second peak from two 13-C vs. 12-C isotope replacements can be expected and must be checked for.
rules[10]
: Restriction to rules[7]
and [9]
: expected peaks are searched for only if no other measured peaks of higher intensity exist in a tolerance window of absolute m/z = 0.5 around
the m/z of the expected peak. This allows skipping the search of expected peaks in cases of intensity masking by other peaks. For example, intensive 37-Cl often mask the occurrence
of a second 13-C peak to be expected from rules[6]
, depending on the number of Cl and C atoms and the measurement resolution used.
rules[11]
: In some cases, trees may - if several charges z are used - be nested within each other.
This rule merges the nested group of charge z=x into the nesting peak group of z>x.
Acceptable outcomes strongly depend on appropriate parametrization of the algorithm.
Including many isotopes and overly large values for rttol
and/or mztol
may lead to overflows. In this case, a warning is issued to increase parameter entry
or to adjust values of rttol
and/or mztol
.
Group IDs are valid both for pattern[[1]]
and pattern[[3]]
.
Peak IDs refer to the order in which peaks are provided.
Different IDs exist for adduct groups, isotope pattern groups, grouped homologue series (HS) peaks
and homologue series cluster. Moreover, and at the highest level, yet other IDs exist for the individual components (see note section of combine
).
Depending on values of mztol
, several m/z isotope differences from argument iso
may match a measured m/z difference between two peaks.
rules[1]
to rules[11]
encompass uncertainties in intensity set by parameter inttol
.
In some cases, two or several isotope pattern trees may overlap. Overlapping trees are not merged by rules[11]
but only fully nested ones.
Disabling rules[10]
may in some cases lead to false rejections of candidate m/z isotope differences for rules[7]
and rules[9]
, especially for low resolutions.
rules[9]
is recursive, i.e. may be applied several times on an ever decreasing number of peaks per tree, until plausibility holds or no m/z isotopic differences remain.
Martin Loos
pattern.search2
rm.sat
peaklist
make.isos
plotisotopes
plotdefect
combine
plotgroup
isotopes
resolution_list
###################################################### # load required data: ################################ # HRMS peak list: #################################### data(peaklist) peaklist<-rm.sat(peaklist,dmz=0.3,drt=0.1,intrat=0.015,spar=0.8,corcut=-1000,plotit=TRUE); peaklist<-peaklist[peaklist[,4],1:3]; # list of isotopes ################################### data(isotopes) ###################################################### # (1) run isotope pattern grouping ################### # (1.1) define isotopes and charge (z) argument ###### iso<-make.isos(isotopes, use_isotopes=c("13C","15N","34S","37Cl","81Br","41K","13C","15N","34S","37Cl","81Br","41K"), use_charges=c(1,1,1,1,1,1,2,2,2,2,2,2)) # (1.2) run isotope grouping ######################### # save the list returned as "pattern" ################ pattern<-pattern.search( peaklist, iso, cutint=10000, rttol=c(-0.05,0.05), mztol=2, mzfrac=0.1, ppm=TRUE, inttol=0.2, rules=c(TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE), deter=FALSE, entry=50 ); names(pattern); # extract peaks listed in isotope pattern group no.1 # # under pattern[[3]] from pattern[[1]] ############### pattern[[1]][as.numeric(strsplit(as.character(pattern[[3]][1,2]),",")[[1]]),]; # (1.3) plot results ################################# plotisotopes(pattern); plotdefect(pattern,elements=c("N")); ######################################################
###################################################### # load required data: ################################ # HRMS peak list: #################################### data(peaklist) peaklist<-rm.sat(peaklist,dmz=0.3,drt=0.1,intrat=0.015,spar=0.8,corcut=-1000,plotit=TRUE); peaklist<-peaklist[peaklist[,4],1:3]; # list of isotopes ################################### data(isotopes) ###################################################### # (1) run isotope pattern grouping ################### # (1.1) define isotopes and charge (z) argument ###### iso<-make.isos(isotopes, use_isotopes=c("13C","15N","34S","37Cl","81Br","41K","13C","15N","34S","37Cl","81Br","41K"), use_charges=c(1,1,1,1,1,1,2,2,2,2,2,2)) # (1.2) run isotope grouping ######################### # save the list returned as "pattern" ################ pattern<-pattern.search( peaklist, iso, cutint=10000, rttol=c(-0.05,0.05), mztol=2, mzfrac=0.1, ppm=TRUE, inttol=0.2, rules=c(TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE), deter=FALSE, entry=50 ); names(pattern); # extract peaks listed in isotope pattern group no.1 # # under pattern[[3]] from pattern[[1]] ############### pattern[[1]][as.numeric(strsplit(as.character(pattern[[3]][1,2]),",")[[1]]),]; # (1.3) plot results ################################# plotisotopes(pattern); plotdefect(pattern,elements=c("N")); ######################################################
Algorithm for grouping isotope pattern centroids of chemical components by querying quantized simulation data
pattern.search2(peaklist,quantiz,mztol=2,ppm=TRUE,inttol=0.5,rttol=0.3, use_isotopes=c("13C","37Cl","15N","81Br","34S","18O"),use_charges=c(1,2), use_marker=TRUE,quick=FALSE,isotopes, get_pairs = FALSE, get_matches = FALSE)
pattern.search2(peaklist,quantiz,mztol=2,ppm=TRUE,inttol=0.5,rttol=0.3, use_isotopes=c("13C","37Cl","15N","81Br","34S","18O"),use_charges=c(1,2), use_marker=TRUE,quick=FALSE,isotopes, get_pairs = FALSE, get_matches = FALSE)
peaklist |
Dataframe of HRMS peaks with three numeric columns for (a) m/z, (b) intensity and (c) retention time, such as |
quantiz |
Quantized instrument-specific (!) simulation data of feasible centroid-centroid relations as provided by package |
mztol |
m/z tolerance setting: value by which the m/z of a peak may vary from its expected value. If parameter |
ppm |
Should |
inttol |
Intensity tolerance setting = fraction by which peak intensities may vary; e.g., if set to 0.2, a peak with expected intensity 10000 may range in between 8000 and 12000. |
rttol |
+/- retention time tolerance. Units as given in column 3 of |
use_isotopes |
Restrict query to certain isotopes dominating centroid relations; set to |
use_charges |
Vector of signed integers. Restrict query to certain charges z; set to |
use_marker |
Query for marker peaks, |
quick |
Continue if query finds first hit? Speeds up, but leaves resulting information on underlying isotopes incomplete. |
isotopes |
Dataframe of relevant isotopes as provided by package |
get_pairs |
enviMass output, please ignore. |
get_matches |
enviMass output, please ignore. |
As alternative to rule-based pattern.search
, differences among measured centroids (peaklist
) are queried to match those of compressed (=quantized) simulation data within bounds
of measurement tolerances and the quantization distortion. Hence, in comparion to pattern.search
, this approach accounts for centroid mass shifts induced by peak profile
interferences prevalent at even high m/z resolution.
To derive the quantized data, isotope pattern centroids of several million organic molecular formulas from the PubChem database were calculated for various classes of adducts.
Molecular formulas were filtered to be unique and only to contain C, H, O, N, Cl, Br, K, Na, S, Si, F, P and/or I.
The resulting >250 million centroid pairs from individual patterns were then categorized for their dominant isotopologues, charge and
the possible presence of another centroid of higher intensity than that of the pair (=marker peak).
Within these categories, data on centroid pair (a) m/z, (b) m/z differences, (c) intensity ratios and (d) marker m/z was quantized by a
recursive partitioning procedure.
The resulting compressed data representation was extended by nearest neigbour estimates in the above dimensions (a) to (d) to
account for queries with molecular formulas possibly not present in the PubChem set.
Internally, the quantized simulation data is queried by a tree-like space-partitioning structure for hyperrectangles, while centroids from peaklist
are restructured into kd-trees.
List of type pattern with 12 entries
pattern[[1]] |
|
pattern[[2]] |
|
pattern[[3]] |
|
pattern[[4]] |
|
pattern[[5]] |
|
pattern[[6]] |
|
pattern[[7]] |
|
pattern[[8]] |
|
pattern[[9]] |
|
pattern[[10]] |
|
pattern[[11]] |
|
pattern[[12]] |
|
Acceptable outcomes strongly depend on appropriate parametrization of the algorithm and using the correct quantiz
data set from package nontargetData
.
Using overly large values for rttol
and/or mztol
may lead to slow execution.
Peak IDs refer to the order in which peaks are provided.
If you do not find quantized simulation data for your instrument in package nontargetData
and you can provide resolution=f(m/z) information: contact maintainer.
Martin Loos
rm.sat
peaklist
plotisotopes
plotdefect
combine
plotgroup
pattern.search
###################################################### # load HRMS centroid list: ########################### data(peaklist) # load isotope data ################################## data(isotopes) # load quantized simulation data ##################### data(OrbitrapXL_VelosPro_R60000at400_q) ###################################################### # run isotope pattern grouping ####################### # save the list returned as "pattern" ################ pattern<-pattern.search2( peaklist, OrbitrapXL_VelosPro_R60000at400_q, mztol=2, ppm=TRUE, inttol=0.5, rttol=0.3, use_isotopes=FALSE, use_charges=FALSE, use_marker=TRUE, quick=FALSE, isotopes ) names(pattern); ######################################################
###################################################### # load HRMS centroid list: ########################### data(peaklist) # load isotope data ################################## data(isotopes) # load quantized simulation data ##################### data(OrbitrapXL_VelosPro_R60000at400_q) ###################################################### # run isotope pattern grouping ####################### # save the list returned as "pattern" ################ pattern<-pattern.search2( peaklist, OrbitrapXL_VelosPro_R60000at400_q, mztol=2, ppm=TRUE, inttol=0.5, rttol=0.3, use_isotopes=FALSE, use_charges=FALSE, use_marker=TRUE, quick=FALSE, isotopes ) names(pattern); ######################################################
LC-HRMS peak list of a sewage treatment plant effluent sample.
data(peaklist)
data(peaklist)
Numeric data frame with 11172 observations on the following 3 variables.
mass
Peak m/z
intensity
Peak intensity
rt
Peak retention time [min]
HPLC-ESI-FTMS Thermo Fisher orbitrap, resolution 100.000@400m/z, positive ionization, profile data. Generated by Thermo Fisher Formulator peak-picking algorithm; contains satellite peaks for some high intensity peaks.
data(peaklist) plot(peaklist[,3],peaklist[,1],pch=19,cex=0.5,xlab="Retention time [min]",ylab="m/z")
data(peaklist) plot(peaklist[,3],peaklist[,1],pch=19,cex=0.5,xlab="Retention time [min]",ylab="m/z")
Plots absolute frequencies of different adducts detected by adduct.search
and boxplots the intensity distributions of associated peaks.
plotadduct(adduct)
plotadduct(adduct)
adduct |
List of type adduct produced by |
Martin Loos
RT vs. m/z scatterplot marking isotope pattern and adduct group peaks.
plotall(pattern, adduct)
plotall(pattern, adduct)
pattern |
List of type pattern produced by |
adduct |
List of type adduct produced by |
Martin Loos
data(peaklist); data(adducts); data(isotopes); iso<-make.isos(isotopes, use_isotopes=c("13C","15N","34S","37Cl","81Br","41K","13C","15N","34S","37Cl","81Br","41K"), use_charges=c(1,1,1,1,1,1,2,2,2,2,2,2)) pattern<-pattern.search( peaklist, iso, cutint=10000, rttol=c(-0.05,0.05), mztol=2, mzfrac=0.1, ppm=TRUE, inttol=0.2, rules=c(TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE), deter=FALSE, entry=50 ); adduct<-adduct.search( peaklist, adducts, rttol=0.05, mztol=3, ppm=TRUE, use_adducts=c("M+K","M+H","M+Na","M+NH4"), ion_mode="positive" ); plotall(pattern, adduct)
data(peaklist); data(adducts); data(isotopes); iso<-make.isos(isotopes, use_isotopes=c("13C","15N","34S","37Cl","81Br","41K","13C","15N","34S","37Cl","81Br","41K"), use_charges=c(1,1,1,1,1,1,2,2,2,2,2,2)) pattern<-pattern.search( peaklist, iso, cutint=10000, rttol=c(-0.05,0.05), mztol=2, mzfrac=0.1, ppm=TRUE, inttol=0.2, rules=c(TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE), deter=FALSE, entry=50 ); adduct<-adduct.search( peaklist, adducts, rttol=0.05, mztol=3, ppm=TRUE, use_adducts=c("M+K","M+H","M+Na","M+NH4"), ion_mode="positive" ); plotall(pattern, adduct)
Plot and print isotope and adduct relations among peaks of a single component. Also lists all other peaks of the data set within tolerance ranges of m/z and retention time (RT).
plotcomp(comp, compoID, peakID = FALSE)
plotcomp(comp, compoID, peakID = FALSE)
comp |
List of type |
compoID |
ID of component to be plotted. For description of component IDs see |
peakID |
ID of a peak in a component; selects the component containing the peak with this ID. For description of peak IDs see note section.
Use with argument |
The upper plot panel provides a circular plot of peak relations, with m/z increasing clockwise starting from noon. Herein, peaks are
represented by their peak IDs; numbers in brackets give decreasing peak intensity ranks over all peaks in the shown component.
Adduct relations are symbolized by red lines and isotope relations by blue arrows. Thin instead of thick lines stand for interfering peaks.
In addition, all relations, other peaks within range and homologue series information are printed as value of plotcomp
The lower panel barplot shows intensities vs. m/z of both the peaks in the component (bold) and the peaks within tolerance ranges of m/z and RT (grey),
defined by arguments mztol
and rttol
of pattern.search
and adduct.search
.
Input peaklist is internally sorted and saved in the lists returned by (a) increasing retention time and (b) m/z by all pattern.search
, adduct.search
and homol.search
. Peak IDs refer to this very order - in contrast to group IDs. Different IDs exist for adduct groups, isotope pattern groups, grouped homologue series (HS) peaks
and homologue series cluster. Moreover, and at the highest level, IDs exist for the individual components (see note section of combine
).
Martin Loos
Mass defect vs. m/z scatterplot of HRMS peaks, with specific m/z isotope differences highlighted.
plotdefect(pattern, elements = c("Br"))
plotdefect(pattern, elements = c("Br"))
pattern |
List of type pattern produced by |
elements |
Character string of an element for which the isotope m/z differences between two peaks detected by |
Here, mass defect is defined as the difference of m/z to the nearest integer from rounding.
rm.comp = FALSE
and rm.noncomp = FALSE
leads to no selection and thus no exclusion of anything.
Martin Loos
data(peaklist); peaklist<-rm.sat(peaklist,dmz=0.3,drt=0.1,intrat=0.015,spar=0.8,corcut=-1000,plotit=TRUE); peaklist<-peaklist[peaklist[,4],1:3]; data(isotopes); iso<-make.isos(isotopes, use_isotopes=c("13C","15N","34S","37Cl","81Br","41K","13C","15N","34S","37Cl","81Br","41K"), use_charges=c(1,1,1,1,1,1,2,2,2,2,2,2)) pattern<-pattern.search( peaklist, iso, cutint=10000, rttol=c(-0.05,0.05), mztol=2, mzfrac=0.1, ppm=TRUE, inttol=0.2, rules=c(TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE), deter=FALSE, entry=50 ); plotdefect(pattern,elements=c("N")); plotdefect(pattern,elements=c("Cl")); plotdefect(pattern,elements=c("Br")); plotdefect(pattern,elements=c("S")); plotdefect(pattern,elements=c("C")); plotdefect(pattern,elements=c("K")); # P has only one isotope, hence: # plotdefect(pattern,elements=c("P"));
data(peaklist); peaklist<-rm.sat(peaklist,dmz=0.3,drt=0.1,intrat=0.015,spar=0.8,corcut=-1000,plotit=TRUE); peaklist<-peaklist[peaklist[,4],1:3]; data(isotopes); iso<-make.isos(isotopes, use_isotopes=c("13C","15N","34S","37Cl","81Br","41K","13C","15N","34S","37Cl","81Br","41K"), use_charges=c(1,1,1,1,1,1,2,2,2,2,2,2)) pattern<-pattern.search( peaklist, iso, cutint=10000, rttol=c(-0.05,0.05), mztol=2, mzfrac=0.1, ppm=TRUE, inttol=0.2, rules=c(TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE), deter=FALSE, entry=50 ); plotdefect(pattern,elements=c("N")); plotdefect(pattern,elements=c("Cl")); plotdefect(pattern,elements=c("Br")); plotdefect(pattern,elements=c("S")); plotdefect(pattern,elements=c("C")); plotdefect(pattern,elements=c("K")); # P has only one isotope, hence: # plotdefect(pattern,elements=c("P"));
Produce a vector and histogram of m/z differences among peaks in a HRMS data set. Frequent m/z differences may be relatable to isotope patterns and the presence of different adducts.
plotdiff(peaklist, histbreaks = 10000, rttol = c(0, 0), mztol = c(0, 100), plotit = TRUE)
plotdiff(peaklist, histbreaks = 10000, rttol = c(0, 0), mztol = c(0, 100), plotit = TRUE)
peaklist |
Dataframe of HRMS peaks with three numeric columns for (a) m/z, (b) intensity and (c) retention time, such as |
histbreaks |
Number of histogram breaks. |
rttol |
Window (upper and lower difference bound, relative to the one peak) of retentiom time (RT) differences of peaks to the one peak screened from, see details and note. Units as given in column 3 of peaklist argument, e.g. [min]. |
mztol |
Window (upper and lower difference bound, relative to the one peak) of m/z differences [u] of peaks to the one peak screened from, see details. |
plotit |
Should histogram be plotted? If |
For each one peak in the dataset, plotdiff
screens for other peaks within arguments rttol
and mztol
, saves their m/z difference to the m/z value
of the one peak and, over all one peaks, finally generates a histogram of all these m/z differences. Thus, and depending on the resolution set by argument histbreaks
,
frequent m/z differences can be visualized.
Vector diffs
of m/z differences. Can serve as input to deter.iso
.
Argument rttol
can e.g. be used to only include m/z differences of peaks with a higher RT relative to that of the one peak (as is the case in homologue series).
For example, let one peak have RT=12 min. Using rttol=c(1,4)
, only m/z differences with peaks having a RT in between 13 and 14 min will then be screened for this one peak.
Akin for argument mztol
.
Martin Loos
data(peaklist) diffs<-plotdiff(peaklist, histbreaks = 10000, rttol = c(0, 0), mztol = c(0, 100), plotit = TRUE)
data(peaklist) diffs<-plotdiff(peaklist, histbreaks = 10000, rttol = c(0, 0), mztol = c(0, 100), plotit = TRUE)
Plots the m/z isotope relations among peaks part of an isotope pattern group detected by pattern.search
.
Optionally, adduct relations from adduct.search
can be depicted, too.
plotgroup(pattern, adduct = FALSE, groupID, massrange = 10, allmass = TRUE)
plotgroup(pattern, adduct = FALSE, groupID, massrange = 10, allmass = TRUE)
pattern |
List of type pattern, i.e. value generated by |
adduct |
List of type pattern, i.e. value generated by |
groupID |
Isotope pattern group ID of the isotope pattern group to be plotted. Group ID as generated by |
massrange |
m/z range of other peaks in the HRMS peaks in the data set below and above the smallest and largest m/z of the isotope pattern group to be plotted, respectively. |
allmass |
Prints only the peaks in the isotope pattern ( |
The upper pannel barplot shows all peaks included by massrange. The lower one only those of the isotope pattern group specified by argument groupID
.
Below that, shown by lines, come the isotope relations among peaks. At the bottom, relations of individual peaks in the isotope pattern group to adduct groups
are highlighted, as far as available. Herein, this adduct
refers to the adduct assigned to the isotope pattern group, whereas further adducts
to those of other peaks
relatable via adduct groups. Peak number
refers to the line number of the peak dataframe printed (see value).
Dataframe with peaks, see argument allmass
.
Martin Loos
############################################################# data(peaklist); data(adducts); data(isotopes); # run isotope grouping ###################################### iso<-make.isos(isotopes, use_isotopes=c("13C","15N","34S","37Cl","81Br","41K","13C","15N","34S","37Cl","81Br","41K"), use_charges=c(1,1,1,1,1,1,2,2,2,2,2,2)) pattern<-pattern.search( peaklist, iso, cutint=10000, rttol=c(-0.05,0.05), mztol=2, mzfrac=0.1, ppm=TRUE, inttol=0.2, rules=c(TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE), deter=FALSE, entry=50 ); plotgroup(pattern,adduct=FALSE,groupID=3,massrange=10,allmass=FALSE) # run adduct grouping ####################################### adduct<-adduct.search( peaklist, adducts, rttol=0.05, mztol=3, ppm=TRUE, use_adducts=c("M+K","M+H","M+Na","M+NH4"), ion_mode="positive" ); plotgroup(pattern,adduct,groupID=3,massrange=10,allmass=FALSE) #############################################################
############################################################# data(peaklist); data(adducts); data(isotopes); # run isotope grouping ###################################### iso<-make.isos(isotopes, use_isotopes=c("13C","15N","34S","37Cl","81Br","41K","13C","15N","34S","37Cl","81Br","41K"), use_charges=c(1,1,1,1,1,1,2,2,2,2,2,2)) pattern<-pattern.search( peaklist, iso, cutint=10000, rttol=c(-0.05,0.05), mztol=2, mzfrac=0.1, ppm=TRUE, inttol=0.2, rules=c(TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE), deter=FALSE, entry=50 ); plotgroup(pattern,adduct=FALSE,groupID=3,massrange=10,allmass=FALSE) # run adduct grouping ####################################### adduct<-adduct.search( peaklist, adducts, rttol=0.05, mztol=3, ppm=TRUE, use_adducts=c("M+K","M+H","M+Na","M+NH4"), ion_mode="positive" ); plotgroup(pattern,adduct,groupID=3,massrange=10,allmass=FALSE) #############################################################
Given results from homol.search
, a scatterplot of peaks within m/z and RT is generated with homologue series marked.
Herein, homologue series receive a color code based on the mean m/z differences between adjacent peaks of a series; these differences
are rounded up to the second digit.
plothomol(homol, xlim = FALSE, ylim = FALSE,plotlegend=TRUE,plotdefect=FALSE)
plothomol(homol, xlim = FALSE, ylim = FALSE,plotlegend=TRUE,plotdefect=FALSE)
homol |
List of type homol produed by |
xlim |
|
ylim |
|
plotlegend |
Should a listing of m/z differences within homologue series and the concommittant color codes been added to the plot? If not, set to FALSE. |
plotdefect |
Plot the mass defect instead of the m/z value. |
Martin Loos
data(peaklist); data(isotopes) homol<-homol.search( peaklist, isotopes, elements=c("C","H","O"), use_C=TRUE, minmz=5, maxmz=120, minrt=2, maxrt=2, ppm=TRUE, mztol=3.5, rttol=0.5, minlength=3, mzfilter=FALSE, vec_size=3E6, spar=.45, R2=.98, plotit=FALSE ) plothomol(homol,xlim=FALSE,ylim=FALSE,plotlegend=FALSE,plotdefect=FALSE);
data(peaklist); data(isotopes) homol<-homol.search( peaklist, isotopes, elements=c("C","H","O"), use_C=TRUE, minmz=5, maxmz=120, minrt=2, maxrt=2, ppm=TRUE, mztol=3.5, rttol=0.5, minlength=3, mzfilter=FALSE, vec_size=3E6, spar=.45, R2=.98, plotit=FALSE ) plothomol(homol,xlim=FALSE,ylim=FALSE,plotlegend=FALSE,plotdefect=FALSE);
Given results from homol.search
, an interactive plot of homologue series (HS) is generated.
The homologue series are color coded in groups of common m/z increments. Hovering over the HS reveals further information of the peaks.
plothomolplotly(homol)
plothomolplotly(homol)
homol |
List of type homol produed by |
Martin Loos; Diana Masch
data(peaklist); data(isotopes) homol<-homol.search( peaklist, isotopes, elements=c("C","H","O"), use_C=TRUE, minmz=5, maxmz=120, minrt=2, maxrt=2, ppm=TRUE, mztol=3.5, rttol=0.5, minlength=3, mzfilter=FALSE, vec_size=3E6, spar=.45, R2=.98, plotit=FALSE ) plothomolplotly(homol);
data(peaklist); data(isotopes) homol<-homol.search( peaklist, isotopes, elements=c("C","H","O"), use_C=TRUE, minmz=5, maxmz=120, minrt=2, maxrt=2, ppm=TRUE, mztol=3.5, rttol=0.5, minlength=3, mzfilter=FALSE, vec_size=3E6, spar=.45, R2=.98, plotit=FALSE ) plothomolplotly(homol);
Plots and prints counts of m/z isotope differences detected either by pattern.search
or by combine
.
plotisotopes(input)
plotisotopes(input)
input |
Either list of type |
The function allows to track the number of m/z isotope differences (a) over individual pairs of peaks and (b) aggregated over isotope pattern groups (argument pattern
)
or (c) aggregated over components and (d) aggregated over components within small mass tolerance (argument comp
).
The small mass tolerance is defined by the massfrac
and mztol
arguments of pattern.search
and adduct.search
.
Dataframe listing counts
Martin Loos
Brute force method to remove satellite peak from a FT-HRMS peak list.
rm.sat(peaklist, dmz = 0.3, drt = 0.3, intrat = 0.01, spar = 0.8, corcut = 0.8, plotit = TRUE)
rm.sat(peaklist, dmz = 0.3, drt = 0.3, intrat = 0.01, spar = 0.8, corcut = 0.8, plotit = TRUE)
peaklist |
Dataframe of HRMS peaks with three numeric columns for (a) m/z, (b) intensity and (c) retention time, such as |
dmz |
m/z window around a parent peak within which satellite peaks are searched for. |
drt |
Retention time window around a parent peak within which satellite peaks are searched for. |
intrat |
Intensity ratio between satellite peak/associated parent peak below which the former are removed. |
spar |
|
corcut |
Correlation coefficient above which symmetrical peaks are marked as satellite peaks. To disable, set to |
plotit |
Plot results? |
"Parent" peak refers to a peak having associated satellite peaks as artifacts from FT calculations.
rm.sat
screens, along decreasing intensity, peaks for having other peaks within ranges set by arguments dmz
, drt
and intrat
.
If present, the latter are marked as satellite peaks and are subsequently excluded from further screening within rm.sat
.
In addition, arguments spar
and corcut
evaluate the symmetry of satellite peaks around the parent peak (i.e. below and above the parent peak m/z), if
enough peaks around a parent peak within ranges set by arguments dmz
, drt
and intrat
are found (here: at least 8 peaks, 4 above and 4 below the parent peak m/z).
Two splines are fitted by R function smooth.spline
, one to those peaks above and one to those peaks below the parent peak m/z.
If the splines are symmetric (i.e. correlated with each other, see argument corcut), the associated peaks are termed satellites. This approach has not yet faced
validation and is highly dependent on the peak-picking algorithm.
A dataframe with four columns. The first three columns are identical to those of argument peaklist
.
The fourth columns marks potential satellite peaks with FALSE
, the other peaks with TRUE
(see example).
Not removing satellite peaks may lead to undesirable artifacts when screening for isotope pattern and adduct relations using pattern.search
and adduct.search
, respectively. For example, consider a satellite peaks having a slightly larger m/z than its monoisotopic parent peaks.
Then, a m/z difference from a 13C isotope between monoisotopic parent and M+1 peak often leads to a 15N isotope difference between satellite and M+1 peak. This artifact
causes bogus isotope pattern groups (with the satellite peak assigned the monoisotopic peak in this example), group overlaps (see pattern.search
)
and interfering peaks in components (see combine
).
Still, given the brute approach of rm.sat
, there is no guarantee that all peaks removed are indeed satellite peaks. As an alternative, one may
filter for peaks with an overly short eluation time/scan number or use data from MS devices that are less prone to produce satellite peaks.
Martin Loos
data(peaklist); peaklist<-rm.sat(peaklist,dmz=0.3,drt=0.1,intrat=0.015,spar=0.8,corcut=-1000,plotit=TRUE); peaklist<-peaklist[peaklist[,4],1:3];
data(peaklist); peaklist<-rm.sat(peaklist,dmz=0.3,drt=0.1,intrat=0.015,spar=0.8,corcut=-1000,plotit=TRUE); peaklist<-peaklist[peaklist[,4],1:3];