MetaboExploreR
is an R package that provides a
streamlined workflow for the processing and quality control of targeted
mass spectrometry data. It is designed to take raw vendor files and
produce concentration values ready for statistical analysis. The package
is built to be cross-platform compatible through the use of Docker.
The main workflow of the package is centred around three core functions:
msConvertR()
: Converts raw mass spectrometry vendor
files into the open-source mzML format.PeakForgeR()
: Performs retention time correction, peak
picking, and integration.qcCheckR()
: Handles quality control, batch correction,
and generates summary reports.This vignette will guide you through a complete example workflow using the sample data provided with the package.
Before installing MetaboExploreR
, you need to have
Docker Desktop installed on your system. You can download it from the
official Docker website: https://www.docker.com/get-started/
Once Docker is installed and running, you can install
MetaboExploreR
from GitHub using the following commands in
R:
source("https://raw.githubusercontent.com/Hszemray/MetaboExploreR/master/R/install.R")
install_MetaboExploreR()
After installation, load the package into your R session:
The MetaboExploreR
workflow consists of four main steps.
We will use the example data included in the package to demonstrate the
workflow.
First, you need to set up a project directory with a specific
structure. The raw data files should be placed in a subdirectory named
raw_data
.
For this vignette, we will use the example files provided with the package. We will create a temporary directory for our project and copy the necessary files into it.
# Create a temporary directory for the project
project_dir <- tempdir()
# Create the raw_data subdirectory
raw_data_dir <- file.path(project_dir, "raw_data")
dir.create(raw_data_dir, recursive = TRUE)
# For the purpose of this vignette, we assume that the raw files are in the raw_data_dir.
# In a real analysis, you would place your vendor-specific raw files (e.g., .wiff) here.
msConvertR()
The msConvertR
function is used to convert
vendor-specific raw mass spectrometry files into the open-standard mzML
format. This step is crucial for ensuring that the data can be processed
by the downstream tools. Input and output locations can be
different.
Since we don’t have access to vendor raw files in this example, we will skip this step. In a real-world scenario, you would run the following command:
This would create mzML files in the appropriate directory structure.
PeakForgeR()
The PeakForgeR
function is the core of the peak picking
and integration workflow. It takes the mzML files and an MRM transition
list as input and produces a report with the integrated peak areas.
The MRM transition list is a tab-separated file that contains
information about the molecules to be quantified. An example file,
LGW_lipid_mrm_template_v1.tsv
, is included in the package.
Let’s inspect its contents.
mrm_template_path <- system.file("extdata", "LGW_lipid_mrm_template_v1.tsv", package = "MetaboExploreR")
mrm_template <- read.delim(mrm_template_path)
head(mrm_template)
#> Molecule.List.Name Precursor.Name Precursor.Mz Precursor.Charge Product.Mz
#> 1 CE CE(14:0) 614.6 1 369.4
#> 2 CE CE(16:0) 642.6 1 369.4
#> 3 CE CE(16:1) 640.6 1 369.4
#> 4 CE CE(18:0) 670.6 1 369.4
#> 5 CE CE(18:1) 668.6 1 369.4
#> 6 CE CE(18:2) 666.6 1 369.4
#> Product.Charge Explicit.Retention.Time Explicit.Retention.Time.Window
#> 1 1 11.600 0.5
#> 2 1 12.295 0.5
#> 3 1 11.615 0.5
#> 4 1 12.845 0.5
#> 5 1 12.310 0.5
#> 6 1 11.685 0.5
#> Note control_chart
#> 1 SIL_CE(16:0)_d7_Lipidyzer FALSE
#> 2 SIL_CE(16:0)_d7_Lipidyzer TRUE
#> 3 SIL_CE(16:1)_d7_Lipidyzer TRUE
#> 4 SIL_CE(18:1)_d7_Lipidyzer FALSE
#> 5 SIL_CE(18:1)_d7_Lipidyzer TRUE
#> 6 SIL_CE(18:2)_d7_Lipidyzer TRUE
For this example, we will use the pre-computed
PeakForgeR
report that is included in the package. In a
real analysis, you would run the PeakForgeR
function as
follows:
# Path to the project directory
project_directory <- "path/to/your/project"
# List of MRM template files
mrm_template_list <- list(system.file("extdata", "LGW_lipid_mrm_template_v1.tsv", package = "MetaboExploreR"))
PeakForgeR(
user_name = "User",
project_directory = project_directory,
mrm_template_list = mrm_template_list,
QC_sample_label = "QC",
plateID_outputs = NULL
)
The output of PeakForgeR
is a CSV file containing the
integrated peak information for each sample and molecule. Let’s look at
the example report provided in the package.
peakforger_report_path <- system.file("extdata", "Example_PeakForgeR_report.csv", package = "MetaboExploreR")
peakforger_report <- read.csv(peakforger_report_path, check.names = FALSE)
head(peakforger_report)
#> FileName MoleculeListName MoleculeName PrecursorMz ProductMz RetentionTime
#> 1 PLASMA_LTR CE CE(14:0) 614.6 369.4 11.65722
#> 2 Sample_1 CE CE(14:0) 614.6 369.4 11.61070
#> 3 PLASMA_LTR CE CE(16:0) 642.6 369.4 12.33465
#> 4 Sample_1 CE CE(16:0) 642.6 369.4 12.33795
#> 5 PLASMA_LTR CE CE(16:1) 640.6 369.4 11.63252
#> 6 Sample_1 CE CE(16:1) 640.6 369.4 11.66010
#> StartTime EndTime Area Height AcquiredTime
#> 1 11.48432 11.75602 382663.0 71444.38 03/14/2021 05:33:51
#> 2 11.48720 11.75890 132454.2 23166.53 03/13/2021 16:21:44
#> 3 12.07065 12.56565 3879869.0 539971.88 03/14/2021 05:33:51
#> 4 12.07395 12.56895 4172605.2 1158824.12 03/13/2021 16:21:44
#> 5 11.43492 11.97832 15003148.0 4006770.00 03/14/2021 05:33:51
#> 6 11.43780 11.98120 9977041.0 2016591.00 03/13/2021 16:21:44
qcCheckR()
The final step in the workflow is to perform quality control and
batch correction using the qcCheckR
function. This function
takes the output from PeakForgeR
and generates various
plots and reports to assess the quality of the data.
# In a real analysis, you would use the project directory where the PeakForgeR output is located.
# qcCheckR can handle tsv and csv data inputs
# See documentation ??MetaboExploreR::qcCheckR for further information.
library(MetaboExploreR)
#Load example mrm_template_list
file_path <- system.file("extdata",
"LGW_lipid_mrm_template_v1.tsv",
package = "MetaboExploreR")
sample_metadata_example <- read_tsv(file_path)
#Load example conc_guide
file_path <- system.file("extdata",
"LGW_SIL_batch_Ultimate_2023_03_06.tsv",
package = "MetaboExploreR")
sample_metadata_example <- read_tsv(file_path)
#Load example report file
file_path <- system.file("extdata",
"Example_PeakForgeR_report.csv",
package = "MetaboExploreR")
report_file <- read_csv(file_path)
#Run qcCheckR function
qcCheckR(user_name = "user1",
project_directory = "path/to/project_directory",
mrm_template_list = list(v1 = list(
SIL_guide = path to/mrm_guide1.tsv,
conc_guide = path to/SIL_concentration_guide1.tsv),
),
QC_sample_label = "qc",
sample_tags = c("sample","control", "qc"),
mv_threshold = 0.5) #default is 0.5 for 50\% missing values
The qcCheckR
function generates an HTML report with
interactive plots, such as PCA plots and control charts, as well as an
Excel file with the final concentration data (contains a guide for
navigation).
Although these functions are already inbuilt into the workflows. We thought it would be helpful if users could check templates prior to running the core functions to save time. There are two assistive functions:
transition_checkR
Checks Q1 and Q3 transitions to
ensure all transitions are unique. Please use the below example to test
it out!mrm_template_path <- system.file("extdata", "LGW_lipid_mrm_template_v1.tsv", package = "MetaboExploreR")
mrm_template_df <- read.delim(mrm_template_path, check.names = FALSE)
head(mrm_template_df)
#Now lets run the function and see the output
transition_checkR(mrm_template_df)
compare_mrm_template_with_guide
Checks if all internal
standards from the Note
column in the transition list has a
match in SIL_name
of the concentration guide. Please use
the below example to test it out!mrm_template_path <- system.file("extdata", "LGW_lipid_mrm_template_v1.tsv", package = "MetaboExploreR")
mrm_template_df <- read.delim(mrm_template_path, check.names = FALSE)
head(mrm_template_df)
conc_guide_path <- system.file("extdata","LGW_SIL_batch_103.tsv", package = "MetaboExploreR")
conc_guide_df <- read.delim(conc_guide_path, check.names = FALSE)
head(conc_guide_df)
#Now lets run the function and see the output
compare_mrm_template_with_guide(mrm_template_df, conc_guide_df)
Previously, we demonstrated how to set up MetaboExploreR for a single method. However, the package also supports multi method analysis provided that a consistent long-term reference material (LTR) has been used across all methods To enable this functionality, users must supply the appropriate transition lists and concentration guides for each method.
We know this may sound obvious, but it’s crucial to emphasise: The same long-term reference material must be used across all plates. If not, the results will be invalid due to inconsistencies in signal correction.
Once PeakForgeR has completed its processing, qcCheckR will automatically gather all reports from the project directory. Each plate’s concentration data is processed using its respective transition list and concentration guide. During signal drift and batch correction, qcCheckR identifies plates run on different versions and aligns target features across them. Signal drift and batch correction is first applied to the long-term reference materials within each plate, and then across all plates collectively.
# Path to the project directory
project_directory <- "path/to/your/project"
# Create the raw_data subdirectory
raw_data_dir <- file.path(project_dir, "raw_data")
dir.create(raw_data_dir, recursive = TRUE)
# In a real analysis, you would place your vendor-specific raw files (e.g., .wiff and .wiff.scan) here.
#Convert vendor files to mzml
msConvertR(input_directory = project_dir, output_directory = project_dir)
#Provide transition list paths to PeakForgeR
#It will cycle test each transition list on a plate until the match is found
PeakForgeR(
user_name = "User",
project_directory = project_directory,
mrm_template_list = list(v1 = "path to/mrm_guide1.tsv",
v2 = "path to/mrm_guide2.tsv"
),
QC_sample_label = "LTR",
)
# Provide transition lists and their respective concentration guide paths to qcCheckR
qcCheckR(user_name = "user",
project_directory = "path/to/project_directory",
mrm_template_list = list(v1 = list(
SIL_guide = path to/mrm_guide1.tsv,
conc_guide = path to/SIL_concentration_guide1.tsv),
v2 = list(
SIL_guide = path to/mrm_guide2.tsv,
conc_guide = path to/SIL_concentration_guide2.tsv)
),
QC_sample_label = "qc",
sample_tags = c("sample","control", "qc"),
mv_threshold = 0.5) #default is 0.5 for 50\% missing values
sessionInfo()
#> R version 4.5.1 (2025-06-13 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26100)
#>
#> Matrix products: default
#> LAPACK version 3.12.1
#>
#> locale:
#> [1] LC_COLLATE=English_Australia.utf8 LC_CTYPE=English_Australia.utf8
#> [3] LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C
#> [5] LC_TIME=English_Australia.utf8
#>
#> time zone: Australia/Perth
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.37 R6_2.6.1 fastmap_1.2.0 xfun_0.53
#> [5] cachem_1.1.0 knitr_1.50 htmltools_0.5.8.1 rmarkdown_2.29
#> [9] lifecycle_1.0.4 cli_3.6.5 sass_0.4.10 jquerylib_0.1.4
#> [13] compiler_4.5.1 rstudioapi_0.17.1 tools_4.5.1 evaluate_1.0.5
#> [17] bslib_0.9.0 yaml_2.3.10 rlang_1.1.6 jsonlite_2.0.0