MetaboExploreR

Harrison_Szemray

2025-09-18

Introduction

MetaboExploreR is an R package that provides a streamlined workflow for the processing and quality control of targeted mass spectrometry data. It is designed to take raw vendor files and produce concentration values ready for statistical analysis. The package is built to be cross-platform compatible through the use of Docker.

The main workflow of the package is centred around three core functions:

This vignette will guide you through a complete example workflow using the sample data provided with the package.

Installation

Before installing MetaboExploreR, you need to have Docker Desktop installed on your system. You can download it from the official Docker website: https://www.docker.com/get-started/

Once Docker is installed and running, you can install MetaboExploreR from GitHub using the following commands in R:

source("https://raw.githubusercontent.com/Hszemray/MetaboExploreR/master/R/install.R")
install_MetaboExploreR()

After installation, load the package into your R session:

library(MetaboExploreR)

Workflow

The MetaboExploreR workflow consists of four main steps. We will use the example data included in the package to demonstrate the workflow.

1. Project Setup

First, you need to set up a project directory with a specific structure. The raw data files should be placed in a subdirectory named raw_data.

For this vignette, we will use the example files provided with the package. We will create a temporary directory for our project and copy the necessary files into it.

# Create a temporary directory for the project
project_dir <- tempdir()

# Create the raw_data subdirectory
raw_data_dir <- file.path(project_dir, "raw_data")
dir.create(raw_data_dir, recursive = TRUE)

# For the purpose of this vignette, we assume that the raw files are in the raw_data_dir.
# In a real analysis, you would place your vendor-specific raw files (e.g., .wiff) here.

2. msConvertR()

The msConvertR function is used to convert vendor-specific raw mass spectrometry files into the open-standard mzML format. This step is crucial for ensuring that the data can be processed by the downstream tools. Input and output locations can be different.

Since we don’t have access to vendor raw files in this example, we will skip this step. In a real-world scenario, you would run the following command:

msConvertR(input_directory = project_dir, output_directory = project_dir)

This would create mzML files in the appropriate directory structure.

3. PeakForgeR()

The PeakForgeR function is the core of the peak picking and integration workflow. It takes the mzML files and an MRM transition list as input and produces a report with the integrated peak areas.

The MRM transition list is a tab-separated file that contains information about the molecules to be quantified. An example file, LGW_lipid_mrm_template_v1.tsv, is included in the package. Let’s inspect its contents.

mrm_template_path <- system.file("extdata", "LGW_lipid_mrm_template_v1.tsv", package = "MetaboExploreR")
mrm_template <- read.delim(mrm_template_path)
head(mrm_template)
#>   Molecule.List.Name Precursor.Name Precursor.Mz Precursor.Charge Product.Mz
#> 1                 CE       CE(14:0)        614.6                1      369.4
#> 2                 CE       CE(16:0)        642.6                1      369.4
#> 3                 CE       CE(16:1)        640.6                1      369.4
#> 4                 CE       CE(18:0)        670.6                1      369.4
#> 5                 CE       CE(18:1)        668.6                1      369.4
#> 6                 CE       CE(18:2)        666.6                1      369.4
#>   Product.Charge Explicit.Retention.Time Explicit.Retention.Time.Window
#> 1              1                  11.600                            0.5
#> 2              1                  12.295                            0.5
#> 3              1                  11.615                            0.5
#> 4              1                  12.845                            0.5
#> 5              1                  12.310                            0.5
#> 6              1                  11.685                            0.5
#>                        Note control_chart
#> 1 SIL_CE(16:0)_d7_Lipidyzer         FALSE
#> 2 SIL_CE(16:0)_d7_Lipidyzer          TRUE
#> 3 SIL_CE(16:1)_d7_Lipidyzer          TRUE
#> 4 SIL_CE(18:1)_d7_Lipidyzer         FALSE
#> 5 SIL_CE(18:1)_d7_Lipidyzer          TRUE
#> 6 SIL_CE(18:2)_d7_Lipidyzer          TRUE

For this example, we will use the pre-computed PeakForgeR report that is included in the package. In a real analysis, you would run the PeakForgeR function as follows:

# Path to the project directory
project_directory <- "path/to/your/project"

# List of MRM template files
mrm_template_list <- list(system.file("extdata", "LGW_lipid_mrm_template_v1.tsv", package = "MetaboExploreR"))

PeakForgeR(
  user_name = "User",
  project_directory = project_directory,
  mrm_template_list = mrm_template_list,
  QC_sample_label = "QC", 
  plateID_outputs = NULL
)

The output of PeakForgeR is a CSV file containing the integrated peak information for each sample and molecule. Let’s look at the example report provided in the package.

peakforger_report_path <- system.file("extdata", "Example_PeakForgeR_report.csv", package = "MetaboExploreR")
peakforger_report <- read.csv(peakforger_report_path, check.names = FALSE)
head(peakforger_report)
#>     FileName MoleculeListName MoleculeName PrecursorMz ProductMz RetentionTime
#> 1 PLASMA_LTR               CE     CE(14:0)       614.6     369.4      11.65722
#> 2   Sample_1               CE     CE(14:0)       614.6     369.4      11.61070
#> 3 PLASMA_LTR               CE     CE(16:0)       642.6     369.4      12.33465
#> 4   Sample_1               CE     CE(16:0)       642.6     369.4      12.33795
#> 5 PLASMA_LTR               CE     CE(16:1)       640.6     369.4      11.63252
#> 6   Sample_1               CE     CE(16:1)       640.6     369.4      11.66010
#>   StartTime  EndTime       Area     Height        AcquiredTime
#> 1  11.48432 11.75602   382663.0   71444.38 03/14/2021 05:33:51
#> 2  11.48720 11.75890   132454.2   23166.53 03/13/2021 16:21:44
#> 3  12.07065 12.56565  3879869.0  539971.88 03/14/2021 05:33:51
#> 4  12.07395 12.56895  4172605.2 1158824.12 03/13/2021 16:21:44
#> 5  11.43492 11.97832 15003148.0 4006770.00 03/14/2021 05:33:51
#> 6  11.43780 11.98120  9977041.0 2016591.00 03/13/2021 16:21:44

4. qcCheckR()

The final step in the workflow is to perform quality control and batch correction using the qcCheckR function. This function takes the output from PeakForgeR and generates various plots and reports to assess the quality of the data.

# In a real analysis, you would use the project directory where the PeakForgeR output is located.
# qcCheckR can handle tsv and csv data inputs
# See documentation ??MetaboExploreR::qcCheckR for further information.

library(MetaboExploreR)

#Load example mrm_template_list
  file_path <- system.file("extdata",
                           "LGW_lipid_mrm_template_v1.tsv",
                           package = "MetaboExploreR")

  sample_metadata_example <- read_tsv(file_path)

#Load example conc_guide
  file_path <- system.file("extdata",
                           "LGW_SIL_batch_Ultimate_2023_03_06.tsv",
                           package = "MetaboExploreR")

  sample_metadata_example <- read_tsv(file_path)

#Load example report file
  file_path <- system.file("extdata",
                           "Example_PeakForgeR_report.csv",
                           package = "MetaboExploreR")

  report_file <- read_csv(file_path)

#Run qcCheckR function
qcCheckR(user_name = "user1",
         project_directory = "path/to/project_directory",
         mrm_template_list = list(v1 = list(
                                    SIL_guide = path to/mrm_guide1.tsv,
                                    conc_guide = path to/SIL_concentration_guide1.tsv),
                                  ),
         QC_sample_label = "qc",
         sample_tags = c("sample","control", "qc"),
         mv_threshold = 0.5) #default is  0.5 for 50\% missing values

The qcCheckR function generates an HTML report with interactive plots, such as PCA plots and control charts, as well as an Excel file with the final concentration data (contains a guide for navigation).

Transition List and conc guide development

Although these functions are already inbuilt into the workflows. We thought it would be helpful if users could check templates prior to running the core functions to save time. There are two assistive functions:

  1. transition_checkR Checks Q1 and Q3 transitions to ensure all transitions are unique. Please use the below example to test it out!
mrm_template_path <- system.file("extdata", "LGW_lipid_mrm_template_v1.tsv", package = "MetaboExploreR")
mrm_template_df <- read.delim(mrm_template_path, check.names = FALSE)
head(mrm_template_df)

#Now lets run the function and see the output
transition_checkR(mrm_template_df)
  1. compare_mrm_template_with_guide Checks if all internal standards from the Note column in the transition list has a match in SIL_name of the concentration guide. Please use the below example to test it out!
mrm_template_path <- system.file("extdata", "LGW_lipid_mrm_template_v1.tsv", package = "MetaboExploreR")
  mrm_template_df <- read.delim(mrm_template_path, check.names = FALSE)
  head(mrm_template_df)

conc_guide_path <- system.file("extdata","LGW_SIL_batch_103.tsv", package = "MetaboExploreR")
  conc_guide_df <- read.delim(conc_guide_path, check.names = FALSE)
  head(conc_guide_df)

#Now lets run the function and see the output
compare_mrm_template_with_guide(mrm_template_df, conc_guide_df)  

Multi Method Project Setup

Previously, we demonstrated how to set up MetaboExploreR for a single method. However, the package also supports multi method analysis provided that a consistent long-term reference material (LTR) has been used across all methods To enable this functionality, users must supply the appropriate transition lists and concentration guides for each method.

We know this may sound obvious, but it’s crucial to emphasise: The same long-term reference material must be used across all plates. If not, the results will be invalid due to inconsistencies in signal correction.

Once PeakForgeR has completed its processing, qcCheckR will automatically gather all reports from the project directory. Each plate’s concentration data is processed using its respective transition list and concentration guide. During signal drift and batch correction, qcCheckR identifies plates run on different versions and aligns target features across them. Signal drift and batch correction is first applied to the long-term reference materials within each plate, and then across all plates collectively.

# Path to the project directory
project_directory <- "path/to/your/project"

# Create the raw_data subdirectory
raw_data_dir <- file.path(project_dir, "raw_data")
dir.create(raw_data_dir, recursive = TRUE)

# In a real analysis, you would place your vendor-specific raw files (e.g., .wiff and .wiff.scan) here. 

#Convert vendor files to mzml
msConvertR(input_directory = project_dir, output_directory = project_dir)

#Provide transition list paths to PeakForgeR
#It will cycle test each transition list on a plate until the match is found
PeakForgeR(
  user_name = "User",
  project_directory = project_directory,
  mrm_template_list = list(v1 = "path to/mrm_guide1.tsv",
                           v2 = "path to/mrm_guide2.tsv"
                          ),
  QC_sample_label = "LTR", 
  
)

# Provide transition lists and their respective concentration guide paths to qcCheckR
qcCheckR(user_name = "user",
         project_directory = "path/to/project_directory",
         mrm_template_list = list(v1 = list(
                                    SIL_guide = path to/mrm_guide1.tsv,
                                    conc_guide = path to/SIL_concentration_guide1.tsv),
                                  v2 = list(
                                    SIL_guide = path to/mrm_guide2.tsv,
                                    conc_guide = path to/SIL_concentration_guide2.tsv)
                                 ),
         QC_sample_label = "qc",
         sample_tags = c("sample","control", "qc"),
         mv_threshold = 0.5) #default is  0.5 for 50\% missing values

Session Info

sessionInfo()
#> R version 4.5.1 (2025-06-13 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26100)
#> 
#> Matrix products: default
#>   LAPACK version 3.12.1
#> 
#> locale:
#> [1] LC_COLLATE=English_Australia.utf8  LC_CTYPE=English_Australia.utf8   
#> [3] LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C                      
#> [5] LC_TIME=English_Australia.utf8    
#> 
#> time zone: Australia/Perth
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.37     R6_2.6.1          fastmap_1.2.0     xfun_0.53        
#>  [5] cachem_1.1.0      knitr_1.50        htmltools_0.5.8.1 rmarkdown_2.29   
#>  [9] lifecycle_1.0.4   cli_3.6.5         sass_0.4.10       jquerylib_0.1.4  
#> [13] compiler_4.5.1    rstudioapi_0.17.1 tools_4.5.1       evaluate_1.0.5   
#> [17] bslib_0.9.0       yaml_2.3.10       rlang_1.1.6       jsonlite_2.0.0