DLOmix

Published

October 31, 2025

Download

Dataset Description

This is a direct subset of the ProteomeTools dataset with computed iRTs based on the PROCAL. The total data contains ~27.200 peptides and is mainly useful for teaching purposes - Training: Containing 27.160 peptides - Validation: Containing 6.800 peptides - Testing: Containing 6.000 peptides - Train/val: Containing 27.200 peptides.

Attributes

title: DLOmix deep learning in proteomics python framework for retention time
dataset tag: retentiontime/DLOmix_RT
data publication: ProteomeTools
machine learning publication: Prosit
data source identifier: PXD004732
data type: retention time
format: CSV
columns: peptide, sequence, iRT, calibrated, retention, time
instrument: Orbitrap Fusion ETD
organism: Homo sapiens (human)
variable modification: unmodified
chromatography separation:
peak measurement:

Sample Protocol

Tryptic peptides were individually synthesized by solid phase synthesis, combined into pools of ~1,000 peptides and measured on an Orbitrap Fusion mass spectrometer. For each peptide pool, an inclusion list was generated to target peptides for fragmentation in further LC-MS experiments using five fragmentation methods (HCD, CID, ETD, EThCD, ETciD) with ion trap or Orbitrap readout and HCD spectra were recorded at 6 different collision energies.

Data Analysis Protocol

The ProteomeTools project aims to derive molecular and digital tools from the human proteome to facilitate biomedical and life science research. Here, we describe the generation and multimodal LC-MS/MS analysis of >350,000 synthetic tryptic peptides representing nearly all canonical human gene products. This resource will be extended to 1.4 million peptides within two years and all data will be made available to the public in ProteomicsDB.

Comments

Internal DLOmix tutorial
DLOmix GitHub