February 18, 2024


Dataset Description

This is a direct subset of the ProteomeTools dataset with computed iRTs based on the PROCAL. The total data contains ~27.200 peptides and is mainly useful for teaching purposes - Training: Containing 27.160 peptides - Validation: Containing 6.800 peptides - Testing: Containing 6.000 peptides - Train/val: Containing 27.200 peptides.


  • title: DLOmix deep learning in proteomics python framework for retention time
  • dataset tag: retentiontime/DLOmix_RT
  • data publication: ProteomeTools
  • machine learning publication: Prosit
  • data source identifier: PXD004732
  • data type: retention time
  • format: CSV
  • columns: peptide, sequence, iRT, calibrated, retention, time
  • instrument: Orbitrap Fusion ETD
  • organism: Homo sapiens (human)
  • variable modification: unmodified
  • chromatography separation:
  • peak measurement:

Sample Protocol

Tryptic peptides were individually synthesized by solid phase synthesis, combined into pools of ~1,000 peptides and measured on an Orbitrap Fusion mass spectrometer. For each peptide pool, an inclusion list was generated to target peptides for fragmentation in further LC-MS experiments using five fragmentation methods (HCD, CID, ETD, EThCD, ETciD) with ion trap or Orbitrap readout and HCD spectra were recorded at 6 different collision energies.

Data Analysis Protocol

The ProteomeTools project aims to derive molecular and digital tools from the human proteome to facilitate biomedical and life science research. Here, we describe the generation and multimodal LC-MS/MS analysis of >350,000 synthetic tryptic peptides representing nearly all canonical human gene products. This resource will be extended to 1.4 million peptides within two years and all data will be made available to the public in ProteomicsDB.


  • Internal DLOmix tutorial
  • DLOmix GitHub