February 18, 2024


Dataset Descriptions

The full data contains 1.000.000 unmodified peptides and 200.000 oxidized peptides all with MaxQuant scores > 100 (as described in Prosit paper) split into five groups.
- Small: Containing 100.000 unmodified peptides (good for teaching)
- Medium: Containing 250.000 unmodified peptides (good for validating)
- Large: Containing 1.000.000 unmodified peptides (good for training)
- Oxidized: Containing 200.000 all oxidized peptides.
- Mixed: Containing 200.000 oxidized and 150.000 unmodified peptides.


  • title: ProteomeTools synthetic peptides and iRT calibrated retention times
  • dataset tag: ProteomeTools_RT
  • data publication: ProteomeTools
  • machine learning publication: Prosit
  • data source identifier: PXD004732
  • data type: retention time
  • format: CSV
  • columns: raw file, sequence, retention time, modified sequence, modifications
  • instrument: Orbitrap Fusion ETD
  • organism: Homo sapiens (human)
  • variable modification: unmodified & oxidation
  • chromatography separation:
  • peak measurement:

Sample Protocol

Tryptic peptides were individually synthesized by solid phase synthesis, combined into pools of ~1,000 peptides and measured on an Orbitrap Fusion mass spectrometer. For each peptide pool, an inclusion list was generated to target peptides for fragmentation in further LC-MS experiments using five fragmentation methods (HCD, CID, ETD, EThCD, ETciD) with ion trap or Orbitrap readout and HCD spectra were recorded at 6 different collision energies.

Data Analysis Protocol

The ProteomeTools project aims to derive molecular and digital tools from the human proteome to facilitate biomedical and life science research. Here, we describe the generation and multimodal LC-MS/MS analysis of >350,000 synthetic tryptic peptides representing nearly all canonical human gene products. This resource will be extended to 1.4 million peptides within two years and all data will be made available to the public in ProteomicsDB.