NIST Peptide libraries


February 18, 2024


Dataset Description

The original dataset is 646 MB (zipped). After parsing the MSP library into a tabular format while only retaining peak intensities for singly charged b- and y-ions, it was randomly split into test (3.4 MB, 27 036 spectra) and train/validation subsets (30 MB, 243 404 spectra). Files with encoded peptides were processed for ML as described in the fragmentation tutorial NIST (part 2): Traditional ML: Gradient boosting.


  • title: NIST
  • dataset tag: fragmentation/nist
  • data publication: Sheetlin et al. 2020
  • machine learning publication:
  • data source identifier:
  • data type: fragmentation intensity
  • format: MSP
  • columns:
  • instrument:
  • organism: Homo sapiens (human)
  • fixed modifications: Carbamidomethylation of C
  • variable modification: unmodified & Oxidation of M
  • dissociation method: HCD (beam-type CID)
  • collision energy: various
  • mass analyzer type: Orbitrap
  • spectra encoding:

Sample Protocol

See for more information.

Data Analysis Protocol

Consensus spectral libraries generated by NIST, the US National Institute of Standards and Technology.