Bottom-up Proteomics: An Overview
working on this...
MS-based proteomics falls into 2 categories: bottom-up
proteomics and top-down
n top-down proteomics
, intact proteins are analyzed directly using high-resolution mass spectrometry
and the highly charged protein is fragmented to produce an MS/MS spectrum.
n bottom-up proteomics
, also called shot-gun or peptide-specific proteomics, proteins are digested
via chemical cleavage or proteolysis prior to MS analysis.
The goal of bottom-up proteomics is often to identify and/or quantify all proteins and the complete protein sequences
including post-translational modifications, in a complex biological matrix.
Given a complex mixture of proteins, each of which is among over a hundred thousand coded proteins
not including possible post-translational modifications,
the identification of the proteins in this mixture is challenging.
The approach used in bottom-up proteomics is through an LC-MS/MS analysis of the peptide mixture
resulting from site-specific cleavage/proteolysis of the original proteins.
This solves 2 problems:
- Solubility: Peptides are generally more soluble than proteins
- Sensitivity: LCMS can detect peptides at much lower levels than the parent proteins
The resulting MS/MS spectra can be matched with data from a database of
simulated MS/MS spectra of peptides generated through in silico
digestion of proteins.
Protein identification this way is called automated sequence database searching
Alternatives are isolation and purification of individual proteins followed by residue-by-residue sequencing,
and sequence inference by de novo interpretation of fragment ion spectra or by means of sequence tags.
Stable isotope-labelled peptides can be added for quantitative analysis of specific proteins.
Label-free quantitation can be done with spectral counting and ion current measurement.
Standard protocol for protein identification
- Digestion of proteins into peptides
- Peptide separation with liquid chromatography
- MS/MS spectra acquisition
Protein identification through matching with a database or de novo sequencing
- Data-dependent acquisition (DDA) or data-independent acquisition (DIA)
- LC/1D or 2D gel separation might have to be done off-line for complex samples
- Hundreds of thousands of MS/MS spectra are obtained for one sample:
some are from noises, and low-level proteins may not result in any spectra
- Database search
- Mascot, PEAKS, Sequest, Tandem, Ommsa, Phenyx, etc.
- De novo sequencing: constructing ladders of fragment ions, for example,
y-ions (or b-ions) for CID MS/MS spectrum.
The amino acid sequence of the peptide is determined from the mass differences between adjacent peaks in the ladders
Assumptions in bottom-up analysis
- Proteolysis by a specific protease reproducibly results in a small number of peptides
- Unique identification of a protein precursor is possible with a small subset of peptides
- This relationship holds whether the protein is purified or in a complex protein mixture
- The target protein and all its variants are in the database
- Peptides resulting from protein cleavage are fully recovered and detected
- Protein identification of a biological sample
Single-analyte assay: differences in protein levels between two or more sample populations
- Not very useful as comprehensive, consistent, and reproducible results are not possible
PTM characterization: report post-translational modifications (PTMs) in the target protein
- Method validation
- Accuracy, precision (repeatability, intermediate precision, and reproducibility)
- LOD, LOQ, linearity and range
- Ruggedness and robustness
- Label-free quantification: fast, cost-effective, simplicity
- Samples are analyzed in separate experiments under the same condition
- Digest protein mixture with a protease such as tryptic digestion
- Spectral counting or ion-current measurement (chromatographic peak intensity)
- Ion-current measurement: difficult in practice due to presence of
multiple confounding factors that compromise precision:
sample preparation, injection volume, retention time, coeluting species,
temperature and pressure fluctuations, etc.
- Identify and map characteristic peaks for the same peptide from different spectral data (peptide features)
- Retention time correction due to LC variations
- Spectra/sequence alignment algorithms (Smith-Waterman)
- Compute protein ratios from averaged peptide ratios
- Intensity splitting when a peptide feature is shared by multiple proteins
- Outlier removal: errors from incorrect peptide identification or
overlapping of peptide features
- Stable isotope labels: improved precision at the expense of higher cost and complexity
- Isotope-Coded Affinity Tags (ICAT)
- Stable Isotope Labeling by Amino Acids in Cell Culture (SILAC)
- Isobaric Tag for Relative and Absolute Quantitation (iTRAQ)
- Relative intensities between the characteristic peaks are used to compute the quantity ratio
PTM quantification: what percentage of proteins are modified by a certain variable PTM
Sequencing of non-standard peptides with non-linear sequence: peptides with disulfide bonds or non-ribosomal peptides
- Most common PTMs: phosphorylation, glycosylation, methylation, acetylation, and acylation
- Modification sites are not defined by the genome, so often not present in protein database
- Searching for all possible PTMs exponentially increase computational complexity
- Proteins with PTMs may have low abundance and not detectable with MS/MS
- Peptides with PTMs can result in too complex spectra
- Glycan has tree structure instead of linear structure like peptide
- Glycan can have variable structures and mass values depending on the modification sites
- Cleave glycans with enzymes then identify structure of released glycans?
- The existence and extent of PTMs in a given sample is unknown
- Lill et al., Proteomics in the pharmaceutical and biotechnology industry: a look to the next decade (2021) (https://doi.org/10.1080/14789450.2021.1962300)
- Duncan et al. The pros and cons of peptide centric proteomics (2010)
- Sadygov, Cociorva, and Yates III, Large-scale database searching using tandem mass spectra: Looking up the answer in the back of the book (2004)
- Ma, Challenges in Computational Analysis of Mass Spectrometry Data for Proteomics (2010) (https://doi.org/10.1007/s11390-010-9309-1)
- Matthiesen, Methods, algorithms and tools in computational proteomics: A practical point of view (2007) DOI: 10.1002/pmic.200700116