Bottom-up Proteomics: An Overview
working on this...
MS-based proteomics falls into 2 categories:
bottom-up proteomics and
top-down proteomics.
n
top-down proteomics, intact proteins are analyzed directly using high-resolution mass spectrometry
and the highly charged protein is fragmented to produce an MS/MS spectrum.
n
bottom-up proteomics, also called shot-gun or peptide-specific proteomics, proteins are digested
via chemical cleavage or proteolysis prior to MS analysis.
The goal of bottom-up proteomics is often to identify and/or quantify all proteins and the complete protein sequences
including post-translational modifications, in a complex biological matrix.
Given a complex mixture of proteins, each of which is among over a hundred thousand coded proteins
not including possible post-translational modifications,
the identification of the proteins in this mixture is challenging.
The approach used in bottom-up proteomics is through an LC-MS/MS analysis of the peptide mixture
resulting from site-specific cleavage/proteolysis of the original proteins.
This solves 2 problems:
- Solubility: Peptides are generally more soluble than proteins
- Sensitivity: LCMS can detect peptides at much lower levels than the parent proteins
The resulting MS/MS spectra can be matched with data from a database of
simulated MS/MS spectra of peptides generated through
in silico digestion of proteins.
Protein identification this way is called
automated sequence database searching .
Alternatives are isolation and purification of individual proteins followed by residue-by-residue sequencing,
and sequence inference by de novo interpretation of fragment ion spectra or by means of sequence tags.
Stable isotope-labelled peptides can be added for quantitative analysis of specific proteins.
Label-free quantitation can be done with spectral counting and ion current measurement.
Standard protocol for protein identification
- Digestion of proteins into peptides
- Peptide separation with liquid chromatography
- MS/MS spectra acquisition
- Data-dependent acquisition (DDA) or data-independent acquisition (DIA)
- LC/1D or 2D gel separation might have to be done off-line for complex samples
- Hundreds of thousands of MS/MS spectra are obtained for one sample:
some are from noises, and low-level proteins may not result in any spectra
- Protein identification through matching with a database or de novo sequencing
- Database search
- Mascot, PEAKS, Sequest, Tandem, Ommsa, Phenyx, etc.
- De novo sequencing: constructing ladders of fragment ions, for example,
y-ions (or b-ions) for CID MS/MS spectrum.
The amino acid sequence of the peptide is determined from the mass differences between adjacent peaks in the ladders
Assumptions in bottom-up analysis
- Proteolysis by a specific protease reproducibly results in a small number of peptides
- Unique identification of a protein precursor is possible with a small subset of peptides
- This relationship holds whether the protein is purified or in a complex protein mixture
- The target protein and all its variants are in the database
- Peptides resulting from protein cleavage are fully recovered and detected
Practical applications
- Protein identification of a biological sample
- Not very useful as comprehensive, consistent, and reproducible results are not possible
- Single-analyte assay: differences in protein levels between two or more sample populations
- Method validation
- Accuracy, precision (repeatability, intermediate precision, and reproducibility)
- Specificity
- LOD, LOQ, linearity and range
- Ruggedness and robustness
- Label-free quantification: fast, cost-effective, simplicity
- Samples are analyzed in separate experiments under the same condition
- Digest protein mixture with a protease such as tryptic digestion
- Spectral counting or ion-current measurement (chromatographic peak intensity)
- Ion-current measurement: difficult in practice due to presence of
multiple confounding factors that compromise precision:
sample preparation, injection volume, retention time, coeluting species,
temperature and pressure fluctuations, etc.
- Identify and map characteristic peaks for the same peptide from different spectral data (peptide features)
- Retention time correction due to LC variations
- Spectra/sequence alignment algorithms (Smith-Waterman)
- Compute protein ratios from averaged peptide ratios
- Intensity splitting when a peptide feature is shared by multiple proteins
- Outlier removal: errors from incorrect peptide identification or
overlapping of peptide features
- Stable isotope labels: improved precision at the expense of higher cost and complexity
- Isotope-Coded Affinity Tags (ICAT)
- Stable Isotope Labeling by Amino Acids in Cell Culture (SILAC)
- Isobaric Tag for Relative and Absolute Quantitation (iTRAQ)
- Relative intensities between the characteristic peaks are used to compute the quantity ratio
- PTM characterization: report post-translational modifications (PTMs) in the target protein
- Most common PTMs: phosphorylation, glycosylation, methylation, acetylation, and acylation
- Modification sites are not defined by the genome, so often not present in protein database
- Searching for all possible PTMs exponentially increase computational complexity
- Proteins with PTMs may have low abundance and not detectable with MS/MS
- Peptides with PTMs can result in too complex spectra
- Phosphorylation
- Glycosylation
- Glycan has tree structure instead of linear structure like peptide
- Glycan can have variable structures and mass values depending on the modification sites
- Cleave glycans with enzymes then identify structure of released glycans?
- The existence and extent of PTMs in a given sample is unknown
- PTM quantification: what percentage of proteins are modified by a certain variable PTM
- Sequencing of non-standard peptides with non-linear sequence: peptides with disulfide bonds or non-ribosomal peptides
References
- Lill et al., Proteomics in the pharmaceutical and biotechnology industry: a look to the next decade (2021) (https://doi.org/10.1080/14789450.2021.1962300)
- Duncan et al. The pros and cons of peptide centric proteomics (2010)
(https://www.nature.com/articles/nbt0710-659)
- Sadygov, Cociorva, and Yates III, Large-scale database searching using tandem mass spectra: Looking up the answer in the back of the book (2004)
(https://www.nature.com/articles/nmeth725i)
- Ma, Challenges in Computational Analysis of Mass Spectrometry Data for Proteomics (2010) (https://doi.org/10.1007/s11390-010-9309-1)
- Matthiesen, Methods, algorithms and tools in computational proteomics: A practical point of view (2007) DOI: 10.1002/pmic.200700116