Validation

Introduction

The Validation tab gives you a complete overview of PeptideShaker's scoring metrics. Every identified population was processed in order to provide accurate and robust confidence estimation.

For every group you can:

More about Validation


Group Selection

The standard groups inspected by PeptideShaker are PSMs, peptides and proteins. However, if statistical significance is ensured PSMs will be separated according to their charge. Similarly, peptides can be separated based on their modification status.

This grouping strategy will allow you to increase the sensitivity of the processing without compromising robustness. Note however that changes at the PSM level will affect results at the Peptide and Protein level. Similarly changes at the Peptide level affects the Protein level.

It is thus important to apply the upstream changes first!

For more information about peptide grouping see Vaudel et al: Peptide identification quality control, Proteomics 2011;11(10):2105-14.



More about Validation


Estimator Optimization

The estimator plots will help you improve the accuracy of confidence estimation by adjusting the bin size used to estimate the PEP.

When the PEP value is confidently estimated, probabilistic estimators provide a smoothed version of the classical estimators. However, sometimes the PEP cannot be accurately estimated, e.g., for small populations. The confidence and probabilistic estimators will then no longer be reliable.

It is advised to keep the PEP and FDR estimator advanced settings at the default values.



More about Validation


Threshold Optimization

The score threshold used, illustrated by a red vertical line in the confidence plot, can be changed to meet three kinds of requirements:


By default the threshold is set to 1% FDR.



More about Validation


Identification Summary

The identification summary provides essential metrics for the selected group:



More about Validation


Confidence Plot

This plot displays the confidence plotted against the score of the selected group's identifications. If the confidence is fluctuating, the confidence estimation might not be robust enough and should be optimized as described above.

The red vertical line indicates the chosen threshold. The red area on the left of the threshold illustrates the amount of retained true positives. The green area on the right of the threshold illustrates the amount of potential true positives not validated, i.e., the false negatives.

Tip: It is important to verify that the confidence reaches 0, otherwise the total number of true positives will be under-estimated.

No red line is displayed? You should use a less restrictive threshold.



More about Validation


FDR/FNR Plot

This plot displays the two FDR estimators and the FNR estimator plotted against the score of the selected group identifications. If the two FDR estimators do not agree, the confidence estimation might not be robust enough and should be optimized.

Three points indicate the FDR and FNR of the validated identifications.



More about Validation


Cost/Benefit Plot

This plot displays the benefit which can be expected, the proportion of retained true positives (1-FNR), plotted against the cost of the selected benefit, the proportion of false positive identifications (FDR). In other words it is a ROC curve for the selected group.

A point indicates the performance at the selected threshold. It is possible to move this point along the curve (by moving the slider below the plot) in order to optimize the threshold balancing between quality and quantity. If the point diverges away from the curve the confidence estimation should be optimized.



More about Validation


PEP Estimation Plot

This plot displays the Posterior Error Probability (PEP) plotted against the score of the selected group. If the PEP is fluctuating the confidence estimation is not robust enough and should be optimized.



More about Validation


FDR Estimation Plot

This plot displays the probabilistic FDR plotted against the classical FDR for identifications with a confidence >0. The curve should closely follow the black diagonal. If this is not the case the confidence estimation should be optimized.



More about Validation