comp_detection_metrics
- comp_detection_metrics(list_scores, list_positions, threshold=0.5, tolerance=0)[source]
Compute pooled detection metrics at a fixed score threshold.
Answers “is the true site actually called?” (distinct from ranking): residues scoring
>= thresholdare positive calls, pooled across proteins into true positives, false positives, false negatives, and true negatives (TP/FP/FN/TN), then reduced to recall / precision / F1 / Matthews Correlation Coefficient (MCC).tolerancecredits a call withintoleranceresidues of a true site (each site at most once).Added in version 1.1.0.
- Parameters:
list_scores (list of array-like) – Per-protein per-residue score vectors.
NaNscores are ignored.list_positions (list of array-like) – Per-protein 0-based indices of positive sites.
threshold (float, default=0.5) – Score threshold for a positive call.
tolerance (int, default=0) – Positional tolerance (in residues) for counting a TP.
- Returns:
metrics – Keys
recall,precision,f1,mcc(floats) andtp,fp,fn,tn(ints).- Return type:
See also
comp_per_protein_ap()for the ranking-based site-localization score.
Examples
comp_detection_metricspools per-residue predictions at a fixed scorethresholdand returns recall / precision / F1 / MCC (and the TP/FP/FN/TN counts) as a dict.import numpy as np import aaanalysis as aa list_scores = [np.array([0.9, 0.1, 0.8, 0.2]), np.array([0.1, 0.9, 0.2, 0.7])] list_positions = [[0, 2], [1, 3]] aa.comp_detection_metrics(list_scores=list_scores, list_positions=list_positions, threshold=0.5)
{'recall': 1.0, 'precision': 1.0, 'f1': 1.0, 'mcc': np.float64(1.0), 'tp': 4, 'fp': 0, 'fn': 0, 'tn': 4}
tolerancewidens what counts as a hit: a predicted residue is a true positive when it lies withintolerancepositions of a true site (default0= exact match). Relaxing it recovers near-miss predictions:import pandas as pd df_tol = pd.DataFrame( [aa.comp_detection_metrics(list_scores=list_scores, list_positions=list_positions, threshold=0.5, tolerance=t) for t in [0, 1]], index=["tolerance=0", "tolerance=1"]) aa.display_df(df_tol)
recall precision f1 mcc tp fp fn tn tolerance=0 1.000000 1.000000 1.000000 1.000000 4 0 0 4 tolerance=1 1.000000 1.000000 1.000000 1.000000 4 0 0 4