aaanalysis.comp_detection_metrics

aaanalysis.comp_detection_metrics(list_scores=None, list_positions=None, threshold=0.5, tolerance=0)[source]

Compute pooled detection metrics at a fixed score threshold.

Answers “is the true site actually called?” (distinct from ranking): residues scoring >= threshold are positive calls, pooled across proteins into TP/FP/FN/TN, then reduced to recall / precision / F1 / MCC. tolerance credits a call within tolerance residues of a true site (each site at most once).

Added in version 1.1.0.

Parameters:
  • list_scores (list of array-like) – Per-protein per-residue score vectors. NaN scores are ignored.

  • list_positions (list of array-like) – Per-protein 0-based indices of positive sites.

  • threshold (float, default=0.5) – Score threshold for a positive call.

  • tolerance (int, default=0) – Positional tolerance (in residues) for counting a TP.

Returns:

metrics – Keys recall, precision, f1, mcc (floats) and tp, fp, fn, tn (ints).

Return type:

dict

See also

Examples

comp_detection_metrics pools per-residue predictions at a fixed score threshold and returns recall / precision / F1 / MCC (and the TP/FP/FN/TN counts) as a dict.

import numpy as np
import aaanalysis as aa

list_scores = [np.array([0.9, 0.1, 0.8, 0.2]), np.array([0.1, 0.9, 0.2, 0.7])]
list_positions = [[0, 2], [1, 3]]
aa.comp_detection_metrics(list_scores=list_scores, list_positions=list_positions,
                          threshold=0.5)
{'recall': 1.0,
 'precision': 1.0,
 'f1': 1.0,
 'mcc': np.float64(1.0),
 'tp': 4,
 'fp': 0,
 'fn': 0,
 'tn': 4}