aaanalysis.comp_per_protein_ap

aaanalysis.comp_per_protein_ap(list_scores=None, list_positions=None, tolerance=0)[source]

Compute per-protein average precision (AP) for windowed site prediction.

The canonical site-localization metric in protease / PTM prediction: for each protein, residues are ranked by score and AP is computed against the known positive sites. tolerance allows off-by-k positional jitter — a ranked residue within tolerance of an unmatched positive counts as a hit.

Added in version 1.1.0.

Parameters:
  • list_scores (list of array-like) – Per-protein per-residue score vectors. NaN scores are ignored.

  • list_positions (list of array-like) – Per-protein 0-based indices of positive sites (empty if none).

  • tolerance (int, default=0) – Positional tolerance (in residues) for counting a hit.

Returns:

ap – Per-protein AP. np.nan for proteins with no positives or no finite scores; take np.nanmean for the dataset-level score.

Return type:

array-like, shape (n_proteins,)

See also

Examples

comp_per_protein_ap computes per-protein average precision for site-localization ranking. Pass a list of per-residue score arrays and the corresponding lists of true site positions. The optional tolerance credits predictions within +/-k residues of a true site.

import numpy as np
import aaanalysis as aa

list_scores = [np.array([0.9, 0.1, 0.8, 0.2]), np.array([0.1, 0.9, 0.2, 0.7])]
list_positions = [[0, 2], [1, 3]]
ap = aa.comp_per_protein_ap(list_scores=list_scores, list_positions=list_positions)
ap
array([1., 1.])

With off-by-one jitter, tolerance=1 rescues a near-miss:

aa.comp_per_protein_ap(list_scores=[np.array([0.1, 0.9, 0.2, 0.05])],
                       list_positions=[[2]], tolerance=1)
array([1.])