aaanalysis.plot_rank

aaanalysis.plot_rank(df_rank=None, col_score='score', col_group='group', group_order=None, dict_color=None, threshold=None, ax=None, figsize=(7, 5), marker_size=25, xlabel='Protein rank', ylabel='Max score per protein', fontsize_labels=None)[source]

Plot a per-protein rank scatter: max-score-per-protein sorted by score, colored by group.

The single most useful sanity check for a deployed per-protein predictor — proteins are ranked by their maximum score and colored by membership in groups such as substrate / hold-out / non-substrate, with optional threshold lines for the deployment caller.

Added in version 1.1.0.

Parameters:
  • df_rank (pd.DataFrame, shape (n_proteins, n_info)) – One row per protein; must contain col_score (the protein’s max score) and col_group (its group label).

  • col_score (str, default="score") – Column with the per-protein score used for ranking (descending) on the y-axis.

  • col_group (str, default="group") – Column with the per-protein group label used for coloring.

  • group_order (list of str, optional) – Order in which groups are colored / drawn. Defaults to first-appearance order.

  • dict_color (dict, optional) – Mapping group -> color (overrides the canonical defaults). Canonical group names (substrate, non-substrate, hold-out) default to the locked sample palette.

  • threshold (int, float, or list, optional) – One or more y-values drawn as horizontal threshold lines.

  • ax (matplotlib.axes.Axes, optional) – Axes to draw on. If None, a new figure and axes are created.

  • figsize (tuple, default=(7, 5)) – Figure size when ax is None.

  • marker_size (int or float, default=25) – Scatter marker size.

  • xlabel (str) – Axis labels.

  • ylabel (str) – Axis labels.

  • fontsize_labels (int or float, optional) – Font size for the axis labels (matplotlib default if None).

Returns:

  • fig (matplotlib.figure.Figure) – The figure.

  • ax (matplotlib.axes.Axes) – The axes with the rank scatter.

See also

Examples

The aa.plot_rank() function gives a per-protein rank scatter: each protein’s maximum prediction score is plotted against its rank (highest score first), colored by group (e.g. substrate / hold-out / non-substrate), with optional threshold lines for the deployment caller. It is the single most useful sanity check for a deployed per-protein predictor and pairs with the numeric metrics in aa.comp_per_protein_ap / aa.comp_detection_metrics.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import aaanalysis as aa

aa.plot_settings()
rng = np.random.default_rng(42)
df_rank = pd.DataFrame({
    "score": np.concatenate([rng.uniform(0.5, 1.0, 12),
                              rng.uniform(0.3, 0.8, 6),
                              rng.uniform(0.0, 0.5, 20)]),
    "group": ["substrate"] * 12 + ["hold-out"] * 6 + ["non-substrate"] * 20,
})
fig, ax = aa.plot_rank(df_rank=df_rank, threshold=0.5)
plt.tight_layout()
plt.show()