comp_smooth_scores

comp_smooth_scores(scores, method='triangular', window=2, sigma=None, peak_preserving=True)[source]

Smooth a per-residue score vector with a NaN-aware, peak-preserving kernel.

Off-by-one positional jitter is universal in windowed protease / Post-Translational Modification (PTM) prediction; smoothing the per-residue score makes nearby high scores reinforce a site. The peak-preserving form takes max(smoothed, raw) so a true peak is never attenuated below its original height. Pure-numpy, no SciPy.

Added in version 1.1.0.

Parameters:
  • scores (array-like, shape (n_residues,)) – Per-residue score vector. NaN positions are ignored in the weighted average and renormalized over finite neighbours.

  • method (str, default='triangular') – Smoothing kernel: 'triangular' or 'gaussian'.

  • window (int, default=2) – Half-width of the kernel (covers +/- window residues).

  • sigma (float, optional) – Gaussian standard deviation; defaults to window / 2 when None.

  • peak_preserving (bool, default=True) – If True, return max(smoothed, raw) elementwise.

Returns:

smoothed – Smoothed score vector, same length as scores.

Return type:

array-like, shape (n_residues,)

See also

  • plot_rank() for visualizing per-protein score tracks.

Examples

comp_smooth_scores smooths a per-residue score track with a NaN-aware, peak-preserving kernel: nearby high scores reinforce a site, but max(smoothed, raw) ensures a true peak is never attenuated.

import numpy as np
import aaanalysis as aa

track = np.array([0.0, 0.0, 1.0, 0.0, 0.0])
smoothed = aa.comp_smooth_scores(scores=track, method="triangular", window=2)
smoothed
array([0.16666667, 0.25      , 1.        , 0.25      , 0.16666667])

The gaussian method weights neighbours by a Gaussian of width sigma, and peak_preserving (default True) takes max(smoothed, raw) so a true peak is never attenuated. Disabling it lets the smoothing pull the peak down:

import pandas as pd
track = np.array([0.0, 0.0, 1.0, 0.0, 0.0])
df_smooth = pd.DataFrame({
    "raw": track,
    "gaussian, peak_preserving=True":
        np.round(aa.comp_smooth_scores(scores=track, method="gaussian", sigma=1.0,
                                       peak_preserving=True), 3),
    "gaussian, peak_preserving=False":
        np.round(aa.comp_smooth_scores(scores=track, method="gaussian", sigma=1.0,
                                       peak_preserving=False), 3)})
aa.display_df(df_smooth)
  raw gaussian, peak_preserving=True gaussian, peak_preserving=False
1 0.000000 0.078000 0.078000
2 0.000000 0.258000 0.258000
3 1.000000 1.000000 0.403000
4 0.000000 0.258000 0.258000
5 0.000000 0.078000 0.078000