SeqMut

class SeqMut(verbose=False, df_scales=None, model=None, target_class=None)[source]

Bases: object

Sequence Mutator (SeqMut) class for CPP-guided sequence mutation and ΔCPP analysis [Breimann24a].

SeqMut is the CPP-aware counterpart of AAMut: it applies point mutations to protein sequences and measures the deterministic, model-free change they induce in a set of CPP features (ΔCPP). It inverts the CPP prediction direction — instead of asking what distinguishes two groups, it asks how a mutation moves a sequence’s feature profile — supporting residue/region mutation, exhaustive ΔCPP scanning, and target-shift suggestion.

Added in version 1.0.0.

Parameters:

Methods

combine(df_seq, variants, df_feat[, ...])

Score combined (multi-mutation) variants by applying their mutations together.

eval(df_scan[, th])

Evaluate a mutational scan: tag mutations stable/disruptive and summarize per region.

mutate(df_seq, mutations[, df_feat, ...])

Apply specific point mutations to sequences and (optionally) measure their ΔCPP.

scan(df_seq, df_feat[, region, to_aa, ...])

Run an exhaustive single-position mutational scan and rank mutations by |ΔCPP|.

suggest(df_seq, df_feat[, n, region, to_aa, ...])

Suggest the top mutations that move a sequence toward the desired CPP / model outcome.

__init__(verbose=False, df_scales=None, model=None, target_class=None)[source]
Parameters:
  • verbose (bool, default=False) – If True, verbose outputs are enabled.

  • df_scales (pd.DataFrame, shape (n_letters, n_scales), optional) – DataFrame of amino acid scales (index = canonical amino acids, columns = scale ids). Default from load_scales().

  • model (object, optional) – A fitted classifier exposing predict_proba (e.g. TreeModel or any scikit-learn classifier) trained on the CPP feature matrix of the df_feat used at call time. When given, the methods add a model prediction-shift column delta_pred (the change of the predicted score a mutation induces, in percentage points) and SeqMut.suggest() is guided by it. When None (default) the class stays deterministic and model-free.

  • target_class (int or str, optional) – Class whose predicted probability delta_pred tracks. None (default) selects the positive class. A class label is matched against model.classes_ when available. Requires model.

See also

  • AAMut for the residue-level, CPP-agnostic substitution analysis.

  • CPP whose df_feat defines the features mutated against.

  • TreeModel whose predict_proba provides the prediction score.