SeqMut.suggest

SeqMut.suggest(df_seq=None, df_feat=None, n=10, region=None, to_aa=None, weight=None, jmd_n_len=10, jmd_c_len=10)[source]

Suggest the top mutations that shift a sequence toward the test-class CPP profile.

Mutations are ranked by shift_score = Sum sign(mean_dif) * ΔX (optionally weighted by a df_feat column), i.e. how strongly they move features in the direction by which the test class differs from the reference class. This is the single-objective design primitive; multi-objective / library generation is out of scope (issues #57-#60).

Parameters:
  • df_seq (pd.DataFrame, shape (n_samples, n_seq_info)) – DataFrame containing an entry column with unique protein identifiers, in the position-based format (sequence, tmd_start, tmd_stop). See SequenceFeature.get_df_parts() for the full df_seq format specification.

  • df_feat (pd.DataFrame) – CPP feature set (output of CPP.run()); its signed mean_dif defines the target direction.

  • n (int, default=10) – Number of top mutations to return.

  • region (str or list of int, optional) – Restrict the scan (see SeqMut.scan()).

  • to_aa (list of str, optional) – Substitution alphabet (see SeqMut.scan()).

  • weight (str, optional) – Optionally weight the shift score by a df_feat column ('feat_importance' or 'abs_auc'). If None, all features contribute equally.

  • jmd_n_len (int, default=10) – Length of JMD-N in number of amino acids.

  • jmd_c_len (int, default=10) – Length of JMD-C in number of amino acids.

Returns:

df_suggest – The top-n mutations sorted by descending shift_score.

Return type:

pd.DataFrame, shape (n, 8)

Examples

:meth:SeqMut.suggest returns the top mutations that move a sequence toward the test-class CPP profile, ranked by shift_score (sum sign(mean_dif) * ΔX).

import aaanalysis as aa
aa.options["verbose"] = False

df_seq = aa.load_dataset(name="DOM_GSEC", n=10)
labels = df_seq["label"].to_list()
sf = aa.SequenceFeature()
df_parts = sf.get_df_parts(df_seq=df_seq)
split_kws = sf.get_split_kws()
cpp = aa.CPP(df_parts=df_parts, split_kws=split_kws, verbose=False)
df_feat = cpp.run(labels=labels, n_filter=25)

seqmut = aa.SeqMut()
aa.display_df(seqmut.suggest(df_seq=df_seq, df_feat=df_feat, n=10, region="tmd"), n_rows=10, show_shape=True)
CPP using the Python kernel fallback — the compiled Cython extension is not available in this install. Output is bit-exact with the Cython path but ~2x slower. Reinstall via pip install --force-reinstall aaanalysis to fetch a prebuilt wheel.
DataFrame shape: (10, 8)
  entry pos from_aa to_aa mutation region delta_cpp shift_score
1 Q8IUW5 74 G A G74A tmd 3.415670 3.415670
2 P05556 744 G A G744A tmd 3.415670 3.415670
3 Q14802 52 G A G52A tmd 3.415660 3.415660
4 P53801 112 G A G112A tmd 3.415660 3.415660
5 Q8IUW5 74 G E G74E tmd 2.904170 2.904170
6 P05556 744 G E G744E tmd 2.904170 2.904170
7 Q14802 52 G E G52E tmd 2.904160 2.904160
8 P53801 112 G E G112E tmd 2.904160 2.904160
9 Q8IUW5 78 C A C78A tmd 2.859590 2.859590
10 P01135 118 C A C118A tmd 2.859580 2.859580