SeqMut.scan
- SeqMut.scan(df_seq, df_feat, region=None, to_aa=None, jmd_n_len=10, jmd_c_len=10)[source]
Run an exhaustive single-position mutational scan and rank mutations by |ΔCPP|.
For every scannable position and every substitution, the change in the CPP feature vector is measured and aggregated into
delta_cpp(the L1 magnitudeSum|ΔX|).- Parameters:
df_seq (pd.DataFrame, shape (n_samples, n_seq_info)) – DataFrame containing an
entrycolumn with unique protein identifiers, in the position-based format (sequence,tmd_start,tmd_stop). SeeSequenceFeature.get_df_parts()for the fulldf_seqformat specification.df_feat (pd.DataFrame) – CPP feature set (output of
CPP.run()) defining which features ΔCPP is measured over.region (str or list of int, optional) – Restrict the scan:
Nonecovers the full JMD-N + TMD + JMD-C span, a part name ('jmd_n'/'tmd'/'jmd_c') restricts to that part, and a list restricts to those 1-based positions.to_aa (list of str, optional) – Substitution alphabet. If
None, every canonical amino acid (except the wild-type residue) is tried at each position.jmd_n_len (int, default=10) – Length of JMD-N in number of amino acids.
jmd_c_len (int, default=10) – Length of JMD-C in number of amino acids.
- Returns:
df_scan – Tidy mutation landscape with columns
entry,pos,from_aa,to_aa,mutation,region,delta_cpp, andshift_score, sorted by descendingdelta_cpp. When amodelis bound to thisSeqMut, the model prediction-shift columnsdelta_pred(ΔP, percentage points),wt_predandwt_pred_stdare appended — this is the data behind the mutation-scan heatmap.- Return type:
pd.DataFrame, shape (n_mutations, 8)
Examples
SeqMutmeasures the model-free change a mutation induces in a set of CPP features (ΔCPP). We first build a feature set with a smallCPPrun, then :meth:SeqMut.scanenumerates every TMD substitution and ranks them bydelta_cpp(the L1 magnitude of the feature change).import aaanalysis as aa aa.options["verbose"] = False df_seq = aa.load_dataset(name="DOM_GSEC", n=10) labels = df_seq["label"].to_list() sf = aa.SequenceFeature() df_parts = sf.get_df_parts(df_seq=df_seq) split_kws = sf.get_split_kws() cpp = aa.CPP(df_parts=df_parts, split_kws=split_kws, verbose=False) df_feat = cpp.run(labels=labels, n_filter=25) seqmut = aa.SeqMut() df_scan = seqmut.scan(df_seq=df_seq, df_feat=df_feat, region="tmd") aa.display_df(df_scan, n_rows=10, show_shape=True)
[94mCPP using the Python kernel fallback — the compiled Cython extension is not available in this install. Output is bit-exact with the Cython path but ~2x slower. Reinstall via pip install --force-reinstall aaanalysis to fetch a prebuilt wheel.[0m DataFrame shape: (8740, 8)
entry pos from_aa to_aa mutation region delta_cpp shift_score 1 P16070 669 A P A669P tmd 4.046420 -3.934420 2 P16070 665 A P A665P tmd 3.975670 -3.863670 3 P09803 730 L P L730P tmd 3.523250 -3.355250 4 Q03157 604 L P L604P tmd 3.523250 -3.355250 5 P05556 748 L P L748P tmd 3.523250 -3.355250 6 Q06481 713 L P L713P tmd 3.523250 -3.355250 7 P05067 720 L P L720P tmd 3.523250 -3.355250 8 P16070 669 A G A669G tmd 3.422670 -3.422670 9 P70180 492 L P L492P tmd 3.417250 -3.249250 10 P01135 114 L P L114P tmd 3.417250 -3.249250 regioncan be a part name ('jmd_n'/'tmd'/'jmd_c'),None(the full JMD-N + TMD + JMD-C span), or a list of 1-based positions;to_aarestricts the substitution alphabet.