SeqMut.mutate
- SeqMut.mutate(df_seq=None, mutations=None, df_feat=None, jmd_n_len=10, jmd_c_len=10)[source]
Apply specific point mutations to sequences and (optionally) measure their ΔCPP.
Each row of
mutationsedits one residue of itsentry’s sequence; the mutated sequence and a human-readablemutationlabel are always returned, and the feature-space change is added when adf_featis supplied.- Parameters:
df_seq (pd.DataFrame, shape (n_samples, n_seq_info)) – DataFrame containing an
entrycolumn with unique protein identifiers, in the position-based format (sequence,tmd_start,tmd_stop). SeeSequenceFeature.get_df_parts()for the fulldf_seqformat specification.mutations (pd.DataFrame, shape (n_mutations, >=3)) – Tidy mutation table with columns
entry,pos(1-based position in the full sequence), andto_aa(target amino acid).from_aais derived and checked.df_feat (pd.DataFrame, optional) – CPP feature set (output of
CPP.run()). If given, the per-mutation ΔCPP (delta_cpp) andshift_scoretoward the test-class profile are added.jmd_n_len (int, default=10) – Length of JMD-N in number of amino acids.
jmd_c_len (int, default=10) – Length of JMD-C in number of amino acids.
- Returns:
df_mut – The
mutationstable augmented withfrom_aa,mutation("<from><pos><to>"),sequence_mut(the mutated sequence), and — whendf_featis given —delta_cppandshift_score.- Return type:
pd.DataFrame, shape (n_mutations, n_info)
Examples
:meth:
SeqMut.mutateapplies specific point mutations from a tidymutationstable (entry,pos,to_aa) and, whendf_featis given, reports each mutation’s ΔCPP and its shift toward the test-class profile.import aaanalysis as aa aa.options["verbose"] = False df_seq = aa.load_dataset(name="DOM_GSEC", n=10) labels = df_seq["label"].to_list() sf = aa.SequenceFeature() df_parts = sf.get_df_parts(df_seq=df_seq) split_kws = sf.get_split_kws() cpp = aa.CPP(df_parts=df_parts, split_kws=split_kws, verbose=False) df_feat = cpp.run(labels=labels, n_filter=25) import pandas as pd # A proline scan across the first TMD positions of one protein. entry = df_seq["entry"].iloc[0] start = int(df_seq.set_index("entry").loc[entry, "tmd_start"]) mutations = pd.DataFrame({"entry": entry, "pos": range(start, start + 12), "to_aa": "P"}) seqmut = aa.SeqMut() aa.display_df(seqmut.mutate(df_seq=df_seq, mutations=mutations, df_feat=df_feat), n_rows=10, show_shape=True)
[94mCPP using the Python kernel fallback — the compiled Cython extension is not available in this install. Output is bit-exact with the Cython path but ~2x slower. Reinstall via pip install --force-reinstall aaanalysis to fetch a prebuilt wheel.[0m DataFrame shape: (12, 8)
entry pos to_aa from_aa mutation sequence_mut delta_cpp shift_score 1 Q14802 37 P L L37P MQKVTLGLLVFLAGF...PGETPPLITPGSAQS 0.167000 0.167000 2 Q14802 38 P Q Q38P MQKVTLGLLVFLAGF...PGETPPLITPGSAQS 0.000000 0.000000 3 Q14802 39 P V V39P MQKVTLGLLVFLAGF...PGETPPLITPGSAQS 0.160000 -0.160000 4 Q14802 40 P G G40P MQKVTLGLLVFLAGF...PGETPPLITPGSAQS 0.000000 0.000000 5 Q14802 41 P G G41P MQKVTLGLLVFLAGF...PGETPPLITPGSAQS 0.181000 0.181000 6 Q14802 42 P L L42P MQKVTLGLLVFLAGF...PGETPPLITPGSAQS 0.000000 0.000000 7 Q14802 43 P I I43P MQKVTLGLLVFLAGF...PGETPPLITPGSAQS 0.179750 -0.179750 8 Q14802 44 P C C44P MQKVTLGLLVFLAGF...PGETPPLITPGSAQS 0.248320 0.248320 9 Q14802 45 P A A45P MQKVTLGLLVFLAGF...PGETPPLITPGSAQS 0.000000 0.000000 10 Q14802 46 P G G46P MQKVTLGLLVFLAGF...PGETPPLITPGSAQS 0.000000 0.000000