SequenceFeature.get_feature_descriptions

static SequenceFeature.get_feature_descriptions(features=None, df_cat=None, start=1, tmd_len=20, jmd_n_len=10, jmd_c_len=10)[source]

Build a standardized, human-readable description for each feature id (PART-SPLIT-SCALE).

Complements the compact SequenceFeature.get_feature_names() label ('scale name [positions]') with one self-contained sentence per feature that spells out all three id fields: the sequence part as a readable label, the Split as a phrase (e.g. 'segment 2 of 4'), and the scale as its AAontology name together with the category and subcategory from df_cat. Terminology is drawn from fixed vocabularies (the part labels and the AAontology category/subcategory wording), so the output is deterministic and consistent across runs. The result can be assigned to a df_feat column ('feature_description') for readable CPP output without changing the 'feature' id string.

Added in version 1.1.0.

Parameters:
  • features (array-like, shape (n_features,) or pd.DataFrame) – List of feature ids ('PART-SPLIT-SCALE'). Alternatively, a df_feat DataFrame, in which case its 'feature' column is used.

  • df_cat (pd.DataFrame, shape (n_scales, n_scales_info), optional) – DataFrame of categories for physicochemical scales. Must contain all scales from df_scales. Default from load_scales() with name='scales_cat', unless specified in options['df_cat'].

  • start (int, default=1) – Position label of first residue position (starting at N-terminus).

  • tmd_len (int, default=20) – Length of target middle domain (TMD) (>0).

  • jmd_n_len (int, default=10) – Length of JMD-N (>=0).

  • jmd_c_len (int, default=10) – Length of JMD-C (>=0).

Returns:

feat_descriptions – Human-readable description for each feature, one per feature id.

Return type:

list of str

Notes

  • Length parameters (tmd_len, jmd_n_len, jmd_c_len) must match with ids in features.

  • Part labels come from a fixed vocabulary; category and subcategory wording is taken verbatim from df_cat (the AAontology scale categories table).

See also

Examples

The SequenceFeature().get_feature_descriptions() method turns each compact PART-SPLIT-SCALE feature id into one standardized, human-readable sentence. It complements the shorter labels from get_feature_names() by also spelling out the sequence region and the split, and by adding the AAontology scale name, category, and subcategory.

First, we retrieve feature ids using the SequenceFeature().get_features() method:

import pandas as pd
import aaanalysis as aa
aa.options["verbose"] = False
sf = aa.SequenceFeature()
features = sf.get_features()
print(features[0:3])
['TMD-Segment(1,1)-ANDN920101', 'TMD-Segment(1,1)-ARGP820101', 'TMD-Segment(1,1)-ARGP820102']

A readable description per feature id is then created with SequenceFeature().get_feature_descriptions():

feature_descriptions = sf.get_feature_descriptions(features=features)
for d in feature_descriptions[0:3]:
    print(d)
TMD, segment 1 of 1 (positions 11-30) — α-CH chemical shifts (backbone-dynamics) [Structure-Activity: Backbone-dynamics (-CH)]
TMD, segment 1 of 1 (positions 11-30) — Hydrophobicity [Polarity: Hydrophobicity]
TMD, segment 1 of 1 (positions 11-30) — Signal sequence helical potential [Polarity: Amphiphilicity (α-helix)]

Instead of a list of feature ids, a df_feat DataFrame (e.g. from :meth:load_features or :class:CPP output) can be passed directly — its 'feature' column is used automatically:

df_feat = aa.load_features(name="DOM_GSEC")
feature_descriptions = sf.get_feature_descriptions(features=df_feat)
for d in feature_descriptions[0:3]:
    print(d)
TMD-C+JMD-C, segment 3 of 4 (positions 31-35) — Charge [Energy: Charge]
TMD-C+JMD-C, segment 3 of 4 (positions 31-35) — α-helix termination [Conformation: α-helix (C-cap)]
TMD-C+JMD-C, segment 6 of 9 (positions 32, 33) — Side chain length [Shape: Side chain length]

Because descriptions are additive (the 'feature' id is unchanged), they can be assigned to an optional 'feature_description' column for readable df_feat output:

df_feat = df_feat.head(5).copy()
df_feat["feature_description"] = sf.get_feature_descriptions(features=df_feat)
aa.display_df(df_feat[["feature", "feature_description"]], n_rows=5, show_shape=True)
DataFrame shape: (5, 2)
  feature feature_description
1 TMD_C_JMD_C-Seg...3,4)-KLEP840101 TMD-C+JMD-C, se...Energy: Charge]
2 TMD_C_JMD_C-Seg...3,4)-FINA910104 TMD-C+JMD-C, se...-helix (C-cap)]
3 TMD_C_JMD_C-Seg...6,9)-LEVM760105 TMD-C+JMD-C, se...e chain length]
4 TMD_C_JMD_C-Seg...3,4)-HUTJ700102 TMD-C+JMD-C, se...nergy: Entropy]
5 TMD_C_JMD_C-Seg...6,9)-RADA880106 TMD-C+JMD-C, se...Volume: Volume]

The start position and the lengths of the sequence parts (tmd_len, jmd_n_len, and jmd_c_len) can be adjusted; they must match the lengths used to build the feature ids so that the reported residue positions are correct:

# Shift the first residue position label from 1 to 20
feature_descriptions = sf.get_feature_descriptions(features=features, start=20)
print(feature_descriptions[0])
TMD, segment 1 of 1 (positions 30-49) — α-CH chemical shifts (backbone-dynamics) [Structure-Activity: Backbone-dynamics (-CH)]
# Change TMD length from 20 to 100 (and JMD lengths)
feature_descriptions = sf.get_feature_descriptions(features=features, tmd_len=100,
                                                   jmd_n_len=10, jmd_c_len=10)
print(feature_descriptions[0])
TMD, segment 1 of 1 (positions 11-110) — α-CH chemical shifts (backbone-dynamics) [Structure-Activity: Backbone-dynamics (-CH)]

If features with customized scales are used, provide a matching df_cat containing the scale_id, category, subcategory, and scale_name columns:

features_custom = ["TMD-Segment(1,1)-Scale1", "JMD_N-Pattern(N,1,3)-Scale1"]
cols = ["scale_id", "category", "subcategory", "scale_name"]
vals = ["Scale1", "Scale1_category", "Scale1_subcategory", "scale_name1"]
df_cat = pd.DataFrame([vals], columns=cols)
feature_descriptions = sf.get_feature_descriptions(features=features_custom, df_cat=df_cat)
for d in feature_descriptions:
    print(d)
TMD, segment 1 of 1 (positions 11-30) — scale_name1 [Scale1_category: Scale1_subcategory]
JMD-N, pattern (from N-terminus) (positions 1, 3) — scale_name1 [Scale1_category: Scale1_subcategory]