SequenceFeature.get_feature_descriptions
- static SequenceFeature.get_feature_descriptions(features=None, df_cat=None, start=1, tmd_len=20, jmd_n_len=10, jmd_c_len=10)[source]
Build a standardized, human-readable description for each feature id (PART-SPLIT-SCALE).
Complements the compact
SequenceFeature.get_feature_names()label ('scale name [positions]') with one self-contained sentence per feature that spells out all three id fields: the sequence part as a readable label, the Split as a phrase (e.g.'segment 2 of 4'), and the scale as its AAontology name together with the category and subcategory fromdf_cat. Terminology is drawn from fixed vocabularies (the part labels and the AAontology category/subcategory wording), so the output is deterministic and consistent across runs. The result can be assigned to adf_featcolumn ('feature_description') for readableCPPoutput without changing the'feature'id string.Added in version 1.1.0.
- Parameters:
features (array-like, shape (n_features,) or pd.DataFrame) – List of feature ids (
'PART-SPLIT-SCALE'). Alternatively, adf_featDataFrame, in which case its'feature'column is used.df_cat (pd.DataFrame, shape (n_scales, n_scales_info), optional) – DataFrame of categories for physicochemical scales. Must contain all scales from
df_scales. Default fromload_scales()withname='scales_cat', unless specified inoptions['df_cat'].start (int, default=1) – Position label of first residue position (starting at N-terminus).
tmd_len (int, default=20) – Length of target middle domain (TMD) (>0).
jmd_n_len (int, default=10) – Length of JMD-N (>=0).
jmd_c_len (int, default=10) – Length of JMD-C (>=0).
- Returns:
feat_descriptions – Human-readable description for each feature, one per feature id.
- Return type:
Notes
Length parameters (
tmd_len,jmd_n_len,jmd_c_len) must match with ids infeatures.Part labels come from a fixed vocabulary; category and subcategory wording is taken verbatim from
df_cat(the AAontology scale categories table).
See also
SequenceFeature.get_feature_names()for the compact label form.
Examples
The
SequenceFeature().get_feature_descriptions()method turns each compactPART-SPLIT-SCALEfeature id into one standardized, human-readable sentence. It complements the shorter labels fromget_feature_names()by also spelling out the sequence region and the split, and by adding the AAontology scale name, category, and subcategory.First, we retrieve feature ids using the
SequenceFeature().get_features()method:import pandas as pd import aaanalysis as aa aa.options["verbose"] = False sf = aa.SequenceFeature() features = sf.get_features() print(features[0:3])
['TMD-Segment(1,1)-ANDN920101', 'TMD-Segment(1,1)-ARGP820101', 'TMD-Segment(1,1)-ARGP820102']
A readable description per feature id is then created with
SequenceFeature().get_feature_descriptions():feature_descriptions = sf.get_feature_descriptions(features=features) for d in feature_descriptions[0:3]: print(d)
TMD, segment 1 of 1 (positions 11-30) — α-CH chemical shifts (backbone-dynamics) [Structure-Activity: Backbone-dynamics (-CH)] TMD, segment 1 of 1 (positions 11-30) — Hydrophobicity [Polarity: Hydrophobicity] TMD, segment 1 of 1 (positions 11-30) — Signal sequence helical potential [Polarity: Amphiphilicity (α-helix)]
Instead of a list of feature ids, a
df_featDataFrame (e.g. from :meth:load_featuresor :class:CPPoutput) can be passed directly — its'feature'column is used automatically:df_feat = aa.load_features(name="DOM_GSEC") feature_descriptions = sf.get_feature_descriptions(features=df_feat) for d in feature_descriptions[0:3]: print(d)
TMD-C+JMD-C, segment 3 of 4 (positions 31-35) — Charge [Energy: Charge] TMD-C+JMD-C, segment 3 of 4 (positions 31-35) — α-helix termination [Conformation: α-helix (C-cap)] TMD-C+JMD-C, segment 6 of 9 (positions 32, 33) — Side chain length [Shape: Side chain length]
Because descriptions are additive (the
'feature'id is unchanged), they can be assigned to an optional'feature_description'column for readabledf_featoutput:df_feat = df_feat.head(5).copy() df_feat["feature_description"] = sf.get_feature_descriptions(features=df_feat) aa.display_df(df_feat[["feature", "feature_description"]], n_rows=5, show_shape=True)
DataFrame shape: (5, 2)
feature feature_description 1 TMD_C_JMD_C-Seg...3,4)-KLEP840101 TMD-C+JMD-C, se...Energy: Charge] 2 TMD_C_JMD_C-Seg...3,4)-FINA910104 TMD-C+JMD-C, se...-helix (C-cap)] 3 TMD_C_JMD_C-Seg...6,9)-LEVM760105 TMD-C+JMD-C, se...e chain length] 4 TMD_C_JMD_C-Seg...3,4)-HUTJ700102 TMD-C+JMD-C, se...nergy: Entropy] 5 TMD_C_JMD_C-Seg...6,9)-RADA880106 TMD-C+JMD-C, se...Volume: Volume] The
startposition and the lengths of the sequence parts (tmd_len,jmd_n_len, andjmd_c_len) can be adjusted; they must match the lengths used to build the feature ids so that the reported residue positions are correct:# Shift the first residue position label from 1 to 20 feature_descriptions = sf.get_feature_descriptions(features=features, start=20) print(feature_descriptions[0])
TMD, segment 1 of 1 (positions 30-49) — α-CH chemical shifts (backbone-dynamics) [Structure-Activity: Backbone-dynamics (-CH)]
# Change TMD length from 20 to 100 (and JMD lengths) feature_descriptions = sf.get_feature_descriptions(features=features, tmd_len=100, jmd_n_len=10, jmd_c_len=10) print(feature_descriptions[0])
TMD, segment 1 of 1 (positions 11-110) — α-CH chemical shifts (backbone-dynamics) [Structure-Activity: Backbone-dynamics (-CH)]
If features with customized scales are used, provide a matching
df_catcontaining thescale_id,category,subcategory, andscale_namecolumns:features_custom = ["TMD-Segment(1,1)-Scale1", "JMD_N-Pattern(N,1,3)-Scale1"] cols = ["scale_id", "category", "subcategory", "scale_name"] vals = ["Scale1", "Scale1_category", "Scale1_subcategory", "scale_name1"] df_cat = pd.DataFrame([vals], columns=cols) feature_descriptions = sf.get_feature_descriptions(features=features_custom, df_cat=df_cat) for d in feature_descriptions: print(d)
TMD, segment 1 of 1 (positions 11-30) — scale_name1 [Scale1_category: Scale1_subcategory] JMD-N, pattern (from N-terminus) (positions 1, 3) — scale_name1 [Scale1_category: Scale1_subcategory]