SequenceFeature.get_feature_names

static SequenceFeature.get_feature_names(features, df_cat=None, start=1, tmd_len=20, jmd_n_len=10, jmd_c_len=10)[source]

Convert feature ids (PART-SPLIT-SCALE) into feature names (scale name [positions]).

Replaces the compact PART-SPLIT-SCALE id format produced by SequenceFeature.get_features() with a human-readable string that shows the full scale name from df_cat together with the residue positions covered by the feature’s Split, making feature results easier to interpret in CPP output DataFrames.

Added in version 0.1.0.

Parameters:

features (array-like, shape (n_features,) or pd.DataFrame) – List of feature ids ('PART-SPLIT-SCALE'). Alternatively, a df_feat DataFrame, in which case its 'feature' column is used.
df_cat (pd.DataFrame, shape (n_scales, n_scales_info), optional) – DataFrame of categories for physicochemical scales. Must contain all scales from df_scales. Default from load_scales() with name='scales_cat', unless specified in options['df_cat'].
start (int, default=1) – Position label of first residue position (starting at N-terminus).
tmd_len (int, default=20) – Length of target middle domain (TMD) (>0).
jmd_n_len (int, default=10) – Length of JMD-N (>=0).
jmd_c_len (int, default=10) – Length of JMD-C (>=0).

Returns:

feat_names – Names of features.

Return type:

list of str

Notes

Length parameters (tmd_len, jmd_n_len, jmd_c_len) must match with ids in features.
Positions are given depending on the three split types:
- Segment: [first…last]
- Pattern: [all positions]
- PeriodicPattern: [first..step1/step2..last]

Examples

To obtain feature names, we retrieve feature ids using the SequenceFeature().get_features() method:

import pandas as pd
import aaanalysis as aa
sf = aa.SequenceFeature()
features = sf.get_features()
print(features[0:5])

['TMD-Segment(1,1)-ANDN920101', 'TMD-Segment(1,1)-ARGP820101', 'TMD-Segment(1,1)-ARGP820102', 'TMD-Segment(1,1)-ARGP820103', 'TMD-Segment(1,1)-BEGF750101']

A list of feature names can now be created using the SequenceFeature().get_feature_name() method:

feature_names = sf.get_feature_names(features=features)
print(feature_names[0:5])

['Backbone-dynamics (-CH) [11-30]', 'Hydrophobicity [11-30]', 'Amphiphilicity (α-helix) [11-30]', 'Buried [11-30]', 'α-helix [11-30]']

Instead of a list of feature ids, a df_feat DataFrame (e.g. from :meth:load_features or :class:CPP output) can be passed directly — its 'feature' column is used automatically:

df_feat = aa.load_features(name="DOM_GSEC")
feature_names = sf.get_feature_names(features=df_feat)
print(feature_names[0:5])

['Charge [31-35]', 'α-helix (C-cap) [31-35]', 'Side chain length [32,33]', 'Entropy [31-35]', 'Volume [32,33]']

The start position and the length of the sequence parts (tmd_len, jmd_n_len, and jmd_c_len) can be adjusted:

# Shift start position from 1 to 20
feature_names = sf.get_feature_names(features=features, start=20)
print(feature_names[0:5])

['Backbone-dynamics (-CH) [30-49]', 'Hydrophobicity [30-49]', 'Amphiphilicity (α-helix) [30-49]', 'Buried [30-49]', 'α-helix [30-49]']

# Change TMD length from 20 to 100
feature_names = sf.get_feature_names(features=features, tmd_len=100)
print(feature_names[0:5])

['Backbone-dynamics (-CH) [11-110]', 'Hydrophobicity [11-110]', 'Amphiphilicity (α-helix) [11-110]', 'Buried [11-110]', 'α-helix [11-110]']

If features with customized scales are used, provide a matching df_cat, which must comprise a scale_id, category, and subcategory column:

# Create customized features and df_cat
features = ["TMD-Segment(1,1)-Scale1", "TMD-Segment(1,2)-Scale1", "TMD-Segment(2,2)-Scale1"]
cols = ["scale_id", "category", "subcategory", "scale_name"]
vals = ["Scale1", "Scale1_category", "Scale1_subcategory", "scale_name1"]
df_cat = pd.DataFrame([vals], columns=cols)
feature_names = sf.get_feature_names(features=features, df_cat=df_cat)
print(feature_names)

['Scale1_subcategory [11-30]', 'Scale1_subcategory [11-20]', 'Scale1_subcategory [21-30]']

Further parameters. The JMD flank lengths jmd_n_len and jmd_c_len parameterize the residue-position labels embedded in each feature name:

df_feat = aa.load_features(name="DOM_GSEC")
feature_names = sf.get_feature_names(features=df_feat, jmd_n_len=10, jmd_c_len=10)
print(feature_names[0:5])

['Charge [31-35]', 'α-helix (C-cap) [31-35]', 'Side chain length [32,33]', 'Entropy [31-35]', 'Volume [32,33]']