aaanalysis.SequenceFeature.get_feature_names
- static SequenceFeature.get_feature_names(features=None, df_cat=None, start=1, tmd_len=20, jmd_n_len=10, jmd_c_len=10)[source]
Convert feature ids (PART-SPLIT-SCALE) into feature names (scale name [positions]).
- Parameters:
features (array-like, shape (n_features,)) – List of feature ids (>0).
df_cat (pd.DataFrame, shape (n_scales, n_scales_info), optional) – DataFrame of categories for physicochemical scales. Must contain all scales from
df_scales. Default fromload_scales()withname='scales_cat', unless specified inoptions['df_cat'].start (int, default=1) – Position label of first residue position (starting at N-terminus).
tmd_len (int, default=20) – Length of TMD (>0).
jmd_n_len (int, default=10) – Length of JMD-N (>=0).
jmd_c_len (int, default=10) – Length of JMD-C (>=0).
- Returns:
feat_names – Names of features.
- Return type:
Notes
Length parameters (
tmd_len,jmd_n_len,jmd_c_len) must match with ids infeatures.Positions are given depending on the three split types:
Segment: [first…last]
Pattern: [all positions]
PeriodicPattern: [first..step1/step2..last]
Examples
To obtain feature names, we retrieve feature ids using the
SequenceFeature().get_features()method:import pandas as pd import aaanalysis as aa sf = aa.SequenceFeature() features = sf.get_features() print(features[0:5])
['TMD-Segment(1,1)-ANDN920101', 'TMD-Segment(1,1)-ARGP820101', 'TMD-Segment(1,1)-ARGP820102', 'TMD-Segment(1,1)-ARGP820103', 'TMD-Segment(1,1)-BEGF750101']
A list of feature names can now be created using the
SequenceFeature().get_feature_name()method:feature_names = sf.get_feature_names(features=features) print(feature_names[0:5])
['Backbone-dynamics (-CH) [11-30]', 'Hydrophobicity [11-30]', 'Amphiphilicity (α-helix) [11-30]', 'Buried [11-30]', 'α-helix [11-30]']
The
startposition and the length of the sequence parts (tmd_len,jmd_n_len, andjmd_c_len) can be adjusted:# Shift start position from 1 to 20 feature_names = sf.get_feature_names(features=features, start=20) print(feature_names[0:5])
['Backbone-dynamics (-CH) [30-49]', 'Hydrophobicity [30-49]', 'Amphiphilicity (α-helix) [30-49]', 'Buried [30-49]', 'α-helix [30-49]']
# Change TMD length from 20 to 100 feature_names = sf.get_feature_names(features=features, tmd_len=100) print(feature_names[0:5])
['Backbone-dynamics (-CH) [11-110]', 'Hydrophobicity [11-110]', 'Amphiphilicity (α-helix) [11-110]', 'Buried [11-110]', 'α-helix [11-110]']
If features with customized scales are used, provide a matching
df_cat, which must comprise ascale_id,category, andsubcategorycolumn:# Create customized features and df_cat features = ["TMD-Segment(1,1)-Scale1", "TMD-Segment(1,2)-Scale1", "TMD-Segment(2,2)-Scale1"] cols = ["scale_id", "category", "subcategory", "scale_name"] vals = ["Scale1", "Scale1_category", "Scale1_subcategory", "scale_name1"] df_cat = pd.DataFrame([vals], columns=cols) feature_names = sf.get_feature_names(features=features, df_cat=df_cat) print(feature_names)
['Scale1_subcategory [11-30]', 'Scale1_subcategory [11-20]', 'Scale1_subcategory [21-30]']