SequenceFeature.get_features

SequenceFeature.get_features(list_parts=None, all_parts=False, split_kws=None, list_scales=None)[source]

Create list of all feature ids for given Parts, Splits, and Scales.

Enumerates every combination of the requested sequence parts, split types (Segment, Pattern, PeriodicPattern from SequenceFeature.get_split_kws()), and scale names, returning structured PART-SPLIT-SCALE feature ids. These ids can be passed directly to SequenceFeature.feature_matrix() or used to pre-select a feature space before calling CPP.run().

Added in version 0.1.0.

Parameters:

list_parts (list of str, default=["tmd", "jmd_n_tmd_n", "tmd_c_jmd_c"]) – Names of sequence parts which should be created (e.g., ‘tmd’). Length should be >= 1.
all_parts (bool, default=False) – Whether to create DataFrame with all possible sequence parts (if True) or parts given by list_parts.
split_kws (dict, optional) – Dictionary with parameter dictionary for each chosen split_type. Default from SequenceFeature.get_split_kws().
list_scales (list of str, optional) – Names of scales. Default scales from load_scales() with name='scales'.

Returns:

features – Ids of all possible features for combination of Parts, Splits, and Scales with form: PART-SPLIT-SCALE

Return type:

list of str

Notes

If ext_len in aaanalysis.options is not set to > 0, following parts containing extended tmd are not considered for all_parts=True: [‘tmd_e’, ‘ext_c’, ‘ext_n’, ‘ext_n_tmd_n’, ‘tmd_c_ext_c’].

Examples

By default, the SequenceFeature().get_features() method creates all features for the default Parts, Splits, and Scales:

import aaanalysis as aa
sf = aa.SequenceFeature()
features = sf.get_features()
print(f"{len(features)} features were created, such as:")
print(features[0:5])

580140 features were created, such as:
['TMD-Segment(1,1)-ANDN920101', 'TMD-Segment(1,1)-ARGP820101', 'TMD-Segment(1,1)-ARGP820102', 'TMD-Segment(1,1)-ARGP820103', 'TMD-Segment(1,1)-BEGF750101']

Beside the default parts, the default splits can be retrieved using the SequenceFeature().get_split_kws() method and the scales by using the load_scales() function:

split_kws = sf.get_split_kws()
list_scales = list(aa.load_scales())
list_parts = ["tmd", "jmd_n_tmd_n", "tmd_c_jmd_c"]
features = sf.get_features(list_parts=list_parts, split_kws=split_kws, list_scales=list_scales)
n_parts = len(list_parts)
n_scales = len(list_scales)
n_splits = int(len(features) / (n_parts * n_scales))
print(f"{n_parts} parts x {n_splits} splits x {n_scales} scales = {len(features)} features")

3 parts x 330 splits x 586 scales = 580140 features

To obtain features for all Parts set all_parts=True:

features = sf.get_features(all_parts=True)
print(f"{len(features)} features were created")

1547040 features were created

Parts and Scales can be easily changed by adjusting their respective lists. To change Splits, you can create a new split_kws:

split_kws = sf.get_split_kws(split_types=["Segment"], n_split_min=5, n_split_max=5)
features = sf.get_features(list_parts=["tmd"], list_scales=["scale_1"], split_kws=split_kws)
print(f"{len(features)} features were created: ")
print(features)

5 features were created:
['TMD-Segment(1,5)-scale_1', 'TMD-Segment(2,5)-scale_1', 'TMD-Segment(3,5)-scale_1', 'TMD-Segment(4,5)-scale_1', 'TMD-Segment(5,5)-scale_1']