SequenceFeature.get_feature_positions

static SequenceFeature.get_feature_positions(features, start=1, tmd_len=20, jmd_n_len=10, jmd_c_len=10, tmd_seq=None, jmd_n_seq=None, jmd_c_seq=None)[source]

Create for features a list of corresponding positions or amino acids.

Resolves each PART-SPLIT-SCALE feature id produced by SequenceFeature.get_features() to the concrete residue positions it covers, using the supplied domain lengths. When sequence strings (tmd_seq, jmd_n_seq, jmd_c_seq) are also provided the method returns the actual amino acid segments or patterns instead of position numbers, which is useful for inspecting CPP feature results on a specific protein.

Added in version 0.1.0.

Parameters:

features (array-like, shape (n_features,) or pd.DataFrame) – List of feature ids ('PART-SPLIT-SCALE'). Alternatively, a df_feat DataFrame, in which case its 'feature' column is used.
start (int, default=1) – Position label of first residue position (starting at N-terminus).
tmd_len (int, default=20) – Length of target middle domain (TMD) (>0).
jmd_n_len (int, default=10) – Length of JMD-N (>=0).
jmd_c_len (int, default=10) – Length of JMD-C (>=0).
tmd_seq (str, optional) – Sequence of TMD. If given, respective amino acid segments/patterns will be returned instead of positions.
jmd_n_seq (str, optional) – Sequence of JMD-N. If given, respective amino acid segments/patterns will be returned instead of positions.
jmd_c_seq (str, optional) – Sequence of JMD-C. If given, respective amino acid segments/patterns will be returned instead of positions.

Returns:

list_pos (list) – List of residue positions for each feature. Returned when no sequence arguments are provided.
list_aa (list) – List of amino acid segments or patterns for each feature. Returned when tmd_seq, jmd_n_seq, and jmd_c_seq are all provided.

Notes

Length parameters (tmd_len, jmd_n_len, jmd_c_len) must match with ids in features.
Length of sequence (tmd_seq, jmd_n_seq, jmd_c_seq) must match with ids in features.

Examples

To obtain feature positions, we retrieve feature ids using the SequenceFeature().get_features() method:

import aaanalysis as aa
sf = aa.SequenceFeature()
split_kws = sf.get_split_kws(n_split_min=10, n_split_max=10, split_types=["Segment"])
features = sf.get_features(split_kws=split_kws, list_scales=["ARGP820101"])
print(features[0:5])

['TMD-Segment(1,10)-ARGP820101', 'TMD-Segment(2,10)-ARGP820101', 'TMD-Segment(3,10)-ARGP820101', 'TMD-Segment(4,10)-ARGP820101', 'TMD-Segment(5,10)-ARGP820101']

A list of feature positions can now be created using the SequenceFeature().get_feature_positions() method:

feature_names = sf.get_feature_positions(features=features)
print(feature_names[0:5])

['11,12', '13,14', '15,16', '17,18', '19,20']

Instead of a list of feature ids, a df_feat DataFrame (e.g. from :meth:load_features or :class:CPP output) can be passed directly — its 'feature' column is used automatically:

df_feat = aa.load_features(name="DOM_GSEC")
feature_positions = sf.get_feature_positions(features=df_feat)
print(feature_positions[0:5])

['31,32,33,34,35', '31,32,33,34,35', '32,33', '31,32,33,34,35', '32,33']

The start position and the length of the sequence parts (tmd_len, jmd_n_len, and jmd_c_len) can be adjusted:

# Shift start position from 1 to 20
feature_names = sf.get_feature_positions(features=features, start=20)
print(feature_names[0:5])

['30,31', '32,33', '34,35', '36,37', '38,39']

# Change TMD length from 20 to 100
feature_names = sf.get_feature_names(features=features, tmd_len=40)
print(feature_names[0:5])

['Hydrophobicity [11-14]', 'Hydrophobicity [15-18]', 'Hydrophobicity [19-22]', 'Hydrophobicity [23-26]', 'Hydrophobicity [27-30]']

To obtain amino acid segments or patterns, you can provide sequence parts of respective matching to the respective features using the tmd_seq, jmd_n_seq, and jmd_c_seq parameters:

tmd_seq = "ABCDEFGHIJKLMNOPQRST"
feature_names = sf.get_feature_positions(features=features, tmd_seq=tmd_seq)
print(feature_names[0:5])

['AB', 'CD', 'EF', 'GH', 'IJ']

Further parameters. Provide all three part sequences (jmd_n_seq, tmd_seq, jmd_c_seq) together with matching jmd_n_len / jmd_c_len to return the amino acid segments (instead of numeric positions) for each feature:

# jmd_n_seq / jmd_c_seq lengths must match jmd_n_len / jmd_c_len; tmd_seq matches tmd_len (default 20)
segments = sf.get_feature_positions(features=features,
                                    tmd_seq="ABCDEFGHIJKLMNOPQRST",
                                    jmd_n_seq="abcdefghij", jmd_c_seq="klmnopqrst",
                                    jmd_n_len=10, jmd_c_len=10)
print(segments[0:5])

['AB', 'CD', 'EF', 'GH', 'IJ']