SequenceFeature: Creation of CPP feature components
A CPP feature is the combination of the three components:
Part: A continuous subset of a sequence, such as a protein domain.
Split: Continuous or discontinuous subset of a Part, either segment or pattern.
Scale: A physicochemical scale, i.e., a set of numerical values (typically [0-1]) assigned to amino acids.
While Scales can be obtained using the load_scales() function
and selecting by the AAclust class, the SequenceFeature class is
designed to create various forms of Parts and Splits, which can
then all be provided to CPP. See the SequenceFeature
API
for more details.
Creation of Parts
To define Parts, the SequenceFeature class provides the
SequenceFeature.get_df_parts() method. To demonstrate this method,
we first obtain an example sequence dataset using the load_dataset()
function
import aaanalysis as aa
aa.options["verbose"] = False
sf = aa.SequenceFeature()
df_seq = aa.load_dataset(name="SEQ_CAPSID", min_len=40, max_len=100)
aa.display_df(df_seq, n_rows=3, show_shape=True, char_limit=15)
DataFrame shape: (172, 3)
| entry | sequence | label | |
|---|---|---|---|
| 1 | CAPSID_4 | MERGDIP...EMDAGLI | 0 |
| 2 | CAPSID_26 | MDTGDRL...PANAGMY | 0 |
| 3 | CAPSID_35 | MTKLLLT...LDDGQAA | 0 |
By default, three sequence parts (tmd, jmd_n_tmd_n,
tmd_c_jmd_c) with a jmd_n and jmd_c length of each 10
residues are provided:
df_parts = sf.get_df_parts(df_seq=df_seq)
aa.display_df(df=df_parts, n_rows=5, show_shape=True, char_limit=15)
DataFrame shape: (172, 3)
| tmd | jmd_n_tmd_n | tmd_c_jmd_c | |
|---|---|---|---|
| entry | |||
| CAPSID_4 | VGRHRRI...KRRQALE | MERGDIP...KAEDVSK | YQRIRDE...EMDAGLI |
| CAPSID_26 | GEVAALF...PAAPTGP | MDTGDRL...PGGHRRF | RESEVRA...PANAGMY |
| CAPSID_35 | AADLLGV...LLAFVHR | MTKLLLT...LNSGDLE | SVRIGRA...LDDGQAA |
| CAPSID_58 | KYLEALF...NTLRKGQ | MRWDGLS...TVYRWLQ | TGVIPAY...VNDEDQP |
| CAPSID_141 | YLTLSEA...SVLNEPI | MYLTIKE...FDGQQHL | INKEQFN...PDVKDED |
Any combination of valid sequence parts can be obtained using the
list_part parameter:
df_parts = sf.get_df_parts(df_seq=df_seq, list_parts=['jmd_n', 'tmd', 'jmd_c', 'tmd_jmd'])
aa.display_df(df=df_parts, n_rows=3, show_shape=True, char_limit=15)
DataFrame shape: (172, 4)
| jmd_n | tmd | jmd_c | tmd_jmd | |
|---|---|---|---|---|
| entry | ||||
| CAPSID_4 | MERGDIPFKY | VGRHRRI...KRRQALE | ELAEMDAGLI | MERGDIP...EMDAGLI |
| CAPSID_26 | MDTGDRLLTP | GEVAALF...PAAPTGP | GPGPANAGMY | MDTGDRL...PANAGMY |
| CAPSID_35 | MTKLLLTPTE | AADLLGV...LLAFVHR | LRGLDDGQAA | MTKLLLT...LDDGQAA |
Set the length of both JMDs by the jmd_c_len and jmd_n_len
parameters:
df_parts = sf.get_df_parts(df_seq=df_seq, list_parts=['jmd_n', 'tmd', 'jmd_c', 'tmd_jmd'], jmd_c_len=8, jmd_n_len=8)
aa.display_df(df=df_parts, n_rows=3, show_shape=True, char_limit=15)
DataFrame shape: (172, 4)
| jmd_n | tmd | jmd_c | tmd_jmd | |
|---|---|---|---|---|
| entry | ||||
| CAPSID_4 | MERGDIPF | KYVGRHR...RQALEEL | AEMDAGLI | MERGDIP...EMDAGLI |
| CAPSID_26 | MDTGDRLL | TPGEVAA...APTGPGP | GPANAGMY | MDTGDRL...PANAGMY |
| CAPSID_35 | MTKLLLTP | TEAADLL...AFVHRLR | GLDDGQAA | MTKLLLT...LDDGQAA |
For more details, see the SequenceFeature.get_df_parts API.
Creation of Splits
Three different types of splits exist:
Segment: continuous sub-sequence.
Pattern: non-periodic discontinuous sub-sequence
PeriodicPattern: periodic discontinuous sub-sequence.
Due to the plethora of combinatorial options, SeuqenceFeature has a
special method (SequenceFeature.get_split_kws()) to create a
dictionary containing all relevant Split information.
You can get the default arguments for all split types as follows:
split_kws = sf.get_split_kws()
split_kws
{'Segment': {'n_split_min': 1, 'n_split_max': 15},
'Pattern': {'steps': [3, 4], 'n_min': 2, 'n_max': 4, 'len_max': 15},
'PeriodicPattern': {'steps': [3, 4]}}
You can also retrieve arguments for specific split types:
split_kws = sf.get_split_kws(split_types=["Segment", "Pattern"])
split_kws
{'Segment': {'n_split_min': 1, 'n_split_max': 15},
'Pattern': {'steps': [3, 4], 'n_min': 2, 'n_max': 4, 'len_max': 15}}
The arguments for each split type can be adjusted. For Segments,
their minimum and maximum length can be changed by the n_split_min
(default=1) and n_split_max (default=15) parameters:
split_kws = sf.get_split_kws(split_types="Segment", n_split_min=5, n_split_max=10)
split_kws
{'Segment': {'n_split_min': 5, 'n_split_max': 10}}
For PeriodicPattern, the step size of each odd and even step can be
specified as follows using the steps_periodicpattern (default=[3,
4]):
split_kws = sf.get_split_kws(split_types="PeriodicPattern", steps_periodicpattern=[5, 10])
split_kws
{'PeriodicPattern': {'steps': [5, 10]}}
And for Patterns, the step size, the minimum and maximum number of
steps, and the maximum residue size of the pattern can be adjusted using
the steps_pattern (default=[3, 4]), n_min (default=2), n_max
(default=4), and len_max (default=10) parameters:
split_kws = sf.get_split_kws(split_types="Pattern", steps_pattern=[5, 10], n_min=3, n_max=5, len_max=30)
split_kws
{'Pattern': {'steps': [5, 10], 'n_min': 3, 'n_max': 5, 'len_max': 30}}
Combining Parts + Splits + Scales
Any combination of the three feature combinations can be provided to
CPP, which will create all Part-Split-Scale combinations and
filter them down to a user-defined number (default=100) of non-redundant
scales through the CPP.run() method:
# Load default scales, parts, and splits
df_scales = aa.load_scales()
df_parts = sf.get_df_parts(df_seq=df_seq)
split_kws = sf.get_split_kws()
# Get labels for test and reference class
labels = df_seq["label"].to_list()
# Creat CPP object by providing three feature components
cpp = aa.CPP(df_parts=df_parts, split_kws=split_kws, df_scales=df_scales)
df_feat = cpp.run(labels=labels)
aa.display_df(df=df_feat, show_shape=True)
DataFrame shape: (100, 13)
| feature | category | subcategory | scale_name | scale_description | abs_auc | abs_mean_dif | mean_dif | std_test | std_ref | p_val_mann_whitney | p_val_fdr_bh | positions | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | TMD_C_JMD_C-Seg...,15)-AURR980107 | Conformation | α-helix (N-term) | α-helix (N-terminal, inside) | Normalized posi...ora-Rose, 1998) | 0.268000 | 0.136000 | -0.136000 | 0.143000 | 0.152000 | 0.000000 | 0.000045 | 37 |
| 2 | TMD-Segment(2,12)-PALJ810113 | Conformation | α-helix (left-handed) | β-turn (α class) | Normalized freq...u et al., 1981) | 0.263000 | 0.144000 | 0.144000 | 0.152000 | 0.132000 | 0.000000 | 0.000040 | 12,13 |
| 3 | TMD-Segment(1,6)-TANS770107 | Conformation | α-helix (left-handed) | α-helix (left-handed) | Normalized freq...Scheraga, 1977) | 0.258000 | 0.101000 | 0.101000 | 0.113000 | 0.089000 | 0.000000 | 0.000042 | 11,12,13 |
| 4 | TMD-PeriodicPat...3,2)-HUTJ700103 | Energy | Entropy | Entropy | Entropy of form...Hutchens, 1970) | 0.257000 | 0.065000 | -0.065000 | 0.065000 | 0.074000 | 0.000000 | 0.000030 | 11,15,18,22,25,29 |
| 5 | JMD_N_TMD_N-Pat...,13)-KARS160119 | Shape | Graph (1. eigenvalue) | Eigenvalue (maximum) | Weighted maximu...-Knisley, 2016) | 0.256000 | 0.115000 | -0.115000 | 0.130000 | 0.120000 | 0.000000 | 0.000029 | 2,6,10,13 |
| 6 | JMD_N_TMD_N-Seg...1,4)-ROBB760102 | Conformation | α-helix (N-term) | α-helix (N-terminal) | Information mea...n-Suzuki, 1976) | 0.256000 | 0.086000 | -0.086000 | 0.108000 | 0.086000 | 0.000000 | 0.000030 | 1,2,3,4,5 |
| 7 | TMD-Segment(1,6)-BULH740102 | ASA/Volume | Partial specific volume | Partial specific volume | Apparent partia...l-Breese, 1974) | 0.254000 | 0.097000 | -0.097000 | 0.124000 | 0.098000 | 0.000000 | 0.000021 | 11,12,13 |
| 8 | TMD_C_JMD_C-Seg...4,4)-RICJ880105 | Conformation | α-helix (N-term) | α-helix (N-terminal) | Relative prefer...chardson, 1988) | 0.253000 | 0.090000 | -0.090000 | 0.082000 | 0.111000 | 0.000000 | 0.000020 | 36,37,38,39,40 |
| 9 | JMD_N_TMD_N-Seg...1,3)-FAUJ880112 | Energy | Charge (negative) | Charge (negative) | Negative charge...e et al., 1988) | 0.253000 | 0.086000 | -0.086000 | 0.081000 | 0.104000 | 0.000000 | 0.000020 | 1,2,3,4,5,6 |
| 10 | TMD_C_JMD_C-Seg...1,2)-SNEP660104 | Others | PC 4 | Principal Component 4 (Sneath) | Principal compo... (Sneath, 1966) | 0.251000 | 0.069000 | 0.069000 | 0.078000 | 0.064000 | 0.000000 | 0.000023 | 21,22,23,24,25,26,27,28,29,30 |
| 11 | TMD-Segment(1,6)-SUEM840101 | Structure-Activity | Stability (helix-coil) | Stability (helix-coil) | Zimm-Bragg para...i et al., 1984) | 0.249000 | 0.090000 | -0.090000 | 0.114000 | 0.084000 | 0.000000 | 0.000025 | 11,12,13 |
| 12 | TMD_C_JMD_C-Seg...5,5)-BURA740102 | Conformation | β-strand | Extended | Normalized freq...s et al., 1974) | 0.249000 | 0.069000 | 0.069000 | 0.078000 | 0.068000 | 0.000000 | 0.000026 | 37,38,39,40 |
| 13 | JMD_N_TMD_N-Seg...2,5)-RICJ880104 | Conformation | Unclassified (Conformation) | α-helix (N-terminal, inside) | Relative prefer...chardson, 1988) | 0.247000 | 0.090000 | -0.090000 | 0.090000 | 0.100000 | 0.000000 | 0.000026 | 5,6,7,8 |
| 14 | TMD_C_JMD_C-Seg...,12)-WOEC730101 | Polarity | Hydrophilicity | Polarity (hydrophilicity) | Polar requireme...t (Woese, 1973) | 0.245000 | 0.115000 | -0.115000 | 0.164000 | 0.131000 | 0.000000 | 0.000027 | 37,38 |
| 15 | TMD_C_JMD_C-Seg...4,4)-LAWE840101 | Polarity | Hydrophobicity | Transfer free e...TFE) to outside | Transfer free e...n et al., 1984) | 0.244000 | 0.072000 | 0.072000 | 0.081000 | 0.078000 | 0.000000 | 0.000030 | 36,37,38,39,40 |
| 16 | TMD_C_JMD_C-Pat...,14)-LINS030103 | ASA/Volume | Accessible surface area (ASA) | Hydrophilic ASA | Hydrophilic acc...s et al., 2003) | 0.243000 | 0.137000 | -0.137000 | 0.158000 | 0.154000 | 0.000000 | 0.000033 | 27,30,33,36 |
| 17 | TMD-Segment(1,6)-QIAN880131 | Conformation | Coil | Coil | Weights for coi...ejnowski, 1988) | 0.242000 | 0.076000 | 0.076000 | 0.107000 | 0.091000 | 0.000000 | 0.000034 | 11,12,13 |
| 18 | JMD_N_TMD_N-Pat...,13)-LEVM760104 | Shape | Shape and Surface | Side chain angle (Phi) | Side chain tors... (Levitt, 1976) | 0.241000 | 0.094000 | 0.094000 | 0.123000 | 0.097000 | 0.000000 | 0.000034 | 2,6,10,13 |
| 19 | TMD-Segment(1,6)-BULH740101 | Composition | MPs (anchor) | TFE to surface | Transfer free e...l-Breese, 1974) | 0.240000 | 0.111000 | 0.111000 | 0.150000 | 0.122000 | 0.000000 | 0.000036 | 11,12,13 |
| 20 | TMD-Segment(1,6)-CIDH920103 | Polarity | Hydrophobicity | Hydrophobicity | Normalized hydr...d et al., 1992) | 0.235000 | 0.089000 | -0.089000 | 0.127000 | 0.096000 | 0.000000 | 0.000044 | 11,12,13 |
| 21 | TMD_C_JMD_C-Seg...1,1)-ZIMJ680103 | Polarity | Hydrophilicity | Polarity (hydrophilicity) | Polarity (Zimme...n et al., 1968) | 0.233000 | 0.073000 | -0.073000 | 0.091000 | 0.078000 | 0.000000 | 0.000049 | 21,22,23,24,25,...,36,37,38,39,40 |
| 22 | TMD_C_JMD_C-Pat...,14)-SNEP660102 | Others | PC 2 | Principal Component 2 (Sneath) | Principal compo... (Sneath, 1966) | 0.230000 | 0.121000 | 0.121000 | 0.133000 | 0.149000 | 0.000000 | 0.000057 | 27,30,33,36 |
| 23 | JMD_N_TMD_N-Seg...2,6)-ROBB760111 | Conformation | β-turn (C-term) | β-turn (C-terminal) | Information mea...n-Suzuki, 1976) | 0.230000 | 0.093000 | 0.093000 | 0.106000 | 0.104000 | 0.000000 | 0.000057 | 4,5,6 |
| 24 | TMD-PeriodicPat...4,3)-AURR980110 | Conformation | α-helix | α-helix (middle) | Normalized posi...ora-Rose, 1998) | 0.228000 | 0.070000 | -0.070000 | 0.091000 | 0.072000 | 0.000000 | 0.000061 | 12,16,20,24,28 |
| 25 | JMD_N_TMD_N-Pat...,13)-SUEM840102 | Structure-Activity | Unclassified (S...cture-Activity) | Stability (extended-coil) | Zimm-Bragg para...i et al., 1984) | 0.227000 | 0.122000 | -0.122000 | 0.156000 | 0.151000 | 0.000000 | 0.000064 | 2,5,9,13 |
| 26 | JMD_N_TMD_N-Seg...7,9)-RACS820103 | Conformation | Unclassified (Conformation) | α-helix (left-handed) | Average relativ...Scheraga, 1982) | 0.225000 | 0.139000 | -0.139000 | 0.184000 | 0.163000 | 0.000000 | 0.000068 | 14,15 |
| 27 | TMD-Segment(1,6)-CORJ870108 | Polarity | Hydrophilicity | TOTLS index | TOTLS index (Co...e et al., 1987) | 0.225000 | 0.090000 | 0.090000 | 0.132000 | 0.107000 | 0.000000 | 0.000068 | 11,12,13 |
| 28 | TMD-Segment(1,6)-PARJ860101 | Others | PC 5 | HPLC parameter | HPLC parameter ...r et al., 1986) | 0.225000 | 0.086000 | 0.086000 | 0.130000 | 0.100000 | 0.000000 | 0.000072 | 11,12,13 |
| 29 | TMD_C_JMD_C-Seg...2,4)-VASM830101 | Conformation | Unclassified (Conformation) | α-helix | Relative popula...z et al., 1983) | 0.225000 | 0.071000 | -0.071000 | 0.091000 | 0.084000 | 0.000000 | 0.000073 | 26,27,28,29,30 |
| 30 | JMD_N_TMD_N-Pat...6,9)-KARS160119 | Shape | Graph (1. eigenvalue) | Eigenvalue (maximum) | Weighted maximu...-Knisley, 2016) | 0.224000 | 0.116000 | -0.116000 | 0.158000 | 0.129000 | 0.000000 | 0.000074 | 2,6,9 |
| 31 | TMD_C_JMD_C-Seg...,13)-AURR980106 | Conformation | α-helix (N-term) | α-helix (N-terminal) | Normalized posi...ora-Rose, 1998) | 0.224000 | 0.081000 | -0.081000 | 0.099000 | 0.135000 | 0.000000 | 0.000073 | 36 |
| 32 | JMD_N_TMD_N-Pat...,12)-RACS820103 | Conformation | Unclassified (Conformation) | α-helix (left-handed) | Average relativ...Scheraga, 1982) | 0.223000 | 0.145000 | -0.145000 | 0.187000 | 0.148000 | 0.000000 | 0.000077 | 9,12,15,18 |
| 33 | TMD-Segment(4,12)-RACS820103 | Conformation | Unclassified (Conformation) | α-helix (left-handed) | Average relativ...Scheraga, 1982) | 0.223000 | 0.127000 | -0.127000 | 0.171000 | 0.148000 | 0.000001 | 0.000081 | 16 |
| 34 | TMD-Segment(3,7)-RACS820103 | Conformation | Unclassified (Conformation) | α-helix (left-handed) | Average relativ...Scheraga, 1982) | 0.223000 | 0.100000 | -0.100000 | 0.136000 | 0.113000 | 0.000000 | 0.000077 | 16,17,18 |
| 35 | TMD-Segment(6,11)-FUKS010101 | Composition | AA composition (surface) | Proteins of thermophiles (INT) | Surface composi...ishikawa, 2001) | 0.222000 | 0.084000 | -0.084000 | 0.097000 | 0.104000 | 0.000001 | 0.000081 | 20 |
| 36 | JMD_N_TMD_N-Seg...,15)-LINS030116 | ASA/Volume | Accessible surface area (ASA) | ASA (folded β-strand) | Total median ac...s et al., 2003) | 0.221000 | 0.097000 | -0.097000 | 0.119000 | 0.145000 | 0.000001 | 0.000089 | 1 |
| 37 | TMD-Segment(1,6)-VINM940102 | Structure-Activity | Flexibility | Flexibility (0 ...igid neighbors) | Normalized flex...n et al., 1994) | 0.221000 | 0.094000 | 0.094000 | 0.134000 | 0.104000 | 0.000001 | 0.000086 | 11,12,13 |
| 38 | TMD-Segment(1,6)-NAKH900112 | Composition | Membrane proteins (MPs) | Membrane proteins | Transmembrane r...a et al., 1990) | 0.221000 | 0.073000 | -0.073000 | 0.108000 | 0.095000 | 0.000001 | 0.000086 | 11,12,13 |
| 39 | TMD_C_JMD_C-Seg...,14)-DAYM780201 | Others | Mutability | Mutability | Relative mutabi... et al., 1978b) | 0.220000 | 0.119000 | -0.119000 | 0.161000 | 0.124000 | 0.000001 | 0.000091 | 39,40 |
| 40 | JMD_N_TMD_N-Pat...,14)-OOBM770105 | Energy | Non-bonded energy | Non-bonded energy per residue | Short and mediu...take-Ooi, 1977) | 0.220000 | 0.076000 | 0.076000 | 0.100000 | 0.094000 | 0.000001 | 0.000091 | 2,6,10,14 |
| 41 | TMD-Segment(1,5)-QIAN880119 | Conformation | β-sheet | β-sheet | Weights for bet...ejnowski, 1988) | 0.220000 | 0.065000 | -0.065000 | 0.082000 | 0.082000 | 0.000001 | 0.000091 | 11,12,13,14 |
| 42 | JMD_N_TMD_N-Pat...2,6)-FASG760101 | ASA/Volume | Volume | Weight | Molecular weigh... (Fasman, 1976) | 0.219000 | 0.112000 | -0.112000 | 0.148000 | 0.128000 | 0.000001 | 0.000094 | 2,6 |
| 43 | TMD_C_JMD_C-Pat...,14)-JACR890101 | Polarity | Hydrophobicity (surrounding) | Hydrophobicity (surrounding) | Weights from th...bs-White, 1989) | 0.219000 | 0.104000 | 0.104000 | 0.144000 | 0.141000 | 0.000001 | 0.000096 | 27,30,33,36 |
| 44 | JMD_N_TMD_N-Pat...,13)-PALJ810109 | Conformation | α-helix | α-helix | Normalized freq...u et al., 1981) | 0.218000 | 0.115000 | -0.115000 | 0.132000 | 0.143000 | 0.000001 | 0.000104 | 3,6,9,13 |
| 45 | TMD-Segment(1,6)-RACS820113 | Shape | Unclassified (Shape) | Side chain angle (theta) | Value of theta(...Scheraga, 1982) | 0.218000 | 0.074000 | 0.074000 | 0.103000 | 0.076000 | 0.000001 | 0.000100 | 11,12,13 |
| 46 | TMD-PeriodicPat...4,3)-FINA770101 | Structure-Activity | Stability (helix-coil) | Stability (helix-coil) | Helix-coil equi...-Ptitsyn, 1977) | 0.218000 | 0.068000 | -0.068000 | 0.085000 | 0.072000 | 0.000001 | 0.000103 | 12,16,20,24,28 |
| 47 | JMD_N_TMD_N-Seg...1,2)-TANS770107 | Conformation | α-helix (left-handed) | α-helix (left-handed) | Normalized freq...Scheraga, 1977) | 0.218000 | 0.066000 | 0.066000 | 0.092000 | 0.063000 | 0.000001 | 0.000102 | 1,2,3,4,5,6,7,8,9,10 |
| 48 | TMD-Segment(1,6)-VENT840101 | Others | Unclassified (Others) | Bitterness | Bitterness (Venanzi, 1984) | 0.217000 | 0.125000 | -0.125000 | 0.189000 | 0.160000 | 0.000001 | 0.000108 | 11,12,13 |
| 49 | JMD_N_TMD_N-Seg...,15)-FASG760105 | Polarity | Unclassified (Polarity) | pK-C | pK-C (Fasman, 1976) | 0.217000 | 0.093000 | 0.093000 | 0.117000 | 0.119000 | 0.000001 | 0.000104 | 1 |
| 50 | JMD_N_TMD_N-Pat...,13)-HUTJ700103 | Energy | Entropy | Entropy | Entropy of form...Hutchens, 1970) | 0.217000 | 0.082000 | -0.082000 | 0.106000 | 0.100000 | 0.000001 | 0.000105 | 2,6,10,13 |
| 51 | JMD_N_TMD_N-Pat...2,6)-JUNJ780101 | Composition | AA composition | AA composition | Sequence freque... (Jungck, 1978) | 0.216000 | 0.123000 | 0.123000 | 0.165000 | 0.150000 | 0.000001 | 0.000111 | 2,6 |
| 52 | TMD_C_JMD_C-Seg...4,4)-CHAM830107 | Energy | Charge (negative) | Charge (transfer) | A parameter of ...-Charton, 1983) | 0.216000 | 0.117000 | -0.117000 | 0.140000 | 0.153000 | 0.000001 | 0.000111 | 36,37,38,39,40 |
| 53 | TMD_C_JMD_C-Pat...,14)-GEIM800103 | Conformation | Unclassified (Conformation) | α-helix (β-proteins) | Alpha-helix ind...-Roberts, 1980) | 0.215000 | 0.103000 | -0.103000 | 0.140000 | 0.119000 | 0.000001 | 0.000123 | 27,31,35,38 |
| 54 | JMD_N_TMD_N-Seg...2,4)-BULH740101 | Composition | MPs (anchor) | TFE to surface | Transfer free e...l-Breese, 1974) | 0.214000 | 0.098000 | 0.098000 | 0.162000 | 0.114000 | 0.000001 | 0.000126 | 6,7,8,9,10 |
| 55 | JMD_N_TMD_N-Seg...,14)-KHAG800101 | Others | Unclassified (Others) | Kerr-constant | The Kerr-consta...an-Moore, 1980) | 0.214000 | 0.081000 | 0.081000 | 0.119000 | 0.094000 | 0.000001 | 0.000127 | 1 |
| 56 | JMD_N_TMD_N-Pat...2,5)-KARS160119 | Shape | Graph (1. eigenvalue) | Eigenvalue (maximum) | Weighted maximu...-Knisley, 2016) | 0.213000 | 0.129000 | -0.129000 | 0.184000 | 0.133000 | 0.000001 | 0.000131 | 2,5 |
| 57 | TMD_C_JMD_C-Pat...,11)-KARS160112 | Shape | Graph (2. eigenvalue) | Eigenvalue (2. smallest) | Second smallest...-Knisley, 2016) | 0.213000 | 0.125000 | 0.125000 | 0.157000 | 0.162000 | 0.000001 | 0.000131 | 30,33,37 |
| 58 | JMD_N_TMD_N-Seg...,11)-BULH740102 | ASA/Volume | Partial specific volume | Partial specific volume | Apparent partia...l-Breese, 1974) | 0.213000 | 0.118000 | -0.118000 | 0.180000 | 0.134000 | 0.000002 | 0.000132 | 8,9 |
| 59 | TMD_C_JMD_C-Per...3,2)-KARS160112 | Shape | Graph (2. eigenvalue) | Eigenvalue (2. smallest) | Second smallest...-Knisley, 2016) | 0.213000 | 0.071000 | 0.071000 | 0.090000 | 0.086000 | 0.000001 | 0.000132 | 22,25,28,31,34,37,40 |
| 60 | JMD_N_TMD_N-Per...3,2)-AURR980110 | Conformation | α-helix | α-helix (middle) | Normalized posi...ora-Rose, 1998) | 0.213000 | 0.067000 | -0.067000 | 0.099000 | 0.091000 | 0.000001 | 0.000131 | 1,5,8,12,15,19 |
| 61 | TMD-Segment(1,6)-FAUJ880101 | Shape | Steric parameter | Shape Index | Graph shape ind...e et al., 1988) | 0.213000 | 0.064000 | -0.064000 | 0.095000 | 0.079000 | 0.000002 | 0.000130 | 11,12,13 |
| 62 | JMD_N_TMD_N-Seg...,15)-WOLS870102 | Others | PC 3 | Principal Component 2 (Wold) | Principal prope...d et al., 1987) | 0.212000 | 0.099000 | -0.099000 | 0.131000 | 0.091000 | 0.000002 | 0.000138 | 1 |
| 63 | TMD_C_JMD_C-Pat...,14)-FAUJ880113 | Polarity | Unclassified (Polarity) | pK-C | pK-a(RCOOH) (Fa...e et al., 1988) | 0.212000 | 0.077000 | -0.077000 | 0.127000 | 0.089000 | 0.000002 | 0.000136 | 27,30,33,37 |
| 64 | JMD_N_TMD_N-Pat...,13)-AURR980106 | Conformation | α-helix (N-term) | α-helix (N-terminal) | Normalized posi...ora-Rose, 1998) | 0.212000 | 0.071000 | -0.071000 | 0.092000 | 0.108000 | 0.000002 | 0.000135 | 3,6,9,13 |
| 65 | JMD_N_TMD_N-Seg...1,9)-VASM830101 | Conformation | Unclassified (Conformation) | α-helix | Relative popula...z et al., 1983) | 0.211000 | 0.114000 | -0.114000 | 0.160000 | 0.136000 | 0.000002 | 0.000144 | 1,2 |
| 66 | JMD_N_TMD_N-Pat...,14)-CHAM820101 | Polarity | Amphiphilicity | Polarizability | Polarizability ...-Charton, 1982) | 0.211000 | 0.072000 | -0.072000 | 0.110000 | 0.095000 | 0.000002 | 0.000144 | 2,6,10,14 |
| 67 | TMD-Segment(4,12)-KARS160120 | Shape | Unclassified (Shape) | Eigenvalue (minimum) | Weighted minimu...-Knisley, 2016) | 0.211000 | 0.065000 | 0.065000 | 0.099000 | 0.121000 | 0.000002 | 0.000143 | 16 |
| 68 | JMD_N_TMD_N-Seg...,12)-NADH010107 | Polarity | Unclassified (Polarity) | Hydrophobicity ...rmation values) | Hydropathy scal...h et al., 2001) | 0.210000 | 0.089000 | -0.089000 | 0.101000 | 0.126000 | 0.000002 | 0.000147 | 1 |
| 69 | JMD_N_TMD_N-Pat...,11)-QIAN880127 | Conformation | Coil (N-term) | Coil (N-terminal) | Weights for coi...ejnowski, 1988) | 0.209000 | 0.083000 | -0.083000 | 0.119000 | 0.114000 | 0.000002 | 0.000156 | 2,5,8,11 |
| 70 | TMD-Segment(4,7)-FUKS010101 | Composition | AA composition (surface) | Proteins of thermophiles (INT) | Surface composi...ishikawa, 2001) | 0.209000 | 0.064000 | -0.064000 | 0.084000 | 0.090000 | 0.000002 | 0.000157 | 19,20,21 |
| 71 | JMD_N_TMD_N-Pat...,13)-VASM830101 | Conformation | Unclassified (Conformation) | α-helix | Relative popula...z et al., 1983) | 0.208000 | 0.115000 | -0.115000 | 0.157000 | 0.142000 | 0.000003 | 0.000162 | 3,6,9,13 |
| 72 | TMD_C_JMD_C-Pat...,14)-FASG760103 | Others | Unclassified (Others) | Optical rotation | Optical rotatio... (Fasman, 1976) | 0.207000 | 0.097000 | -0.097000 | 0.149000 | 0.092000 | 0.000003 | 0.000165 | 27,30,33,37 |
| 73 | JMD_N_TMD_N-Pat...,13)-WERD780102 | Energy | Unclassified (Energy) | Free energy change | Free energy cha...Scheraga, 1978) | 0.207000 | 0.076000 | 0.076000 | 0.117000 | 0.087000 | 0.000003 | 0.000170 | 6,9,13 |
| 74 | JMD_N_TMD_N-Seg...2,4)-PLIV810101 | ASA/Volume | Partial specific volume | Partition coefficient | Partition Coeff...a et al., 1981) | 0.205000 | 0.097000 | -0.097000 | 0.165000 | 0.115000 | 0.000004 | 0.000185 | 6,7,8,9,10 |
| 75 | TMD-Pattern(N,4,7)-LEVM760106 | ASA/Volume | Volume | Volume | van der Waals p... (Levitt, 1976) | 0.204000 | 0.132000 | -0.132000 | 0.187000 | 0.165000 | 0.000004 | 0.000197 | 14,17 |
| 76 | JMD_N_TMD_N-Pat...,11)-ROBB760111 | Conformation | β-turn (C-term) | β-turn (C-terminal) | Information mea...n-Suzuki, 1976) | 0.204000 | 0.088000 | 0.088000 | 0.129000 | 0.108000 | 0.000004 | 0.000203 | 1,4,8,11 |
| 77 | JMD_N_TMD_N-Pat...,14)-WOLS870102 | Others | PC 3 | Principal Component 2 (Wold) | Principal prope...d et al., 1987) | 0.204000 | 0.077000 | -0.077000 | 0.115000 | 0.103000 | 0.000004 | 0.000195 | 2,6,10,14 |
| 78 | TMD-Pattern(N,3,6)-FAUJ880113 | Polarity | Unclassified (Polarity) | pK-C | pK-a(RCOOH) (Fa...e et al., 1988) | 0.204000 | 0.071000 | -0.071000 | 0.134000 | 0.110000 | 0.000004 | 0.000194 | 13,16 |
| 79 | JMD_N_TMD_N-Seg...2,4)-QIAN880114 | Conformation | β-sheet (N-term) | β-sheet (N-terminal) | Weights for bet...ejnowski, 1988) | 0.203000 | 0.080000 | 0.080000 | 0.125000 | 0.088000 | 0.000004 | 0.000208 | 6,7,8,9,10 |
| 80 | JMD_N_TMD_N-Seg...2,4)-MANP780101 | Polarity | Hydrophobicity (surrounding) | Surrounding hydrophobicity | Average surroun...nnuswamy, 1978) | 0.203000 | 0.078000 | -0.078000 | 0.142000 | 0.105000 | 0.000005 | 0.000208 | 6,7,8,9,10 |
| 81 | TMD_C_JMD_C-Per...3,1)-SNEP660102 | Others | PC 2 | Principal Component 2 (Sneath) | Principal compo... (Sneath, 1966) | 0.203000 | 0.071000 | 0.071000 | 0.081000 | 0.100000 | 0.000004 | 0.000207 | 21,25,28,32,35,39 |
| 82 | TMD_C_JMD_C-Seg...,12)-LINS030121 | ASA/Volume | Accessible surface area (ASA) | Hydrophilic ASA...olded β-strand) | % Hydrophilic a...s et al., 2003) | 0.202000 | 0.108000 | -0.108000 | 0.198000 | 0.155000 | 0.000005 | 0.000218 | 37,38 |
| 83 | JMD_N_TMD_N-Pat...,14)-YUTK870104 | Energy | Free energy (unfolding) | Free energy (unfolding) | Activation Gibb...i et al., 1987) | 0.202000 | 0.087000 | 0.087000 | 0.097000 | 0.160000 | 0.000005 | 0.000218 | 7,11,14,17 |
| 84 | TMD-Segment(1,6)-RACS770101 | Shape | Reduced distance | Reduced distance (C-α) | Average reduced...Scheraga, 1977) | 0.202000 | 0.082000 | 0.082000 | 0.135000 | 0.109000 | 0.000005 | 0.000217 | 11,12,13 |
| 85 | JMD_N_TMD_N-Seg...2,6)-KLEP840101 | Energy | Charge | Charge | Net charge (Kle...n et al., 1984) | 0.201000 | 0.075000 | 0.075000 | 0.106000 | 0.107000 | 0.000006 | 0.000236 | 4,5,6 |
| 86 | JMD_N_TMD_N-Seg...3,7)-PALJ810113 | Conformation | α-helix (left-handed) | β-turn (α class) | Normalized freq...u et al., 1981) | 0.200000 | 0.090000 | 0.090000 | 0.144000 | 0.123000 | 0.000006 | 0.000244 | 6,7,8 |
| 87 | TMD_C_JMD_C-Pat...,14)-AURR980111 | Conformation | α-helix | α-helix | Normalized posi...ora-Rose, 1998) | 0.199000 | 0.102000 | -0.102000 | 0.139000 | 0.137000 | 0.000007 | 0.000261 | 27,30,33,37 |
| 88 | JMD_N_TMD_N-Pat...,14)-GEIM800103 | Conformation | Unclassified (Conformation) | α-helix (β-proteins) | Alpha-helix ind...-Roberts, 1980) | 0.199000 | 0.096000 | 0.096000 | 0.123000 | 0.149000 | 0.000007 | 0.000264 | 7,11,14,17 |
| 89 | TMD_C_JMD_C-Pat...3,6)-GEOR030109 | Conformation | Linker (>14 AA) | Linker (Non-Helical) | Linker propensi...-Heringa, 2003) | 0.199000 | 0.092000 | 0.092000 | 0.182000 | 0.104000 | 0.000007 | 0.000260 | 35,38 |
| 90 | TMD_C_JMD_C-Per...3,1)-LINS030118 | ASA/Volume | Accessible surface area (ASA) | Hydrophilic ASA...olded β-strand) | Hydrophilic med...s et al., 2003) | 0.199000 | 0.081000 | -0.081000 | 0.097000 | 0.117000 | 0.000007 | 0.000254 | 21,25,28,32,35,39 |
| 91 | TMD-Pattern(N,3,6)-MUNV940101 | Energy | Free energy (folding) | Free energy (α-helix) | Free energy in ...-Serrano, 1994) | 0.199000 | 0.079000 | 0.079000 | 0.132000 | 0.117000 | 0.000007 | 0.000253 | 13,16 |
| 92 | JMD_N_TMD_N-Pat...,14)-ZIMJ680104 | Energy | Isoelectric point | Isoelectric point | Isoelectric poi...n et al., 1968) | 0.199000 | 0.076000 | -0.076000 | 0.101000 | 0.125000 | 0.000007 | 0.000260 | 7,11,14,17 |
| 93 | JMD_N_TMD_N-Seg...2,6)-OOBM850105 | Structure-Activity | Flexibility | Side chain interaction | Optimized side ...e et al., 1985) | 0.198000 | 0.065000 | 0.065000 | 0.099000 | 0.083000 | 0.000008 | 0.000272 | 4,5,6 |
| 94 | JMD_N_TMD_N-Seg...,14)-CHAM830102 | Conformation | Unclassified (Conformation) | β-sheet | A parameter def...-Charton, 1983) | 0.197000 | 0.102000 | 0.102000 | 0.129000 | 0.159000 | 0.000009 | 0.000284 | 3,4 |
| 95 | TMD_C_JMD_C-Seg...7,8)-WERD780104 | Energy | Unclassified (Energy) | Free energy (α-helix) | Free energy cha...Scheraga, 1978) | 0.197000 | 0.088000 | 0.088000 | 0.125000 | 0.118000 | 0.000009 | 0.000277 | 36,37 |
| 96 | JMD_N_TMD_N-Pat...,10)-ROBB760102 | Conformation | α-helix (N-term) | α-helix (N-terminal) | Information mea...n-Suzuki, 1976) | 0.196000 | 0.132000 | -0.132000 | 0.152000 | 0.196000 | 0.000010 | 0.000292 | 3,7,10 |
| 97 | JMD_N_TMD_N-Seg...2,4)-WOLS870101 | Others | PC 5 | Principal Component 1 (Wold) | Principal prope...d et al., 1987) | 0.196000 | 0.083000 | 0.083000 | 0.152000 | 0.098000 | 0.000009 | 0.000283 | 6,7,8,9,10 |
| 98 | TMD-Pattern(N,1...,10)-QIAN880114 | Conformation | β-sheet (N-term) | β-sheet (N-terminal) | Weights for bet...ejnowski, 1988) | 0.195000 | 0.098000 | 0.098000 | 0.131000 | 0.147000 | 0.000011 | 0.000309 | 11,14,17,20 |
| 99 | JMD_N_TMD_N-Seg...,15)-CHOC760104 | ASA/Volume | Buried | Buried | Proportion of r...(Chothia, 1976) | 0.194000 | 0.118000 | 0.118000 | 0.170000 | 0.153000 | 0.000011 | 0.000314 | 1 |
| 100 | TMD-Pattern(N,4,7)-JOND750101 | Polarity | Hydrophobicity | Hydrophobicity | Hydrophobicity (Jones, 1975) | 0.193000 | 0.130000 | -0.130000 | 0.179000 | 0.176000 | 0.000014 | 0.000337 | 14,17 |
Further information on the CPP feature concept can be found in thr CPP Usage Principles section.