SequenceFeature: Creation of CPP feature components

A CPP feature is the combination of the three components:

  • Part: A continuous subset of a sequence, such as a protein domain.

  • Split: Continuous or discontinuous subset of a Part, either segment or pattern.

  • Scale: A physicochemical scale, i.e., a set of numerical values (typically [0-1]) assigned to amino acids.

While Scales can be obtained using the load_scales() function and selecting by the AAclust class, the SequenceFeature class is designed to create various forms of Parts and Splits, which can then all be provided to CPP. See the SequenceFeature API for more details.

Creation of Parts

To define Parts, the SequenceFeature class provides the SequenceFeature.get_df_parts() method. To demonstrate this method, we first obtain an example sequence dataset using the load_dataset() function

import aaanalysis as aa
aa.options["verbose"] = False

sf = aa.SequenceFeature()
df_seq = aa.load_dataset(name="SEQ_CAPSID", min_len=40, max_len=100)
aa.display_df(df_seq, n_rows=3, show_shape=True, char_limit=15)
DataFrame shape: (172, 3)
  entry sequence label
1 CAPSID_4 MERGDIP...EMDAGLI 0
2 CAPSID_26 MDTGDRL...PANAGMY 0
3 CAPSID_35 MTKLLLT...LDDGQAA 0

By default, three sequence parts (tmd, jmd_n_tmd_n, tmd_c_jmd_c) with a jmd_n and jmd_c length of each 10 residues are provided:

df_parts = sf.get_df_parts(df_seq=df_seq)
aa.display_df(df=df_parts, n_rows=5, show_shape=True, char_limit=15)
DataFrame shape: (172, 3)
  tmd jmd_n_tmd_n tmd_c_jmd_c
entry      
CAPSID_4 VGRHRRI...KRRQALE MERGDIP...KAEDVSK YQRIRDE...EMDAGLI
CAPSID_26 GEVAALF...PAAPTGP MDTGDRL...PGGHRRF RESEVRA...PANAGMY
CAPSID_35 AADLLGV...LLAFVHR MTKLLLT...LNSGDLE SVRIGRA...LDDGQAA
CAPSID_58 KYLEALF...NTLRKGQ MRWDGLS...TVYRWLQ TGVIPAY...VNDEDQP
CAPSID_141 YLTLSEA...SVLNEPI MYLTIKE...FDGQQHL INKEQFN...PDVKDED

Any combination of valid sequence parts can be obtained using the list_part parameter:

df_parts = sf.get_df_parts(df_seq=df_seq, list_parts=['jmd_n', 'tmd', 'jmd_c', 'tmd_jmd'])
aa.display_df(df=df_parts, n_rows=3, show_shape=True, char_limit=15)
DataFrame shape: (172, 4)
  jmd_n tmd jmd_c tmd_jmd
entry        
CAPSID_4 MERGDIPFKY VGRHRRI...KRRQALE ELAEMDAGLI MERGDIP...EMDAGLI
CAPSID_26 MDTGDRLLTP GEVAALF...PAAPTGP GPGPANAGMY MDTGDRL...PANAGMY
CAPSID_35 MTKLLLTPTE AADLLGV...LLAFVHR LRGLDDGQAA MTKLLLT...LDDGQAA

Set the length of both JMDs by the jmd_c_len and jmd_n_len parameters:

df_parts = sf.get_df_parts(df_seq=df_seq, list_parts=['jmd_n', 'tmd', 'jmd_c', 'tmd_jmd'], jmd_c_len=8, jmd_n_len=8)
aa.display_df(df=df_parts, n_rows=3, show_shape=True, char_limit=15)
DataFrame shape: (172, 4)
  jmd_n tmd jmd_c tmd_jmd
entry        
CAPSID_4 MERGDIPF KYVGRHR...RQALEEL AEMDAGLI MERGDIP...EMDAGLI
CAPSID_26 MDTGDRLL TPGEVAA...APTGPGP GPANAGMY MDTGDRL...PANAGMY
CAPSID_35 MTKLLLTP TEAADLL...AFVHRLR GLDDGQAA MTKLLLT...LDDGQAA

For more details, see the SequenceFeature.get_df_parts API.

Creation of Splits

Three different types of splits exist:

  • Segment: continuous sub-sequence.

  • Pattern: non-periodic discontinuous sub-sequence

  • PeriodicPattern: periodic discontinuous sub-sequence.

Due to the plethora of combinatorial options, SeuqenceFeature has a special method (SequenceFeature.get_split_kws()) to create a dictionary containing all relevant Split information.

You can get the default arguments for all split types as follows:

split_kws = sf.get_split_kws()
split_kws
{'Segment': {'n_split_min': 1, 'n_split_max': 15},
 'Pattern': {'steps': [3, 4], 'n_min': 2, 'n_max': 4, 'len_max': 15},
 'PeriodicPattern': {'steps': [3, 4]}}

You can also retrieve arguments for specific split types:

split_kws = sf.get_split_kws(split_types=["Segment", "Pattern"])
split_kws
{'Segment': {'n_split_min': 1, 'n_split_max': 15},
 'Pattern': {'steps': [3, 4], 'n_min': 2, 'n_max': 4, 'len_max': 15}}

The arguments for each split type can be adjusted. For Segments, their minimum and maximum length can be changed by the n_split_min (default=1) and n_split_max (default=15) parameters:

split_kws = sf.get_split_kws(split_types="Segment", n_split_min=5, n_split_max=10)
split_kws
{'Segment': {'n_split_min': 5, 'n_split_max': 10}}

For PeriodicPattern, the step size of each odd and even step can be specified as follows using the steps_periodicpattern (default=[3, 4]):

split_kws = sf.get_split_kws(split_types="PeriodicPattern", steps_periodicpattern=[5, 10])
split_kws
{'PeriodicPattern': {'steps': [5, 10]}}

And for Patterns, the step size, the minimum and maximum number of steps, and the maximum residue size of the pattern can be adjusted using the steps_pattern (default=[3, 4]), n_min (default=2), n_max (default=4), and len_max (default=10) parameters:

split_kws = sf.get_split_kws(split_types="Pattern", steps_pattern=[5, 10], n_min=3, n_max=5, len_max=30)
split_kws
{'Pattern': {'steps': [5, 10], 'n_min': 3, 'n_max': 5, 'len_max': 30}}

Combining Parts + Splits + Scales

Any combination of the three feature combinations can be provided to CPP, which will create all Part-Split-Scale combinations and filter them down to a user-defined number (default=100) of non-redundant scales through the CPP.run() method:

# Load default scales, parts, and splits
df_scales = aa.load_scales()
df_parts = sf.get_df_parts(df_seq=df_seq)
split_kws = sf.get_split_kws()

# Get labels for test and reference class
labels = df_seq["label"].to_list()

# Creat CPP object by providing three feature components
cpp = aa.CPP(df_parts=df_parts, split_kws=split_kws, df_scales=df_scales)
df_feat = cpp.run(labels=labels)

aa.display_df(df=df_feat, show_shape=True)
DataFrame shape: (100, 13)
  feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions
1 TMD_C_JMD_C-Seg...,15)-AURR980107 Conformation α-helix (N-term) α-helix (N-terminal, inside) Normalized posi...ora-Rose, 1998) 0.268000 0.136000 -0.136000 0.143000 0.152000 0.000000 0.000045 37
2 TMD-Segment(2,12)-PALJ810113 Conformation α-helix (left-handed) β-turn (α class) Normalized freq...u et al., 1981) 0.263000 0.144000 0.144000 0.152000 0.132000 0.000000 0.000040 12,13
3 TMD-Segment(1,6)-TANS770107 Conformation α-helix (left-handed) α-helix (left-handed) Normalized freq...Scheraga, 1977) 0.258000 0.101000 0.101000 0.113000 0.089000 0.000000 0.000042 11,12,13
4 TMD-PeriodicPat...3,2)-HUTJ700103 Energy Entropy Entropy Entropy of form...Hutchens, 1970) 0.257000 0.065000 -0.065000 0.065000 0.074000 0.000000 0.000030 11,15,18,22,25,29
5 JMD_N_TMD_N-Pat...,13)-KARS160119 Shape Graph (1. eigenvalue) Eigenvalue (maximum) Weighted maximu...-Knisley, 2016) 0.256000 0.115000 -0.115000 0.130000 0.120000 0.000000 0.000029 2,6,10,13
6 JMD_N_TMD_N-Seg...1,4)-ROBB760102 Conformation α-helix (N-term) α-helix (N-terminal) Information mea...n-Suzuki, 1976) 0.256000 0.086000 -0.086000 0.108000 0.086000 0.000000 0.000030 1,2,3,4,5
7 TMD-Segment(1,6)-BULH740102 ASA/Volume Partial specific volume Partial specific volume Apparent partia...l-Breese, 1974) 0.254000 0.097000 -0.097000 0.124000 0.098000 0.000000 0.000021 11,12,13
8 TMD_C_JMD_C-Seg...4,4)-RICJ880105 Conformation α-helix (N-term) α-helix (N-terminal) Relative prefer...chardson, 1988) 0.253000 0.090000 -0.090000 0.082000 0.111000 0.000000 0.000020 36,37,38,39,40
9 JMD_N_TMD_N-Seg...1,3)-FAUJ880112 Energy Charge (negative) Charge (negative) Negative charge...e et al., 1988) 0.253000 0.086000 -0.086000 0.081000 0.104000 0.000000 0.000020 1,2,3,4,5,6
10 TMD_C_JMD_C-Seg...1,2)-SNEP660104 Others PC 4 Principal Component 4 (Sneath) Principal compo... (Sneath, 1966) 0.251000 0.069000 0.069000 0.078000 0.064000 0.000000 0.000023 21,22,23,24,25,26,27,28,29,30
11 TMD-Segment(1,6)-SUEM840101 Structure-Activity Stability (helix-coil) Stability (helix-coil) Zimm-Bragg para...i et al., 1984) 0.249000 0.090000 -0.090000 0.114000 0.084000 0.000000 0.000025 11,12,13
12 TMD_C_JMD_C-Seg...5,5)-BURA740102 Conformation β-strand Extended Normalized freq...s et al., 1974) 0.249000 0.069000 0.069000 0.078000 0.068000 0.000000 0.000026 37,38,39,40
13 JMD_N_TMD_N-Seg...2,5)-RICJ880104 Conformation Unclassified (Conformation) α-helix (N-terminal, inside) Relative prefer...chardson, 1988) 0.247000 0.090000 -0.090000 0.090000 0.100000 0.000000 0.000026 5,6,7,8
14 TMD_C_JMD_C-Seg...,12)-WOEC730101 Polarity Hydrophilicity Polarity (hydrophilicity) Polar requireme...t (Woese, 1973) 0.245000 0.115000 -0.115000 0.164000 0.131000 0.000000 0.000027 37,38
15 TMD_C_JMD_C-Seg...4,4)-LAWE840101 Polarity Hydrophobicity Transfer free e...TFE) to outside Transfer free e...n et al., 1984) 0.244000 0.072000 0.072000 0.081000 0.078000 0.000000 0.000030 36,37,38,39,40
16 TMD_C_JMD_C-Pat...,14)-LINS030103 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA Hydrophilic acc...s et al., 2003) 0.243000 0.137000 -0.137000 0.158000 0.154000 0.000000 0.000033 27,30,33,36
17 TMD-Segment(1,6)-QIAN880131 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.242000 0.076000 0.076000 0.107000 0.091000 0.000000 0.000034 11,12,13
18 JMD_N_TMD_N-Pat...,13)-LEVM760104 Shape Shape and Surface Side chain angle (Phi) Side chain tors... (Levitt, 1976) 0.241000 0.094000 0.094000 0.123000 0.097000 0.000000 0.000034 2,6,10,13
19 TMD-Segment(1,6)-BULH740101 Composition MPs (anchor) TFE to surface Transfer free e...l-Breese, 1974) 0.240000 0.111000 0.111000 0.150000 0.122000 0.000000 0.000036 11,12,13
20 TMD-Segment(1,6)-CIDH920103 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.235000 0.089000 -0.089000 0.127000 0.096000 0.000000 0.000044 11,12,13
21 TMD_C_JMD_C-Seg...1,1)-ZIMJ680103 Polarity Hydrophilicity Polarity (hydrophilicity) Polarity (Zimme...n et al., 1968) 0.233000 0.073000 -0.073000 0.091000 0.078000 0.000000 0.000049 21,22,23,24,25,...,36,37,38,39,40
22 TMD_C_JMD_C-Pat...,14)-SNEP660102 Others PC 2 Principal Component 2 (Sneath) Principal compo... (Sneath, 1966) 0.230000 0.121000 0.121000 0.133000 0.149000 0.000000 0.000057 27,30,33,36
23 JMD_N_TMD_N-Seg...2,6)-ROBB760111 Conformation β-turn (C-term) β-turn (C-terminal) Information mea...n-Suzuki, 1976) 0.230000 0.093000 0.093000 0.106000 0.104000 0.000000 0.000057 4,5,6
24 TMD-PeriodicPat...4,3)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.228000 0.070000 -0.070000 0.091000 0.072000 0.000000 0.000061 12,16,20,24,28
25 JMD_N_TMD_N-Pat...,13)-SUEM840102 Structure-Activity Unclassified (S...cture-Activity) Stability (extended-coil) Zimm-Bragg para...i et al., 1984) 0.227000 0.122000 -0.122000 0.156000 0.151000 0.000000 0.000064 2,5,9,13
26 JMD_N_TMD_N-Seg...7,9)-RACS820103 Conformation Unclassified (Conformation) α-helix (left-handed) Average relativ...Scheraga, 1982) 0.225000 0.139000 -0.139000 0.184000 0.163000 0.000000 0.000068 14,15
27 TMD-Segment(1,6)-CORJ870108 Polarity Hydrophilicity TOTLS index TOTLS index (Co...e et al., 1987) 0.225000 0.090000 0.090000 0.132000 0.107000 0.000000 0.000068 11,12,13
28 TMD-Segment(1,6)-PARJ860101 Others PC 5 HPLC parameter HPLC parameter ...r et al., 1986) 0.225000 0.086000 0.086000 0.130000 0.100000 0.000000 0.000072 11,12,13
29 TMD_C_JMD_C-Seg...2,4)-VASM830101 Conformation Unclassified (Conformation) α-helix Relative popula...z et al., 1983) 0.225000 0.071000 -0.071000 0.091000 0.084000 0.000000 0.000073 26,27,28,29,30
30 JMD_N_TMD_N-Pat...6,9)-KARS160119 Shape Graph (1. eigenvalue) Eigenvalue (maximum) Weighted maximu...-Knisley, 2016) 0.224000 0.116000 -0.116000 0.158000 0.129000 0.000000 0.000074 2,6,9
31 TMD_C_JMD_C-Seg...,13)-AURR980106 Conformation α-helix (N-term) α-helix (N-terminal) Normalized posi...ora-Rose, 1998) 0.224000 0.081000 -0.081000 0.099000 0.135000 0.000000 0.000073 36
32 JMD_N_TMD_N-Pat...,12)-RACS820103 Conformation Unclassified (Conformation) α-helix (left-handed) Average relativ...Scheraga, 1982) 0.223000 0.145000 -0.145000 0.187000 0.148000 0.000000 0.000077 9,12,15,18
33 TMD-Segment(4,12)-RACS820103 Conformation Unclassified (Conformation) α-helix (left-handed) Average relativ...Scheraga, 1982) 0.223000 0.127000 -0.127000 0.171000 0.148000 0.000001 0.000081 16
34 TMD-Segment(3,7)-RACS820103 Conformation Unclassified (Conformation) α-helix (left-handed) Average relativ...Scheraga, 1982) 0.223000 0.100000 -0.100000 0.136000 0.113000 0.000000 0.000077 16,17,18
35 TMD-Segment(6,11)-FUKS010101 Composition AA composition (surface) Proteins of thermophiles (INT) Surface composi...ishikawa, 2001) 0.222000 0.084000 -0.084000 0.097000 0.104000 0.000001 0.000081 20
36 JMD_N_TMD_N-Seg...,15)-LINS030116 ASA/Volume Accessible surface area (ASA) ASA (folded β-strand) Total median ac...s et al., 2003) 0.221000 0.097000 -0.097000 0.119000 0.145000 0.000001 0.000089 1
37 TMD-Segment(1,6)-VINM940102 Structure-Activity Flexibility Flexibility (0 ...igid neighbors) Normalized flex...n et al., 1994) 0.221000 0.094000 0.094000 0.134000 0.104000 0.000001 0.000086 11,12,13
38 TMD-Segment(1,6)-NAKH900112 Composition Membrane proteins (MPs) Membrane proteins Transmembrane r...a et al., 1990) 0.221000 0.073000 -0.073000 0.108000 0.095000 0.000001 0.000086 11,12,13
39 TMD_C_JMD_C-Seg...,14)-DAYM780201 Others Mutability Mutability Relative mutabi... et al., 1978b) 0.220000 0.119000 -0.119000 0.161000 0.124000 0.000001 0.000091 39,40
40 JMD_N_TMD_N-Pat...,14)-OOBM770105 Energy Non-bonded energy Non-bonded energy per residue Short and mediu...take-Ooi, 1977) 0.220000 0.076000 0.076000 0.100000 0.094000 0.000001 0.000091 2,6,10,14
41 TMD-Segment(1,5)-QIAN880119 Conformation β-sheet β-sheet Weights for bet...ejnowski, 1988) 0.220000 0.065000 -0.065000 0.082000 0.082000 0.000001 0.000091 11,12,13,14
42 JMD_N_TMD_N-Pat...2,6)-FASG760101 ASA/Volume Volume Weight Molecular weigh... (Fasman, 1976) 0.219000 0.112000 -0.112000 0.148000 0.128000 0.000001 0.000094 2,6
43 TMD_C_JMD_C-Pat...,14)-JACR890101 Polarity Hydrophobicity (surrounding) Hydrophobicity (surrounding) Weights from th...bs-White, 1989) 0.219000 0.104000 0.104000 0.144000 0.141000 0.000001 0.000096 27,30,33,36
44 JMD_N_TMD_N-Pat...,13)-PALJ810109 Conformation α-helix α-helix Normalized freq...u et al., 1981) 0.218000 0.115000 -0.115000 0.132000 0.143000 0.000001 0.000104 3,6,9,13
45 TMD-Segment(1,6)-RACS820113 Shape Unclassified (Shape) Side chain angle (theta) Value of theta(...Scheraga, 1982) 0.218000 0.074000 0.074000 0.103000 0.076000 0.000001 0.000100 11,12,13
46 TMD-PeriodicPat...4,3)-FINA770101 Structure-Activity Stability (helix-coil) Stability (helix-coil) Helix-coil equi...-Ptitsyn, 1977) 0.218000 0.068000 -0.068000 0.085000 0.072000 0.000001 0.000103 12,16,20,24,28
47 JMD_N_TMD_N-Seg...1,2)-TANS770107 Conformation α-helix (left-handed) α-helix (left-handed) Normalized freq...Scheraga, 1977) 0.218000 0.066000 0.066000 0.092000 0.063000 0.000001 0.000102 1,2,3,4,5,6,7,8,9,10
48 TMD-Segment(1,6)-VENT840101 Others Unclassified (Others) Bitterness Bitterness (Venanzi, 1984) 0.217000 0.125000 -0.125000 0.189000 0.160000 0.000001 0.000108 11,12,13
49 JMD_N_TMD_N-Seg...,15)-FASG760105 Polarity Unclassified (Polarity) pK-C pK-C (Fasman, 1976) 0.217000 0.093000 0.093000 0.117000 0.119000 0.000001 0.000104 1
50 JMD_N_TMD_N-Pat...,13)-HUTJ700103 Energy Entropy Entropy Entropy of form...Hutchens, 1970) 0.217000 0.082000 -0.082000 0.106000 0.100000 0.000001 0.000105 2,6,10,13
51 JMD_N_TMD_N-Pat...2,6)-JUNJ780101 Composition AA composition AA composition Sequence freque... (Jungck, 1978) 0.216000 0.123000 0.123000 0.165000 0.150000 0.000001 0.000111 2,6
52 TMD_C_JMD_C-Seg...4,4)-CHAM830107 Energy Charge (negative) Charge (transfer) A parameter of ...-Charton, 1983) 0.216000 0.117000 -0.117000 0.140000 0.153000 0.000001 0.000111 36,37,38,39,40
53 TMD_C_JMD_C-Pat...,14)-GEIM800103 Conformation Unclassified (Conformation) α-helix (β-proteins) Alpha-helix ind...-Roberts, 1980) 0.215000 0.103000 -0.103000 0.140000 0.119000 0.000001 0.000123 27,31,35,38
54 JMD_N_TMD_N-Seg...2,4)-BULH740101 Composition MPs (anchor) TFE to surface Transfer free e...l-Breese, 1974) 0.214000 0.098000 0.098000 0.162000 0.114000 0.000001 0.000126 6,7,8,9,10
55 JMD_N_TMD_N-Seg...,14)-KHAG800101 Others Unclassified (Others) Kerr-constant The Kerr-consta...an-Moore, 1980) 0.214000 0.081000 0.081000 0.119000 0.094000 0.000001 0.000127 1
56 JMD_N_TMD_N-Pat...2,5)-KARS160119 Shape Graph (1. eigenvalue) Eigenvalue (maximum) Weighted maximu...-Knisley, 2016) 0.213000 0.129000 -0.129000 0.184000 0.133000 0.000001 0.000131 2,5
57 TMD_C_JMD_C-Pat...,11)-KARS160112 Shape Graph (2. eigenvalue) Eigenvalue (2. smallest) Second smallest...-Knisley, 2016) 0.213000 0.125000 0.125000 0.157000 0.162000 0.000001 0.000131 30,33,37
58 JMD_N_TMD_N-Seg...,11)-BULH740102 ASA/Volume Partial specific volume Partial specific volume Apparent partia...l-Breese, 1974) 0.213000 0.118000 -0.118000 0.180000 0.134000 0.000002 0.000132 8,9
59 TMD_C_JMD_C-Per...3,2)-KARS160112 Shape Graph (2. eigenvalue) Eigenvalue (2. smallest) Second smallest...-Knisley, 2016) 0.213000 0.071000 0.071000 0.090000 0.086000 0.000001 0.000132 22,25,28,31,34,37,40
60 JMD_N_TMD_N-Per...3,2)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.213000 0.067000 -0.067000 0.099000 0.091000 0.000001 0.000131 1,5,8,12,15,19
61 TMD-Segment(1,6)-FAUJ880101 Shape Steric parameter Shape Index Graph shape ind...e et al., 1988) 0.213000 0.064000 -0.064000 0.095000 0.079000 0.000002 0.000130 11,12,13
62 JMD_N_TMD_N-Seg...,15)-WOLS870102 Others PC 3 Principal Component 2 (Wold) Principal prope...d et al., 1987) 0.212000 0.099000 -0.099000 0.131000 0.091000 0.000002 0.000138 1
63 TMD_C_JMD_C-Pat...,14)-FAUJ880113 Polarity Unclassified (Polarity) pK-C pK-a(RCOOH) (Fa...e et al., 1988) 0.212000 0.077000 -0.077000 0.127000 0.089000 0.000002 0.000136 27,30,33,37
64 JMD_N_TMD_N-Pat...,13)-AURR980106 Conformation α-helix (N-term) α-helix (N-terminal) Normalized posi...ora-Rose, 1998) 0.212000 0.071000 -0.071000 0.092000 0.108000 0.000002 0.000135 3,6,9,13
65 JMD_N_TMD_N-Seg...1,9)-VASM830101 Conformation Unclassified (Conformation) α-helix Relative popula...z et al., 1983) 0.211000 0.114000 -0.114000 0.160000 0.136000 0.000002 0.000144 1,2
66 JMD_N_TMD_N-Pat...,14)-CHAM820101 Polarity Amphiphilicity Polarizability Polarizability ...-Charton, 1982) 0.211000 0.072000 -0.072000 0.110000 0.095000 0.000002 0.000144 2,6,10,14
67 TMD-Segment(4,12)-KARS160120 Shape Unclassified (Shape) Eigenvalue (minimum) Weighted minimu...-Knisley, 2016) 0.211000 0.065000 0.065000 0.099000 0.121000 0.000002 0.000143 16
68 JMD_N_TMD_N-Seg...,12)-NADH010107 Polarity Unclassified (Polarity) Hydrophobicity ...rmation values) Hydropathy scal...h et al., 2001) 0.210000 0.089000 -0.089000 0.101000 0.126000 0.000002 0.000147 1
69 JMD_N_TMD_N-Pat...,11)-QIAN880127 Conformation Coil (N-term) Coil (N-terminal) Weights for coi...ejnowski, 1988) 0.209000 0.083000 -0.083000 0.119000 0.114000 0.000002 0.000156 2,5,8,11
70 TMD-Segment(4,7)-FUKS010101 Composition AA composition (surface) Proteins of thermophiles (INT) Surface composi...ishikawa, 2001) 0.209000 0.064000 -0.064000 0.084000 0.090000 0.000002 0.000157 19,20,21
71 JMD_N_TMD_N-Pat...,13)-VASM830101 Conformation Unclassified (Conformation) α-helix Relative popula...z et al., 1983) 0.208000 0.115000 -0.115000 0.157000 0.142000 0.000003 0.000162 3,6,9,13
72 TMD_C_JMD_C-Pat...,14)-FASG760103 Others Unclassified (Others) Optical rotation Optical rotatio... (Fasman, 1976) 0.207000 0.097000 -0.097000 0.149000 0.092000 0.000003 0.000165 27,30,33,37
73 JMD_N_TMD_N-Pat...,13)-WERD780102 Energy Unclassified (Energy) Free energy change Free energy cha...Scheraga, 1978) 0.207000 0.076000 0.076000 0.117000 0.087000 0.000003 0.000170 6,9,13
74 JMD_N_TMD_N-Seg...2,4)-PLIV810101 ASA/Volume Partial specific volume Partition coefficient Partition Coeff...a et al., 1981) 0.205000 0.097000 -0.097000 0.165000 0.115000 0.000004 0.000185 6,7,8,9,10
75 TMD-Pattern(N,4,7)-LEVM760106 ASA/Volume Volume Volume van der Waals p... (Levitt, 1976) 0.204000 0.132000 -0.132000 0.187000 0.165000 0.000004 0.000197 14,17
76 JMD_N_TMD_N-Pat...,11)-ROBB760111 Conformation β-turn (C-term) β-turn (C-terminal) Information mea...n-Suzuki, 1976) 0.204000 0.088000 0.088000 0.129000 0.108000 0.000004 0.000203 1,4,8,11
77 JMD_N_TMD_N-Pat...,14)-WOLS870102 Others PC 3 Principal Component 2 (Wold) Principal prope...d et al., 1987) 0.204000 0.077000 -0.077000 0.115000 0.103000 0.000004 0.000195 2,6,10,14
78 TMD-Pattern(N,3,6)-FAUJ880113 Polarity Unclassified (Polarity) pK-C pK-a(RCOOH) (Fa...e et al., 1988) 0.204000 0.071000 -0.071000 0.134000 0.110000 0.000004 0.000194 13,16
79 JMD_N_TMD_N-Seg...2,4)-QIAN880114 Conformation β-sheet (N-term) β-sheet (N-terminal) Weights for bet...ejnowski, 1988) 0.203000 0.080000 0.080000 0.125000 0.088000 0.000004 0.000208 6,7,8,9,10
80 JMD_N_TMD_N-Seg...2,4)-MANP780101 Polarity Hydrophobicity (surrounding) Surrounding hydrophobicity Average surroun...nnuswamy, 1978) 0.203000 0.078000 -0.078000 0.142000 0.105000 0.000005 0.000208 6,7,8,9,10
81 TMD_C_JMD_C-Per...3,1)-SNEP660102 Others PC 2 Principal Component 2 (Sneath) Principal compo... (Sneath, 1966) 0.203000 0.071000 0.071000 0.081000 0.100000 0.000004 0.000207 21,25,28,32,35,39
82 TMD_C_JMD_C-Seg...,12)-LINS030121 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded β-strand) % Hydrophilic a...s et al., 2003) 0.202000 0.108000 -0.108000 0.198000 0.155000 0.000005 0.000218 37,38
83 JMD_N_TMD_N-Pat...,14)-YUTK870104 Energy Free energy (unfolding) Free energy (unfolding) Activation Gibb...i et al., 1987) 0.202000 0.087000 0.087000 0.097000 0.160000 0.000005 0.000218 7,11,14,17
84 TMD-Segment(1,6)-RACS770101 Shape Reduced distance Reduced distance (C-α) Average reduced...Scheraga, 1977) 0.202000 0.082000 0.082000 0.135000 0.109000 0.000005 0.000217 11,12,13
85 JMD_N_TMD_N-Seg...2,6)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.201000 0.075000 0.075000 0.106000 0.107000 0.000006 0.000236 4,5,6
86 JMD_N_TMD_N-Seg...3,7)-PALJ810113 Conformation α-helix (left-handed) β-turn (α class) Normalized freq...u et al., 1981) 0.200000 0.090000 0.090000 0.144000 0.123000 0.000006 0.000244 6,7,8
87 TMD_C_JMD_C-Pat...,14)-AURR980111 Conformation α-helix α-helix Normalized posi...ora-Rose, 1998) 0.199000 0.102000 -0.102000 0.139000 0.137000 0.000007 0.000261 27,30,33,37
88 JMD_N_TMD_N-Pat...,14)-GEIM800103 Conformation Unclassified (Conformation) α-helix (β-proteins) Alpha-helix ind...-Roberts, 1980) 0.199000 0.096000 0.096000 0.123000 0.149000 0.000007 0.000264 7,11,14,17
89 TMD_C_JMD_C-Pat...3,6)-GEOR030109 Conformation Linker (>14 AA) Linker (Non-Helical) Linker propensi...-Heringa, 2003) 0.199000 0.092000 0.092000 0.182000 0.104000 0.000007 0.000260 35,38
90 TMD_C_JMD_C-Per...3,1)-LINS030118 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded β-strand) Hydrophilic med...s et al., 2003) 0.199000 0.081000 -0.081000 0.097000 0.117000 0.000007 0.000254 21,25,28,32,35,39
91 TMD-Pattern(N,3,6)-MUNV940101 Energy Free energy (folding) Free energy (α-helix) Free energy in ...-Serrano, 1994) 0.199000 0.079000 0.079000 0.132000 0.117000 0.000007 0.000253 13,16
92 JMD_N_TMD_N-Pat...,14)-ZIMJ680104 Energy Isoelectric point Isoelectric point Isoelectric poi...n et al., 1968) 0.199000 0.076000 -0.076000 0.101000 0.125000 0.000007 0.000260 7,11,14,17
93 JMD_N_TMD_N-Seg...2,6)-OOBM850105 Structure-Activity Flexibility Side chain interaction Optimized side ...e et al., 1985) 0.198000 0.065000 0.065000 0.099000 0.083000 0.000008 0.000272 4,5,6
94 JMD_N_TMD_N-Seg...,14)-CHAM830102 Conformation Unclassified (Conformation) β-sheet A parameter def...-Charton, 1983) 0.197000 0.102000 0.102000 0.129000 0.159000 0.000009 0.000284 3,4
95 TMD_C_JMD_C-Seg...7,8)-WERD780104 Energy Unclassified (Energy) Free energy (α-helix) Free energy cha...Scheraga, 1978) 0.197000 0.088000 0.088000 0.125000 0.118000 0.000009 0.000277 36,37
96 JMD_N_TMD_N-Pat...,10)-ROBB760102 Conformation α-helix (N-term) α-helix (N-terminal) Information mea...n-Suzuki, 1976) 0.196000 0.132000 -0.132000 0.152000 0.196000 0.000010 0.000292 3,7,10
97 JMD_N_TMD_N-Seg...2,4)-WOLS870101 Others PC 5 Principal Component 1 (Wold) Principal prope...d et al., 1987) 0.196000 0.083000 0.083000 0.152000 0.098000 0.000009 0.000283 6,7,8,9,10
98 TMD-Pattern(N,1...,10)-QIAN880114 Conformation β-sheet (N-term) β-sheet (N-terminal) Weights for bet...ejnowski, 1988) 0.195000 0.098000 0.098000 0.131000 0.147000 0.000011 0.000309 11,14,17,20
99 JMD_N_TMD_N-Seg...,15)-CHOC760104 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.194000 0.118000 0.118000 0.170000 0.153000 0.000011 0.000314 1
100 TMD-Pattern(N,4,7)-JOND750101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity (Jones, 1975) 0.193000 0.130000 -0.130000 0.179000 0.176000 0.000014 0.000337 14,17

Further information on the CPP feature concept can be found in thr CPP Usage Principles section.