CPP.simplify

CPP.simplify(df_feat=None, labels=None, strategy='greedy', candidate_search='exact', max_interpret_grade=None, min_cor=0.7, ml_model='svm', ml_metric='balanced_accuracy', ml_th=0.0, ml_cv=5, allow_drop=True, on_unimprovable='keep', redundancy_tie_break='interpretability', label_test=1, label_ref=0, max_std_test=0.2, max_cor=0.5, max_overlap=0.5, check_cat=True, return_details=False)[source]

Simplify a feature set by swapping scales for more interpretable correlated ones.

For each feature (PART-SPLIT-SCALE), an alternative scale from a more interpretable AAontology subcategory (interpretability grade 1-10, 1 = best; per-subcategory grades are in load_scales(name='subcat')) that correlates with the original scale is substituted, keeping PART-SPLIT. The swapped feature’s statistics are recomputed; a swap is accepted only if it passes CPP’s per-feature filtering (max_std_test) and a cross-validation gate (performance not worse than the current set, within ml_th). The swapped set is then redundancy-reduced, yielding a more interpretable and ideally smaller df_feat. The candidate pool (the full rated AAontology scale set) is loaded internally.

Added in version 1.1.0.

Parameters:
  • df_feat (pd.DataFrame, shape (n_features, n_feature_info)) – Feature DataFrame from run() (the standardized CPP output schema).

  • labels (array-like, shape (n_samples,)) – Class labels for samples in sequence DataFrame (typically, test=1, reference=0).

  • strategy ({'greedy', 'consolidate', 'swap_all'}, default='greedy') –

    How candidate swaps are chosen and validated (see Notes for full behavior):

    • 'greedy': per-feature — each targeted feature is swapped to its best candidate that keeps the cross-validation (CV) score within ml_th.

    • 'consolidate': set-level — funnels features into the fewest interpretable subcategories, keeping each batch swap only if the set CV score holds.

    • 'swap_all': apply every eligible best-candidate swap with no CV gate (fastest; ml_model / ml_metric / ml_th / ml_cv ignored).

  • candidate_search ({'exact', 'fast'}, default='exact') – How many candidate scales are evaluated per feature. 'exact' (default) tests every eligible candidate and reproduces the original result exactly. 'fast' is an approximate speed-up that caps the search to the most promising candidates per feature (highest interpretability, then strongest correlation); it can change which features are kept and so is most useful on large scale pools. If none of the searched candidates is accepted, the feature keeps its original scale (it is never dropped for this reason), so 'fast' may leave a feature un-simplified that 'exact' would have swapped. The speed-up is concentrated in strategy='greedy' (one cross-validation per candidate tried); 'consolidate' gains less and 'swap_all' is unaffected (it already stops at the first viable candidate). With return_details=True the df_candidates report is correspondingly shorter under 'fast'.

  • max_interpret_grade (int, optional) – The maximum (worst) interpretability grade kept (1-10, where grade 1 is the best / most interpretable, so lower is better). Every feature whose scale subcategory is graded worse (higher) than this is targeted for replacement. If None (default), every improvable feature is attempted.

  • min_cor (float, default=0.7) – Minimum absolute Pearson correlation between a candidate scale and the original scale (between 0 and 1); anti-correlation is allowed via the absolute value.

  • ml_model (str or sklearn estimator, default='svm') – Model for the cross-validation gate ('greedy' / 'consolidate'). A string preset 'svm' (default; fast), 'rf' (recommended for non-linear feature relationships, but slower), or 'log_reg' (fastest); or any configured scikit-learn classifier instance (e.g. SVC(kernel='poly', C=0.1)), used as-is.

  • ml_metric (str, default='balanced_accuracy') – Scoring metric for the CV gate (any scikit-learn classification scorer name).

  • ml_th (float, default=0.0) – CV-gate tolerance: a swap is accepted if its CV score is at least baseline - ml_th (>=0).

  • ml_cv (int, default=5) – Number of cross-validation folds (>=2, <= smallest class count).

  • allow_drop (bool, default=True) – Whether simplify may drop features. If False, it only swaps scales and never removes a feature, so the output keeps every input feature 1:1 (the redundancy reduction is skipped and on_unimprovable is forced to 'keep').

  • on_unimprovable (str, default='keep') – What to do with a targeted feature that cannot be improved: 'keep' (retain the original), 'drop' (remove it), or 'drop_if_perf_allows' (remove only if the CV score does not drop). The last feature is never dropped.

  • redundancy_tie_break (str, default='interpretability') – When two swapped features are redundant, keep the 'interpretability'-best (then abs_auc) or the 'performance'-best (abs_auc).

  • label_test (int, default=1) – Class label of the test group in labels.

  • label_ref (int, default=0) – Class label of the reference group in labels.

  • max_std_test (float, default=0.2) – Per-feature pre-filter threshold a swapped feature must satisfy (between 0 and 1).

  • max_cor (float, default=0.5) – Redundancy correlation threshold for the post-swap reduction (between 0 and 1).

  • max_overlap (float, default=0.5) – Redundancy position-overlap threshold for the post-swap reduction (between 0 and 1).

  • check_cat (bool, default=True) – Whether the redundancy reduction only compares features within the same scale category.

  • return_details (bool, default=False) – If True, also return a long-form df_candidates reporting every candidate considered (scale, interpretability, correlation, recomputed std, accepted-flag).

Returns:

  • df_feat (pd.DataFrame) – The simplified feature DataFrame (CPP output schema), with swapped scales, recomputed statistics, and redundant features removed.

  • df_candidates (pd.DataFrame) – Returned only if return_details=True: one row per candidate considered.

Notes

  • The CV-gate model is seeded from the CPP instance’s random_state; set it once via aa.CPP(..., random_state=...) for a reproducible result.

  • Redundancy reduction protects original features — it never drops a feature the user already had, it only removes a swapped feature when the swap made it redundant with a kept feature (using signed correlation, matching run()).

  • The strategy controls how swaps are chosen and validated:

    • ‘greedy’: per-feature. Each targeted feature is swapped to its best correlated candidate that keeps the cross-validation score within ml_th of the current set; otherwise the next candidate is tried. Each swap is individually justified.

    • ‘consolidate’: set-level. Interpretable subcategories are taken best-first, and every targeted feature that can move into the current subcategory is swapped as one batch, which is kept only if the set CV score stays within ml_th. Funnels features into the fewest subcategories.

    • ‘swap_all’: apply every eligible best-candidate swap with no cross-validation (fastest); ml_model / ml_metric / ml_th / ml_cv are ignored. A pure interpretability transform to evaluate yourself afterwards.

  • Features whose scale is not a rated AAontology scale (e.g. run_num pseudo-scales or unclassified scales) carry no interpretability grade and are skipped. If no feature is rated, df_feat is returned unchanged with a RuntimeWarning.

  • An anti-correlated swap flips the sign of mean_dif (the feature still discriminates); the correlation sign is reported in df_candidates.

See also

  • run() for the feature DataFrame produced and its schema.

  • load_scales() for the interpretability-tiered explainable scale sets (top_explain_n).

Examples

CPP().simplify() rewrites a fitted df_feat into a more interpretable, and ideally smaller one. For each feature (PART-SPLIT-SCALE) it swaps the scale for a correlated scale from a better-graded AAontology subcategory (interpretability grade 1-10, where grade 1 is the best, so lower is better), recomputes the feature statistics, and accepts the swap only if it keeps passing CPP filtering and does not reduce a cross-validation score. The swapped set is then redundancy-reduced without dropping any original feature (only a swapped feature that became redundant is removed). We start from a precomputed DOM_GSEC feature set (see [Breimann25]), which already carries feat_importance:

import aaanalysis as aa
aa.options["verbose"] = False
df_feat = aa.load_features(name="DOM_GSEC")
df_seq = aa.load_dataset(name="DOM_GSEC")
labels = df_seq["label"].to_list()
sf = aa.SequenceFeature()
df_parts = sf.get_df_parts(df_seq=df_seq)
# Reproducibility: the CV-gate model is seeded from the CPP instance's random_state
cpp = aa.CPP(df_parts=df_parts, random_state=0)
aa.display_df(df_feat, n_rows=5, show_shape=True)
DataFrame shape: (150, 15)
  feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions feat_importance feat_importance_std
1 TMD_C_JMD_C-Seg...3,4)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.244000 0.103666 0.103666 0.106692 0.110506 0.000000 0.000000 31,32,33,34,35 0.970400 1.438918
2 TMD_C_JMD_C-Seg...3,4)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.243000 0.085064 0.085064 0.098774 0.096946 0.000000 0.000000 31,32,33,34,35 0.000000 0.000000
3 TMD_C_JMD_C-Seg...6,9)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.233000 0.137044 0.137044 0.161683 0.176964 0.000000 0.000001 32,33 1.554800 2.109848
4 TMD_C_JMD_C-Seg...3,4)-HUTJ700102 Energy Entropy Entropy Absolute entrop...Hutchens, 1970) 0.229000 0.098224 0.098224 0.106865 0.124608 0.000000 0.000001 31,32,33,34,35 3.111200 3.109955
5 TMD_C_JMD_C-Seg...6,9)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.223000 0.095071 0.095071 0.114758 0.132829 0.000000 0.000002 32,33 0.000000 0.000000

With only df_feat and labels, simplify runs the default greedy strategy with an SVM cross-validation gate and returns the simplified feature set:

df_simple = cpp.simplify(df_feat=df_feat, labels=labels)
aa.display_df(df_simple, show_shape=True)
DataFrame shape: (94, 15)
  feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions feat_importance feat_importance_std
1 TMD_C_JMD_C-Seg...3,4)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.244000 0.103666 0.103666 0.106692 0.110506 0.000000 0.000000 31,32,33,34,35 0.970400 1.438918
2 TMD_C_JMD_C-Seg...3,4)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.243000 0.085064 0.085064 0.098774 0.096946 0.000000 0.000000 31,32,33,34,35 0.000000 0.000000
3 TMD_C_JMD_C-Seg...6,9)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.233000 0.137044 0.137044 0.161683 0.176964 0.000000 0.000001 32,33 1.554800 2.109848
4 TMD_C_JMD_C-Seg...6,9)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.223000 0.095071 0.095071 0.114758 0.132829 0.000000 0.000002 32,33 0.000000 0.000000
5 TMD_C_JMD_C-Seg...2,3)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.222000 0.058671 0.058671 0.064895 0.069547 0.000000 0.000001 27,28,29,30,31,32,33 0.000000 0.000000
6 TMD_C_JMD_C-Seg...3,4)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.215000 0.124317 0.124317 0.166309 0.153364 0.000000 0.000004 31,32,33,34,35 1.080400 1.296094
7 TMD_C_JMD_C-Seg...,10)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.212000 0.141305 -0.141305 0.168603 0.217235 0.000000 0.000005 33,34 1.747200 2.150664
8 TMD_C_JMD_C-Seg...6,9)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.125350 0.125350 0.160819 0.174121 0.000000 0.000005 32,33 1.788800 2.700803
9 TMD_C_JMD_C-Seg...2,3)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.077355 0.077355 0.102965 0.107453 0.000000 0.000005 27,28,29,30,31,32,33 3.048800 3.623912
10 TMD_C_JMD_C-Seg...6,9)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.205000 0.125868 -0.125868 0.172165 0.188333 0.000000 0.000009 32,33 0.000000 0.000000
11 TMD_C_JMD_C-Seg...4,5)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.204000 0.105513 0.105513 0.132849 0.145219 0.000000 0.000009 33,34,35,36 1.992000 2.929460
12 JMD_N_TMD_N-Seg...1,2)-KARP850101 Structure-Activity Flexibility Flexibility (0 ...igid neighbors) Flexibility par...s-Schulz, 1985) 0.196000 0.062671 0.062671 0.083456 0.090427 0.000000 0.000023 1,2,3,4,5,6,7,8,9,10 1.574400 1.835403
13 TMD_C_JMD_C-Seg...4,5)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.193000 0.076770 0.076770 0.092804 0.114150 0.000000 0.000027 33,34,35,36 0.000000 0.000000
14 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.189000 0.125674 0.125674 0.183876 0.218813 0.000001 0.000039 28,29 4.729200 4.776785
15 TMD_C_JMD_C-Seg...6,9)-KOEH090103 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.323000 0.248255 0.248255 0.196374 0.181558 0.000000 0.000000 32,33 nan nan
16 TMD_C_JMD_C-Seg...4,5)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.185000 0.105474 -0.105474 0.157535 0.163039 0.000001 0.000059 33,34,35,36 0.000000 0.000000
17 TMD_C_JMD_C-Pat...,15)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.184000 0.062096 0.062096 0.078809 0.091271 0.000000 0.000017 26,30,33 0.147200 0.345306
18 JMD_N_TMD_N-Seg...2,4)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.183000 0.063902 -0.063902 0.090842 0.101427 0.000002 0.000068 6,7,8,9,10 0.823200 1.404583
19 TMD-Pattern(C,3...,15)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.198000 0.109532 0.109532 0.133076 0.159918 0.000122 0.000122 16,20,24,28 nan nan
20 TMD-Pattern(C,3...,15)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.182000 0.096246 0.096246 0.160859 0.159538 0.000002 0.000070 16,20,24,28 0.508400 0.738667
21 JMD_N_TMD_N-Seg...2,4)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.182000 0.066394 -0.066394 0.097857 0.103426 0.000002 0.000070 6,7,8,9,10 0.000000 0.000000
22 TMD_C_JMD_C-Seg...2,3)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.182000 0.063819 0.063819 0.101691 0.105987 0.000002 0.000071 27,28,29,30,31,32,33 0.000000 0.000000
23 TMD_C_JMD_C-Seg...,11)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.181000 0.057287 -0.057287 0.072234 0.106512 0.000002 0.000076 28,29 1.919600 2.094497
24 TMD_C_JMD_C-Pat...,12)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.220000 0.147418 0.147418 0.172594 0.195572 0.000020 0.000020 25,29,32 nan nan
25 TMD-PeriodicPat...3,4)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.180000 0.069277 -0.069277 0.094949 0.119524 0.000002 0.000082 13,16,20,23,27 1.818000 2.308293
26 JMD_N_TMD_N-Pat...,15)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.179000 0.115042 -0.115042 0.151938 0.189623 0.000002 0.000068 6,9,12,15 0.648400 1.061142
27 TMD-Pattern(C,4,7)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.176000 0.120892 0.120892 0.198986 0.216030 0.000004 0.000113 24,27 0.714800 1.118149
28 TMD_C_JMD_C-Pat...4,8)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.176000 0.087846 0.087846 0.140464 0.157561 0.000004 0.000113 24,28 2.704000 4.076269
29 TMD_C_JMD_C-Pat...,12)-BULH740102 ASA/Volume Partial specific volume Partial specific volume Apparent partia...l-Breese, 1974) 0.262000 0.153437 0.153437 0.130978 0.164028 0.000000 0.000000 21,24,28,32 nan nan
30 TMD-Pattern(C,4,7)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.176000 0.056675 -0.056675 0.099355 0.114698 0.000004 0.000113 24,27 0.372000 0.882270
31 TMD_C_JMD_C-Seg...2,3)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.175000 0.055597 -0.055597 0.089100 0.105827 0.000005 0.000126 27,28,29,30,31,32,33 0.664000 1.089536
32 TMD_C_JMD_C-Pat...,11)-QIAN880122 Conformation β-strand β-sheet Weights for bet...ejnowski, 1988) 0.173000 0.056328 0.056328 0.067428 0.094795 0.000006 0.000147 25,28,31 0.483200 0.913371
33 JMD_N_TMD_N-Per...3,2)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.172000 0.087470 -0.087470 0.135114 0.144731 0.000005 0.000137 2,5,8,11,14,17,20 0.444000 0.721620
34 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan
35 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan
36 TMD_C_JMD_C-Seg...2,3)-KOEH090106 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.224000 0.088025 0.088025 0.095746 0.124611 0.000014 0.000014 27,28,29,30,31,32,33 nan nan
37 TMD_C_JMD_C-Seg...4,5)-OOBM770101 Polarity Hydrophilicity Non-bonded energy per atom Average non-bon...take-Ooi, 1977) 0.277000 0.217063 0.217063 0.180330 0.208994 0.000000 0.000000 33,34,35,36 nan nan
38 TMD_C_JMD_C-Pat...,14)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.168000 0.086323 0.086323 0.121405 0.138577 0.000000 0.000030 30,34 0.140400 0.391229
39 TMD_C_JMD_C-Seg...4,5)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.166000 0.081797 0.081797 0.121170 0.149555 0.000013 0.000239 33,34,35,36 1.295200 2.225137
40 TMD-Pattern(C,4,7)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.121210 -0.121210 0.143560 0.207767 0.000015 0.000254 24,27 1.302000 1.466618
41 TMD_C_JMD_C-Pat...5,8)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.119568 -0.119568 0.143560 0.205817 0.000014 0.000253 25,28 0.000000 0.000000
42 TMD_C_JMD_C-Seg...5,7)-TANS770108 Conformation β/α-bridge β/α-bridge Normalized freq...Scheraga, 1977) 0.164000 0.079708 0.079708 0.135324 0.137910 0.000016 0.000271 32,33,34 0.462400 0.706967
43 TMD-PeriodicPat...3,1)-OOBM850101 Structure-Activity Stability Stability (extended-coil) Optimized beta-...e et al., 1985) 0.197000 0.046799 0.046799 0.052251 0.070467 0.000133 0.000133 12,15,18,21,24,27,30 nan nan
44 TMD-Pattern(C,4,7)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.300000 0.236619 0.236619 0.165353 0.219458 0.000000 0.000000 24,27 nan nan
45 TMD_C_JMD_C-Pat...,11)-EISD860101 Polarity Hydrophobicity Solvation free energy Solvation free ...cLachlan, 1986) 0.162000 0.083936 -0.083936 0.143338 0.147948 0.000021 0.000304 30,33,37 0.330400 0.377566
46 TMD_C_JMD_C-Pat...5,8)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.162000 0.070292 -0.070292 0.096915 0.128362 0.000020 0.000302 21,25,28 1.528400 2.418922
47 TMD-Pattern(C,4...,11)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.161000 0.068424 -0.068424 0.096915 0.126975 0.000024 0.000332 20,24,27 0.000000 0.000000
48 JMD_N_TMD_N-Seg...2,4)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.161000 0.058976 -0.058976 0.096823 0.114647 0.000025 0.000335 6,7,8,9,10 0.000000 0.000000
49 JMD_N_TMD_N-Pat...,11)-PRAM820103 Shape Shape and Surface Correlation coe...t in regression Correlation coe...nnuswamy, 1982) 0.161000 0.057828 0.057828 0.088362 0.106085 0.000024 0.000328 1,5,8,11 1.304400 1.657101
50 TMD_C_JMD_C-Seg...5,7)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.160000 0.059281 -0.059281 0.100693 0.120806 0.000027 0.000359 32,33,34 0.757200 1.471249
51 TMD_C_JMD_C-Pat...4,8)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.159000 0.103808 0.103808 0.140977 0.179008 0.000014 0.000248 33,37 0.233200 0.593921
52 JMD_N_TMD_N-Seg...,13)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.157000 0.127895 -0.127895 0.151304 0.258491 0.000035 0.000420 5,6 0.833200 1.360696
53 TMD_C_JMD_C-Pat...4,8)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.155000 0.110281 0.110281 0.178578 0.202098 0.000046 0.000486 33,37,40 0.272400 0.623809
54 JMD_N_TMD_N-Pat...,12)-AURR980113 Conformation α-helix α-helix Normalized posi...ora-Rose, 1998) 0.208000 0.105550 -0.105550 0.151448 0.143693 0.000055 0.000055 9,12,15 nan nan
55 JMD_N_TMD_N-Pat...,15)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.155000 0.059593 -0.059593 0.104862 0.110749 0.000050 0.000508 6,9,12,15 0.482000 0.672000
56 JMD_N_TMD_N-Pat...,11)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.154000 0.092099 -0.092099 0.142836 0.171547 0.000052 0.000520 4,7,11 1.065200 1.916900
57 TMD_C_JMD_C-Seg...,10)-CHAM820102 Polarity Hydrophobicity (interface) Free energy (interface) Free energy of ...-Charton, 1982) 0.154000 0.082300 -0.082300 0.136264 0.177551 0.000050 0.000508 33,34 0.366800 0.691767
58 TMD-Pattern(C,5...,12)-FAUJ880107 Structure-Activity Stability α-CH chemical s...kbone-dynamics) N.m.r. chemical...e et al., 1988) 0.123000 0.065435 -0.065435 0.173044 0.140726 0.017378 0.017378 19,22,26 nan nan
59 TMD_C_JMD_C-Pat...,11)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.153000 0.085041 -0.085041 0.135864 0.161279 0.000059 0.000561 30,33,37 0.473600 0.930690
60 TMD_C_JMD_C-Pat...,15)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.153000 0.069595 -0.069595 0.107314 0.134698 0.000060 0.000566 26,29,33 0.770800 1.299178
61 TMD_C_JMD_C-Pat...,15)-LINS030106 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded proteins) Hydrophilic med...s et al., 2003) 0.151000 0.071208 0.071208 0.136279 0.155749 0.000078 0.000657 26,30,33 0.326400 0.451202
62 TMD-Pattern(C,3...,14)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.150000 0.056439 -0.056439 0.094520 0.108682 0.000084 0.000685 17,20,24,28 0.684400 0.941892
63 TMD_C_JMD_C-Pat...,10)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.149000 0.073526 0.073526 0.133612 0.157088 0.000090 0.000714 31,34,38 2.050800 2.338278
64 JMD_N_TMD_N-Seg...2,6)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.148000 0.076361 -0.076361 0.140513 0.148387 0.000108 0.000790 4,5,6 0.537200 1.041739
65 JMD_N_TMD_N-Pat...,15)-BROC820101 Polarity Hydrophobicity Hydrophobicity ...on coefficient) Retention Coeff...e et al., 1982) 0.148000 0.067069 -0.067069 0.120409 0.137261 0.000103 0.000768 6,9,12,15 0.106400 0.249766
66 TMD_C_JMD_C-Pat...4,8)-CORJ870108 Polarity Hydrophilicity TOTLS index TOTLS index (Co...e et al., 1987) 0.251000 0.170889 -0.170889 0.167914 0.219014 0.000001 0.000001 24,28 nan nan
67 TMD_C_JMD_C-Seg...2,5)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.260000 0.141016 0.141016 0.107804 0.160336 0.000000 0.000000 25,26,27,28 nan nan
68 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.333000 0.246278 0.246278 0.183876 0.212529 0.000000 0.000000 28,29 nan nan
69 JMD_N_TMD_N-Pat...,11)-BIGC670101 ASA/Volume Volume Volume Residue volume (Bigelow, 1967) 0.143000 0.067181 -0.067181 0.141579 0.135502 0.000184 0.001045 5,8,11 0.382000 0.675082
70 JMD_N_TMD_N-Pat...,11)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.142000 0.070908 -0.070908 0.135389 0.144272 0.000190 0.001062 5,8,11 0.384400 0.570074
71 JMD_N_TMD_N-Pat...,14)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.142000 0.058743 -0.058743 0.117342 0.120311 0.000187 0.001056 6,10,14 0.197200 0.344958
72 TMD-Pattern(N,4,7)-QIAN880113 Conformation π-helix α-helix (C-terminal) Weights for alp...ejnowski, 1988) 0.141000 0.070553 -0.070553 0.164819 0.154840 0.000217 0.001151 14,17 0.634800 0.816456
73 JMD_N_TMD_N-Seg...3,6)-PALJ810111 Conformation β-sheet β-sheet Normalized freq...u et al., 1981) 0.108000 0.053468 -0.053468 0.129405 0.141614 0.035900 0.035900 7,8,9,10 nan nan
74 TMD_C_JMD_C-Pat...4,8)-KARS160103 Shape Side chain length Graph (weighted degree) Total weighted ...-Knisley, 2016) 0.153000 0.052948 0.052948 0.106553 0.121516 0.002988 0.002988 33,37,40 nan nan
75 JMD_N_TMD_N-Seg...2,7)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.139000 0.070195 -0.070195 0.113589 0.146944 0.000259 0.001276 3,4,5 0.498800 0.924962
76 JMD_N_TMD_N-Seg...2,6)-RACS770103 ASA/Volume Accessible surface area (ASA) Side chain orientation Side chain orie...Scheraga, 1977) 0.138000 0.069674 0.069674 0.151437 0.143090 0.000308 0.001398 4,5,6 0.214400 0.501327
77 TMD_C_JMD_C-Seg...4,8)-MIYS850101 Polarity Hydrophobicity Effective partition energy Effective parti...Jernigan, 1985) 0.215000 0.123198 0.123198 0.127012 0.166940 0.000030 0.000030 28,29,30 nan nan
78 JMD_N_TMD_N-Seg...7,8)-KARS160114 Shape Side chain length Eccentricity (average) Average weighte...-Knisley, 2016) 0.137000 0.056352 -0.056352 0.122287 0.122893 0.000322 0.001432 16,17 1.170800 1.925978
79 TMD_C_JMD_C-Seg...,14)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.136000 0.080537 -0.080537 0.194254 0.165343 0.000150 0.000932 26,27 0.638000 0.796859
80 TMD_C_JMD_C-Seg...,10)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.136000 0.072267 -0.072267 0.142246 0.173638 0.000352 0.001527 33,34 1.013200 1.315181
81 TMD_C_JMD_C-Pat...5,8)-LINS030107 ASA/Volume Accessible surface area (ASA) ASA (folded protein) % total accessi...s et al., 2003) 0.136000 0.064864 -0.064864 0.078387 0.131618 0.000367 0.001565 25,28 0.842000 0.904274
82 JMD_N_TMD_N-Pat...6,9)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.135000 0.062723 -0.062723 0.120282 0.141044 0.000396 0.001638 3,6,9 0.696800 1.062095
83 TMD-Pattern(N,1...4,7)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.135000 0.058024 -0.058024 0.115415 0.124556 0.000385 0.001610 11,14,17 0.244400 0.503183
84 JMD_N_TMD_N-Seg...,10)-CRAJ730102 Conformation β-sheet β-sheet Normalized freq...d et al., 1973) 0.134000 0.096792 -0.096792 0.182935 0.210285 0.000461 0.001775 5,6 0.485600 0.792949
85 JMD_N_TMD_N-Pat...,14)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.133000 0.071020 0.071020 0.161372 0.138873 0.000491 0.001836 6,10,14 0.000000 0.000000
86 JMD_N_TMD_N-Seg...7,9)-BURA740102 Conformation β-strand Extended Normalized freq...s et al., 1974) 0.132000 0.056043 0.056043 0.119813 0.123454 0.000562 0.001981 14,15 0.231600 0.356019
87 TMD-Segment(3,8)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.132000 0.055783 -0.055783 0.129933 0.133383 0.000558 0.001977 16,17 0.502400 0.761626
88 TMD-Pattern(N,1...,10)-QIAN880124 Conformation β-sheet (C-term) β-sheet (C-terminal) Weights for bet...ejnowski, 1988) 0.131000 0.069857 0.069857 0.157078 0.159138 0.000580 0.002008 11,14,17,20 0.502800 0.811308
89 JMD_N_TMD_N-Pat...,11)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.178000 0.109835 0.109835 0.133383 0.179887 0.000587 0.000587 10,13,17 nan nan
90 JMD_N_TMD_N-Pat...,14)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.130000 0.067433 -0.067433 0.133237 0.146065 0.000642 0.002130 7,11,14 0.306800 0.574245
91 TMD_C_JMD_C-Pat...3,7)-MAXF760105 Conformation α-helix (left-handed) α-helix (left-handed) Normalized freq...Scheraga, 1976) 0.129000 0.071374 0.071374 0.180851 0.152571 0.000727 0.002285 23,27 0.000000 0.000000
92 TMD_C_JMD_C-Pat...,11)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.128000 0.062708 -0.062708 0.113629 0.123346 0.000767 0.002362 25,28,31 0.407200 0.686822
93 JMD_N_TMD_N-Seg...2,3)-CHAM830105 Shape Side chain length n atoms in side chain (3+1) The number of a...-Charton, 1983) 0.128000 0.057140 -0.057140 0.128493 0.130946 0.000672 0.002187 7,8,9,10,11,12,13 0.121600 0.273037
94 JMD_N_TMD_N-Seg...3,9)-ISOY800102 Conformation β-strand Extended Normalized rela...i et al., 1980) 0.126000 0.079975 -0.079975 0.169167 0.182954 0.000926 0.002636 5,6 1.002000 1.075427

max_interpret_grade caps the worst interpretability grade allowed to remain (1 = best). With max_interpret_grade=2 every feature graded worse than 2 is targeted for replacement; if it is None (default) every improvable feature is attempted:

df_grade = cpp.simplify(df_feat=df_feat, labels=labels, max_interpret_grade=2)
aa.display_df(df_grade, show_shape=True)
DataFrame shape: (96, 15)
  feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions feat_importance feat_importance_std
1 TMD_C_JMD_C-Seg...3,4)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.244000 0.103666 0.103666 0.106692 0.110506 0.000000 0.000000 31,32,33,34,35 0.970400 1.438918
2 TMD_C_JMD_C-Seg...3,4)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.243000 0.085064 0.085064 0.098774 0.096946 0.000000 0.000000 31,32,33,34,35 0.000000 0.000000
3 TMD_C_JMD_C-Seg...6,9)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.233000 0.137044 0.137044 0.161683 0.176964 0.000000 0.000001 32,33 1.554800 2.109848
4 TMD_C_JMD_C-Seg...6,9)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.223000 0.095071 0.095071 0.114758 0.132829 0.000000 0.000002 32,33 0.000000 0.000000
5 TMD_C_JMD_C-Seg...2,3)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.222000 0.058671 0.058671 0.064895 0.069547 0.000000 0.000001 27,28,29,30,31,32,33 0.000000 0.000000
6 TMD_C_JMD_C-Seg...3,4)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.215000 0.124317 0.124317 0.166309 0.153364 0.000000 0.000004 31,32,33,34,35 1.080400 1.296094
7 TMD_C_JMD_C-Seg...,10)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.212000 0.141305 -0.141305 0.168603 0.217235 0.000000 0.000005 33,34 1.747200 2.150664
8 TMD_C_JMD_C-Seg...6,9)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.125350 0.125350 0.160819 0.174121 0.000000 0.000005 32,33 1.788800 2.700803
9 TMD_C_JMD_C-Seg...2,3)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.077355 0.077355 0.102965 0.107453 0.000000 0.000005 27,28,29,30,31,32,33 3.048800 3.623912
10 TMD_C_JMD_C-Seg...6,9)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.205000 0.125868 -0.125868 0.172165 0.188333 0.000000 0.000009 32,33 0.000000 0.000000
11 TMD_C_JMD_C-Seg...4,5)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.204000 0.105513 0.105513 0.132849 0.145219 0.000000 0.000009 33,34,35,36 1.992000 2.929460
12 JMD_N_TMD_N-Seg...1,2)-KARP850101 Structure-Activity Flexibility Flexibility (0 ...igid neighbors) Flexibility par...s-Schulz, 1985) 0.196000 0.062671 0.062671 0.083456 0.090427 0.000000 0.000023 1,2,3,4,5,6,7,8,9,10 1.574400 1.835403
13 TMD_C_JMD_C-Seg...4,5)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.193000 0.076770 0.076770 0.092804 0.114150 0.000000 0.000027 33,34,35,36 0.000000 0.000000
14 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.189000 0.125674 0.125674 0.183876 0.218813 0.000001 0.000039 28,29 4.729200 4.776785
15 TMD_C_JMD_C-Seg...6,9)-KOEH090103 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.323000 0.248255 0.248255 0.196374 0.181558 0.000000 0.000000 32,33 nan nan
16 TMD_C_JMD_C-Seg...4,5)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.185000 0.105474 -0.105474 0.157535 0.163039 0.000001 0.000059 33,34,35,36 0.000000 0.000000
17 TMD_C_JMD_C-Seg...6,9)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.185000 0.101798 0.101798 0.145676 0.155096 0.000001 0.000054 32,33 0.000000 0.000000
18 TMD_C_JMD_C-Pat...,15)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.184000 0.062096 0.062096 0.078809 0.091271 0.000000 0.000017 26,30,33 0.147200 0.345306
19 JMD_N_TMD_N-Seg...2,4)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.183000 0.063902 -0.063902 0.090842 0.101427 0.000002 0.000068 6,7,8,9,10 0.823200 1.404583
20 TMD-Pattern(C,3...,15)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.198000 0.109532 0.109532 0.133076 0.159918 0.000122 0.000122 16,20,24,28 nan nan
21 TMD-Pattern(C,3...,15)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.182000 0.096246 0.096246 0.160859 0.159538 0.000002 0.000070 16,20,24,28 0.508400 0.738667
22 JMD_N_TMD_N-Seg...2,4)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.182000 0.066394 -0.066394 0.097857 0.103426 0.000002 0.000070 6,7,8,9,10 0.000000 0.000000
23 TMD_C_JMD_C-Seg...2,3)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.182000 0.063819 0.063819 0.101691 0.105987 0.000002 0.000071 27,28,29,30,31,32,33 0.000000 0.000000
24 TMD_C_JMD_C-Seg...,11)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.181000 0.057287 -0.057287 0.072234 0.106512 0.000002 0.000076 28,29 1.919600 2.094497
25 TMD_C_JMD_C-Pat...,12)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.220000 0.147418 0.147418 0.172594 0.195572 0.000020 0.000020 25,29,32 nan nan
26 TMD-PeriodicPat...3,4)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.180000 0.069277 -0.069277 0.094949 0.119524 0.000002 0.000082 13,16,20,23,27 1.818000 2.308293
27 JMD_N_TMD_N-Pat...,15)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.179000 0.115042 -0.115042 0.151938 0.189623 0.000002 0.000068 6,9,12,15 0.648400 1.061142
28 TMD-Pattern(C,4,7)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.176000 0.120892 0.120892 0.198986 0.216030 0.000004 0.000113 24,27 0.714800 1.118149
29 TMD_C_JMD_C-Pat...4,8)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.176000 0.087846 0.087846 0.140464 0.157561 0.000004 0.000113 24,28 2.704000 4.076269
30 TMD_C_JMD_C-Pat...,12)-BULH740102 ASA/Volume Partial specific volume Partial specific volume Apparent partia...l-Breese, 1974) 0.262000 0.153437 0.153437 0.130978 0.164028 0.000000 0.000000 21,24,28,32 nan nan
31 TMD-Pattern(C,4,7)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.176000 0.056675 -0.056675 0.099355 0.114698 0.000004 0.000113 24,27 0.372000 0.882270
32 TMD_C_JMD_C-Seg...2,3)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.175000 0.055597 -0.055597 0.089100 0.105827 0.000005 0.000126 27,28,29,30,31,32,33 0.664000 1.089536
33 TMD_C_JMD_C-Pat...,11)-QIAN880122 Conformation β-strand β-sheet Weights for bet...ejnowski, 1988) 0.173000 0.056328 0.056328 0.067428 0.094795 0.000006 0.000147 25,28,31 0.483200 0.913371
34 JMD_N_TMD_N-Per...3,2)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.172000 0.087470 -0.087470 0.135114 0.144731 0.000005 0.000137 2,5,8,11,14,17,20 0.444000 0.721620
35 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan
36 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan
37 TMD_C_JMD_C-Seg...2,3)-KOEH090106 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.224000 0.088025 0.088025 0.095746 0.124611 0.000014 0.000014 27,28,29,30,31,32,33 nan nan
38 TMD_C_JMD_C-Seg...4,5)-OOBM770101 Polarity Hydrophilicity Non-bonded energy per atom Average non-bon...take-Ooi, 1977) 0.277000 0.217063 0.217063 0.180330 0.208994 0.000000 0.000000 33,34,35,36 nan nan
39 TMD_C_JMD_C-Pat...,14)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.168000 0.086323 0.086323 0.121405 0.138577 0.000000 0.000030 30,34 0.140400 0.391229
40 TMD_C_JMD_C-Seg...4,5)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.167000 0.080568 0.080568 0.128898 0.128726 0.000011 0.000218 33,34,35,36 1.299200 2.159535
41 TMD_C_JMD_C-Seg...4,5)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.166000 0.081797 0.081797 0.121170 0.149555 0.000013 0.000239 33,34,35,36 1.295200 2.225137
42 TMD-Pattern(C,4,7)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.121210 -0.121210 0.143560 0.207767 0.000015 0.000254 24,27 1.302000 1.466618
43 TMD_C_JMD_C-Pat...5,8)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.119568 -0.119568 0.143560 0.205817 0.000014 0.000253 25,28 0.000000 0.000000
44 TMD_C_JMD_C-Seg...5,7)-TANS770108 Conformation β/α-bridge β/α-bridge Normalized freq...Scheraga, 1977) 0.164000 0.079708 0.079708 0.135324 0.137910 0.000016 0.000271 32,33,34 0.462400 0.706967
45 TMD-PeriodicPat...3,1)-OOBM850101 Structure-Activity Stability Stability (extended-coil) Optimized beta-...e et al., 1985) 0.197000 0.046799 0.046799 0.052251 0.070467 0.000133 0.000133 12,15,18,21,24,27,30 nan nan
46 TMD-Pattern(C,4,7)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.300000 0.236619 0.236619 0.165353 0.219458 0.000000 0.000000 24,27 nan nan
47 TMD_C_JMD_C-Pat...,11)-EISD860101 Polarity Hydrophobicity Solvation free energy Solvation free ...cLachlan, 1986) 0.162000 0.083936 -0.083936 0.143338 0.147948 0.000021 0.000304 30,33,37 0.330400 0.377566
48 TMD_C_JMD_C-Pat...5,8)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.162000 0.070292 -0.070292 0.096915 0.128362 0.000020 0.000302 21,25,28 1.528400 2.418922
49 TMD-Pattern(C,4...,11)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.161000 0.068424 -0.068424 0.096915 0.126975 0.000024 0.000332 20,24,27 0.000000 0.000000
50 JMD_N_TMD_N-Seg...2,4)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.161000 0.058976 -0.058976 0.096823 0.114647 0.000025 0.000335 6,7,8,9,10 0.000000 0.000000
51 JMD_N_TMD_N-Pat...,11)-PRAM820103 Shape Shape and Surface Correlation coe...t in regression Correlation coe...nnuswamy, 1982) 0.161000 0.057828 0.057828 0.088362 0.106085 0.000024 0.000328 1,5,8,11 1.304400 1.657101
52 TMD_C_JMD_C-Seg...5,7)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.160000 0.059281 -0.059281 0.100693 0.120806 0.000027 0.000359 32,33,34 0.757200 1.471249
53 TMD_C_JMD_C-Pat...4,8)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.159000 0.103808 0.103808 0.140977 0.179008 0.000014 0.000248 33,37 0.233200 0.593921
54 JMD_N_TMD_N-Seg...,13)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.157000 0.127895 -0.127895 0.151304 0.258491 0.000035 0.000420 5,6 0.833200 1.360696
55 TMD_C_JMD_C-Pat...4,8)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.155000 0.110281 0.110281 0.178578 0.202098 0.000046 0.000486 33,37,40 0.272400 0.623809
56 JMD_N_TMD_N-Pat...,12)-AURR980113 Conformation α-helix α-helix Normalized posi...ora-Rose, 1998) 0.208000 0.105550 -0.105550 0.151448 0.143693 0.000055 0.000055 9,12,15 nan nan
57 JMD_N_TMD_N-Pat...,15)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.155000 0.059593 -0.059593 0.104862 0.110749 0.000050 0.000508 6,9,12,15 0.482000 0.672000
58 JMD_N_TMD_N-Pat...,11)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.154000 0.092099 -0.092099 0.142836 0.171547 0.000052 0.000520 4,7,11 1.065200 1.916900
59 TMD_C_JMD_C-Seg...,10)-CHAM820102 Polarity Hydrophobicity (interface) Free energy (interface) Free energy of ...-Charton, 1982) 0.154000 0.082300 -0.082300 0.136264 0.177551 0.000050 0.000508 33,34 0.366800 0.691767
60 TMD-Pattern(C,5...,12)-FAUJ880107 Structure-Activity Stability α-CH chemical s...kbone-dynamics) N.m.r. chemical...e et al., 1988) 0.123000 0.065435 -0.065435 0.173044 0.140726 0.017378 0.017378 19,22,26 nan nan
61 TMD_C_JMD_C-Pat...,11)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.153000 0.085041 -0.085041 0.135864 0.161279 0.000059 0.000561 30,33,37 0.473600 0.930690
62 TMD_C_JMD_C-Pat...,15)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.153000 0.069595 -0.069595 0.107314 0.134698 0.000060 0.000566 26,29,33 0.770800 1.299178
63 TMD_C_JMD_C-Pat...,15)-LINS030106 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded proteins) Hydrophilic med...s et al., 2003) 0.151000 0.071208 0.071208 0.136279 0.155749 0.000078 0.000657 26,30,33 0.326400 0.451202
64 TMD-Pattern(C,3...,14)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.150000 0.056439 -0.056439 0.094520 0.108682 0.000084 0.000685 17,20,24,28 0.684400 0.941892
65 TMD_C_JMD_C-Pat...,10)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.149000 0.073526 0.073526 0.133612 0.157088 0.000090 0.000714 31,34,38 2.050800 2.338278
66 JMD_N_TMD_N-Seg...2,6)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.148000 0.076361 -0.076361 0.140513 0.148387 0.000108 0.000790 4,5,6 0.537200 1.041739
67 JMD_N_TMD_N-Pat...,15)-BROC820101 Polarity Hydrophobicity Hydrophobicity ...on coefficient) Retention Coeff...e et al., 1982) 0.148000 0.067069 -0.067069 0.120409 0.137261 0.000103 0.000768 6,9,12,15 0.106400 0.249766
68 TMD_C_JMD_C-Pat...4,8)-CORJ870108 Polarity Hydrophilicity TOTLS index TOTLS index (Co...e et al., 1987) 0.251000 0.170889 -0.170889 0.167914 0.219014 0.000001 0.000001 24,28 nan nan
69 TMD_C_JMD_C-Seg...2,5)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.260000 0.141016 0.141016 0.107804 0.160336 0.000000 0.000000 25,26,27,28 nan nan
70 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.333000 0.246278 0.246278 0.183876 0.212529 0.000000 0.000000 28,29 nan nan
71 JMD_N_TMD_N-Pat...,11)-BIGC670101 ASA/Volume Volume Volume Residue volume (Bigelow, 1967) 0.143000 0.067181 -0.067181 0.141579 0.135502 0.000184 0.001045 5,8,11 0.382000 0.675082
72 JMD_N_TMD_N-Pat...,11)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.142000 0.070908 -0.070908 0.135389 0.144272 0.000190 0.001062 5,8,11 0.384400 0.570074
73 JMD_N_TMD_N-Pat...,14)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.142000 0.058743 -0.058743 0.117342 0.120311 0.000187 0.001056 6,10,14 0.197200 0.344958
74 TMD-Pattern(N,4,7)-QIAN880113 Conformation π-helix α-helix (C-terminal) Weights for alp...ejnowski, 1988) 0.141000 0.070553 -0.070553 0.164819 0.154840 0.000217 0.001151 14,17 0.634800 0.816456
75 JMD_N_TMD_N-Seg...3,6)-PALJ810111 Conformation β-sheet β-sheet Normalized freq...u et al., 1981) 0.108000 0.053468 -0.053468 0.129405 0.141614 0.035900 0.035900 7,8,9,10 nan nan
76 TMD_C_JMD_C-Pat...4,8)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.140000 0.066859 0.066859 0.130397 0.147129 0.000229 0.001185 33,37,40 0.334800 0.632640
77 JMD_N_TMD_N-Seg...2,7)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.139000 0.070195 -0.070195 0.113589 0.146944 0.000259 0.001276 3,4,5 0.498800 0.924962
78 JMD_N_TMD_N-Seg...2,6)-RACS770103 ASA/Volume Accessible surface area (ASA) Side chain orientation Side chain orie...Scheraga, 1977) 0.138000 0.069674 0.069674 0.151437 0.143090 0.000308 0.001398 4,5,6 0.214400 0.501327
79 TMD_C_JMD_C-Seg...4,8)-MIYS850101 Polarity Hydrophobicity Effective partition energy Effective parti...Jernigan, 1985) 0.215000 0.123198 0.123198 0.127012 0.166940 0.000030 0.000030 28,29,30 nan nan
80 JMD_N_TMD_N-Seg...7,8)-KARS160114 Shape Side chain length Eccentricity (average) Average weighte...-Knisley, 2016) 0.137000 0.056352 -0.056352 0.122287 0.122893 0.000322 0.001432 16,17 1.170800 1.925978
81 TMD_C_JMD_C-Seg...,14)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.136000 0.080537 -0.080537 0.194254 0.165343 0.000150 0.000932 26,27 0.638000 0.796859
82 TMD_C_JMD_C-Seg...,10)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.136000 0.072267 -0.072267 0.142246 0.173638 0.000352 0.001527 33,34 1.013200 1.315181
83 TMD_C_JMD_C-Pat...5,8)-LINS030107 ASA/Volume Accessible surface area (ASA) ASA (folded protein) % total accessi...s et al., 2003) 0.136000 0.064864 -0.064864 0.078387 0.131618 0.000367 0.001565 25,28 0.842000 0.904274
84 JMD_N_TMD_N-Pat...6,9)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.135000 0.062723 -0.062723 0.120282 0.141044 0.000396 0.001638 3,6,9 0.696800 1.062095
85 TMD-Pattern(N,1...4,7)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.135000 0.058024 -0.058024 0.115415 0.124556 0.000385 0.001610 11,14,17 0.244400 0.503183
86 JMD_N_TMD_N-Seg...,10)-CRAJ730102 Conformation β-sheet β-sheet Normalized freq...d et al., 1973) 0.134000 0.096792 -0.096792 0.182935 0.210285 0.000461 0.001775 5,6 0.485600 0.792949
87 JMD_N_TMD_N-Pat...,14)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.133000 0.071020 0.071020 0.161372 0.138873 0.000491 0.001836 6,10,14 0.000000 0.000000
88 JMD_N_TMD_N-Seg...7,9)-BURA740102 Conformation β-strand Extended Normalized freq...s et al., 1974) 0.132000 0.056043 0.056043 0.119813 0.123454 0.000562 0.001981 14,15 0.231600 0.356019
89 TMD-Segment(3,8)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.132000 0.055783 -0.055783 0.129933 0.133383 0.000558 0.001977 16,17 0.502400 0.761626
90 TMD-Pattern(N,1...,10)-QIAN880124 Conformation β-sheet (C-term) β-sheet (C-terminal) Weights for bet...ejnowski, 1988) 0.131000 0.069857 0.069857 0.157078 0.159138 0.000580 0.002008 11,14,17,20 0.502800 0.811308
91 JMD_N_TMD_N-Pat...,11)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.178000 0.109835 0.109835 0.133383 0.179887 0.000587 0.000587 10,13,17 nan nan
92 JMD_N_TMD_N-Pat...,14)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.130000 0.067433 -0.067433 0.133237 0.146065 0.000642 0.002130 7,11,14 0.306800 0.574245
93 TMD_C_JMD_C-Pat...3,7)-MAXF760105 Conformation α-helix (left-handed) α-helix (left-handed) Normalized freq...Scheraga, 1976) 0.129000 0.071374 0.071374 0.180851 0.152571 0.000727 0.002285 23,27 0.000000 0.000000
94 TMD_C_JMD_C-Pat...,11)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.128000 0.062708 -0.062708 0.113629 0.123346 0.000767 0.002362 25,28,31 0.407200 0.686822
95 JMD_N_TMD_N-Seg...2,3)-CHAM830105 Shape Side chain length n atoms in side chain (3+1) The number of a...-Charton, 1983) 0.128000 0.057140 -0.057140 0.128493 0.130946 0.000672 0.002187 7,8,9,10,11,12,13 0.121600 0.273037
96 JMD_N_TMD_N-Seg...3,9)-ISOY800102 Conformation β-strand Extended Normalized rela...i et al., 1980) 0.126000 0.079975 -0.079975 0.169167 0.182954 0.000926 0.002636 5,6 1.002000 1.075427

The strategy controls how swaps are chosen and validated. greedy swaps feature by feature behind the CV gate; consolidate batches features by subcategory toward the fewest subcategories; swap_all applies every eligible swap with no cross-validation (fastest):

import pandas as pd
rows = []
for strategy in ["greedy", "consolidate", "swap_all"]:
    out = cpp.simplify(df_feat=df_feat, labels=labels, strategy=strategy)
    rows.append([strategy, len(out), out["subcategory"].nunique()])
df_strategies = pd.DataFrame(rows, columns=["strategy", "n_features", "n_subcategories"])
aa.display_df(df_strategies)
  strategy n_features n_subcategories
1 greedy 94 25
2 consolidate 92 25
3 swap_all 94 25

The candidate_search mode trades exactness for speed. exact (default) tests every eligible candidate per feature and reproduces the result exactly; fast is an opt-in heuristic that caps the search to the most promising candidates (best interpretability, then strongest correlation), which speeds up large scale pools — it mainly accelerates greedy (swap_all already stops at the first viable candidate). The kept-feature set stays close to exact:

df_exact = cpp.simplify(df_feat=df_feat, labels=labels, candidate_search="exact")
df_fast = cpp.simplify(df_feat=df_feat, labels=labels, candidate_search="fast")
kept_exact, kept_fast = set(df_exact["feature"]), set(df_fast["feature"])
jaccard = len(kept_exact & kept_fast) / len(kept_exact | kept_fast)
df_cs = pd.DataFrame([["exact", len(df_exact)], ["fast", len(df_fast)]],
                     columns=["candidate_search", "n_features"])
print(f"kept-feature Jaccard (fast vs exact): {jaccard:.2f}")
aa.display_df(df_cs)
kept-feature Jaccard (fast vs exact): 1.00
  candidate_search n_features
1 exact 94
2 fast 94

The cross-validation gate (greedy / consolidate) decides whether a swap is kept. ml_model selects the classifier — a preset 'svm' (default; fast), 'rf' (recommended for non-linear relationships, slower), or 'log_reg' (fastest), or a custom scikit-learn estimator instance. ml_metric is the scoring metric, ml_cv the number of folds, and ml_th the tolerated CV-score drop (a swap is kept if its score is at least baseline - ml_th):

df_rf = cpp.simplify(df_feat=df_feat, labels=labels, max_interpret_grade=2,
                     ml_model="rf", ml_metric="accuracy", ml_cv=3, ml_th=0.05)
aa.display_df(df_rf, show_shape=True)
DataFrame shape: (96, 15)
  feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions feat_importance feat_importance_std
1 TMD_C_JMD_C-Seg...3,4)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.244000 0.103666 0.103666 0.106692 0.110506 0.000000 0.000000 31,32,33,34,35 0.970400 1.438918
2 TMD_C_JMD_C-Seg...3,4)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.243000 0.085064 0.085064 0.098774 0.096946 0.000000 0.000000 31,32,33,34,35 0.000000 0.000000
3 TMD_C_JMD_C-Seg...6,9)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.233000 0.137044 0.137044 0.161683 0.176964 0.000000 0.000001 32,33 1.554800 2.109848
4 TMD_C_JMD_C-Seg...6,9)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.223000 0.095071 0.095071 0.114758 0.132829 0.000000 0.000002 32,33 0.000000 0.000000
5 TMD_C_JMD_C-Seg...2,3)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.222000 0.058671 0.058671 0.064895 0.069547 0.000000 0.000001 27,28,29,30,31,32,33 0.000000 0.000000
6 TMD_C_JMD_C-Seg...3,4)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.215000 0.124317 0.124317 0.166309 0.153364 0.000000 0.000004 31,32,33,34,35 1.080400 1.296094
7 TMD_C_JMD_C-Seg...,10)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.212000 0.141305 -0.141305 0.168603 0.217235 0.000000 0.000005 33,34 1.747200 2.150664
8 TMD_C_JMD_C-Seg...6,9)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.125350 0.125350 0.160819 0.174121 0.000000 0.000005 32,33 1.788800 2.700803
9 TMD_C_JMD_C-Seg...2,3)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.077355 0.077355 0.102965 0.107453 0.000000 0.000005 27,28,29,30,31,32,33 3.048800 3.623912
10 TMD_C_JMD_C-Seg...6,9)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.205000 0.125868 -0.125868 0.172165 0.188333 0.000000 0.000009 32,33 0.000000 0.000000
11 TMD_C_JMD_C-Seg...4,5)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.204000 0.105513 0.105513 0.132849 0.145219 0.000000 0.000009 33,34,35,36 1.992000 2.929460
12 JMD_N_TMD_N-Seg...1,2)-KARP850101 Structure-Activity Flexibility Flexibility (0 ...igid neighbors) Flexibility par...s-Schulz, 1985) 0.196000 0.062671 0.062671 0.083456 0.090427 0.000000 0.000023 1,2,3,4,5,6,7,8,9,10 1.574400 1.835403
13 TMD_C_JMD_C-Seg...4,5)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.193000 0.076770 0.076770 0.092804 0.114150 0.000000 0.000027 33,34,35,36 0.000000 0.000000
14 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.189000 0.125674 0.125674 0.183876 0.218813 0.000001 0.000039 28,29 4.729200 4.776785
15 TMD_C_JMD_C-Seg...6,9)-KOEH090103 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.323000 0.248255 0.248255 0.196374 0.181558 0.000000 0.000000 32,33 nan nan
16 TMD_C_JMD_C-Seg...4,5)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.185000 0.105474 -0.105474 0.157535 0.163039 0.000001 0.000059 33,34,35,36 0.000000 0.000000
17 TMD_C_JMD_C-Seg...6,9)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.185000 0.101798 0.101798 0.145676 0.155096 0.000001 0.000054 32,33 0.000000 0.000000
18 TMD_C_JMD_C-Pat...,15)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.184000 0.062096 0.062096 0.078809 0.091271 0.000000 0.000017 26,30,33 0.147200 0.345306
19 JMD_N_TMD_N-Seg...2,4)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.183000 0.063902 -0.063902 0.090842 0.101427 0.000002 0.000068 6,7,8,9,10 0.823200 1.404583
20 TMD-Pattern(C,3...,15)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.198000 0.109532 0.109532 0.133076 0.159918 0.000122 0.000122 16,20,24,28 nan nan
21 TMD-Pattern(C,3...,15)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.182000 0.096246 0.096246 0.160859 0.159538 0.000002 0.000070 16,20,24,28 0.508400 0.738667
22 JMD_N_TMD_N-Seg...2,4)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.182000 0.066394 -0.066394 0.097857 0.103426 0.000002 0.000070 6,7,8,9,10 0.000000 0.000000
23 TMD_C_JMD_C-Seg...2,3)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.182000 0.063819 0.063819 0.101691 0.105987 0.000002 0.000071 27,28,29,30,31,32,33 0.000000 0.000000
24 TMD_C_JMD_C-Seg...,11)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.181000 0.057287 -0.057287 0.072234 0.106512 0.000002 0.000076 28,29 1.919600 2.094497
25 TMD_C_JMD_C-Pat...,12)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.220000 0.147418 0.147418 0.172594 0.195572 0.000020 0.000020 25,29,32 nan nan
26 TMD-PeriodicPat...3,4)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.180000 0.069277 -0.069277 0.094949 0.119524 0.000002 0.000082 13,16,20,23,27 1.818000 2.308293
27 JMD_N_TMD_N-Pat...,15)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.179000 0.115042 -0.115042 0.151938 0.189623 0.000002 0.000068 6,9,12,15 0.648400 1.061142
28 TMD-Pattern(C,4,7)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.176000 0.120892 0.120892 0.198986 0.216030 0.000004 0.000113 24,27 0.714800 1.118149
29 TMD_C_JMD_C-Pat...4,8)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.176000 0.087846 0.087846 0.140464 0.157561 0.000004 0.000113 24,28 2.704000 4.076269
30 TMD_C_JMD_C-Pat...,12)-BULH740102 ASA/Volume Partial specific volume Partial specific volume Apparent partia...l-Breese, 1974) 0.262000 0.153437 0.153437 0.130978 0.164028 0.000000 0.000000 21,24,28,32 nan nan
31 TMD-Pattern(C,4,7)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.176000 0.056675 -0.056675 0.099355 0.114698 0.000004 0.000113 24,27 0.372000 0.882270
32 TMD_C_JMD_C-Seg...2,3)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.175000 0.055597 -0.055597 0.089100 0.105827 0.000005 0.000126 27,28,29,30,31,32,33 0.664000 1.089536
33 TMD_C_JMD_C-Pat...,11)-QIAN880122 Conformation β-strand β-sheet Weights for bet...ejnowski, 1988) 0.173000 0.056328 0.056328 0.067428 0.094795 0.000006 0.000147 25,28,31 0.483200 0.913371
34 JMD_N_TMD_N-Per...3,2)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.172000 0.087470 -0.087470 0.135114 0.144731 0.000005 0.000137 2,5,8,11,14,17,20 0.444000 0.721620
35 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan
36 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan
37 TMD_C_JMD_C-Seg...2,3)-KOEH090106 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.224000 0.088025 0.088025 0.095746 0.124611 0.000014 0.000014 27,28,29,30,31,32,33 nan nan
38 TMD_C_JMD_C-Seg...4,5)-OOBM770101 Polarity Hydrophilicity Non-bonded energy per atom Average non-bon...take-Ooi, 1977) 0.277000 0.217063 0.217063 0.180330 0.208994 0.000000 0.000000 33,34,35,36 nan nan
39 TMD_C_JMD_C-Pat...,14)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.168000 0.086323 0.086323 0.121405 0.138577 0.000000 0.000030 30,34 0.140400 0.391229
40 TMD_C_JMD_C-Seg...4,5)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.167000 0.080568 0.080568 0.128898 0.128726 0.000011 0.000218 33,34,35,36 1.299200 2.159535
41 TMD_C_JMD_C-Seg...4,5)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.166000 0.081797 0.081797 0.121170 0.149555 0.000013 0.000239 33,34,35,36 1.295200 2.225137
42 TMD-Pattern(C,4,7)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.121210 -0.121210 0.143560 0.207767 0.000015 0.000254 24,27 1.302000 1.466618
43 TMD_C_JMD_C-Pat...5,8)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.119568 -0.119568 0.143560 0.205817 0.000014 0.000253 25,28 0.000000 0.000000
44 TMD_C_JMD_C-Seg...5,7)-TANS770108 Conformation β/α-bridge β/α-bridge Normalized freq...Scheraga, 1977) 0.164000 0.079708 0.079708 0.135324 0.137910 0.000016 0.000271 32,33,34 0.462400 0.706967
45 TMD-PeriodicPat...3,1)-OOBM850101 Structure-Activity Stability Stability (extended-coil) Optimized beta-...e et al., 1985) 0.197000 0.046799 0.046799 0.052251 0.070467 0.000133 0.000133 12,15,18,21,24,27,30 nan nan
46 TMD-Pattern(C,4,7)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.300000 0.236619 0.236619 0.165353 0.219458 0.000000 0.000000 24,27 nan nan
47 TMD_C_JMD_C-Pat...,11)-EISD860101 Polarity Hydrophobicity Solvation free energy Solvation free ...cLachlan, 1986) 0.162000 0.083936 -0.083936 0.143338 0.147948 0.000021 0.000304 30,33,37 0.330400 0.377566
48 TMD_C_JMD_C-Pat...5,8)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.162000 0.070292 -0.070292 0.096915 0.128362 0.000020 0.000302 21,25,28 1.528400 2.418922
49 TMD-Pattern(C,4...,11)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.161000 0.068424 -0.068424 0.096915 0.126975 0.000024 0.000332 20,24,27 0.000000 0.000000
50 JMD_N_TMD_N-Seg...2,4)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.161000 0.058976 -0.058976 0.096823 0.114647 0.000025 0.000335 6,7,8,9,10 0.000000 0.000000
51 JMD_N_TMD_N-Pat...,11)-PRAM820103 Shape Shape and Surface Correlation coe...t in regression Correlation coe...nnuswamy, 1982) 0.161000 0.057828 0.057828 0.088362 0.106085 0.000024 0.000328 1,5,8,11 1.304400 1.657101
52 TMD_C_JMD_C-Seg...5,7)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.160000 0.059281 -0.059281 0.100693 0.120806 0.000027 0.000359 32,33,34 0.757200 1.471249
53 TMD_C_JMD_C-Pat...4,8)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.159000 0.103808 0.103808 0.140977 0.179008 0.000014 0.000248 33,37 0.233200 0.593921
54 JMD_N_TMD_N-Seg...,13)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.157000 0.127895 -0.127895 0.151304 0.258491 0.000035 0.000420 5,6 0.833200 1.360696
55 TMD_C_JMD_C-Pat...4,8)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.155000 0.110281 0.110281 0.178578 0.202098 0.000046 0.000486 33,37,40 0.272400 0.623809
56 JMD_N_TMD_N-Pat...,12)-AURR980113 Conformation α-helix α-helix Normalized posi...ora-Rose, 1998) 0.208000 0.105550 -0.105550 0.151448 0.143693 0.000055 0.000055 9,12,15 nan nan
57 JMD_N_TMD_N-Pat...,15)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.155000 0.059593 -0.059593 0.104862 0.110749 0.000050 0.000508 6,9,12,15 0.482000 0.672000
58 JMD_N_TMD_N-Pat...,11)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.154000 0.092099 -0.092099 0.142836 0.171547 0.000052 0.000520 4,7,11 1.065200 1.916900
59 TMD_C_JMD_C-Seg...,10)-CHAM820102 Polarity Hydrophobicity (interface) Free energy (interface) Free energy of ...-Charton, 1982) 0.154000 0.082300 -0.082300 0.136264 0.177551 0.000050 0.000508 33,34 0.366800 0.691767
60 TMD-Pattern(C,5...,12)-FAUJ880107 Structure-Activity Stability α-CH chemical s...kbone-dynamics) N.m.r. chemical...e et al., 1988) 0.123000 0.065435 -0.065435 0.173044 0.140726 0.017378 0.017378 19,22,26 nan nan
61 TMD_C_JMD_C-Pat...,11)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.153000 0.085041 -0.085041 0.135864 0.161279 0.000059 0.000561 30,33,37 0.473600 0.930690
62 TMD_C_JMD_C-Pat...,15)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.153000 0.069595 -0.069595 0.107314 0.134698 0.000060 0.000566 26,29,33 0.770800 1.299178
63 TMD_C_JMD_C-Pat...,15)-LINS030106 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded proteins) Hydrophilic med...s et al., 2003) 0.151000 0.071208 0.071208 0.136279 0.155749 0.000078 0.000657 26,30,33 0.326400 0.451202
64 TMD-Pattern(C,3...,14)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.150000 0.056439 -0.056439 0.094520 0.108682 0.000084 0.000685 17,20,24,28 0.684400 0.941892
65 TMD_C_JMD_C-Pat...,10)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.149000 0.073526 0.073526 0.133612 0.157088 0.000090 0.000714 31,34,38 2.050800 2.338278
66 JMD_N_TMD_N-Seg...2,6)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.148000 0.076361 -0.076361 0.140513 0.148387 0.000108 0.000790 4,5,6 0.537200 1.041739
67 JMD_N_TMD_N-Pat...,15)-BROC820101 Polarity Hydrophobicity Hydrophobicity ...on coefficient) Retention Coeff...e et al., 1982) 0.148000 0.067069 -0.067069 0.120409 0.137261 0.000103 0.000768 6,9,12,15 0.106400 0.249766
68 TMD_C_JMD_C-Pat...4,8)-CORJ870108 Polarity Hydrophilicity TOTLS index TOTLS index (Co...e et al., 1987) 0.251000 0.170889 -0.170889 0.167914 0.219014 0.000001 0.000001 24,28 nan nan
69 TMD_C_JMD_C-Seg...2,5)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.260000 0.141016 0.141016 0.107804 0.160336 0.000000 0.000000 25,26,27,28 nan nan
70 TMD_C_JMD_C-Seg...,11)-OOBM850101 Structure-Activity Stability Stability (extended-coil) Optimized beta-...e et al., 1985) 0.280000 0.112063 0.112063 0.084763 0.131666 0.000000 0.000000 28,29 nan nan
71 JMD_N_TMD_N-Pat...,11)-BIGC670101 ASA/Volume Volume Volume Residue volume (Bigelow, 1967) 0.143000 0.067181 -0.067181 0.141579 0.135502 0.000184 0.001045 5,8,11 0.382000 0.675082
72 JMD_N_TMD_N-Pat...,11)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.142000 0.070908 -0.070908 0.135389 0.144272 0.000190 0.001062 5,8,11 0.384400 0.570074
73 JMD_N_TMD_N-Pat...,14)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.142000 0.058743 -0.058743 0.117342 0.120311 0.000187 0.001056 6,10,14 0.197200 0.344958
74 TMD-Pattern(N,4,7)-QIAN880113 Conformation π-helix α-helix (C-terminal) Weights for alp...ejnowski, 1988) 0.141000 0.070553 -0.070553 0.164819 0.154840 0.000217 0.001151 14,17 0.634800 0.816456
75 JMD_N_TMD_N-Seg...3,6)-PALJ810111 Conformation β-sheet β-sheet Normalized freq...u et al., 1981) 0.108000 0.053468 -0.053468 0.129405 0.141614 0.035900 0.035900 7,8,9,10 nan nan
76 TMD_C_JMD_C-Pat...4,8)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.140000 0.066859 0.066859 0.130397 0.147129 0.000229 0.001185 33,37,40 0.334800 0.632640
77 JMD_N_TMD_N-Seg...2,7)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.139000 0.070195 -0.070195 0.113589 0.146944 0.000259 0.001276 3,4,5 0.498800 0.924962
78 JMD_N_TMD_N-Seg...2,6)-RACS770103 ASA/Volume Accessible surface area (ASA) Side chain orientation Side chain orie...Scheraga, 1977) 0.138000 0.069674 0.069674 0.151437 0.143090 0.000308 0.001398 4,5,6 0.214400 0.501327
79 TMD_C_JMD_C-Seg...4,8)-MIYS850101 Polarity Hydrophobicity Effective partition energy Effective parti...Jernigan, 1985) 0.215000 0.123198 0.123198 0.127012 0.166940 0.000030 0.000030 28,29,30 nan nan
80 JMD_N_TMD_N-Seg...7,8)-KARS160114 Shape Side chain length Eccentricity (average) Average weighte...-Knisley, 2016) 0.137000 0.056352 -0.056352 0.122287 0.122893 0.000322 0.001432 16,17 1.170800 1.925978
81 TMD_C_JMD_C-Seg...,14)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.136000 0.080537 -0.080537 0.194254 0.165343 0.000150 0.000932 26,27 0.638000 0.796859
82 TMD_C_JMD_C-Seg...,10)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.136000 0.072267 -0.072267 0.142246 0.173638 0.000352 0.001527 33,34 1.013200 1.315181
83 TMD_C_JMD_C-Pat...5,8)-LINS030107 ASA/Volume Accessible surface area (ASA) ASA (folded protein) % total accessi...s et al., 2003) 0.136000 0.064864 -0.064864 0.078387 0.131618 0.000367 0.001565 25,28 0.842000 0.904274
84 JMD_N_TMD_N-Pat...6,9)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.135000 0.062723 -0.062723 0.120282 0.141044 0.000396 0.001638 3,6,9 0.696800 1.062095
85 TMD-Pattern(N,1...4,7)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.135000 0.058024 -0.058024 0.115415 0.124556 0.000385 0.001610 11,14,17 0.244400 0.503183
86 JMD_N_TMD_N-Seg...,10)-CRAJ730102 Conformation β-sheet β-sheet Normalized freq...d et al., 1973) 0.134000 0.096792 -0.096792 0.182935 0.210285 0.000461 0.001775 5,6 0.485600 0.792949
87 JMD_N_TMD_N-Pat...,14)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.133000 0.071020 0.071020 0.161372 0.138873 0.000491 0.001836 6,10,14 0.000000 0.000000
88 JMD_N_TMD_N-Seg...7,9)-BURA740102 Conformation β-strand Extended Normalized freq...s et al., 1974) 0.132000 0.056043 0.056043 0.119813 0.123454 0.000562 0.001981 14,15 0.231600 0.356019
89 TMD-Segment(3,8)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.132000 0.055783 -0.055783 0.129933 0.133383 0.000558 0.001977 16,17 0.502400 0.761626
90 TMD-Pattern(N,1...,10)-QIAN880124 Conformation β-sheet (C-term) β-sheet (C-terminal) Weights for bet...ejnowski, 1988) 0.131000 0.069857 0.069857 0.157078 0.159138 0.000580 0.002008 11,14,17,20 0.502800 0.811308
91 JMD_N_TMD_N-Pat...,11)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.178000 0.109835 0.109835 0.133383 0.179887 0.000587 0.000587 10,13,17 nan nan
92 JMD_N_TMD_N-Pat...,14)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.130000 0.067433 -0.067433 0.133237 0.146065 0.000642 0.002130 7,11,14 0.306800 0.574245
93 TMD_C_JMD_C-Pat...3,7)-MAXF760105 Conformation α-helix (left-handed) α-helix (left-handed) Normalized freq...Scheraga, 1976) 0.129000 0.071374 0.071374 0.180851 0.152571 0.000727 0.002285 23,27 0.000000 0.000000
94 TMD_C_JMD_C-Pat...,11)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.128000 0.062708 -0.062708 0.113629 0.123346 0.000767 0.002362 25,28,31 0.407200 0.686822
95 JMD_N_TMD_N-Seg...2,3)-CHAM830105 Shape Side chain length n atoms in side chain (3+1) The number of a...-Charton, 1983) 0.128000 0.057140 -0.057140 0.128493 0.130946 0.000672 0.002187 7,8,9,10,11,12,13 0.121600 0.273037
96 JMD_N_TMD_N-Seg...3,9)-ISOY800102 Conformation β-strand Extended Normalized rela...i et al., 1980) 0.126000 0.079975 -0.079975 0.169167 0.182954 0.000926 0.002636 5,6 1.002000 1.075427

A custom estimator instance can be passed directly as ml_model (used as-is):

from sklearn.svm import SVC
df_custom = cpp.simplify(df_feat=df_feat, labels=labels, max_interpret_grade=2,
                         ml_model=SVC(kernel="linear"))
aa.display_df(df_custom, show_shape=True)
DataFrame shape: (109, 15)
  feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions feat_importance feat_importance_std
1 TMD_C_JMD_C-Seg...3,4)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.244000 0.103666 0.103666 0.106692 0.110506 0.000000 0.000000 31,32,33,34,35 0.970400 1.438918
2 TMD_C_JMD_C-Seg...3,4)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.243000 0.085064 0.085064 0.098774 0.096946 0.000000 0.000000 31,32,33,34,35 0.000000 0.000000
3 TMD_C_JMD_C-Seg...6,9)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.233000 0.137044 0.137044 0.161683 0.176964 0.000000 0.000001 32,33 1.554800 2.109848
4 TMD_C_JMD_C-Seg...6,9)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.223000 0.095071 0.095071 0.114758 0.132829 0.000000 0.000002 32,33 0.000000 0.000000
5 TMD_C_JMD_C-Seg...2,3)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.222000 0.058671 0.058671 0.064895 0.069547 0.000000 0.000001 27,28,29,30,31,32,33 0.000000 0.000000
6 TMD_C_JMD_C-Seg...3,4)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.215000 0.124317 0.124317 0.166309 0.153364 0.000000 0.000004 31,32,33,34,35 1.080400 1.296094
7 TMD_C_JMD_C-Seg...,10)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.212000 0.141305 -0.141305 0.168603 0.217235 0.000000 0.000005 33,34 1.747200 2.150664
8 TMD_C_JMD_C-Seg...6,9)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.125350 0.125350 0.160819 0.174121 0.000000 0.000005 32,33 1.788800 2.700803
9 TMD_C_JMD_C-Seg...2,3)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.077355 0.077355 0.102965 0.107453 0.000000 0.000005 27,28,29,30,31,32,33 3.048800 3.623912
10 TMD_C_JMD_C-Seg...6,9)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.205000 0.125868 -0.125868 0.172165 0.188333 0.000000 0.000009 32,33 0.000000 0.000000
11 TMD_C_JMD_C-Seg...4,5)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.204000 0.105513 0.105513 0.132849 0.145219 0.000000 0.000009 33,34,35,36 1.992000 2.929460
12 TMD_C_JMD_C-Seg...6,9)-RICJ880113 Conformation α-helix (C-cap) α-helix (C-terminal, inside) Relative prefer...chardson, 1988) 0.198000 0.138293 0.138293 0.172194 0.198814 0.000000 0.000017 32,33 0.832400 1.383718
13 JMD_N_TMD_N-Seg...1,2)-KARP850101 Structure-Activity Flexibility Flexibility (0 ...igid neighbors) Flexibility par...s-Schulz, 1985) 0.196000 0.062671 0.062671 0.083456 0.090427 0.000000 0.000023 1,2,3,4,5,6,7,8,9,10 1.574400 1.835403
14 TMD_C_JMD_C-Seg...4,5)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.193000 0.076770 0.076770 0.092804 0.114150 0.000000 0.000027 33,34,35,36 0.000000 0.000000
15 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.189000 0.125674 0.125674 0.183876 0.218813 0.000001 0.000039 28,29 4.729200 4.776785
16 TMD_C_JMD_C-Seg...6,9)-KOEH090103 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.323000 0.248255 0.248255 0.196374 0.181558 0.000000 0.000000 32,33 nan nan
17 TMD_C_JMD_C-Seg...4,5)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.185000 0.105474 -0.105474 0.157535 0.163039 0.000001 0.000059 33,34,35,36 0.000000 0.000000
18 TMD_C_JMD_C-Seg...6,9)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.185000 0.101798 0.101798 0.145676 0.155096 0.000001 0.000054 32,33 0.000000 0.000000
19 JMD_N_TMD_N-Pat...,10)-AURR980116 Conformation α-helix (C-cap) α-helix (C-terminal, C-cap) Normalized posi...ora-Rose, 1998) 0.184000 0.112728 -0.112728 0.166431 0.183800 0.000001 0.000061 11,15 0.857600 1.339550
20 TMD_C_JMD_C-Pat...,15)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.184000 0.062096 0.062096 0.078809 0.091271 0.000000 0.000017 26,30,33 0.147200 0.345306
21 JMD_N_TMD_N-Seg...2,4)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.183000 0.063902 -0.063902 0.090842 0.101427 0.000002 0.000068 6,7,8,9,10 0.823200 1.404583
22 TMD-Pattern(C,3...,15)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.198000 0.109532 0.109532 0.133076 0.159918 0.000122 0.000122 16,20,24,28 nan nan
23 TMD-Pattern(C,3...,15)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.182000 0.096246 0.096246 0.160859 0.159538 0.000002 0.000070 16,20,24,28 0.508400 0.738667
24 JMD_N_TMD_N-Seg...2,4)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.182000 0.066394 -0.066394 0.097857 0.103426 0.000002 0.000070 6,7,8,9,10 0.000000 0.000000
25 TMD_C_JMD_C-Seg...2,3)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.182000 0.063819 0.063819 0.101691 0.105987 0.000002 0.000071 27,28,29,30,31,32,33 0.000000 0.000000
26 TMD-Pattern(N,4,7)-AURR980116 Conformation α-helix (C-cap) α-helix (C-terminal, C-cap) Normalized posi...ora-Rose, 1998) 0.181000 0.118349 -0.118349 0.169282 0.185522 0.000002 0.000078 14,17 1.226400 1.510986
27 TMD_C_JMD_C-Seg...,11)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.181000 0.057287 -0.057287 0.072234 0.106512 0.000002 0.000076 28,29 1.919600 2.094497
28 TMD_C_JMD_C-Pat...,12)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.220000 0.147418 0.147418 0.172594 0.195572 0.000020 0.000020 25,29,32 nan nan
29 TMD-PeriodicPat...3,4)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.180000 0.069277 -0.069277 0.094949 0.119524 0.000002 0.000082 13,16,20,23,27 1.818000 2.308293
30 JMD_N_TMD_N-Pat...,15)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.179000 0.115042 -0.115042 0.151938 0.189623 0.000002 0.000068 6,9,12,15 0.648400 1.061142
31 TMD-Pattern(C,4,7)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.176000 0.120892 0.120892 0.198986 0.216030 0.000004 0.000113 24,27 0.714800 1.118149
32 TMD_C_JMD_C-Pat...4,8)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.176000 0.087846 0.087846 0.140464 0.157561 0.000004 0.000113 24,28 2.704000 4.076269
33 TMD_C_JMD_C-Pat...,12)-BULH740102 ASA/Volume Partial specific volume Partial specific volume Apparent partia...l-Breese, 1974) 0.262000 0.153437 0.153437 0.130978 0.164028 0.000000 0.000000 21,24,28,32 nan nan
34 TMD-Pattern(C,4,7)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.176000 0.056675 -0.056675 0.099355 0.114698 0.000004 0.000113 24,27 0.372000 0.882270
35 TMD_C_JMD_C-Seg...4,5)-TANS770106 Conformation β-turn (TM helix) β-turn in double bend Normalized freq...Scheraga, 1977) 0.175000 0.078020 0.078020 0.113536 0.125285 0.000005 0.000129 33,34,35,36 0.000000 0.000000
36 TMD_C_JMD_C-Seg...2,3)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.175000 0.055597 -0.055597 0.089100 0.105827 0.000005 0.000126 27,28,29,30,31,32,33 0.664000 1.089536
37 JMD_N_TMD_N-Per...4,3)-QIAN880138 Conformation Coil (C-term) Coil (C-terminal) Weights for coi...ejnowski, 1988) 0.174000 0.067216 0.067216 0.105047 0.116197 0.000005 0.000133 1,4,8,11,15,18 0.000000 0.000000
38 TMD_C_JMD_C-Pat...,11)-QIAN880122 Conformation β-strand β-sheet Weights for bet...ejnowski, 1988) 0.173000 0.056328 0.056328 0.067428 0.094795 0.000006 0.000147 25,28,31 0.483200 0.913371
39 JMD_N_TMD_N-Per...3,2)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.172000 0.087470 -0.087470 0.135114 0.144731 0.000005 0.000137 2,5,8,11,14,17,20 0.444000 0.721620
40 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan
41 TMD_C_JMD_C-Seg...2,3)-KOEH090106 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.224000 0.088025 0.088025 0.095746 0.124611 0.000014 0.000014 27,28,29,30,31,32,33 nan nan
42 TMD_C_JMD_C-Pat...,14)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.168000 0.086323 0.086323 0.121405 0.138577 0.000000 0.000030 30,34 0.140400 0.391229
43 JMD_N_TMD_N-Pat...,11)-QIAN880127 Conformation Coil (N-term) Coil (N-terminal) Weights for coi...ejnowski, 1988) 0.168000 0.071770 -0.071770 0.116934 0.123667 0.000011 0.000216 4,7,11 0.616400 1.124195
44 TMD_C_JMD_C-Seg...2,2)-RICJ880113 Conformation α-helix (C-cap) α-helix (C-terminal, inside) Relative prefer...chardson, 1988) 0.168000 0.067627 0.067627 0.098469 0.110321 0.000011 0.000215 31,32,33,34,35,36,37,38,39,40 1.105200 1.425601
45 TMD_C_JMD_C-Seg...4,5)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.167000 0.080568 0.080568 0.128898 0.128726 0.000011 0.000218 33,34,35,36 1.299200 2.159535
46 TMD_C_JMD_C-Seg...4,5)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.166000 0.081797 0.081797 0.121170 0.149555 0.000013 0.000239 33,34,35,36 1.295200 2.225137
47 TMD-Pattern(C,4,7)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.121210 -0.121210 0.143560 0.207767 0.000015 0.000254 24,27 1.302000 1.466618
48 TMD_C_JMD_C-Pat...5,8)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.119568 -0.119568 0.143560 0.205817 0.000014 0.000253 25,28 0.000000 0.000000
49 TMD-Pattern(C,5...,12)-HUTJ700102 Energy Entropy Entropy Absolute entrop...Hutchens, 1970) 0.165000 0.063134 -0.063134 0.104624 0.113955 0.000015 0.000258 19,22,26 0.000000 0.000000
50 TMD_C_JMD_C-Seg...5,7)-TANS770108 Conformation β/α-bridge β/α-bridge Normalized freq...Scheraga, 1977) 0.164000 0.079708 0.079708 0.135324 0.137910 0.000016 0.000271 32,33,34 0.462400 0.706967
51 TMD_C_JMD_C-Pat...,12)-CHOP780212 Conformation β-sheet (C-term) β-turn (1st residue) Frequency of th...-Fasman, 1978b) 0.164000 0.076207 -0.076207 0.125506 0.147002 0.000016 0.000267 24,28,32 1.095600 1.575630
52 TMD-Pattern(C,4,7)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.300000 0.236619 0.236619 0.165353 0.219458 0.000000 0.000000 24,27 nan nan
53 TMD_C_JMD_C-Pat...,11)-EISD860101 Polarity Hydrophobicity Solvation free energy Solvation free ...cLachlan, 1986) 0.162000 0.083936 -0.083936 0.143338 0.147948 0.000021 0.000304 30,33,37 0.330400 0.377566
54 TMD_C_JMD_C-Pat...5,8)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.162000 0.070292 -0.070292 0.096915 0.128362 0.000020 0.000302 21,25,28 1.528400 2.418922
55 TMD-Pattern(C,4...,11)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.161000 0.068424 -0.068424 0.096915 0.126975 0.000024 0.000332 20,24,27 0.000000 0.000000
56 JMD_N_TMD_N-Seg...2,4)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.161000 0.058976 -0.058976 0.096823 0.114647 0.000025 0.000335 6,7,8,9,10 0.000000 0.000000
57 JMD_N_TMD_N-Pat...,11)-PRAM820103 Shape Shape and Surface Correlation coe...t in regression Correlation coe...nnuswamy, 1982) 0.161000 0.057828 0.057828 0.088362 0.106085 0.000024 0.000328 1,5,8,11 1.304400 1.657101
58 TMD_C_JMD_C-Seg...5,7)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.160000 0.059281 -0.059281 0.100693 0.120806 0.000027 0.000359 32,33,34 0.757200 1.471249
59 TMD_C_JMD_C-Pat...4,8)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.159000 0.103808 0.103808 0.140977 0.179008 0.000014 0.000248 33,37 0.233200 0.593921
60 JMD_N_TMD_N-Seg...,13)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.157000 0.127895 -0.127895 0.151304 0.258491 0.000035 0.000420 5,6 0.833200 1.360696
61 TMD_C_JMD_C-Pat...4,8)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.155000 0.110281 0.110281 0.178578 0.202098 0.000046 0.000486 33,37,40 0.272400 0.623809
62 JMD_N_TMD_N-Pat...,12)-AURR980113 Conformation α-helix α-helix Normalized posi...ora-Rose, 1998) 0.208000 0.105550 -0.105550 0.151448 0.143693 0.000055 0.000055 9,12,15 nan nan
63 JMD_N_TMD_N-Pat...,13)-RICJ880107 Conformation π-helix α-helix Relative prefer...chardson, 1988) 0.155000 0.066867 -0.066867 0.105803 0.129430 0.000047 0.000496 3,6,9,13 0.335200 0.649905
64 JMD_N_TMD_N-Pat...,15)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.155000 0.059593 -0.059593 0.104862 0.110749 0.000050 0.000508 6,9,12,15 0.482000 0.672000
65 JMD_N_TMD_N-Pat...,11)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.154000 0.092099 -0.092099 0.142836 0.171547 0.000052 0.000520 4,7,11 1.065200 1.916900
66 TMD_C_JMD_C-Seg...,10)-CHAM820102 Polarity Hydrophobicity (interface) Free energy (interface) Free energy of ...-Charton, 1982) 0.154000 0.082300 -0.082300 0.136264 0.177551 0.000050 0.000508 33,34 0.366800 0.691767
67 TMD-Pattern(C,5...,12)-FAUJ880107 Structure-Activity Stability α-CH chemical s...kbone-dynamics) N.m.r. chemical...e et al., 1988) 0.123000 0.065435 -0.065435 0.173044 0.140726 0.017378 0.017378 19,22,26 nan nan
68 TMD_C_JMD_C-Pat...,11)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.153000 0.085041 -0.085041 0.135864 0.161279 0.000059 0.000561 30,33,37 0.473600 0.930690
69 TMD_C_JMD_C-Pat...,15)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.153000 0.069595 -0.069595 0.107314 0.134698 0.000060 0.000566 26,29,33 0.770800 1.299178
70 TMD-Pattern(N,2...,11)-RACS820101 Conformation β-sheet (N-term) α-helix with fl...α structure (i) Average relativ...Scheraga, 1982) 0.153000 0.062678 -0.062678 0.109868 0.123054 0.000061 0.000570 12,15,18,21 0.333600 0.598524
71 TMD_C_JMD_C-Pat...,15)-LINS030106 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded proteins) Hydrophilic med...s et al., 2003) 0.151000 0.071208 0.071208 0.136279 0.155749 0.000078 0.000657 26,30,33 0.326400 0.451202
72 TMD-Pattern(C,3...,14)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.150000 0.056439 -0.056439 0.094520 0.108682 0.000084 0.000685 17,20,24,28 0.684400 0.941892
73 TMD_C_JMD_C-Pat...,10)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.149000 0.073526 0.073526 0.133612 0.157088 0.000090 0.000714 31,34,38 2.050800 2.338278
74 JMD_N_TMD_N-Pat...,11)-RACS820101 Conformation β-sheet (N-term) α-helix with fl...α structure (i) Average relativ...Scheraga, 1982) 0.149000 0.063073 -0.063073 0.107731 0.126806 0.000091 0.000716 10,13,16,19 0.000000 0.000000
75 JMD_N_TMD_N-Seg...2,6)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.148000 0.076361 -0.076361 0.140513 0.148387 0.000108 0.000790 4,5,6 0.537200 1.041739
76 JMD_N_TMD_N-Pat...,15)-BROC820101 Polarity Hydrophobicity Hydrophobicity ...on coefficient) Retention Coeff...e et al., 1982) 0.148000 0.067069 -0.067069 0.120409 0.137261 0.000103 0.000768 6,9,12,15 0.106400 0.249766
77 TMD_C_JMD_C-Pat...4,8)-CORJ870108 Polarity Hydrophilicity TOTLS index TOTLS index (Co...e et al., 1987) 0.251000 0.170889 -0.170889 0.167914 0.219014 0.000001 0.000001 24,28 nan nan
78 TMD_C_JMD_C-Seg...2,5)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.260000 0.141016 0.141016 0.107804 0.160336 0.000000 0.000000 25,26,27,28 nan nan
79 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.333000 0.246278 0.246278 0.183876 0.212529 0.000000 0.000000 28,29 nan nan
80 JMD_N_TMD_N-Pat...,11)-BIGC670101 ASA/Volume Volume Volume Residue volume (Bigelow, 1967) 0.143000 0.067181 -0.067181 0.141579 0.135502 0.000184 0.001045 5,8,11 0.382000 0.675082
81 JMD_N_TMD_N-Pat...,11)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.142000 0.070908 -0.070908 0.135389 0.144272 0.000190 0.001062 5,8,11 0.384400 0.570074
82 JMD_N_TMD_N-Pat...,14)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.142000 0.058743 -0.058743 0.117342 0.120311 0.000187 0.001056 6,10,14 0.197200 0.344958
83 TMD-Pattern(N,4,7)-QIAN880113 Conformation π-helix α-helix (C-terminal) Weights for alp...ejnowski, 1988) 0.141000 0.070553 -0.070553 0.164819 0.154840 0.000217 0.001151 14,17 0.634800 0.816456
84 JMD_N_TMD_N-Seg...3,6)-VASM830102 Energy Non-bonded energy Free energy (Extended) Relative popula...z et al., 1983) 0.141000 0.067593 0.067593 0.146572 0.140332 0.000225 0.001173 7,8,9,10 0.484800 0.832789
85 TMD_C_JMD_C-Pat...,14)-FUKS010111 Composition AA composition Proteins of mesophiles (EXT) Entire chain co...ishikawa, 2001) 0.003000 0.015230 -0.015230 0.140051 0.227922 0.959141 0.959141 30,34 nan nan
86 TMD_C_JMD_C-Pat...4,8)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.140000 0.066859 0.066859 0.130397 0.147129 0.000229 0.001185 33,37,40 0.334800 0.632640
87 JMD_N_TMD_N-Seg...2,7)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.139000 0.070195 -0.070195 0.113589 0.146944 0.000259 0.001276 3,4,5 0.498800 0.924962
88 JMD_N_TMD_N-Seg...2,6)-RACS770103 ASA/Volume Accessible surface area (ASA) Side chain orientation Side chain orie...Scheraga, 1977) 0.138000 0.069674 0.069674 0.151437 0.143090 0.000308 0.001398 4,5,6 0.214400 0.501327
89 TMD_C_JMD_C-Seg...4,8)-MIYS850101 Polarity Hydrophobicity Effective partition energy Effective parti...Jernigan, 1985) 0.215000 0.123198 0.123198 0.127012 0.166940 0.000030 0.000030 28,29,30 nan nan
90 TMD_C_JMD_C-Pat...,10)-QIAN880138 Conformation Coil (C-term) Coil (C-terminal) Weights for coi...ejnowski, 1988) 0.137000 0.065719 -0.065719 0.114425 0.146722 0.000312 0.001404 31,35,39 0.360800 0.882718
91 JMD_N_TMD_N-Seg...7,8)-KARS160114 Shape Side chain length Eccentricity (average) Average weighte...-Knisley, 2016) 0.137000 0.056352 -0.056352 0.122287 0.122893 0.000322 0.001432 16,17 1.170800 1.925978
92 TMD_C_JMD_C-Seg...,14)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.136000 0.080537 -0.080537 0.194254 0.165343 0.000150 0.000932 26,27 0.638000 0.796859
93 TMD_C_JMD_C-Seg...,10)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.136000 0.072267 -0.072267 0.142246 0.173638 0.000352 0.001527 33,34 1.013200 1.315181
94 TMD_C_JMD_C-Pat...5,8)-LINS030107 ASA/Volume Accessible surface area (ASA) ASA (folded protein) % total accessi...s et al., 2003) 0.136000 0.064864 -0.064864 0.078387 0.131618 0.000367 0.001565 25,28 0.842000 0.904274
95 JMD_N_TMD_N-Pat...6,9)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.135000 0.062723 -0.062723 0.120282 0.141044 0.000396 0.001638 3,6,9 0.696800 1.062095
96 TMD-Pattern(N,1...4,7)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.135000 0.058024 -0.058024 0.115415 0.124556 0.000385 0.001610 11,14,17 0.244400 0.503183
97 JMD_N_TMD_N-Seg...,10)-CRAJ730102 Conformation β-sheet β-sheet Normalized freq...d et al., 1973) 0.134000 0.096792 -0.096792 0.182935 0.210285 0.000461 0.001775 5,6 0.485600 0.792949
98 JMD_N_TMD_N-Pat...,14)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.133000 0.071020 0.071020 0.161372 0.138873 0.000491 0.001836 6,10,14 0.000000 0.000000
99 JMD_N_TMD_N-Seg...7,9)-BURA740102 Conformation β-strand Extended Normalized freq...s et al., 1974) 0.132000 0.056043 0.056043 0.119813 0.123454 0.000562 0.001981 14,15 0.231600 0.356019
100 TMD-Segment(3,8)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.132000 0.055783 -0.055783 0.129933 0.133383 0.000558 0.001977 16,17 0.502400 0.761626
101 TMD-Pattern(N,1...,10)-QIAN880124 Conformation β-sheet (C-term) β-sheet (C-terminal) Weights for bet...ejnowski, 1988) 0.131000 0.069857 0.069857 0.157078 0.159138 0.000580 0.002008 11,14,17,20 0.502800 0.811308
102 JMD_N_TMD_N-Pat...,11)-ANDN920101 Structure-Activity Backbone-dynamics (-CH) α-CH chemical s...kbone-dynamics) alpha-CH chemic...n et al., 1992) 0.130000 0.087733 -0.087733 0.180612 0.187328 0.000674 0.002190 10,13,17 0.420000 0.643453
103 JMD_N_TMD_N-Pat...,14)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.130000 0.067433 -0.067433 0.133237 0.146065 0.000642 0.002130 7,11,14 0.306800 0.574245
104 JMD_N_TMD_N-Pat...,11)-VASM830102 Energy Non-bonded energy Free energy (Extended) Relative popula...z et al., 1983) 0.129000 0.077724 0.077724 0.148907 0.164954 0.000708 0.002247 4,8,11 0.160400 0.302939
105 TMD_C_JMD_C-Pat...3,7)-MAXF760105 Conformation α-helix (left-handed) α-helix (left-handed) Normalized freq...Scheraga, 1976) 0.129000 0.071374 0.071374 0.180851 0.152571 0.000727 0.002285 23,27 0.000000 0.000000
106 JMD_N_TMD_N-Seg...3,9)-KOEP990102 Conformation β-sheet (N-term) Extended (designed β-sheet) Beta-sheet prop...l-Levitt, 1999) 0.128000 0.086726 0.086726 0.184173 0.184291 0.000769 0.002364 5,6 0.565600 0.778424
107 TMD_C_JMD_C-Pat...,11)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.128000 0.062708 -0.062708 0.113629 0.123346 0.000767 0.002362 25,28,31 0.407200 0.686822
108 JMD_N_TMD_N-Seg...2,3)-CHAM830105 Shape Side chain length n atoms in side chain (3+1) The number of a...-Charton, 1983) 0.128000 0.057140 -0.057140 0.128493 0.130946 0.000672 0.002187 7,8,9,10,11,12,13 0.121600 0.273037
109 JMD_N_TMD_N-Seg...3,9)-ISOY800102 Conformation β-strand Extended Normalized rela...i et al., 1980) 0.126000 0.079975 -0.079975 0.169167 0.182954 0.000926 0.002636 5,6 1.002000 1.075427

Candidate eligibility and per-feature filtering. min_cor is the minimum absolute correlation a candidate scale must have with the original scale (anti-correlation allowed), and max_std_test is the CPP per-feature pre-filter threshold the recomputed swapped feature must satisfy:

df_strict = cpp.simplify(df_feat=df_feat, labels=labels, max_interpret_grade=2,
                         min_cor=0.8, max_std_test=0.2)
aa.display_df(df_strict, show_shape=True)
DataFrame shape: (120, 15)
  feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions feat_importance feat_importance_std
1 TMD_C_JMD_C-Seg...3,4)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.244000 0.103666 0.103666 0.106692 0.110506 0.000000 0.000000 31,32,33,34,35 0.970400 1.438918
2 TMD_C_JMD_C-Seg...3,4)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.243000 0.085064 0.085064 0.098774 0.096946 0.000000 0.000000 31,32,33,34,35 0.000000 0.000000
3 TMD_C_JMD_C-Seg...6,9)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.233000 0.137044 0.137044 0.161683 0.176964 0.000000 0.000001 32,33 1.554800 2.109848
4 TMD_C_JMD_C-Seg...6,9)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.223000 0.095071 0.095071 0.114758 0.132829 0.000000 0.000002 32,33 0.000000 0.000000
5 TMD_C_JMD_C-Seg...2,3)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.222000 0.058671 0.058671 0.064895 0.069547 0.000000 0.000001 27,28,29,30,31,32,33 0.000000 0.000000
6 TMD_C_JMD_C-Seg...3,4)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.215000 0.124317 0.124317 0.166309 0.153364 0.000000 0.000004 31,32,33,34,35 1.080400 1.296094
7 TMD_C_JMD_C-Seg...,10)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.212000 0.141305 -0.141305 0.168603 0.217235 0.000000 0.000005 33,34 1.747200 2.150664
8 TMD_C_JMD_C-Seg...6,9)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.125350 0.125350 0.160819 0.174121 0.000000 0.000005 32,33 1.788800 2.700803
9 TMD_C_JMD_C-Seg...2,3)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.077355 0.077355 0.102965 0.107453 0.000000 0.000005 27,28,29,30,31,32,33 3.048800 3.623912
10 TMD_C_JMD_C-Seg...3,4)-KOEH090106 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.289000 0.193943 0.193943 0.159718 0.184043 0.000000 0.000000 31,32,33,34,35 nan nan
11 TMD_C_JMD_C-Seg...6,9)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.205000 0.125868 -0.125868 0.172165 0.188333 0.000000 0.000009 32,33 0.000000 0.000000
12 TMD_C_JMD_C-Seg...4,5)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.204000 0.105513 0.105513 0.132849 0.145219 0.000000 0.000009 33,34,35,36 1.992000 2.929460
13 TMD_C_JMD_C-Seg...3,4)-PRAM820102 Shape Shape and Surface Slope in Regression Slope in Regres...nnuswamy, 1982) 0.199000 0.073023 -0.073023 0.087336 0.107750 0.000000 0.000017 31,32,33,34,35 0.616000 0.847660
14 TMD_C_JMD_C-Seg...6,9)-RICJ880113 Conformation α-helix (C-cap) α-helix (C-terminal, inside) Relative prefer...chardson, 1988) 0.198000 0.138293 0.138293 0.172194 0.198814 0.000000 0.000017 32,33 0.832400 1.383718
15 JMD_N_TMD_N-Seg...1,2)-KARP850101 Structure-Activity Flexibility Flexibility (0 ...igid neighbors) Flexibility par...s-Schulz, 1985) 0.196000 0.062671 0.062671 0.083456 0.090427 0.000000 0.000023 1,2,3,4,5,6,7,8,9,10 1.574400 1.835403
16 TMD_C_JMD_C-Seg...4,5)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.193000 0.076770 0.076770 0.092804 0.114150 0.000000 0.000027 33,34,35,36 0.000000 0.000000
17 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.189000 0.125674 0.125674 0.183876 0.218813 0.000001 0.000039 28,29 4.729200 4.776785
18 TMD_C_JMD_C-Seg...6,9)-TANS770106 Conformation β-turn (TM helix) β-turn in double bend Normalized freq...Scheraga, 1977) 0.189000 0.093759 0.093759 0.136715 0.137320 0.000001 0.000039 32,33 0.000000 0.000000
19 TMD_C_JMD_C-Pat...4,8)-KOEH090106 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.283000 0.239889 0.239889 0.181777 0.235674 0.000000 0.000000 33,37 nan nan
20 TMD_C_JMD_C-Seg...4,5)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.185000 0.105474 -0.105474 0.157535 0.163039 0.000001 0.000059 33,34,35,36 0.000000 0.000000
21 TMD_C_JMD_C-Seg...6,9)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.185000 0.101798 0.101798 0.145676 0.155096 0.000001 0.000054 32,33 0.000000 0.000000
22 JMD_N_TMD_N-Pat...,10)-AURR980116 Conformation α-helix (C-cap) α-helix (C-terminal, C-cap) Normalized posi...ora-Rose, 1998) 0.184000 0.112728 -0.112728 0.166431 0.183800 0.000001 0.000061 11,15 0.857600 1.339550
23 TMD_C_JMD_C-Pat...,15)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.184000 0.062096 0.062096 0.078809 0.091271 0.000000 0.000017 26,30,33 0.147200 0.345306
24 JMD_N_TMD_N-Seg...2,4)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.183000 0.063902 -0.063902 0.090842 0.101427 0.000002 0.000068 6,7,8,9,10 0.823200 1.404583
25 TMD_C_JMD_C-Seg...4,5)-RICJ880113 Conformation α-helix (C-cap) α-helix (C-terminal, inside) Relative prefer...chardson, 1988) 0.182000 0.121315 0.121315 0.147184 0.184212 0.000002 0.000070 33,34,35,36 0.865200 1.553379
26 TMD-Pattern(C,3...,15)-ANDN920101 Structure-Activity Backbone-dynamics (-CH) α-CH chemical s...kbone-dynamics) alpha-CH chemic...n et al., 1992) 0.182000 0.098529 -0.098529 0.141641 0.162412 0.000002 0.000072 16,20,24,28 0.221200 0.519240
27 TMD-Pattern(C,3...,15)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.182000 0.096246 0.096246 0.160859 0.159538 0.000002 0.000070 16,20,24,28 0.508400 0.738667
28 JMD_N_TMD_N-Seg...2,4)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.182000 0.066394 -0.066394 0.097857 0.103426 0.000002 0.000070 6,7,8,9,10 0.000000 0.000000
29 TMD_C_JMD_C-Seg...2,3)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.182000 0.063819 0.063819 0.101691 0.105987 0.000002 0.000071 27,28,29,30,31,32,33 0.000000 0.000000
30 TMD-Pattern(N,4,7)-AURR980116 Conformation α-helix (C-cap) α-helix (C-terminal, C-cap) Normalized posi...ora-Rose, 1998) 0.181000 0.118349 -0.118349 0.169282 0.185522 0.000002 0.000078 14,17 1.226400 1.510986
31 TMD_C_JMD_C-Seg...,11)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.181000 0.057287 -0.057287 0.072234 0.106512 0.000002 0.000076 28,29 1.919600 2.094497
32 TMD_C_JMD_C-Pat...,12)-ANDN920101 Structure-Activity Backbone-dynamics (-CH) α-CH chemical s...kbone-dynamics) alpha-CH chemic...n et al., 1992) 0.180000 0.096784 -0.096784 0.151260 0.170153 0.000002 0.000084 25,29,32 0.356800 0.617224
33 TMD-PeriodicPat...3,4)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.180000 0.069277 -0.069277 0.094949 0.119524 0.000002 0.000082 13,16,20,23,27 1.818000 2.308293
34 JMD_N_TMD_N-Pat...,15)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.179000 0.115042 -0.115042 0.151938 0.189623 0.000002 0.000068 6,9,12,15 0.648400 1.061142
35 JMD_N_TMD_N-Per...4,3)-QIAN880138 Conformation Coil (C-term) Coil (C-terminal) Weights for coi...ejnowski, 1988) 0.179000 0.069852 0.069852 0.103576 0.116589 0.000003 0.000093 3,6,10,13,17,20 0.385200 0.555965
36 TMD-Pattern(C,4,7)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.176000 0.120892 0.120892 0.198986 0.216030 0.000004 0.000113 24,27 0.714800 1.118149
37 TMD_C_JMD_C-Pat...4,8)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.176000 0.087846 0.087846 0.140464 0.157561 0.000004 0.000113 24,28 2.704000 4.076269
38 TMD_C_JMD_C-Pat...,12)-FAUJ880108 Energy Electron-ion interaction pot. Electrical Effect Localized Elect...e et al., 1988) 0.176000 0.064253 -0.064253 0.092619 0.113588 0.000004 0.000113 21,24,28,32 0.826400 1.303426
39 TMD-Pattern(C,4,7)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.176000 0.056675 -0.056675 0.099355 0.114698 0.000004 0.000113 24,27 0.372000 0.882270
40 TMD_C_JMD_C-Seg...4,5)-TANS770106 Conformation β-turn (TM helix) β-turn in double bend Normalized freq...Scheraga, 1977) 0.175000 0.078020 0.078020 0.113536 0.125285 0.000005 0.000129 33,34,35,36 0.000000 0.000000
41 TMD_C_JMD_C-Seg...2,3)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.175000 0.055597 -0.055597 0.089100 0.105827 0.000005 0.000126 27,28,29,30,31,32,33 0.664000 1.089536
42 JMD_N_TMD_N-Per...4,3)-QIAN880138 Conformation Coil (C-term) Coil (C-terminal) Weights for coi...ejnowski, 1988) 0.174000 0.067216 0.067216 0.105047 0.116197 0.000005 0.000133 1,4,8,11,15,18 0.000000 0.000000
43 TMD_C_JMD_C-Pat...,11)-QIAN880122 Conformation β-strand β-sheet Weights for bet...ejnowski, 1988) 0.173000 0.056328 0.056328 0.067428 0.094795 0.000006 0.000147 25,28,31 0.483200 0.913371
44 JMD_N_TMD_N-Per...3,2)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.172000 0.087470 -0.087470 0.135114 0.144731 0.000005 0.000137 2,5,8,11,14,17,20 0.444000 0.721620
45 TMD_C_JMD_C-Seg...2,3)-PRAM820102 Shape Shape and Surface Slope in Regression Slope in Regres...nnuswamy, 1982) 0.172000 0.056268 -0.056268 0.074692 0.093571 0.000006 0.000151 27,28,29,30,31,32,33 0.303600 0.618242
46 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan
47 TMD_C_JMD_C-Seg...2,3)-KOEH090106 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.224000 0.088025 0.088025 0.095746 0.124611 0.000014 0.000014 27,28,29,30,31,32,33 nan nan
48 TMD_C_JMD_C-Pat...,14)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.168000 0.086323 0.086323 0.121405 0.138577 0.000000 0.000030 30,34 0.140400 0.391229
49 TMD_C_JMD_C-Seg...2,2)-RICJ880113 Conformation α-helix (C-cap) α-helix (C-terminal, inside) Relative prefer...chardson, 1988) 0.168000 0.067627 0.067627 0.098469 0.110321 0.000011 0.000215 31,32,33,34,35,36,37,38,39,40 1.105200 1.425601
50 TMD_C_JMD_C-Seg...4,5)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.167000 0.080568 0.080568 0.128898 0.128726 0.000011 0.000218 33,34,35,36 1.299200 2.159535
51 TMD-Pattern(C,5...,12)-PRAM820102 Shape Shape and Surface Slope in Regression Slope in Regres...nnuswamy, 1982) 0.167000 0.077343 0.077343 0.135340 0.134263 0.000012 0.000228 19,22,26 1.301600 1.697263
52 TMD_C_JMD_C-Seg...4,5)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.166000 0.081797 0.081797 0.121170 0.149555 0.000013 0.000239 33,34,35,36 1.295200 2.225137
53 TMD-Pattern(C,4,7)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.121210 -0.121210 0.143560 0.207767 0.000015 0.000254 24,27 1.302000 1.466618
54 TMD_C_JMD_C-Pat...5,8)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.119568 -0.119568 0.143560 0.205817 0.000014 0.000253 25,28 0.000000 0.000000
55 TMD_C_JMD_C-Seg...5,7)-TANS770108 Conformation β/α-bridge β/α-bridge Normalized freq...Scheraga, 1977) 0.164000 0.079708 0.079708 0.135324 0.137910 0.000016 0.000271 32,33,34 0.462400 0.706967
56 TMD-PeriodicPat...3,1)-COHE430101 ASA/Volume Partial specific volume Partial specific volume Partial specifi...n-Edsall, 1943) 0.164000 0.058745 0.058745 0.092103 0.106413 0.000017 0.000276 12,15,18,21,24,27,30 1.141600 1.375595
57 TMD-Pattern(C,4,7)-ANDN920101 Structure-Activity Backbone-dynamics (-CH) α-CH chemical s...kbone-dynamics) alpha-CH chemic...n et al., 1992) 0.163000 0.128817 -0.128817 0.184672 0.227780 0.000020 0.000293 24,27 0.872800 1.063156
58 TMD_C_JMD_C-Pat...,11)-EISD860101 Polarity Hydrophobicity Solvation free energy Solvation free ...cLachlan, 1986) 0.162000 0.083936 -0.083936 0.143338 0.147948 0.000021 0.000304 30,33,37 0.330400 0.377566
59 TMD_C_JMD_C-Pat...5,8)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.162000 0.070292 -0.070292 0.096915 0.128362 0.000020 0.000302 21,25,28 1.528400 2.418922
60 TMD-Pattern(C,4...,11)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.161000 0.068424 -0.068424 0.096915 0.126975 0.000024 0.000332 20,24,27 0.000000 0.000000
61 JMD_N_TMD_N-Seg...2,4)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.161000 0.058976 -0.058976 0.096823 0.114647 0.000025 0.000335 6,7,8,9,10 0.000000 0.000000
62 JMD_N_TMD_N-Pat...,11)-PRAM820103 Shape Shape and Surface Correlation coe...t in regression Correlation coe...nnuswamy, 1982) 0.161000 0.057828 0.057828 0.088362 0.106085 0.000024 0.000328 1,5,8,11 1.304400 1.657101
63 TMD_C_JMD_C-Seg...5,7)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.160000 0.059281 -0.059281 0.100693 0.120806 0.000027 0.000359 32,33,34 0.757200 1.471249
64 TMD_C_JMD_C-Pat...4,8)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.159000 0.103808 0.103808 0.140977 0.179008 0.000014 0.000248 33,37 0.233200 0.593921
65 JMD_N_TMD_N-Seg...,13)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.157000 0.127895 -0.127895 0.151304 0.258491 0.000035 0.000420 5,6 0.833200 1.360696
66 TMD_C_JMD_C-Pat...4,8)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.155000 0.110281 0.110281 0.178578 0.202098 0.000046 0.000486 33,37,40 0.272400 0.623809
67 JMD_N_TMD_N-Pat...,12)-AURR980113 Conformation α-helix α-helix Normalized posi...ora-Rose, 1998) 0.208000 0.105550 -0.105550 0.151448 0.143693 0.000055 0.000055 9,12,15 nan nan
68 JMD_N_TMD_N-Pat...,13)-RICJ880107 Conformation π-helix α-helix Relative prefer...chardson, 1988) 0.155000 0.066867 -0.066867 0.105803 0.129430 0.000047 0.000496 3,6,9,13 0.335200 0.649905
69 JMD_N_TMD_N-Pat...,15)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.155000 0.059593 -0.059593 0.104862 0.110749 0.000050 0.000508 6,9,12,15 0.482000 0.672000
70 JMD_N_TMD_N-Pat...,11)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.154000 0.092099 -0.092099 0.142836 0.171547 0.000052 0.000520 4,7,11 1.065200 1.916900
71 TMD_C_JMD_C-Seg...,10)-CHAM820102 Polarity Hydrophobicity (interface) Free energy (interface) Free energy of ...-Charton, 1982) 0.154000 0.082300 -0.082300 0.136264 0.177551 0.000050 0.000508 33,34 0.366800 0.691767
72 TMD-Pattern(C,5...,12)-MAXF760105 Conformation α-helix (left-handed) α-helix (left-handed) Normalized freq...Scheraga, 1976) 0.154000 0.062226 0.062226 0.144085 0.119863 0.000057 0.000548 19,22,26 0.715200 1.186306
73 TMD_C_JMD_C-Pat...,11)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.153000 0.085041 -0.085041 0.135864 0.161279 0.000059 0.000561 30,33,37 0.473600 0.930690
74 TMD_C_JMD_C-Pat...,15)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.153000 0.069595 -0.069595 0.107314 0.134698 0.000060 0.000566 26,29,33 0.770800 1.299178
75 TMD-Pattern(N,2...,11)-RACS820101 Conformation β-sheet (N-term) α-helix with fl...α structure (i) Average relativ...Scheraga, 1982) 0.153000 0.062678 -0.062678 0.109868 0.123054 0.000061 0.000570 12,15,18,21 0.333600 0.598524
76 TMD_C_JMD_C-Pat...,15)-LINS030106 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded proteins) Hydrophilic med...s et al., 2003) 0.151000 0.071208 0.071208 0.136279 0.155749 0.000078 0.000657 26,30,33 0.326400 0.451202
77 TMD-Pattern(C,3...,14)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.150000 0.056439 -0.056439 0.094520 0.108682 0.000084 0.000685 17,20,24,28 0.684400 0.941892
78 TMD_C_JMD_C-Pat...,10)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.149000 0.073526 0.073526 0.133612 0.157088 0.000090 0.000714 31,34,38 2.050800 2.338278
79 TMD_C_JMD_C-Pat...,13)-CHOP780212 Conformation β-sheet (C-term) β-turn (1st residue) Frequency of th...-Fasman, 1978b) 0.149000 0.069627 -0.069627 0.113251 0.143949 0.000093 0.000725 26,29,33 0.842800 1.314094
80 JMD_N_TMD_N-Pat...,11)-RACS820101 Conformation β-sheet (N-term) α-helix with fl...α structure (i) Average relativ...Scheraga, 1982) 0.149000 0.063073 -0.063073 0.107731 0.126806 0.000091 0.000716 10,13,16,19 0.000000 0.000000
81 JMD_N_TMD_N-Seg...2,6)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.148000 0.076361 -0.076361 0.140513 0.148387 0.000108 0.000790 4,5,6 0.537200 1.041739
82 JMD_N_TMD_N-Pat...,15)-BROC820101 Polarity Hydrophobicity Hydrophobicity ...on coefficient) Retention Coeff...e et al., 1982) 0.148000 0.067069 -0.067069 0.120409 0.137261 0.000103 0.000768 6,9,12,15 0.106400 0.249766
83 TMD_C_JMD_C-Pat...4,8)-CORJ870108 Polarity Hydrophilicity TOTLS index TOTLS index (Co...e et al., 1987) 0.251000 0.170889 -0.170889 0.167914 0.219014 0.000001 0.000001 24,28 nan nan
84 TMD_C_JMD_C-Seg...2,5)-ANDN920101 Structure-Activity Backbone-dynamics (-CH) α-CH chemical s...kbone-dynamics) alpha-CH chemic...n et al., 1992) 0.147000 0.079575 -0.079575 0.145620 0.160200 0.000115 0.000811 25,26,27,28 0.322000 0.559943
85 TMD-Pattern(C,4...,11)-RICJ880107 Conformation π-helix α-helix Relative prefer...chardson, 1988) 0.146000 0.068957 0.068957 0.131400 0.140413 0.000131 0.000868 20,23,27 0.697200 1.056350
86 TMD_C_JMD_C-Seg...,11)-COHE430101 ASA/Volume Partial specific volume Partial specific volume Partial specifi...n-Edsall, 1943) 0.145000 0.124999 0.124999 0.180151 0.242281 0.000145 0.000912 28,29 1.740800 2.317117
87 JMD_N_TMD_N-Pat...,11)-BIGC670101 ASA/Volume Volume Volume Residue volume (Bigelow, 1967) 0.143000 0.067181 -0.067181 0.141579 0.135502 0.000184 0.001045 5,8,11 0.382000 0.675082
88 JMD_N_TMD_N-Pat...,11)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.142000 0.070908 -0.070908 0.135389 0.144272 0.000190 0.001062 5,8,11 0.384400 0.570074
89 JMD_N_TMD_N-Pat...,14)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.142000 0.058743 -0.058743 0.117342 0.120311 0.000187 0.001056 6,10,14 0.197200 0.344958
90 TMD-Pattern(N,4,7)-QIAN880113 Conformation π-helix α-helix (C-terminal) Weights for alp...ejnowski, 1988) 0.141000 0.070553 -0.070553 0.164819 0.154840 0.000217 0.001151 14,17 0.634800 0.816456
91 JMD_N_TMD_N-Seg...3,6)-VASM830102 Energy Non-bonded energy Free energy (Extended) Relative popula...z et al., 1983) 0.141000 0.067593 0.067593 0.146572 0.140332 0.000225 0.001173 7,8,9,10 0.484800 0.832789
92 TMD_C_JMD_C-Pat...4,8)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.140000 0.066859 0.066859 0.130397 0.147129 0.000229 0.001185 33,37,40 0.334800 0.632640
93 JMD_N_TMD_N-Seg...2,7)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.139000 0.070195 -0.070195 0.113589 0.146944 0.000259 0.001276 3,4,5 0.498800 0.924962
94 TMD_C_JMD_C-Pat...,12)-QIAN880114 Conformation β-sheet (N-term) β-sheet (N-terminal) Weights for bet...ejnowski, 1988) 0.138000 0.070821 -0.070821 0.121293 0.151868 0.000310 0.001400 24,28,32 0.718800 1.295090
95 JMD_N_TMD_N-Seg...2,6)-RACS770103 ASA/Volume Accessible surface area (ASA) Side chain orientation Side chain orie...Scheraga, 1977) 0.138000 0.069674 0.069674 0.151437 0.143090 0.000308 0.001398 4,5,6 0.214400 0.501327
96 TMD_C_JMD_C-Pat...,15)-KOEH090102 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.186000 0.073577 0.073577 0.103456 0.126131 0.000305 0.000305 26,30,33 nan nan
97 TMD_C_JMD_C-Seg...4,8)-MIYS850101 Polarity Hydrophobicity Effective partition energy Effective parti...Jernigan, 1985) 0.215000 0.123198 0.123198 0.127012 0.166940 0.000030 0.000030 28,29,30 nan nan
98 TMD_C_JMD_C-Pat...,10)-QIAN880138 Conformation Coil (C-term) Coil (C-terminal) Weights for coi...ejnowski, 1988) 0.137000 0.065719 -0.065719 0.114425 0.146722 0.000312 0.001404 31,35,39 0.360800 0.882718
99 JMD_N_TMD_N-Seg...7,8)-KARS160114 Shape Side chain length Eccentricity (average) Average weighte...-Knisley, 2016) 0.137000 0.056352 -0.056352 0.122287 0.122893 0.000322 0.001432 16,17 1.170800 1.925978
100 TMD_C_JMD_C-Seg...,14)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.136000 0.080537 -0.080537 0.194254 0.165343 0.000150 0.000932 26,27 0.638000 0.796859
101 TMD_C_JMD_C-Seg...,10)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.136000 0.072267 -0.072267 0.142246 0.173638 0.000352 0.001527 33,34 1.013200 1.315181
102 TMD_C_JMD_C-Pat...5,8)-LINS030107 ASA/Volume Accessible surface area (ASA) ASA (folded protein) % total accessi...s et al., 2003) 0.136000 0.064864 -0.064864 0.078387 0.131618 0.000367 0.001565 25,28 0.842000 0.904274
103 TMD_C_JMD_C-Seg...6,9)-PALJ810113 Conformation α-helix (left-handed) β-turn (α class) Normalized freq...u et al., 1981) 0.135000 0.072992 -0.072992 0.138972 0.165851 0.000412 0.001667 32,33 0.292400 0.546994
104 JMD_N_TMD_N-Pat...6,9)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.135000 0.062723 -0.062723 0.120282 0.141044 0.000396 0.001638 3,6,9 0.696800 1.062095
105 TMD-Pattern(N,1...4,7)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.135000 0.058024 -0.058024 0.115415 0.124556 0.000385 0.001610 11,14,17 0.244400 0.503183
106 JMD_N_TMD_N-Seg...,10)-CRAJ730102 Conformation β-sheet β-sheet Normalized freq...d et al., 1973) 0.134000 0.096792 -0.096792 0.182935 0.210285 0.000461 0.001775 5,6 0.485600 0.792949
107 JMD_N_TMD_N-Pat...,14)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.133000 0.071020 0.071020 0.161372 0.138873 0.000491 0.001836 6,10,14 0.000000 0.000000
108 JMD_N_TMD_N-Seg...7,9)-BURA740102 Conformation β-strand Extended Normalized freq...s et al., 1974) 0.132000 0.056043 0.056043 0.119813 0.123454 0.000562 0.001981 14,15 0.231600 0.356019
109 TMD-Segment(3,8)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.132000 0.055783 -0.055783 0.129933 0.133383 0.000558 0.001977 16,17 0.502400 0.761626
110 TMD-Pattern(N,1...,10)-QIAN880124 Conformation β-sheet (C-term) β-sheet (C-terminal) Weights for bet...ejnowski, 1988) 0.131000 0.069857 0.069857 0.157078 0.159138 0.000580 0.002008 11,14,17,20 0.502800 0.811308
111 TMD_C_JMD_C-Pat...,10)-TANS770106 Conformation β-turn (TM helix) β-turn in double bend Normalized freq...Scheraga, 1977) 0.131000 0.056621 0.056621 0.144377 0.128425 0.000597 0.002043 31,34,38 0.726800 0.885807
112 JMD_N_TMD_N-Pat...,11)-ANDN920101 Structure-Activity Backbone-dynamics (-CH) α-CH chemical s...kbone-dynamics) alpha-CH chemic...n et al., 1992) 0.130000 0.087733 -0.087733 0.180612 0.187328 0.000674 0.002190 10,13,17 0.420000 0.643453
113 JMD_N_TMD_N-Pat...,14)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.130000 0.067433 -0.067433 0.133237 0.146065 0.000642 0.002130 7,11,14 0.306800 0.574245
114 JMD_N_TMD_N-Seg...2,4)-QIAN880114 Conformation β-sheet (N-term) β-sheet (N-terminal) Weights for bet...ejnowski, 1988) 0.130000 0.058210 0.058210 0.127516 0.112411 0.000633 0.002111 6,7,8,9,10 0.140800 0.376807
115 JMD_N_TMD_N-Pat...,11)-VASM830102 Energy Non-bonded energy Free energy (Extended) Relative popula...z et al., 1983) 0.129000 0.077724 0.077724 0.148907 0.164954 0.000708 0.002247 4,8,11 0.160400 0.302939
116 TMD_C_JMD_C-Pat...3,7)-MAXF760105 Conformation α-helix (left-handed) α-helix (left-handed) Normalized freq...Scheraga, 1976) 0.129000 0.071374 0.071374 0.180851 0.152571 0.000727 0.002285 23,27 0.000000 0.000000
117 JMD_N_TMD_N-Seg...3,9)-KOEP990102 Conformation β-sheet (N-term) Extended (designed β-sheet) Beta-sheet prop...l-Levitt, 1999) 0.128000 0.086726 0.086726 0.184173 0.184291 0.000769 0.002364 5,6 0.565600 0.778424
118 TMD_C_JMD_C-Pat...,11)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.128000 0.062708 -0.062708 0.113629 0.123346 0.000767 0.002362 25,28,31 0.407200 0.686822
119 JMD_N_TMD_N-Seg...2,3)-CHAM830105 Shape Side chain length n atoms in side chain (3+1) The number of a...-Charton, 1983) 0.128000 0.057140 -0.057140 0.128493 0.130946 0.000672 0.002187 7,8,9,10,11,12,13 0.121600 0.273037
120 JMD_N_TMD_N-Seg...3,9)-ISOY800102 Conformation β-strand Extended Normalized rela...i et al., 1980) 0.126000 0.079975 -0.079975 0.169167 0.182954 0.000926 0.002636 5,6 1.002000 1.075427

Unimprovable features. When a targeted feature has no accepted swap, on_unimprovable decides its fate: 'keep' (retain the original, default), 'drop' (remove it), or 'drop_if_perf_allows' (remove only if the CV score does not drop). The last feature is never dropped:

df_drop = cpp.simplify(df_feat=df_feat, labels=labels, max_interpret_grade=2,
                       on_unimprovable="drop")
aa.display_df(df_drop, show_shape=True)
DataFrame shape: (78, 15)
  feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions feat_importance feat_importance_std
1 TMD_C_JMD_C-Seg...3,4)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.244000 0.103666 0.103666 0.106692 0.110506 0.000000 0.000000 31,32,33,34,35 0.970400 1.438918
2 TMD_C_JMD_C-Seg...6,9)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.233000 0.137044 0.137044 0.161683 0.176964 0.000000 0.000001 32,33 1.554800 2.109848
3 TMD_C_JMD_C-Seg...6,9)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.223000 0.095071 0.095071 0.114758 0.132829 0.000000 0.000002 32,33 0.000000 0.000000
4 TMD_C_JMD_C-Seg...2,3)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.222000 0.058671 0.058671 0.064895 0.069547 0.000000 0.000001 27,28,29,30,31,32,33 0.000000 0.000000
5 TMD_C_JMD_C-Seg...3,4)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.215000 0.124317 0.124317 0.166309 0.153364 0.000000 0.000004 31,32,33,34,35 1.080400 1.296094
6 TMD_C_JMD_C-Seg...6,9)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.125350 0.125350 0.160819 0.174121 0.000000 0.000005 32,33 1.788800 2.700803
7 TMD_C_JMD_C-Seg...2,3)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.077355 0.077355 0.102965 0.107453 0.000000 0.000005 27,28,29,30,31,32,33 3.048800 3.623912
8 TMD_C_JMD_C-Seg...6,9)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.205000 0.125868 -0.125868 0.172165 0.188333 0.000000 0.000009 32,33 0.000000 0.000000
9 TMD_C_JMD_C-Seg...4,5)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.204000 0.105513 0.105513 0.132849 0.145219 0.000000 0.000009 33,34,35,36 1.992000 2.929460
10 JMD_N_TMD_N-Seg...1,2)-KARP850101 Structure-Activity Flexibility Flexibility (0 ...igid neighbors) Flexibility par...s-Schulz, 1985) 0.196000 0.062671 0.062671 0.083456 0.090427 0.000000 0.000023 1,2,3,4,5,6,7,8,9,10 1.574400 1.835403
11 TMD_C_JMD_C-Seg...4,5)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.193000 0.076770 0.076770 0.092804 0.114150 0.000000 0.000027 33,34,35,36 0.000000 0.000000
12 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.189000 0.125674 0.125674 0.183876 0.218813 0.000001 0.000039 28,29 4.729200 4.776785
13 TMD_C_JMD_C-Seg...6,9)-KOEH090103 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.323000 0.248255 0.248255 0.196374 0.181558 0.000000 0.000000 32,33 nan nan
14 TMD_C_JMD_C-Seg...4,5)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.185000 0.105474 -0.105474 0.157535 0.163039 0.000001 0.000059 33,34,35,36 0.000000 0.000000
15 TMD_C_JMD_C-Seg...6,9)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.185000 0.101798 0.101798 0.145676 0.155096 0.000001 0.000054 32,33 0.000000 0.000000
16 JMD_N_TMD_N-Seg...2,4)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.183000 0.063902 -0.063902 0.090842 0.101427 0.000002 0.000068 6,7,8,9,10 0.823200 1.404583
17 TMD-Pattern(C,3...,15)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.198000 0.109532 0.109532 0.133076 0.159918 0.000122 0.000122 16,20,24,28 nan nan
18 TMD-Pattern(C,3...,15)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.182000 0.096246 0.096246 0.160859 0.159538 0.000002 0.000070 16,20,24,28 0.508400 0.738667
19 JMD_N_TMD_N-Seg...2,4)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.182000 0.066394 -0.066394 0.097857 0.103426 0.000002 0.000070 6,7,8,9,10 0.000000 0.000000
20 TMD_C_JMD_C-Seg...2,3)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.182000 0.063819 0.063819 0.101691 0.105987 0.000002 0.000071 27,28,29,30,31,32,33 0.000000 0.000000
21 TMD_C_JMD_C-Seg...,11)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.181000 0.057287 -0.057287 0.072234 0.106512 0.000002 0.000076 28,29 1.919600 2.094497
22 TMD_C_JMD_C-Pat...,12)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.220000 0.147418 0.147418 0.172594 0.195572 0.000020 0.000020 25,29,32 nan nan
23 JMD_N_TMD_N-Pat...,15)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.179000 0.115042 -0.115042 0.151938 0.189623 0.000002 0.000068 6,9,12,15 0.648400 1.061142
24 TMD-Pattern(C,4,7)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.176000 0.120892 0.120892 0.198986 0.216030 0.000004 0.000113 24,27 0.714800 1.118149
25 TMD_C_JMD_C-Pat...4,8)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.176000 0.087846 0.087846 0.140464 0.157561 0.000004 0.000113 24,28 2.704000 4.076269
26 TMD_C_JMD_C-Pat...,12)-BULH740102 ASA/Volume Partial specific volume Partial specific volume Apparent partia...l-Breese, 1974) 0.262000 0.153437 0.153437 0.130978 0.164028 0.000000 0.000000 21,24,28,32 nan nan
27 TMD-Pattern(C,4,7)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.176000 0.056675 -0.056675 0.099355 0.114698 0.000004 0.000113 24,27 0.372000 0.882270
28 TMD_C_JMD_C-Pat...,11)-QIAN880122 Conformation β-strand β-sheet Weights for bet...ejnowski, 1988) 0.173000 0.056328 0.056328 0.067428 0.094795 0.000006 0.000147 25,28,31 0.483200 0.913371
29 JMD_N_TMD_N-Per...3,2)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.172000 0.087470 -0.087470 0.135114 0.144731 0.000005 0.000137 2,5,8,11,14,17,20 0.444000 0.721620
30 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan
31 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan
32 TMD_C_JMD_C-Seg...2,3)-KOEH090106 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.224000 0.088025 0.088025 0.095746 0.124611 0.000014 0.000014 27,28,29,30,31,32,33 nan nan
33 TMD_C_JMD_C-Seg...4,5)-OOBM770101 Polarity Hydrophilicity Non-bonded energy per atom Average non-bon...take-Ooi, 1977) 0.277000 0.217063 0.217063 0.180330 0.208994 0.000000 0.000000 33,34,35,36 nan nan
34 TMD_C_JMD_C-Pat...,14)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.168000 0.086323 0.086323 0.121405 0.138577 0.000000 0.000030 30,34 0.140400 0.391229
35 TMD_C_JMD_C-Seg...4,5)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.167000 0.080568 0.080568 0.128898 0.128726 0.000011 0.000218 33,34,35,36 1.299200 2.159535
36 TMD_C_JMD_C-Seg...4,5)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.166000 0.081797 0.081797 0.121170 0.149555 0.000013 0.000239 33,34,35,36 1.295200 2.225137
37 TMD-PeriodicPat...3,1)-OOBM850101 Structure-Activity Stability Stability (extended-coil) Optimized beta-...e et al., 1985) 0.197000 0.046799 0.046799 0.052251 0.070467 0.000133 0.000133 12,15,18,21,24,27,30 nan nan
38 TMD-Pattern(C,4,7)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.300000 0.236619 0.236619 0.165353 0.219458 0.000000 0.000000 24,27 nan nan
39 TMD_C_JMD_C-Pat...,11)-EISD860101 Polarity Hydrophobicity Solvation free energy Solvation free ...cLachlan, 1986) 0.162000 0.083936 -0.083936 0.143338 0.147948 0.000021 0.000304 30,33,37 0.330400 0.377566
40 TMD_C_JMD_C-Pat...5,8)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.162000 0.070292 -0.070292 0.096915 0.128362 0.000020 0.000302 21,25,28 1.528400 2.418922
41 TMD-Pattern(C,4...,11)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.161000 0.068424 -0.068424 0.096915 0.126975 0.000024 0.000332 20,24,27 0.000000 0.000000
42 JMD_N_TMD_N-Seg...2,4)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.161000 0.058976 -0.058976 0.096823 0.114647 0.000025 0.000335 6,7,8,9,10 0.000000 0.000000
43 TMD-Pattern(N,1...4,7)-AURR980113 Conformation α-helix α-helix Normalized posi...ora-Rose, 1998) 0.218000 0.114519 -0.114519 0.151936 0.147686 0.000024 0.000024 11,14,17 nan nan
44 JMD_N_TMD_N-Seg...,13)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.157000 0.127895 -0.127895 0.151304 0.258491 0.000035 0.000420 5,6 0.833200 1.360696
45 TMD_C_JMD_C-Pat...4,8)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.155000 0.110281 0.110281 0.178578 0.202098 0.000046 0.000486 33,37,40 0.272400 0.623809
46 JMD_N_TMD_N-Pat...,12)-AURR980113 Conformation α-helix α-helix Normalized posi...ora-Rose, 1998) 0.208000 0.105550 -0.105550 0.151448 0.143693 0.000055 0.000055 9,12,15 nan nan
47 JMD_N_TMD_N-Pat...,15)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.155000 0.059593 -0.059593 0.104862 0.110749 0.000050 0.000508 6,9,12,15 0.482000 0.672000
48 JMD_N_TMD_N-Pat...,11)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.154000 0.092099 -0.092099 0.142836 0.171547 0.000052 0.000520 4,7,11 1.065200 1.916900
49 TMD-Pattern(C,5...,12)-FAUJ880107 Structure-Activity Stability α-CH chemical s...kbone-dynamics) N.m.r. chemical...e et al., 1988) 0.123000 0.065435 -0.065435 0.173044 0.140726 0.017378 0.017378 19,22,26 nan nan
50 TMD_C_JMD_C-Pat...,11)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.153000 0.085041 -0.085041 0.135864 0.161279 0.000059 0.000561 30,33,37 0.473600 0.930690
51 TMD_C_JMD_C-Pat...,15)-LINS030106 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded proteins) Hydrophilic med...s et al., 2003) 0.151000 0.071208 0.071208 0.136279 0.155749 0.000078 0.000657 26,30,33 0.326400 0.451202
52 TMD_C_JMD_C-Pat...,10)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.149000 0.073526 0.073526 0.133612 0.157088 0.000090 0.000714 31,34,38 2.050800 2.338278
53 JMD_N_TMD_N-Seg...2,6)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.148000 0.076361 -0.076361 0.140513 0.148387 0.000108 0.000790 4,5,6 0.537200 1.041739
54 JMD_N_TMD_N-Pat...,15)-BROC820101 Polarity Hydrophobicity Hydrophobicity ...on coefficient) Retention Coeff...e et al., 1982) 0.148000 0.067069 -0.067069 0.120409 0.137261 0.000103 0.000768 6,9,12,15 0.106400 0.249766
55 TMD_C_JMD_C-Pat...4,8)-CORJ870108 Polarity Hydrophilicity TOTLS index TOTLS index (Co...e et al., 1987) 0.251000 0.170889 -0.170889 0.167914 0.219014 0.000001 0.000001 24,28 nan nan
56 TMD_C_JMD_C-Seg...2,5)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.260000 0.141016 0.141016 0.107804 0.160336 0.000000 0.000000 25,26,27,28 nan nan
57 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.333000 0.246278 0.246278 0.183876 0.212529 0.000000 0.000000 28,29 nan nan
58 JMD_N_TMD_N-Pat...,11)-BIGC670101 ASA/Volume Volume Volume Residue volume (Bigelow, 1967) 0.143000 0.067181 -0.067181 0.141579 0.135502 0.000184 0.001045 5,8,11 0.382000 0.675082
59 JMD_N_TMD_N-Pat...,11)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.142000 0.070908 -0.070908 0.135389 0.144272 0.000190 0.001062 5,8,11 0.384400 0.570074
60 JMD_N_TMD_N-Pat...,14)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.142000 0.058743 -0.058743 0.117342 0.120311 0.000187 0.001056 6,10,14 0.197200 0.344958
61 JMD_N_TMD_N-Seg...3,6)-PALJ810111 Conformation β-sheet β-sheet Normalized freq...u et al., 1981) 0.108000 0.053468 -0.053468 0.129405 0.141614 0.035900 0.035900 7,8,9,10 nan nan
62 TMD_C_JMD_C-Pat...4,8)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.140000 0.066859 0.066859 0.130397 0.147129 0.000229 0.001185 33,37,40 0.334800 0.632640
63 JMD_N_TMD_N-Seg...2,7)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.139000 0.070195 -0.070195 0.113589 0.146944 0.000259 0.001276 3,4,5 0.498800 0.924962
64 JMD_N_TMD_N-Seg...2,6)-RACS770103 ASA/Volume Accessible surface area (ASA) Side chain orientation Side chain orie...Scheraga, 1977) 0.138000 0.069674 0.069674 0.151437 0.143090 0.000308 0.001398 4,5,6 0.214400 0.501327
65 TMD_C_JMD_C-Seg...4,8)-MIYS850101 Polarity Hydrophobicity Effective partition energy Effective parti...Jernigan, 1985) 0.215000 0.123198 0.123198 0.127012 0.166940 0.000030 0.000030 28,29,30 nan nan
66 JMD_N_TMD_N-Seg...7,8)-KARS160114 Shape Side chain length Eccentricity (average) Average weighte...-Knisley, 2016) 0.137000 0.056352 -0.056352 0.122287 0.122893 0.000322 0.001432 16,17 1.170800 1.925978
67 TMD_C_JMD_C-Seg...,10)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.136000 0.072267 -0.072267 0.142246 0.173638 0.000352 0.001527 33,34 1.013200 1.315181
68 TMD_C_JMD_C-Pat...5,8)-LINS030107 ASA/Volume Accessible surface area (ASA) ASA (folded protein) % total accessi...s et al., 2003) 0.136000 0.064864 -0.064864 0.078387 0.131618 0.000367 0.001565 25,28 0.842000 0.904274
69 TMD_C_JMD_C-Seg...6,9)-PALJ810105 Conformation β-turn β-turn Normalized freq...u et al., 1981) 0.087000 0.050175 -0.050175 0.089871 0.163195 0.091373 0.091373 32,33 nan nan
70 JMD_N_TMD_N-Pat...6,9)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.135000 0.062723 -0.062723 0.120282 0.141044 0.000396 0.001638 3,6,9 0.696800 1.062095
71 TMD-Pattern(N,1...4,7)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.135000 0.058024 -0.058024 0.115415 0.124556 0.000385 0.001610 11,14,17 0.244400 0.503183
72 JMD_N_TMD_N-Seg...,10)-CRAJ730102 Conformation β-sheet β-sheet Normalized freq...d et al., 1973) 0.134000 0.096792 -0.096792 0.182935 0.210285 0.000461 0.001775 5,6 0.485600 0.792949
73 JMD_N_TMD_N-Pat...,14)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.133000 0.071020 0.071020 0.161372 0.138873 0.000491 0.001836 6,10,14 0.000000 0.000000
74 JMD_N_TMD_N-Seg...7,9)-BURA740102 Conformation β-strand Extended Normalized freq...s et al., 1974) 0.132000 0.056043 0.056043 0.119813 0.123454 0.000562 0.001981 14,15 0.231600 0.356019
75 JMD_N_TMD_N-Pat...,11)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.178000 0.109835 0.109835 0.133383 0.179887 0.000587 0.000587 10,13,17 nan nan
76 JMD_N_TMD_N-Pat...,14)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.130000 0.067433 -0.067433 0.133237 0.146065 0.000642 0.002130 7,11,14 0.306800 0.574245
77 JMD_N_TMD_N-Seg...2,3)-CHAM830105 Shape Side chain length n atoms in side chain (3+1) The number of a...-Charton, 1983) 0.128000 0.057140 -0.057140 0.128493 0.130946 0.000672 0.002187 7,8,9,10,11,12,13 0.121600 0.273037
78 JMD_N_TMD_N-Seg...3,9)-ISOY800102 Conformation β-strand Extended Normalized rela...i et al., 1980) 0.126000 0.079975 -0.079975 0.169167 0.182954 0.000926 0.002636 5,6 1.002000 1.075427

Redundancy reduction removes only swapped features that became redundant — original features are always protected. redundancy_tie_break chooses the keeper of a redundant pair ('interpretability' or 'performance'); max_cor and max_overlap are the scale-correlation and position-overlap thresholds; and check_cat restricts comparisons to the same scale category:

df_red = cpp.simplify(df_feat=df_feat, labels=labels, max_interpret_grade=2,
                      redundancy_tie_break="performance", max_cor=0.5, max_overlap=0.5,
                      check_cat=True)
aa.display_df(df_red, show_shape=True)
DataFrame shape: (96, 15)
  feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions feat_importance feat_importance_std
1 TMD_C_JMD_C-Seg...3,4)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.244000 0.103666 0.103666 0.106692 0.110506 0.000000 0.000000 31,32,33,34,35 0.970400 1.438918
2 TMD_C_JMD_C-Seg...3,4)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.243000 0.085064 0.085064 0.098774 0.096946 0.000000 0.000000 31,32,33,34,35 0.000000 0.000000
3 TMD_C_JMD_C-Seg...6,9)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.233000 0.137044 0.137044 0.161683 0.176964 0.000000 0.000001 32,33 1.554800 2.109848
4 TMD_C_JMD_C-Seg...6,9)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.223000 0.095071 0.095071 0.114758 0.132829 0.000000 0.000002 32,33 0.000000 0.000000
5 TMD_C_JMD_C-Seg...2,3)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.222000 0.058671 0.058671 0.064895 0.069547 0.000000 0.000001 27,28,29,30,31,32,33 0.000000 0.000000
6 TMD_C_JMD_C-Seg...3,4)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.215000 0.124317 0.124317 0.166309 0.153364 0.000000 0.000004 31,32,33,34,35 1.080400 1.296094
7 TMD_C_JMD_C-Seg...,10)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.212000 0.141305 -0.141305 0.168603 0.217235 0.000000 0.000005 33,34 1.747200 2.150664
8 TMD_C_JMD_C-Seg...6,9)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.125350 0.125350 0.160819 0.174121 0.000000 0.000005 32,33 1.788800 2.700803
9 TMD_C_JMD_C-Seg...2,3)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.077355 0.077355 0.102965 0.107453 0.000000 0.000005 27,28,29,30,31,32,33 3.048800 3.623912
10 TMD_C_JMD_C-Seg...6,9)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.205000 0.125868 -0.125868 0.172165 0.188333 0.000000 0.000009 32,33 0.000000 0.000000
11 TMD_C_JMD_C-Seg...4,5)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.204000 0.105513 0.105513 0.132849 0.145219 0.000000 0.000009 33,34,35,36 1.992000 2.929460
12 JMD_N_TMD_N-Seg...1,2)-KARP850101 Structure-Activity Flexibility Flexibility (0 ...igid neighbors) Flexibility par...s-Schulz, 1985) 0.196000 0.062671 0.062671 0.083456 0.090427 0.000000 0.000023 1,2,3,4,5,6,7,8,9,10 1.574400 1.835403
13 TMD_C_JMD_C-Seg...4,5)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.193000 0.076770 0.076770 0.092804 0.114150 0.000000 0.000027 33,34,35,36 0.000000 0.000000
14 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.189000 0.125674 0.125674 0.183876 0.218813 0.000001 0.000039 28,29 4.729200 4.776785
15 TMD_C_JMD_C-Seg...6,9)-KOEH090103 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.323000 0.248255 0.248255 0.196374 0.181558 0.000000 0.000000 32,33 nan nan
16 TMD_C_JMD_C-Seg...4,5)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.185000 0.105474 -0.105474 0.157535 0.163039 0.000001 0.000059 33,34,35,36 0.000000 0.000000
17 TMD_C_JMD_C-Seg...6,9)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.185000 0.101798 0.101798 0.145676 0.155096 0.000001 0.000054 32,33 0.000000 0.000000
18 TMD_C_JMD_C-Pat...,15)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.184000 0.062096 0.062096 0.078809 0.091271 0.000000 0.000017 26,30,33 0.147200 0.345306
19 JMD_N_TMD_N-Seg...2,4)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.183000 0.063902 -0.063902 0.090842 0.101427 0.000002 0.000068 6,7,8,9,10 0.823200 1.404583
20 TMD-Pattern(C,3...,15)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.198000 0.109532 0.109532 0.133076 0.159918 0.000122 0.000122 16,20,24,28 nan nan
21 TMD-Pattern(C,3...,15)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.182000 0.096246 0.096246 0.160859 0.159538 0.000002 0.000070 16,20,24,28 0.508400 0.738667
22 JMD_N_TMD_N-Seg...2,4)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.182000 0.066394 -0.066394 0.097857 0.103426 0.000002 0.000070 6,7,8,9,10 0.000000 0.000000
23 TMD_C_JMD_C-Seg...2,3)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.182000 0.063819 0.063819 0.101691 0.105987 0.000002 0.000071 27,28,29,30,31,32,33 0.000000 0.000000
24 TMD_C_JMD_C-Seg...,11)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.181000 0.057287 -0.057287 0.072234 0.106512 0.000002 0.000076 28,29 1.919600 2.094497
25 TMD_C_JMD_C-Pat...,12)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.220000 0.147418 0.147418 0.172594 0.195572 0.000020 0.000020 25,29,32 nan nan
26 TMD-PeriodicPat...3,4)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.180000 0.069277 -0.069277 0.094949 0.119524 0.000002 0.000082 13,16,20,23,27 1.818000 2.308293
27 JMD_N_TMD_N-Pat...,15)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.179000 0.115042 -0.115042 0.151938 0.189623 0.000002 0.000068 6,9,12,15 0.648400 1.061142
28 TMD-Pattern(C,4,7)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.176000 0.120892 0.120892 0.198986 0.216030 0.000004 0.000113 24,27 0.714800 1.118149
29 TMD_C_JMD_C-Pat...4,8)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.176000 0.087846 0.087846 0.140464 0.157561 0.000004 0.000113 24,28 2.704000 4.076269
30 TMD_C_JMD_C-Pat...,12)-BULH740102 ASA/Volume Partial specific volume Partial specific volume Apparent partia...l-Breese, 1974) 0.262000 0.153437 0.153437 0.130978 0.164028 0.000000 0.000000 21,24,28,32 nan nan
31 TMD-Pattern(C,4,7)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.176000 0.056675 -0.056675 0.099355 0.114698 0.000004 0.000113 24,27 0.372000 0.882270
32 TMD_C_JMD_C-Seg...2,3)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.175000 0.055597 -0.055597 0.089100 0.105827 0.000005 0.000126 27,28,29,30,31,32,33 0.664000 1.089536
33 TMD_C_JMD_C-Pat...,11)-QIAN880122 Conformation β-strand β-sheet Weights for bet...ejnowski, 1988) 0.173000 0.056328 0.056328 0.067428 0.094795 0.000006 0.000147 25,28,31 0.483200 0.913371
34 JMD_N_TMD_N-Per...3,2)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.172000 0.087470 -0.087470 0.135114 0.144731 0.000005 0.000137 2,5,8,11,14,17,20 0.444000 0.721620
35 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan
36 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan
37 TMD_C_JMD_C-Seg...2,3)-KOEH090106 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.224000 0.088025 0.088025 0.095746 0.124611 0.000014 0.000014 27,28,29,30,31,32,33 nan nan
38 TMD_C_JMD_C-Seg...4,5)-OOBM770101 Polarity Hydrophilicity Non-bonded energy per atom Average non-bon...take-Ooi, 1977) 0.277000 0.217063 0.217063 0.180330 0.208994 0.000000 0.000000 33,34,35,36 nan nan
39 TMD_C_JMD_C-Pat...,14)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.168000 0.086323 0.086323 0.121405 0.138577 0.000000 0.000030 30,34 0.140400 0.391229
40 TMD_C_JMD_C-Seg...4,5)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.167000 0.080568 0.080568 0.128898 0.128726 0.000011 0.000218 33,34,35,36 1.299200 2.159535
41 TMD_C_JMD_C-Seg...4,5)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.166000 0.081797 0.081797 0.121170 0.149555 0.000013 0.000239 33,34,35,36 1.295200 2.225137
42 TMD-Pattern(C,4,7)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.121210 -0.121210 0.143560 0.207767 0.000015 0.000254 24,27 1.302000 1.466618
43 TMD_C_JMD_C-Pat...5,8)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.119568 -0.119568 0.143560 0.205817 0.000014 0.000253 25,28 0.000000 0.000000
44 TMD_C_JMD_C-Seg...5,7)-TANS770108 Conformation β/α-bridge β/α-bridge Normalized freq...Scheraga, 1977) 0.164000 0.079708 0.079708 0.135324 0.137910 0.000016 0.000271 32,33,34 0.462400 0.706967
45 TMD-PeriodicPat...3,1)-OOBM850101 Structure-Activity Stability Stability (extended-coil) Optimized beta-...e et al., 1985) 0.197000 0.046799 0.046799 0.052251 0.070467 0.000133 0.000133 12,15,18,21,24,27,30 nan nan
46 TMD-Pattern(C,4,7)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.300000 0.236619 0.236619 0.165353 0.219458 0.000000 0.000000 24,27 nan nan
47 TMD_C_JMD_C-Pat...,11)-EISD860101 Polarity Hydrophobicity Solvation free energy Solvation free ...cLachlan, 1986) 0.162000 0.083936 -0.083936 0.143338 0.147948 0.000021 0.000304 30,33,37 0.330400 0.377566
48 TMD_C_JMD_C-Pat...5,8)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.162000 0.070292 -0.070292 0.096915 0.128362 0.000020 0.000302 21,25,28 1.528400 2.418922
49 TMD-Pattern(C,4...,11)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.161000 0.068424 -0.068424 0.096915 0.126975 0.000024 0.000332 20,24,27 0.000000 0.000000
50 JMD_N_TMD_N-Seg...2,4)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.161000 0.058976 -0.058976 0.096823 0.114647 0.000025 0.000335 6,7,8,9,10 0.000000 0.000000
51 JMD_N_TMD_N-Pat...,11)-PRAM820103 Shape Shape and Surface Correlation coe...t in regression Correlation coe...nnuswamy, 1982) 0.161000 0.057828 0.057828 0.088362 0.106085 0.000024 0.000328 1,5,8,11 1.304400 1.657101
52 TMD_C_JMD_C-Seg...5,7)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.160000 0.059281 -0.059281 0.100693 0.120806 0.000027 0.000359 32,33,34 0.757200 1.471249
53 TMD_C_JMD_C-Pat...4,8)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.159000 0.103808 0.103808 0.140977 0.179008 0.000014 0.000248 33,37 0.233200 0.593921
54 JMD_N_TMD_N-Seg...,13)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.157000 0.127895 -0.127895 0.151304 0.258491 0.000035 0.000420 5,6 0.833200 1.360696
55 TMD_C_JMD_C-Pat...4,8)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.155000 0.110281 0.110281 0.178578 0.202098 0.000046 0.000486 33,37,40 0.272400 0.623809
56 JMD_N_TMD_N-Pat...,12)-AURR980113 Conformation α-helix α-helix Normalized posi...ora-Rose, 1998) 0.208000 0.105550 -0.105550 0.151448 0.143693 0.000055 0.000055 9,12,15 nan nan
57 JMD_N_TMD_N-Pat...,15)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.155000 0.059593 -0.059593 0.104862 0.110749 0.000050 0.000508 6,9,12,15 0.482000 0.672000
58 JMD_N_TMD_N-Pat...,11)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.154000 0.092099 -0.092099 0.142836 0.171547 0.000052 0.000520 4,7,11 1.065200 1.916900
59 TMD_C_JMD_C-Seg...,10)-CHAM820102 Polarity Hydrophobicity (interface) Free energy (interface) Free energy of ...-Charton, 1982) 0.154000 0.082300 -0.082300 0.136264 0.177551 0.000050 0.000508 33,34 0.366800 0.691767
60 TMD-Pattern(C,5...,12)-FAUJ880107 Structure-Activity Stability α-CH chemical s...kbone-dynamics) N.m.r. chemical...e et al., 1988) 0.123000 0.065435 -0.065435 0.173044 0.140726 0.017378 0.017378 19,22,26 nan nan
61 TMD_C_JMD_C-Pat...,11)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.153000 0.085041 -0.085041 0.135864 0.161279 0.000059 0.000561 30,33,37 0.473600 0.930690
62 TMD_C_JMD_C-Pat...,15)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.153000 0.069595 -0.069595 0.107314 0.134698 0.000060 0.000566 26,29,33 0.770800 1.299178
63 TMD_C_JMD_C-Pat...,15)-LINS030106 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded proteins) Hydrophilic med...s et al., 2003) 0.151000 0.071208 0.071208 0.136279 0.155749 0.000078 0.000657 26,30,33 0.326400 0.451202
64 TMD-Pattern(C,3...,14)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.150000 0.056439 -0.056439 0.094520 0.108682 0.000084 0.000685 17,20,24,28 0.684400 0.941892
65 TMD_C_JMD_C-Pat...,10)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.149000 0.073526 0.073526 0.133612 0.157088 0.000090 0.000714 31,34,38 2.050800 2.338278
66 JMD_N_TMD_N-Seg...2,6)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.148000 0.076361 -0.076361 0.140513 0.148387 0.000108 0.000790 4,5,6 0.537200 1.041739
67 JMD_N_TMD_N-Pat...,15)-BROC820101 Polarity Hydrophobicity Hydrophobicity ...on coefficient) Retention Coeff...e et al., 1982) 0.148000 0.067069 -0.067069 0.120409 0.137261 0.000103 0.000768 6,9,12,15 0.106400 0.249766
68 TMD_C_JMD_C-Pat...4,8)-CORJ870108 Polarity Hydrophilicity TOTLS index TOTLS index (Co...e et al., 1987) 0.251000 0.170889 -0.170889 0.167914 0.219014 0.000001 0.000001 24,28 nan nan
69 TMD_C_JMD_C-Seg...2,5)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.260000 0.141016 0.141016 0.107804 0.160336 0.000000 0.000000 25,26,27,28 nan nan
70 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.333000 0.246278 0.246278 0.183876 0.212529 0.000000 0.000000 28,29 nan nan
71 JMD_N_TMD_N-Pat...,11)-BIGC670101 ASA/Volume Volume Volume Residue volume (Bigelow, 1967) 0.143000 0.067181 -0.067181 0.141579 0.135502 0.000184 0.001045 5,8,11 0.382000 0.675082
72 JMD_N_TMD_N-Pat...,11)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.142000 0.070908 -0.070908 0.135389 0.144272 0.000190 0.001062 5,8,11 0.384400 0.570074
73 JMD_N_TMD_N-Pat...,14)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.142000 0.058743 -0.058743 0.117342 0.120311 0.000187 0.001056 6,10,14 0.197200 0.344958
74 TMD-Pattern(N,4,7)-QIAN880113 Conformation π-helix α-helix (C-terminal) Weights for alp...ejnowski, 1988) 0.141000 0.070553 -0.070553 0.164819 0.154840 0.000217 0.001151 14,17 0.634800 0.816456
75 JMD_N_TMD_N-Seg...3,6)-PALJ810111 Conformation β-sheet β-sheet Normalized freq...u et al., 1981) 0.108000 0.053468 -0.053468 0.129405 0.141614 0.035900 0.035900 7,8,9,10 nan nan
76 TMD_C_JMD_C-Pat...4,8)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.140000 0.066859 0.066859 0.130397 0.147129 0.000229 0.001185 33,37,40 0.334800 0.632640
77 JMD_N_TMD_N-Seg...2,7)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.139000 0.070195 -0.070195 0.113589 0.146944 0.000259 0.001276 3,4,5 0.498800 0.924962
78 JMD_N_TMD_N-Seg...2,6)-RACS770103 ASA/Volume Accessible surface area (ASA) Side chain orientation Side chain orie...Scheraga, 1977) 0.138000 0.069674 0.069674 0.151437 0.143090 0.000308 0.001398 4,5,6 0.214400 0.501327
79 TMD_C_JMD_C-Seg...4,8)-MIYS850101 Polarity Hydrophobicity Effective partition energy Effective parti...Jernigan, 1985) 0.215000 0.123198 0.123198 0.127012 0.166940 0.000030 0.000030 28,29,30 nan nan
80 JMD_N_TMD_N-Seg...7,8)-KARS160114 Shape Side chain length Eccentricity (average) Average weighte...-Knisley, 2016) 0.137000 0.056352 -0.056352 0.122287 0.122893 0.000322 0.001432 16,17 1.170800 1.925978
81 TMD_C_JMD_C-Seg...,14)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.136000 0.080537 -0.080537 0.194254 0.165343 0.000150 0.000932 26,27 0.638000 0.796859
82 TMD_C_JMD_C-Seg...,10)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.136000 0.072267 -0.072267 0.142246 0.173638 0.000352 0.001527 33,34 1.013200 1.315181
83 TMD_C_JMD_C-Pat...5,8)-LINS030107 ASA/Volume Accessible surface area (ASA) ASA (folded protein) % total accessi...s et al., 2003) 0.136000 0.064864 -0.064864 0.078387 0.131618 0.000367 0.001565 25,28 0.842000 0.904274
84 JMD_N_TMD_N-Pat...6,9)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.135000 0.062723 -0.062723 0.120282 0.141044 0.000396 0.001638 3,6,9 0.696800 1.062095
85 TMD-Pattern(N,1...4,7)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.135000 0.058024 -0.058024 0.115415 0.124556 0.000385 0.001610 11,14,17 0.244400 0.503183
86 JMD_N_TMD_N-Seg...,10)-CRAJ730102 Conformation β-sheet β-sheet Normalized freq...d et al., 1973) 0.134000 0.096792 -0.096792 0.182935 0.210285 0.000461 0.001775 5,6 0.485600 0.792949
87 JMD_N_TMD_N-Pat...,14)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.133000 0.071020 0.071020 0.161372 0.138873 0.000491 0.001836 6,10,14 0.000000 0.000000
88 JMD_N_TMD_N-Seg...7,9)-BURA740102 Conformation β-strand Extended Normalized freq...s et al., 1974) 0.132000 0.056043 0.056043 0.119813 0.123454 0.000562 0.001981 14,15 0.231600 0.356019
89 TMD-Segment(3,8)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.132000 0.055783 -0.055783 0.129933 0.133383 0.000558 0.001977 16,17 0.502400 0.761626
90 TMD-Pattern(N,1...,10)-QIAN880124 Conformation β-sheet (C-term) β-sheet (C-terminal) Weights for bet...ejnowski, 1988) 0.131000 0.069857 0.069857 0.157078 0.159138 0.000580 0.002008 11,14,17,20 0.502800 0.811308
91 JMD_N_TMD_N-Pat...,11)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.178000 0.109835 0.109835 0.133383 0.179887 0.000587 0.000587 10,13,17 nan nan
92 JMD_N_TMD_N-Pat...,14)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.130000 0.067433 -0.067433 0.133237 0.146065 0.000642 0.002130 7,11,14 0.306800 0.574245
93 TMD_C_JMD_C-Pat...3,7)-MAXF760105 Conformation α-helix (left-handed) α-helix (left-handed) Normalized freq...Scheraga, 1976) 0.129000 0.071374 0.071374 0.180851 0.152571 0.000727 0.002285 23,27 0.000000 0.000000
94 TMD_C_JMD_C-Pat...,11)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.128000 0.062708 -0.062708 0.113629 0.123346 0.000767 0.002362 25,28,31 0.407200 0.686822
95 JMD_N_TMD_N-Seg...2,3)-CHAM830105 Shape Side chain length n atoms in side chain (3+1) The number of a...-Charton, 1983) 0.128000 0.057140 -0.057140 0.128493 0.130946 0.000672 0.002187 7,8,9,10,11,12,13 0.121600 0.273037
96 JMD_N_TMD_N-Seg...3,9)-ISOY800102 Conformation β-strand Extended Normalized rela...i et al., 1980) 0.126000 0.079975 -0.079975 0.169167 0.182954 0.000926 0.002636 5,6 1.002000 1.075427

label_test / label_ref name the test and reference classes in labels (default 1 / 0); set them to match a different encoding:

df_lab = cpp.simplify(df_feat=df_feat, labels=labels, max_interpret_grade=2,
                      label_test=1, label_ref=0)
aa.display_df(df_lab, show_shape=True)
DataFrame shape: (96, 15)
  feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions feat_importance feat_importance_std
1 TMD_C_JMD_C-Seg...3,4)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.244000 0.103666 0.103666 0.106692 0.110506 0.000000 0.000000 31,32,33,34,35 0.970400 1.438918
2 TMD_C_JMD_C-Seg...3,4)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.243000 0.085064 0.085064 0.098774 0.096946 0.000000 0.000000 31,32,33,34,35 0.000000 0.000000
3 TMD_C_JMD_C-Seg...6,9)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.233000 0.137044 0.137044 0.161683 0.176964 0.000000 0.000001 32,33 1.554800 2.109848
4 TMD_C_JMD_C-Seg...6,9)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.223000 0.095071 0.095071 0.114758 0.132829 0.000000 0.000002 32,33 0.000000 0.000000
5 TMD_C_JMD_C-Seg...2,3)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.222000 0.058671 0.058671 0.064895 0.069547 0.000000 0.000001 27,28,29,30,31,32,33 0.000000 0.000000
6 TMD_C_JMD_C-Seg...3,4)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.215000 0.124317 0.124317 0.166309 0.153364 0.000000 0.000004 31,32,33,34,35 1.080400 1.296094
7 TMD_C_JMD_C-Seg...,10)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.212000 0.141305 -0.141305 0.168603 0.217235 0.000000 0.000005 33,34 1.747200 2.150664
8 TMD_C_JMD_C-Seg...6,9)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.125350 0.125350 0.160819 0.174121 0.000000 0.000005 32,33 1.788800 2.700803
9 TMD_C_JMD_C-Seg...2,3)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.077355 0.077355 0.102965 0.107453 0.000000 0.000005 27,28,29,30,31,32,33 3.048800 3.623912
10 TMD_C_JMD_C-Seg...6,9)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.205000 0.125868 -0.125868 0.172165 0.188333 0.000000 0.000009 32,33 0.000000 0.000000
11 TMD_C_JMD_C-Seg...4,5)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.204000 0.105513 0.105513 0.132849 0.145219 0.000000 0.000009 33,34,35,36 1.992000 2.929460
12 JMD_N_TMD_N-Seg...1,2)-KARP850101 Structure-Activity Flexibility Flexibility (0 ...igid neighbors) Flexibility par...s-Schulz, 1985) 0.196000 0.062671 0.062671 0.083456 0.090427 0.000000 0.000023 1,2,3,4,5,6,7,8,9,10 1.574400 1.835403
13 TMD_C_JMD_C-Seg...4,5)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.193000 0.076770 0.076770 0.092804 0.114150 0.000000 0.000027 33,34,35,36 0.000000 0.000000
14 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.189000 0.125674 0.125674 0.183876 0.218813 0.000001 0.000039 28,29 4.729200 4.776785
15 TMD_C_JMD_C-Seg...6,9)-KOEH090103 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.323000 0.248255 0.248255 0.196374 0.181558 0.000000 0.000000 32,33 nan nan
16 TMD_C_JMD_C-Seg...4,5)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.185000 0.105474 -0.105474 0.157535 0.163039 0.000001 0.000059 33,34,35,36 0.000000 0.000000
17 TMD_C_JMD_C-Seg...6,9)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.185000 0.101798 0.101798 0.145676 0.155096 0.000001 0.000054 32,33 0.000000 0.000000
18 TMD_C_JMD_C-Pat...,15)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.184000 0.062096 0.062096 0.078809 0.091271 0.000000 0.000017 26,30,33 0.147200 0.345306
19 JMD_N_TMD_N-Seg...2,4)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.183000 0.063902 -0.063902 0.090842 0.101427 0.000002 0.000068 6,7,8,9,10 0.823200 1.404583
20 TMD-Pattern(C,3...,15)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.198000 0.109532 0.109532 0.133076 0.159918 0.000122 0.000122 16,20,24,28 nan nan
21 TMD-Pattern(C,3...,15)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.182000 0.096246 0.096246 0.160859 0.159538 0.000002 0.000070 16,20,24,28 0.508400 0.738667
22 JMD_N_TMD_N-Seg...2,4)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.182000 0.066394 -0.066394 0.097857 0.103426 0.000002 0.000070 6,7,8,9,10 0.000000 0.000000
23 TMD_C_JMD_C-Seg...2,3)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.182000 0.063819 0.063819 0.101691 0.105987 0.000002 0.000071 27,28,29,30,31,32,33 0.000000 0.000000
24 TMD_C_JMD_C-Seg...,11)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.181000 0.057287 -0.057287 0.072234 0.106512 0.000002 0.000076 28,29 1.919600 2.094497
25 TMD_C_JMD_C-Pat...,12)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.220000 0.147418 0.147418 0.172594 0.195572 0.000020 0.000020 25,29,32 nan nan
26 TMD-PeriodicPat...3,4)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.180000 0.069277 -0.069277 0.094949 0.119524 0.000002 0.000082 13,16,20,23,27 1.818000 2.308293
27 JMD_N_TMD_N-Pat...,15)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.179000 0.115042 -0.115042 0.151938 0.189623 0.000002 0.000068 6,9,12,15 0.648400 1.061142
28 TMD-Pattern(C,4,7)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.176000 0.120892 0.120892 0.198986 0.216030 0.000004 0.000113 24,27 0.714800 1.118149
29 TMD_C_JMD_C-Pat...4,8)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.176000 0.087846 0.087846 0.140464 0.157561 0.000004 0.000113 24,28 2.704000 4.076269
30 TMD_C_JMD_C-Pat...,12)-BULH740102 ASA/Volume Partial specific volume Partial specific volume Apparent partia...l-Breese, 1974) 0.262000 0.153437 0.153437 0.130978 0.164028 0.000000 0.000000 21,24,28,32 nan nan
31 TMD-Pattern(C,4,7)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.176000 0.056675 -0.056675 0.099355 0.114698 0.000004 0.000113 24,27 0.372000 0.882270
32 TMD_C_JMD_C-Seg...2,3)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.175000 0.055597 -0.055597 0.089100 0.105827 0.000005 0.000126 27,28,29,30,31,32,33 0.664000 1.089536
33 TMD_C_JMD_C-Pat...,11)-QIAN880122 Conformation β-strand β-sheet Weights for bet...ejnowski, 1988) 0.173000 0.056328 0.056328 0.067428 0.094795 0.000006 0.000147 25,28,31 0.483200 0.913371
34 JMD_N_TMD_N-Per...3,2)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.172000 0.087470 -0.087470 0.135114 0.144731 0.000005 0.000137 2,5,8,11,14,17,20 0.444000 0.721620
35 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan
36 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan
37 TMD_C_JMD_C-Seg...2,3)-KOEH090106 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.224000 0.088025 0.088025 0.095746 0.124611 0.000014 0.000014 27,28,29,30,31,32,33 nan nan
38 TMD_C_JMD_C-Seg...4,5)-OOBM770101 Polarity Hydrophilicity Non-bonded energy per atom Average non-bon...take-Ooi, 1977) 0.277000 0.217063 0.217063 0.180330 0.208994 0.000000 0.000000 33,34,35,36 nan nan
39 TMD_C_JMD_C-Pat...,14)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.168000 0.086323 0.086323 0.121405 0.138577 0.000000 0.000030 30,34 0.140400 0.391229
40 TMD_C_JMD_C-Seg...4,5)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.167000 0.080568 0.080568 0.128898 0.128726 0.000011 0.000218 33,34,35,36 1.299200 2.159535
41 TMD_C_JMD_C-Seg...4,5)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.166000 0.081797 0.081797 0.121170 0.149555 0.000013 0.000239 33,34,35,36 1.295200 2.225137
42 TMD-Pattern(C,4,7)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.121210 -0.121210 0.143560 0.207767 0.000015 0.000254 24,27 1.302000 1.466618
43 TMD_C_JMD_C-Pat...5,8)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.119568 -0.119568 0.143560 0.205817 0.000014 0.000253 25,28 0.000000 0.000000
44 TMD_C_JMD_C-Seg...5,7)-TANS770108 Conformation β/α-bridge β/α-bridge Normalized freq...Scheraga, 1977) 0.164000 0.079708 0.079708 0.135324 0.137910 0.000016 0.000271 32,33,34 0.462400 0.706967
45 TMD-PeriodicPat...3,1)-OOBM850101 Structure-Activity Stability Stability (extended-coil) Optimized beta-...e et al., 1985) 0.197000 0.046799 0.046799 0.052251 0.070467 0.000133 0.000133 12,15,18,21,24,27,30 nan nan
46 TMD-Pattern(C,4,7)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.300000 0.236619 0.236619 0.165353 0.219458 0.000000 0.000000 24,27 nan nan
47 TMD_C_JMD_C-Pat...,11)-EISD860101 Polarity Hydrophobicity Solvation free energy Solvation free ...cLachlan, 1986) 0.162000 0.083936 -0.083936 0.143338 0.147948 0.000021 0.000304 30,33,37 0.330400 0.377566
48 TMD_C_JMD_C-Pat...5,8)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.162000 0.070292 -0.070292 0.096915 0.128362 0.000020 0.000302 21,25,28 1.528400 2.418922
49 TMD-Pattern(C,4...,11)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.161000 0.068424 -0.068424 0.096915 0.126975 0.000024 0.000332 20,24,27 0.000000 0.000000
50 JMD_N_TMD_N-Seg...2,4)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.161000 0.058976 -0.058976 0.096823 0.114647 0.000025 0.000335 6,7,8,9,10 0.000000 0.000000
51 JMD_N_TMD_N-Pat...,11)-PRAM820103 Shape Shape and Surface Correlation coe...t in regression Correlation coe...nnuswamy, 1982) 0.161000 0.057828 0.057828 0.088362 0.106085 0.000024 0.000328 1,5,8,11 1.304400 1.657101
52 TMD_C_JMD_C-Seg...5,7)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.160000 0.059281 -0.059281 0.100693 0.120806 0.000027 0.000359 32,33,34 0.757200 1.471249
53 TMD_C_JMD_C-Pat...4,8)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.159000 0.103808 0.103808 0.140977 0.179008 0.000014 0.000248 33,37 0.233200 0.593921
54 JMD_N_TMD_N-Seg...,13)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.157000 0.127895 -0.127895 0.151304 0.258491 0.000035 0.000420 5,6 0.833200 1.360696
55 TMD_C_JMD_C-Pat...4,8)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.155000 0.110281 0.110281 0.178578 0.202098 0.000046 0.000486 33,37,40 0.272400 0.623809
56 JMD_N_TMD_N-Pat...,12)-AURR980113 Conformation α-helix α-helix Normalized posi...ora-Rose, 1998) 0.208000 0.105550 -0.105550 0.151448 0.143693 0.000055 0.000055 9,12,15 nan nan
57 JMD_N_TMD_N-Pat...,15)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.155000 0.059593 -0.059593 0.104862 0.110749 0.000050 0.000508 6,9,12,15 0.482000 0.672000
58 JMD_N_TMD_N-Pat...,11)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.154000 0.092099 -0.092099 0.142836 0.171547 0.000052 0.000520 4,7,11 1.065200 1.916900
59 TMD_C_JMD_C-Seg...,10)-CHAM820102 Polarity Hydrophobicity (interface) Free energy (interface) Free energy of ...-Charton, 1982) 0.154000 0.082300 -0.082300 0.136264 0.177551 0.000050 0.000508 33,34 0.366800 0.691767
60 TMD-Pattern(C,5...,12)-FAUJ880107 Structure-Activity Stability α-CH chemical s...kbone-dynamics) N.m.r. chemical...e et al., 1988) 0.123000 0.065435 -0.065435 0.173044 0.140726 0.017378 0.017378 19,22,26 nan nan
61 TMD_C_JMD_C-Pat...,11)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.153000 0.085041 -0.085041 0.135864 0.161279 0.000059 0.000561 30,33,37 0.473600 0.930690
62 TMD_C_JMD_C-Pat...,15)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.153000 0.069595 -0.069595 0.107314 0.134698 0.000060 0.000566 26,29,33 0.770800 1.299178
63 TMD_C_JMD_C-Pat...,15)-LINS030106 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded proteins) Hydrophilic med...s et al., 2003) 0.151000 0.071208 0.071208 0.136279 0.155749 0.000078 0.000657 26,30,33 0.326400 0.451202
64 TMD-Pattern(C,3...,14)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.150000 0.056439 -0.056439 0.094520 0.108682 0.000084 0.000685 17,20,24,28 0.684400 0.941892
65 TMD_C_JMD_C-Pat...,10)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.149000 0.073526 0.073526 0.133612 0.157088 0.000090 0.000714 31,34,38 2.050800 2.338278
66 JMD_N_TMD_N-Seg...2,6)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.148000 0.076361 -0.076361 0.140513 0.148387 0.000108 0.000790 4,5,6 0.537200 1.041739
67 JMD_N_TMD_N-Pat...,15)-BROC820101 Polarity Hydrophobicity Hydrophobicity ...on coefficient) Retention Coeff...e et al., 1982) 0.148000 0.067069 -0.067069 0.120409 0.137261 0.000103 0.000768 6,9,12,15 0.106400 0.249766
68 TMD_C_JMD_C-Pat...4,8)-CORJ870108 Polarity Hydrophilicity TOTLS index TOTLS index (Co...e et al., 1987) 0.251000 0.170889 -0.170889 0.167914 0.219014 0.000001 0.000001 24,28 nan nan
69 TMD_C_JMD_C-Seg...2,5)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.260000 0.141016 0.141016 0.107804 0.160336 0.000000 0.000000 25,26,27,28 nan nan
70 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.333000 0.246278 0.246278 0.183876 0.212529 0.000000 0.000000 28,29 nan nan
71 JMD_N_TMD_N-Pat...,11)-BIGC670101 ASA/Volume Volume Volume Residue volume (Bigelow, 1967) 0.143000 0.067181 -0.067181 0.141579 0.135502 0.000184 0.001045 5,8,11 0.382000 0.675082
72 JMD_N_TMD_N-Pat...,11)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.142000 0.070908 -0.070908 0.135389 0.144272 0.000190 0.001062 5,8,11 0.384400 0.570074
73 JMD_N_TMD_N-Pat...,14)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.142000 0.058743 -0.058743 0.117342 0.120311 0.000187 0.001056 6,10,14 0.197200 0.344958
74 TMD-Pattern(N,4,7)-QIAN880113 Conformation π-helix α-helix (C-terminal) Weights for alp...ejnowski, 1988) 0.141000 0.070553 -0.070553 0.164819 0.154840 0.000217 0.001151 14,17 0.634800 0.816456
75 JMD_N_TMD_N-Seg...3,6)-PALJ810111 Conformation β-sheet β-sheet Normalized freq...u et al., 1981) 0.108000 0.053468 -0.053468 0.129405 0.141614 0.035900 0.035900 7,8,9,10 nan nan
76 TMD_C_JMD_C-Pat...4,8)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.140000 0.066859 0.066859 0.130397 0.147129 0.000229 0.001185 33,37,40 0.334800 0.632640
77 JMD_N_TMD_N-Seg...2,7)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.139000 0.070195 -0.070195 0.113589 0.146944 0.000259 0.001276 3,4,5 0.498800 0.924962
78 JMD_N_TMD_N-Seg...2,6)-RACS770103 ASA/Volume Accessible surface area (ASA) Side chain orientation Side chain orie...Scheraga, 1977) 0.138000 0.069674 0.069674 0.151437 0.143090 0.000308 0.001398 4,5,6 0.214400 0.501327
79 TMD_C_JMD_C-Seg...4,8)-MIYS850101 Polarity Hydrophobicity Effective partition energy Effective parti...Jernigan, 1985) 0.215000 0.123198 0.123198 0.127012 0.166940 0.000030 0.000030 28,29,30 nan nan
80 JMD_N_TMD_N-Seg...7,8)-KARS160114 Shape Side chain length Eccentricity (average) Average weighte...-Knisley, 2016) 0.137000 0.056352 -0.056352 0.122287 0.122893 0.000322 0.001432 16,17 1.170800 1.925978
81 TMD_C_JMD_C-Seg...,14)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.136000 0.080537 -0.080537 0.194254 0.165343 0.000150 0.000932 26,27 0.638000 0.796859
82 TMD_C_JMD_C-Seg...,10)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.136000 0.072267 -0.072267 0.142246 0.173638 0.000352 0.001527 33,34 1.013200 1.315181
83 TMD_C_JMD_C-Pat...5,8)-LINS030107 ASA/Volume Accessible surface area (ASA) ASA (folded protein) % total accessi...s et al., 2003) 0.136000 0.064864 -0.064864 0.078387 0.131618 0.000367 0.001565 25,28 0.842000 0.904274
84 JMD_N_TMD_N-Pat...6,9)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.135000 0.062723 -0.062723 0.120282 0.141044 0.000396 0.001638 3,6,9 0.696800 1.062095
85 TMD-Pattern(N,1...4,7)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.135000 0.058024 -0.058024 0.115415 0.124556 0.000385 0.001610 11,14,17 0.244400 0.503183
86 JMD_N_TMD_N-Seg...,10)-CRAJ730102 Conformation β-sheet β-sheet Normalized freq...d et al., 1973) 0.134000 0.096792 -0.096792 0.182935 0.210285 0.000461 0.001775 5,6 0.485600 0.792949
87 JMD_N_TMD_N-Pat...,14)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.133000 0.071020 0.071020 0.161372 0.138873 0.000491 0.001836 6,10,14 0.000000 0.000000
88 JMD_N_TMD_N-Seg...7,9)-BURA740102 Conformation β-strand Extended Normalized freq...s et al., 1974) 0.132000 0.056043 0.056043 0.119813 0.123454 0.000562 0.001981 14,15 0.231600 0.356019
89 TMD-Segment(3,8)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.132000 0.055783 -0.055783 0.129933 0.133383 0.000558 0.001977 16,17 0.502400 0.761626
90 TMD-Pattern(N,1...,10)-QIAN880124 Conformation β-sheet (C-term) β-sheet (C-terminal) Weights for bet...ejnowski, 1988) 0.131000 0.069857 0.069857 0.157078 0.159138 0.000580 0.002008 11,14,17,20 0.502800 0.811308
91 JMD_N_TMD_N-Pat...,11)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.178000 0.109835 0.109835 0.133383 0.179887 0.000587 0.000587 10,13,17 nan nan
92 JMD_N_TMD_N-Pat...,14)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.130000 0.067433 -0.067433 0.133237 0.146065 0.000642 0.002130 7,11,14 0.306800 0.574245
93 TMD_C_JMD_C-Pat...3,7)-MAXF760105 Conformation α-helix (left-handed) α-helix (left-handed) Normalized freq...Scheraga, 1976) 0.129000 0.071374 0.071374 0.180851 0.152571 0.000727 0.002285 23,27 0.000000 0.000000
94 TMD_C_JMD_C-Pat...,11)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.128000 0.062708 -0.062708 0.113629 0.123346 0.000767 0.002362 25,28,31 0.407200 0.686822
95 JMD_N_TMD_N-Seg...2,3)-CHAM830105 Shape Side chain length n atoms in side chain (3+1) The number of a...-Charton, 1983) 0.128000 0.057140 -0.057140 0.128493 0.130946 0.000672 0.002187 7,8,9,10,11,12,13 0.121600 0.273037
96 JMD_N_TMD_N-Seg...3,9)-ISOY800102 Conformation β-strand Extended Normalized rela...i et al., 1980) 0.126000 0.079975 -0.079975 0.169167 0.182954 0.000926 0.002636 5,6 1.002000 1.075427

return_details=True additionally returns a long-form table of every candidate scale considered for each feature, with its interpretability grade, correlation with the original scale, recomputed std_test, and whether it was accepted:

df_simple2, df_candidates = cpp.simplify(df_feat=df_feat, labels=labels,
                                         max_interpret_grade=2, return_details=True)
aa.display_df(df_candidates, show_shape=True)
DataFrame shape: (78, 9)
  feature candidate_scale interpretability_orig interpretability_cand cor std_test accepted cv_score reason
1 TMD_C_JMD_C-Seg...6,9)-TANS770106 RADA880101 8.000000 1.000000 -0.742074 0.208957 False nan max_std_test
2 TMD_C_JMD_C-Seg...6,9)-TANS770106 KOEH090103 8.000000 1.000000 0.737712 0.196374 True 0.907051 accepted
3 TMD-Pattern(C,3...,15)-ANDN920101 FUKS010109 8.000000 1.000000 -0.728856 0.133076 True 0.907051 accepted
4 TMD_C_JMD_C-Pat...,12)-ANDN920101 FUKS010109 8.000000 1.000000 -0.728856 0.172594 True 0.907051 accepted
5 TMD_C_JMD_C-Seg...4,5)-TANS770106 RADA880101 8.000000 1.000000 -0.742074 0.162400 True 0.907051 accepted
6 TMD-Pattern(C,5...,15)-OOBM770105 KARS160117 8.000000 1.000000 -0.959796 0.082837 True 0.907051 accepted
7 TMD-Pattern(C,4,7)-ANDN920101 FUKS010109 8.000000 1.000000 -0.728856 0.165353 True 0.907051 accepted
8 TMD_C_JMD_C-Seg...3,4)-MONM990101 KOEH090102 8.000000 1.000000 0.922232 0.134902 True 0.907051 accepted
9 TMD_C_JMD_C-Pat...3,7)-OOBM770105 KARS160117 8.000000 1.000000 -0.959796 0.117714 True 0.907051 accepted
10 TMD_C_JMD_C-Seg...2,5)-ANDN920101 FUKS010109 8.000000 1.000000 -0.728856 0.107804 True 0.907051 accepted
11 JMD_N_TMD_N-Seg...3,6)-VASM830102 PALJ810111 8.000000 1.000000 -0.754824 0.129405 True 0.907051 accepted
12 TMD_C_JMD_C-Pat...,15)-MONM990101 KOEH090102 8.000000 1.000000 0.922232 0.103456 True 0.907051 accepted
13 TMD_C_JMD_C-Pat...,10)-TANS770106 RADA880101 8.000000 1.000000 -0.742074 0.178993 True 0.907051 accepted
14 JMD_N_TMD_N-Pat...,11)-ANDN920101 FUKS010109 8.000000 1.000000 -0.728856 0.133383 True 0.907051 accepted
15 JMD_N_TMD_N-Pat...,11)-VASM830102 PALJ810111 8.000000 1.000000 -0.754824 0.125346 True 0.907051 accepted
16 TMD_C_JMD_C-Seg...3,4)-PRAM820102 LINS030101 7.000000 1.000000 -0.756652 0.121842 True 0.907051 accepted
17 TMD_C_JMD_C-Pat...,12)-FAUJ880108 BULH740102 7.000000 4.000000 -0.740983 0.130978 True 0.907051 accepted
18 TMD_C_JMD_C-Seg...2,3)-PRAM820102 LINS030101 7.000000 1.000000 -0.756652 0.098523 True 0.907051 accepted
19 TMD-Pattern(C,5...,12)-PRAM820102 LINS030101 7.000000 1.000000 -0.756652 0.110439 True 0.907051 accepted
20 JMD_N_TMD_N-Seg...,15)-FAUJ880101 RACS820111 7.000000 1.000000 0.796898 0.194309 True 0.907051 accepted
21 JMD_N_TMD_N-Seg...3,8)-FAUJ880101 RACS820111 7.000000 1.000000 0.796898 0.163724 True 0.907051 accepted
22 JMD_N_TMD_N-Seg...2,4)-MEIH800101 MIYS850101 7.000000 1.000000 -0.956832 0.096040 True 0.907051 accepted
23 JMD_N_TMD_N-Seg...2,6)-MEIH800101 MIYS850101 7.000000 1.000000 -0.956832 0.117602 True 0.907051 accepted
24 TMD_C_JMD_C-Pat...4,8)-MEIH800101 MIYS850101 7.000000 1.000000 -0.956832 0.153526 True 0.907051 accepted
25 TMD_C_JMD_C-Seg...4,8)-MEIH800101 MIYS850101 7.000000 1.000000 -0.956832 0.127012 True 0.907051 accepted
26 JMD_N_TMD_N-Per...4,3)-QIAN880138 QIAN880112 6.000000 5.000000 -0.774206 0.089768 True 0.907051 accepted
27 JMD_N_TMD_N-Per...4,3)-QIAN880138 QIAN880112 6.000000 5.000000 -0.774206 0.090360 True 0.907051 accepted
28 JMD_N_TMD_N-Pat...,11)-QIAN880127 OOBM850105 6.000000 1.000000 -0.812942 0.118719 True 0.907051 accepted
29 TMD_C_JMD_C-Pat...,10)-QIAN880138 QIAN880112 6.000000 5.000000 -0.774206 0.101748 True 0.907051 accepted
30 TMD_C_JMD_C-Seg...2,3)-CHOP780212 PALJ810106 5.000000 1.000000 0.800921 0.109387 True 0.907051 accepted
31 TMD_C_JMD_C-Pat...,12)-FINA770101 AURR980113 5.000000 1.000000 0.848669 0.103029 True 0.907051 accepted
32 TMD_C_JMD_C-Pat...,12)-CHOP780212 PALJ810106 5.000000 1.000000 0.800921 0.156745 True 0.907051 accepted
33 TMD-Pattern(N,1...4,7)-FINA770101 AURR980113 5.000000 1.000000 0.848669 0.151936 True 0.907051 accepted
34 JMD_N_TMD_N-Pat...,12)-FINA770101 AURR980113 5.000000 1.000000 0.848669 0.151448 True 0.907051 accepted
35 TMD-Pattern(C,5...,12)-MAXF760105 FAUJ880107 5.000000 1.000000 -0.771112 0.173044 True 0.907051 accepted
36 TMD-Pattern(N,2...,11)-RACS820101 CIDH920104 5.000000 1.000000 -0.716346 0.128234 True 0.907051 accepted
37 TMD_C_JMD_C-Pat...,13)-CHOP780212 PALJ810106 5.000000 1.000000 0.800921 0.158145 True 0.907051 accepted
38 JMD_N_TMD_N-Pat...,11)-RACS820101 CIDH920104 5.000000 1.000000 -0.716346 0.126433 True 0.907051 accepted
39 TMD-Pattern(C,4...,11)-FINA770101 AURR980113 5.000000 1.000000 0.848669 0.127339 True 0.907051 accepted
40 TMD_C_JMD_C-Pat...,12)-QIAN880114 CIDH920103 5.000000 1.000000 -0.763299 0.124914 True 0.907051 accepted
41 TMD_C_JMD_C-Seg...6,9)-PALJ810113 PALJ810105 5.000000 1.000000 0.732768 0.089871 True 0.907051 accepted
42 JMD_N_TMD_N-Seg...2,4)-QIAN880114 CIDH920103 5.000000 1.000000 -0.763299 0.095751 True 0.907051 accepted
43 TMD_C_JMD_C-Pat...3,7)-MAXF760105 FAUJ880107 5.000000 1.000000 -0.771112 0.211477 False nan max_std_test
44 JMD_N_TMD_N-Seg...3,9)-KOEP990102 GRAR740102 5.000000 1.000000 0.773220 0.156564 True 0.907051 accepted
45 TMD_C_JMD_C-Seg...3,4)-HUTJ700102 LINS030101 4.000000 1.000000 0.869506 0.121842 True 0.907051 accepted
46 TMD_C_JMD_C-Seg...3,4)-JANJ790102 KOEH090106 4.000000 1.000000 -1.000000 0.159718 True 0.907051 accepted
47 TMD_C_JMD_C-Seg...6,9)-DESM900102 OOBM770101 4.000000 1.000000 -0.949505 0.202111 False nan max_std_test
48 TMD_C_JMD_C-Seg...6,9)-DESM900102 LINS030107 4.000000 1.000000 -0.948654 0.193808 True 0.907051 accepted
49 TMD_C_JMD_C-Seg...6,9)-RICJ880113 WILM950101 4.000000 1.000000 -0.708204 0.109849 True 0.907051 accepted
50 TMD_C_JMD_C-Pat...5,8)-RADA880104 EISD840101 4.000000 1.000000 0.908505 0.052593 True 0.907051 accepted
51 TMD-Pattern(C,4,7)-RADA880104 EISD840101 4.000000 1.000000 0.908505 0.052593 True 0.907051 accepted
52 TMD_C_JMD_C-Pat...4,8)-JANJ790102 KOEH090106 4.000000 1.000000 -1.000000 0.181777 True 0.907051 accepted
53 JMD_N_TMD_N-Pat...,10)-AURR980116 QIAN880110 4.000000 1.000000 0.753792 0.153325 True 0.907051 accepted
54 TMD_C_JMD_C-Seg...4,5)-RICJ880113 WILM950101 4.000000 1.000000 -0.708204 0.102916 True 0.907051 accepted
55 TMD-Pattern(N,4,7)-AURR980116 QIAN880110 4.000000 1.000000 0.753792 0.167930 True 0.914744 accepted
56 TMD_C_JMD_C-Seg...4,5)-YUTK870103 EISD860102 4.000000 3.000000 -0.838651 0.180052 True 0.914744 accepted
57 TMD_C_JMD_C-Pat...,15)-YUTK870101 GUYH850105 4.000000 1.000000 -0.840600 0.126437 True 0.914744 accepted
58 TMD_C_JMD_C-Seg...2,3)-HUTJ700102 LINS030101 4.000000 1.000000 0.869506 0.098523 True 0.914744 accepted
59 TMD_C_JMD_C-Seg...2,3)-JANJ790102 KOEH090106 4.000000 1.000000 -1.000000 0.095746 True 0.914744 accepted
60 TMD_C_JMD_C-Seg...4,5)-DESM900102 OOBM770101 4.000000 1.000000 -0.949505 0.180330 True 0.914744 accepted
61 TMD_C_JMD_C-Seg...2,2)-RICJ880113 WILM950101 4.000000 1.000000 -0.708204 0.067646 True 0.914744 accepted
62 TMD-Pattern(C,5...,12)-HUTJ700102 LINS030101 4.000000 1.000000 0.869506 0.110439 True 0.914744 accepted
63 TMD-PeriodicPat...3,1)-COHE430101 OOBM850101 4.000000 1.000000 0.759615 0.052251 True 0.914744 accepted
64 JMD_N_TMD_N-Pat...,13)-RICJ880107 ROSG850101 4.000000 1.000000 0.796390 0.101561 True 0.914744 accepted
65 TMD_C_JMD_C-Pat...4,8)-CORJ870107 CORJ870108 4.000000 1.000000 -0.996278 0.167914 True 0.914744 accepted
66 TMD-Pattern(C,4...,11)-RICJ880107 ROSG850101 4.000000 1.000000 0.796390 0.124610 False 0.907051 cv_drop
67 TMD-Pattern(C,4...,11)-RICJ880107 CHOP780203 4.000000 1.000000 -0.792356 0.138664 True 0.922436 accepted
68 TMD_C_JMD_C-Seg...,11)-COHE430101 OOBM850101 4.000000 1.000000 0.759615 0.084763 False 0.914744 cv_drop
69 TMD_C_JMD_C-Seg...,11)-COHE430101 PALJ810106 4.000000 1.000000 -0.717945 0.167896 False 0.914744 cv_drop
70 TMD_C_JMD_C-Seg...,11)-COHE430101 LIFS790102 4.000000 1.000000 0.713730 0.183876 True 0.922436 accepted
71 JMD_N_TMD_N-Seg...,10)-RICJ880111 BHAR880101 4.000000 1.000000 -0.812868 0.178100 True 0.922436 accepted
72 TMD_C_JMD_C-Pat...,14)-HUTJ700102 LINS030101 4.000000 1.000000 0.869506 0.165586 True 0.922436 accepted
73 JMD_N_TMD_N-Pat...,11)-HUTJ700102 LINS030101 4.000000 1.000000 0.869506 0.137245 True 0.922436 accepted
74 TMD_C_JMD_C-Seg...2,3)-YUTK870103 EISD860102 4.000000 3.000000 -0.838651 0.108394 True 0.922436 accepted
75 JMD_N_TMD_N-Pat...,12)-RICJ880111 BHAR880101 4.000000 1.000000 -0.812868 0.164544 True 0.922436 accepted
76 TMD_C_JMD_C-Seg...4,5)-FAUJ880109 GUYH850105 3.000000 1.000000 0.926858 0.157550 True 0.922436 accepted
77 TMD_C_JMD_C-Seg...4,6)-FAUJ880109 GUYH850105 3.000000 1.000000 0.926858 0.162605 True 0.922436 accepted
78 TMD_C_JMD_C-Seg...2,2)-FAUJ880109 GUYH850105 3.000000 1.000000 0.926858 0.086797 True 0.922436 accepted

CPPPlot().feature_map — the signature CPP visualization — makes the simplification visible: it lays the per-feature mean differences out along the sequence, groups the rows by subcategory, and shows feature importance as the bar track. With 150 features the map is unreadable, so we show the top 40 features by importance on a tall canvas. The original feature set already carries feat_importance:

import matplotlib.pyplot as plt
cpp_plot = aa.CPPPlot()
aa.plot_settings(weight_bold=False)
df_feat_top = df_feat.sort_values("feat_importance", ascending=False).head(40)
cpp_plot.feature_map(df_feat=df_feat_top, figsize=(8, 14))
plt.show()
../_images/cpp_simplify_1_output_25_0.png

And the simplified set (swapped features carry no importance, so we re-attach it with TreeModel) — it speaks in fewer, more interpretable subcategories, with the original (most interpretable) features protected:

df_scales_all = aa.load_scales()
X = sf.feature_matrix(features=list(df_simple["feature"]), df_parts=df_parts,
                      df_scales=df_scales_all)
df_simple_imp = aa.TreeModel().fit(X, labels=labels).add_feat_importance(df_feat=df_simple, drop=True, sort=True)
df_simple_top = df_simple_imp.head(40)
cpp_plot.feature_map(df_feat=df_simple_top, figsize=(8, 14))
plt.show()
/Users/stephanbreimann/Programming/1Packages/aaanalysis-simplify-fast/aaanalysis/feature_engineering/_backend/cpp_run.py:143: UserWarning: CPP is using the Python kernel fallback — the compiled Cython extension is not available in this install. Output is bit-exact with the Cython path but ~2x slower. Reinstall via pip install --force-reinstall aaanalysis to fetch a prebuilt wheel.
  warnings.warn(
../_images/cpp_simplify_2_output_27_1.png

Finally, an overview of how the subcategory vocabulary shifts as max_interpret_grade is tightened from 10 (keep everything) down to 1 (only the best, grade-1 tier). Each grade level is one colored series (aa.plot_get_clist(n_colors=10)); as the grade tightens, features in worse-graded subcategories are replaced and migrate into the most interpretable ones:

import matplotlib.pyplot as plt
import pandas as pd

levels = list(range(1, 11))
colors = aa.plot_get_clist(n_colors=10)
counts = {g: cpp.simplify(df_feat=df_feat, labels=labels, max_interpret_grade=g)
          ["subcategory"].value_counts() for g in levels}
df_levels = pd.DataFrame(counts).fillna(0).astype(int)
print(df_levels)

top_subcats = df_levels[10].sort_values(ascending=False).head(8).index
df_plot = df_levels.loc[top_subcats]

aa.plot_settings(weight_bold=False)
ax = df_plot.plot(kind="barh", figsize=(7, 6), color=colors, width=0.85, legend=False)
ax.invert_yaxis()
ax.set_xlabel("number of features")
ax.set_ylabel("subcategory")
ax.legend(title="max_interpret_grade", labels=[str(g) for g in levels],
          loc="upper center", bbox_to_anchor=(0.5, -0.12), ncol=5, frameon=False)
plt.tight_layout()
plt.show()
                               1   2   3   4   5   6   7   8   9   10
subcategory
AA composition                  5   5   5   5   5   5   5   0   0   0
Accessible surface area (ASA)   6   6   6   6   6   6   6   6   6   6
Amphiphilicity                  0   3   3   3   3   3   3   3   3   3
Amphiphilicity (α-helix)        0   0   0   3   3   3   3   3   3   3
Backbone-dynamics (-CH)         2   2   2   2   2   2   2   7   7   7
Buried                          5   5   5   5   5   5   5   5   5   5
Charge                          3   3   3   3   3   3   3   3   3   3
Coil                            5   5   5   5   5   5   5   5   5   5
Coil (C-term)                   0   0   0   0   0   3   3   3   3   3
Coil (N-term)                   0   0   0   0   0   1   1   1   1   1
Electron-ion interaction pot.   3   3   3   3   3   3   4   4   4   4
Entropy                         0   0   0   5   5   5   5   5   5   5
Flexibility                     1   1   1   1   1   1   1   1   1   1
Free energy (unfolding)         0   0   0   8   8   8   8   8   8   8
Hydrophilicity                  4   4   4   2   1   1   1   0   0   0
Hydrophobicity                  7   7   7   6   6   6   6   6   6   6
Hydrophobicity (interface)      4   4   4   4   4   4   4   4   4   4
Isoelectric point               0   0   3   3   3   3   3   3   3   3
Non-bonded energy               0   0   0   0   0   0   0   4   4   4
Partial specific volume         1   1   1   2   2   2   2   2   2   2
Reduced distance                0   0   0   0   0   0   4   4   4   4
Shape and Surface               1   1   1   1   1   1   4   4   4   4
Side chain length               8   7   7   7   7   7   7   7   7   7
Stability                       6   6   6   5   4   4   4   4   4   4
Stability (helix-coil)          0   0   0   0   4   4   4   4   4   4
Steric parameter                0   0   0   0   0   0   2   2   2   2
Volume                          7   7   7   6   6   6   5   5   5   5
α-helix                         5   5   5   4   4   4   4   4   4   4
α-helix (C-cap)                 3   3   3   8   8   8   8   8   8   8
α-helix (C-term, out)           3   3   3   3   3   3   3   3   3   3
α-helix (left-handed)           1   1   1   1   3   3   3   3   3   3
β-sheet                         2   2   2   2   2   2   2   1   1   1
β-sheet (C-term)                1   1   1   1   4   4   4   4   4   4
β-sheet (N-term)                0   0   0   0   5   5   5   5   5   5
β-strand                        9   9   9   8   8   8   8   8   8   8
β-turn (TM helix)               0   0   0   0   0   0   0   5   5   5
β/α-bridge                      1   1   1   1   1   1   1   1   1   1
π-helix                         1   1   1   5   5   5   5   5   5   5
../_images/cpp_simplify_3_output_29_1.png