CPP.simplify
- CPP.simplify(df_feat=None, labels=None, strategy='greedy', candidate_search='exact', max_interpret_grade=None, min_cor=0.7, ml_model='svm', ml_metric='balanced_accuracy', ml_th=0.0, ml_cv=5, allow_drop=True, on_unimprovable='keep', redundancy_tie_break='interpretability', label_test=1, label_ref=0, max_std_test=0.2, max_cor=0.5, max_overlap=0.5, check_cat=True, return_details=False)[source]
Simplify a feature set by swapping scales for more interpretable correlated ones.
For each feature (
PART-SPLIT-SCALE), an alternative scale from a more interpretable AAontology subcategory (interpretability grade 1-10, 1 = best; per-subcategory grades are inload_scales(name='subcat')) that correlates with the original scale is substituted, keepingPART-SPLIT. The swapped feature’s statistics are recomputed; a swap is accepted only if it passes CPP’s per-feature filtering (max_std_test) and a cross-validation gate (performance not worse than the current set, withinml_th). The swapped set is then redundancy-reduced, yielding a more interpretable and ideally smallerdf_feat. The candidate pool (the full rated AAontology scale set) is loaded internally.Added in version 1.1.0.
- Parameters:
df_feat (pd.DataFrame, shape (n_features, n_feature_info)) – Feature DataFrame from
run()(the standardized CPP output schema).labels (array-like, shape (n_samples,)) – Class labels for samples in sequence DataFrame (typically, test=1, reference=0).
strategy ({'greedy', 'consolidate', 'swap_all'}, default='greedy') –
How candidate swaps are chosen and validated (see Notes for full behavior):
'greedy': per-feature — each targeted feature is swapped to its best candidate that keeps the cross-validation (CV) score withinml_th.'consolidate': set-level — funnels features into the fewest interpretable subcategories, keeping each batch swap only if the set CV score holds.'swap_all': apply every eligible best-candidate swap with no CV gate (fastest;ml_model/ml_metric/ml_th/ml_cvignored).
candidate_search ({'exact', 'fast'}, default='exact') – How many candidate scales are evaluated per feature.
'exact'(default) tests every eligible candidate and reproduces the original result exactly.'fast'is an approximate speed-up that caps the search to the most promising candidates per feature (highest interpretability, then strongest correlation); it can change which features are kept and so is most useful on large scale pools. If none of the searched candidates is accepted, the feature keeps its original scale (it is never dropped for this reason), so'fast'may leave a feature un-simplified that'exact'would have swapped. The speed-up is concentrated instrategy='greedy'(one cross-validation per candidate tried);'consolidate'gains less and'swap_all'is unaffected (it already stops at the first viable candidate). Withreturn_details=Truethedf_candidatesreport is correspondingly shorter under'fast'.max_interpret_grade (int, optional) – The maximum (worst) interpretability grade kept (1-10, where grade 1 is the best / most interpretable, so lower is better). Every feature whose scale subcategory is graded worse (higher) than this is targeted for replacement. If
None(default), every improvable feature is attempted.min_cor (float, default=0.7) – Minimum absolute Pearson correlation between a candidate scale and the original scale (between 0 and 1); anti-correlation is allowed via the absolute value.
ml_model (str or sklearn estimator, default='svm') – Model for the cross-validation gate (
'greedy'/'consolidate'). A string preset'svm'(default; fast),'rf'(recommended for non-linear feature relationships, but slower), or'log_reg'(fastest); or any configured scikit-learn classifier instance (e.g.SVC(kernel='poly', C=0.1)), used as-is.ml_metric (str, default='balanced_accuracy') – Scoring metric for the CV gate (any scikit-learn classification scorer name).
ml_th (float, default=0.0) – CV-gate tolerance: a swap is accepted if its CV score is at least
baseline - ml_th(>=0).ml_cv (int, default=5) – Number of cross-validation folds (>=2, <= smallest class count).
allow_drop (bool, default=True) – Whether
simplifymay drop features. IfFalse, it only swaps scales and never removes a feature, so the output keeps every input feature 1:1 (the redundancy reduction is skipped andon_unimprovableis forced to'keep').on_unimprovable (str, default='keep') – What to do with a targeted feature that cannot be improved:
'keep'(retain the original),'drop'(remove it), or'drop_if_perf_allows'(remove only if the CV score does not drop). The last feature is never dropped.redundancy_tie_break (str, default='interpretability') – When two swapped features are redundant, keep the
'interpretability'-best (thenabs_auc) or the'performance'-best (abs_auc).label_test (int, default=1) – Class label of the test group in
labels.label_ref (int, default=0) – Class label of the reference group in
labels.max_std_test (float, default=0.2) – Per-feature pre-filter threshold a swapped feature must satisfy (between 0 and 1).
max_cor (float, default=0.5) – Redundancy correlation threshold for the post-swap reduction (between 0 and 1).
max_overlap (float, default=0.5) – Redundancy position-overlap threshold for the post-swap reduction (between 0 and 1).
check_cat (bool, default=True) – Whether the redundancy reduction only compares features within the same scale category.
return_details (bool, default=False) – If
True, also return a long-formdf_candidatesreporting every candidate considered (scale, interpretability, correlation, recomputed std, accepted-flag).
- Returns:
df_feat (pd.DataFrame) – The simplified feature DataFrame (CPP output schema), with swapped scales, recomputed statistics, and redundant features removed.
df_candidates (pd.DataFrame) – Returned only if
return_details=True: one row per candidate considered.
Notes
The CV-gate model is seeded from the CPP instance’s
random_state; set it once viaaa.CPP(..., random_state=...)for a reproducible result.Redundancy reduction protects original features — it never drops a feature the user already had, it only removes a swapped feature when the swap made it redundant with a kept feature (using signed correlation, matching
run()).The
strategycontrols how swaps are chosen and validated:‘greedy’: per-feature. Each targeted feature is swapped to its best correlated candidate that keeps the cross-validation score within
ml_thof the current set; otherwise the next candidate is tried. Each swap is individually justified.‘consolidate’: set-level. Interpretable subcategories are taken best-first, and every targeted feature that can move into the current subcategory is swapped as one batch, which is kept only if the set CV score stays within
ml_th. Funnels features into the fewest subcategories.‘swap_all’: apply every eligible best-candidate swap with no cross-validation (fastest);
ml_model/ml_metric/ml_th/ml_cvare ignored. A pure interpretability transform to evaluate yourself afterwards.
Features whose scale is not a rated AAontology scale (e.g.
run_numpseudo-scales or unclassified scales) carry no interpretability grade and are skipped. If no feature is rated,df_featis returned unchanged with aRuntimeWarning.An anti-correlated swap flips the sign of
mean_dif(the feature still discriminates); the correlation sign is reported indf_candidates.
See also
run()for the feature DataFrame produced and its schema.load_scales()for the interpretability-tiered explainable scale sets (top_explain_n).
Examples
CPP().simplify()rewrites a fitteddf_featinto a more interpretable, and ideally smaller one. For each feature (PART-SPLIT-SCALE) it swaps the scale for a correlated scale from a better-graded AAontology subcategory (interpretability grade 1-10, where grade 1 is the best, so lower is better), recomputes the feature statistics, and accepts the swap only if it keeps passing CPP filtering and does not reduce a cross-validation score. The swapped set is then redundancy-reduced without dropping any original feature (only a swapped feature that became redundant is removed). We start from a precomputed DOM_GSEC feature set (see [Breimann25]), which already carriesfeat_importance:import aaanalysis as aa aa.options["verbose"] = False df_feat = aa.load_features(name="DOM_GSEC") df_seq = aa.load_dataset(name="DOM_GSEC") labels = df_seq["label"].to_list() sf = aa.SequenceFeature() df_parts = sf.get_df_parts(df_seq=df_seq) # Reproducibility: the CV-gate model is seeded from the CPP instance's random_state cpp = aa.CPP(df_parts=df_parts, random_state=0) aa.display_df(df_feat, n_rows=5, show_shape=True)
DataFrame shape: (150, 15)
feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions feat_importance feat_importance_std 1 TMD_C_JMD_C-Seg...3,4)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.244000 0.103666 0.103666 0.106692 0.110506 0.000000 0.000000 31,32,33,34,35 0.970400 1.438918 2 TMD_C_JMD_C-Seg...3,4)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.243000 0.085064 0.085064 0.098774 0.096946 0.000000 0.000000 31,32,33,34,35 0.000000 0.000000 3 TMD_C_JMD_C-Seg...6,9)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.233000 0.137044 0.137044 0.161683 0.176964 0.000000 0.000001 32,33 1.554800 2.109848 4 TMD_C_JMD_C-Seg...3,4)-HUTJ700102 Energy Entropy Entropy Absolute entrop...Hutchens, 1970) 0.229000 0.098224 0.098224 0.106865 0.124608 0.000000 0.000001 31,32,33,34,35 3.111200 3.109955 5 TMD_C_JMD_C-Seg...6,9)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.223000 0.095071 0.095071 0.114758 0.132829 0.000000 0.000002 32,33 0.000000 0.000000 With only
df_featandlabels,simplifyruns the defaultgreedystrategy with an SVM cross-validation gate and returns the simplified feature set:df_simple = cpp.simplify(df_feat=df_feat, labels=labels) aa.display_df(df_simple, show_shape=True)
DataFrame shape: (94, 15)
feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions feat_importance feat_importance_std 1 TMD_C_JMD_C-Seg...3,4)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.244000 0.103666 0.103666 0.106692 0.110506 0.000000 0.000000 31,32,33,34,35 0.970400 1.438918 2 TMD_C_JMD_C-Seg...3,4)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.243000 0.085064 0.085064 0.098774 0.096946 0.000000 0.000000 31,32,33,34,35 0.000000 0.000000 3 TMD_C_JMD_C-Seg...6,9)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.233000 0.137044 0.137044 0.161683 0.176964 0.000000 0.000001 32,33 1.554800 2.109848 4 TMD_C_JMD_C-Seg...6,9)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.223000 0.095071 0.095071 0.114758 0.132829 0.000000 0.000002 32,33 0.000000 0.000000 5 TMD_C_JMD_C-Seg...2,3)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.222000 0.058671 0.058671 0.064895 0.069547 0.000000 0.000001 27,28,29,30,31,32,33 0.000000 0.000000 6 TMD_C_JMD_C-Seg...3,4)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.215000 0.124317 0.124317 0.166309 0.153364 0.000000 0.000004 31,32,33,34,35 1.080400 1.296094 7 TMD_C_JMD_C-Seg...,10)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.212000 0.141305 -0.141305 0.168603 0.217235 0.000000 0.000005 33,34 1.747200 2.150664 8 TMD_C_JMD_C-Seg...6,9)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.125350 0.125350 0.160819 0.174121 0.000000 0.000005 32,33 1.788800 2.700803 9 TMD_C_JMD_C-Seg...2,3)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.077355 0.077355 0.102965 0.107453 0.000000 0.000005 27,28,29,30,31,32,33 3.048800 3.623912 10 TMD_C_JMD_C-Seg...6,9)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.205000 0.125868 -0.125868 0.172165 0.188333 0.000000 0.000009 32,33 0.000000 0.000000 11 TMD_C_JMD_C-Seg...4,5)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.204000 0.105513 0.105513 0.132849 0.145219 0.000000 0.000009 33,34,35,36 1.992000 2.929460 12 JMD_N_TMD_N-Seg...1,2)-KARP850101 Structure-Activity Flexibility Flexibility (0 ...igid neighbors) Flexibility par...s-Schulz, 1985) 0.196000 0.062671 0.062671 0.083456 0.090427 0.000000 0.000023 1,2,3,4,5,6,7,8,9,10 1.574400 1.835403 13 TMD_C_JMD_C-Seg...4,5)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.193000 0.076770 0.076770 0.092804 0.114150 0.000000 0.000027 33,34,35,36 0.000000 0.000000 14 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.189000 0.125674 0.125674 0.183876 0.218813 0.000001 0.000039 28,29 4.729200 4.776785 15 TMD_C_JMD_C-Seg...6,9)-KOEH090103 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.323000 0.248255 0.248255 0.196374 0.181558 0.000000 0.000000 32,33 nan nan 16 TMD_C_JMD_C-Seg...4,5)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.185000 0.105474 -0.105474 0.157535 0.163039 0.000001 0.000059 33,34,35,36 0.000000 0.000000 17 TMD_C_JMD_C-Pat...,15)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.184000 0.062096 0.062096 0.078809 0.091271 0.000000 0.000017 26,30,33 0.147200 0.345306 18 JMD_N_TMD_N-Seg...2,4)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.183000 0.063902 -0.063902 0.090842 0.101427 0.000002 0.000068 6,7,8,9,10 0.823200 1.404583 19 TMD-Pattern(C,3...,15)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.198000 0.109532 0.109532 0.133076 0.159918 0.000122 0.000122 16,20,24,28 nan nan 20 TMD-Pattern(C,3...,15)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.182000 0.096246 0.096246 0.160859 0.159538 0.000002 0.000070 16,20,24,28 0.508400 0.738667 21 JMD_N_TMD_N-Seg...2,4)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.182000 0.066394 -0.066394 0.097857 0.103426 0.000002 0.000070 6,7,8,9,10 0.000000 0.000000 22 TMD_C_JMD_C-Seg...2,3)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.182000 0.063819 0.063819 0.101691 0.105987 0.000002 0.000071 27,28,29,30,31,32,33 0.000000 0.000000 23 TMD_C_JMD_C-Seg...,11)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.181000 0.057287 -0.057287 0.072234 0.106512 0.000002 0.000076 28,29 1.919600 2.094497 24 TMD_C_JMD_C-Pat...,12)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.220000 0.147418 0.147418 0.172594 0.195572 0.000020 0.000020 25,29,32 nan nan 25 TMD-PeriodicPat...3,4)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.180000 0.069277 -0.069277 0.094949 0.119524 0.000002 0.000082 13,16,20,23,27 1.818000 2.308293 26 JMD_N_TMD_N-Pat...,15)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.179000 0.115042 -0.115042 0.151938 0.189623 0.000002 0.000068 6,9,12,15 0.648400 1.061142 27 TMD-Pattern(C,4,7)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.176000 0.120892 0.120892 0.198986 0.216030 0.000004 0.000113 24,27 0.714800 1.118149 28 TMD_C_JMD_C-Pat...4,8)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.176000 0.087846 0.087846 0.140464 0.157561 0.000004 0.000113 24,28 2.704000 4.076269 29 TMD_C_JMD_C-Pat...,12)-BULH740102 ASA/Volume Partial specific volume Partial specific volume Apparent partia...l-Breese, 1974) 0.262000 0.153437 0.153437 0.130978 0.164028 0.000000 0.000000 21,24,28,32 nan nan 30 TMD-Pattern(C,4,7)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.176000 0.056675 -0.056675 0.099355 0.114698 0.000004 0.000113 24,27 0.372000 0.882270 31 TMD_C_JMD_C-Seg...2,3)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.175000 0.055597 -0.055597 0.089100 0.105827 0.000005 0.000126 27,28,29,30,31,32,33 0.664000 1.089536 32 TMD_C_JMD_C-Pat...,11)-QIAN880122 Conformation β-strand β-sheet Weights for bet...ejnowski, 1988) 0.173000 0.056328 0.056328 0.067428 0.094795 0.000006 0.000147 25,28,31 0.483200 0.913371 33 JMD_N_TMD_N-Per...3,2)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.172000 0.087470 -0.087470 0.135114 0.144731 0.000005 0.000137 2,5,8,11,14,17,20 0.444000 0.721620 34 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan 35 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan 36 TMD_C_JMD_C-Seg...2,3)-KOEH090106 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.224000 0.088025 0.088025 0.095746 0.124611 0.000014 0.000014 27,28,29,30,31,32,33 nan nan 37 TMD_C_JMD_C-Seg...4,5)-OOBM770101 Polarity Hydrophilicity Non-bonded energy per atom Average non-bon...take-Ooi, 1977) 0.277000 0.217063 0.217063 0.180330 0.208994 0.000000 0.000000 33,34,35,36 nan nan 38 TMD_C_JMD_C-Pat...,14)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.168000 0.086323 0.086323 0.121405 0.138577 0.000000 0.000030 30,34 0.140400 0.391229 39 TMD_C_JMD_C-Seg...4,5)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.166000 0.081797 0.081797 0.121170 0.149555 0.000013 0.000239 33,34,35,36 1.295200 2.225137 40 TMD-Pattern(C,4,7)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.121210 -0.121210 0.143560 0.207767 0.000015 0.000254 24,27 1.302000 1.466618 41 TMD_C_JMD_C-Pat...5,8)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.119568 -0.119568 0.143560 0.205817 0.000014 0.000253 25,28 0.000000 0.000000 42 TMD_C_JMD_C-Seg...5,7)-TANS770108 Conformation β/α-bridge β/α-bridge Normalized freq...Scheraga, 1977) 0.164000 0.079708 0.079708 0.135324 0.137910 0.000016 0.000271 32,33,34 0.462400 0.706967 43 TMD-PeriodicPat...3,1)-OOBM850101 Structure-Activity Stability Stability (extended-coil) Optimized beta-...e et al., 1985) 0.197000 0.046799 0.046799 0.052251 0.070467 0.000133 0.000133 12,15,18,21,24,27,30 nan nan 44 TMD-Pattern(C,4,7)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.300000 0.236619 0.236619 0.165353 0.219458 0.000000 0.000000 24,27 nan nan 45 TMD_C_JMD_C-Pat...,11)-EISD860101 Polarity Hydrophobicity Solvation free energy Solvation free ...cLachlan, 1986) 0.162000 0.083936 -0.083936 0.143338 0.147948 0.000021 0.000304 30,33,37 0.330400 0.377566 46 TMD_C_JMD_C-Pat...5,8)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.162000 0.070292 -0.070292 0.096915 0.128362 0.000020 0.000302 21,25,28 1.528400 2.418922 47 TMD-Pattern(C,4...,11)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.161000 0.068424 -0.068424 0.096915 0.126975 0.000024 0.000332 20,24,27 0.000000 0.000000 48 JMD_N_TMD_N-Seg...2,4)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.161000 0.058976 -0.058976 0.096823 0.114647 0.000025 0.000335 6,7,8,9,10 0.000000 0.000000 49 JMD_N_TMD_N-Pat...,11)-PRAM820103 Shape Shape and Surface Correlation coe...t in regression Correlation coe...nnuswamy, 1982) 0.161000 0.057828 0.057828 0.088362 0.106085 0.000024 0.000328 1,5,8,11 1.304400 1.657101 50 TMD_C_JMD_C-Seg...5,7)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.160000 0.059281 -0.059281 0.100693 0.120806 0.000027 0.000359 32,33,34 0.757200 1.471249 51 TMD_C_JMD_C-Pat...4,8)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.159000 0.103808 0.103808 0.140977 0.179008 0.000014 0.000248 33,37 0.233200 0.593921 52 JMD_N_TMD_N-Seg...,13)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.157000 0.127895 -0.127895 0.151304 0.258491 0.000035 0.000420 5,6 0.833200 1.360696 53 TMD_C_JMD_C-Pat...4,8)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.155000 0.110281 0.110281 0.178578 0.202098 0.000046 0.000486 33,37,40 0.272400 0.623809 54 JMD_N_TMD_N-Pat...,12)-AURR980113 Conformation α-helix α-helix Normalized posi...ora-Rose, 1998) 0.208000 0.105550 -0.105550 0.151448 0.143693 0.000055 0.000055 9,12,15 nan nan 55 JMD_N_TMD_N-Pat...,15)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.155000 0.059593 -0.059593 0.104862 0.110749 0.000050 0.000508 6,9,12,15 0.482000 0.672000 56 JMD_N_TMD_N-Pat...,11)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.154000 0.092099 -0.092099 0.142836 0.171547 0.000052 0.000520 4,7,11 1.065200 1.916900 57 TMD_C_JMD_C-Seg...,10)-CHAM820102 Polarity Hydrophobicity (interface) Free energy (interface) Free energy of ...-Charton, 1982) 0.154000 0.082300 -0.082300 0.136264 0.177551 0.000050 0.000508 33,34 0.366800 0.691767 58 TMD-Pattern(C,5...,12)-FAUJ880107 Structure-Activity Stability α-CH chemical s...kbone-dynamics) N.m.r. chemical...e et al., 1988) 0.123000 0.065435 -0.065435 0.173044 0.140726 0.017378 0.017378 19,22,26 nan nan 59 TMD_C_JMD_C-Pat...,11)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.153000 0.085041 -0.085041 0.135864 0.161279 0.000059 0.000561 30,33,37 0.473600 0.930690 60 TMD_C_JMD_C-Pat...,15)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.153000 0.069595 -0.069595 0.107314 0.134698 0.000060 0.000566 26,29,33 0.770800 1.299178 61 TMD_C_JMD_C-Pat...,15)-LINS030106 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded proteins) Hydrophilic med...s et al., 2003) 0.151000 0.071208 0.071208 0.136279 0.155749 0.000078 0.000657 26,30,33 0.326400 0.451202 62 TMD-Pattern(C,3...,14)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.150000 0.056439 -0.056439 0.094520 0.108682 0.000084 0.000685 17,20,24,28 0.684400 0.941892 63 TMD_C_JMD_C-Pat...,10)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.149000 0.073526 0.073526 0.133612 0.157088 0.000090 0.000714 31,34,38 2.050800 2.338278 64 JMD_N_TMD_N-Seg...2,6)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.148000 0.076361 -0.076361 0.140513 0.148387 0.000108 0.000790 4,5,6 0.537200 1.041739 65 JMD_N_TMD_N-Pat...,15)-BROC820101 Polarity Hydrophobicity Hydrophobicity ...on coefficient) Retention Coeff...e et al., 1982) 0.148000 0.067069 -0.067069 0.120409 0.137261 0.000103 0.000768 6,9,12,15 0.106400 0.249766 66 TMD_C_JMD_C-Pat...4,8)-CORJ870108 Polarity Hydrophilicity TOTLS index TOTLS index (Co...e et al., 1987) 0.251000 0.170889 -0.170889 0.167914 0.219014 0.000001 0.000001 24,28 nan nan 67 TMD_C_JMD_C-Seg...2,5)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.260000 0.141016 0.141016 0.107804 0.160336 0.000000 0.000000 25,26,27,28 nan nan 68 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.333000 0.246278 0.246278 0.183876 0.212529 0.000000 0.000000 28,29 nan nan 69 JMD_N_TMD_N-Pat...,11)-BIGC670101 ASA/Volume Volume Volume Residue volume (Bigelow, 1967) 0.143000 0.067181 -0.067181 0.141579 0.135502 0.000184 0.001045 5,8,11 0.382000 0.675082 70 JMD_N_TMD_N-Pat...,11)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.142000 0.070908 -0.070908 0.135389 0.144272 0.000190 0.001062 5,8,11 0.384400 0.570074 71 JMD_N_TMD_N-Pat...,14)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.142000 0.058743 -0.058743 0.117342 0.120311 0.000187 0.001056 6,10,14 0.197200 0.344958 72 TMD-Pattern(N,4,7)-QIAN880113 Conformation π-helix α-helix (C-terminal) Weights for alp...ejnowski, 1988) 0.141000 0.070553 -0.070553 0.164819 0.154840 0.000217 0.001151 14,17 0.634800 0.816456 73 JMD_N_TMD_N-Seg...3,6)-PALJ810111 Conformation β-sheet β-sheet Normalized freq...u et al., 1981) 0.108000 0.053468 -0.053468 0.129405 0.141614 0.035900 0.035900 7,8,9,10 nan nan 74 TMD_C_JMD_C-Pat...4,8)-KARS160103 Shape Side chain length Graph (weighted degree) Total weighted ...-Knisley, 2016) 0.153000 0.052948 0.052948 0.106553 0.121516 0.002988 0.002988 33,37,40 nan nan 75 JMD_N_TMD_N-Seg...2,7)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.139000 0.070195 -0.070195 0.113589 0.146944 0.000259 0.001276 3,4,5 0.498800 0.924962 76 JMD_N_TMD_N-Seg...2,6)-RACS770103 ASA/Volume Accessible surface area (ASA) Side chain orientation Side chain orie...Scheraga, 1977) 0.138000 0.069674 0.069674 0.151437 0.143090 0.000308 0.001398 4,5,6 0.214400 0.501327 77 TMD_C_JMD_C-Seg...4,8)-MIYS850101 Polarity Hydrophobicity Effective partition energy Effective parti...Jernigan, 1985) 0.215000 0.123198 0.123198 0.127012 0.166940 0.000030 0.000030 28,29,30 nan nan 78 JMD_N_TMD_N-Seg...7,8)-KARS160114 Shape Side chain length Eccentricity (average) Average weighte...-Knisley, 2016) 0.137000 0.056352 -0.056352 0.122287 0.122893 0.000322 0.001432 16,17 1.170800 1.925978 79 TMD_C_JMD_C-Seg...,14)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.136000 0.080537 -0.080537 0.194254 0.165343 0.000150 0.000932 26,27 0.638000 0.796859 80 TMD_C_JMD_C-Seg...,10)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.136000 0.072267 -0.072267 0.142246 0.173638 0.000352 0.001527 33,34 1.013200 1.315181 81 TMD_C_JMD_C-Pat...5,8)-LINS030107 ASA/Volume Accessible surface area (ASA) ASA (folded protein) % total accessi...s et al., 2003) 0.136000 0.064864 -0.064864 0.078387 0.131618 0.000367 0.001565 25,28 0.842000 0.904274 82 JMD_N_TMD_N-Pat...6,9)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.135000 0.062723 -0.062723 0.120282 0.141044 0.000396 0.001638 3,6,9 0.696800 1.062095 83 TMD-Pattern(N,1...4,7)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.135000 0.058024 -0.058024 0.115415 0.124556 0.000385 0.001610 11,14,17 0.244400 0.503183 84 JMD_N_TMD_N-Seg...,10)-CRAJ730102 Conformation β-sheet β-sheet Normalized freq...d et al., 1973) 0.134000 0.096792 -0.096792 0.182935 0.210285 0.000461 0.001775 5,6 0.485600 0.792949 85 JMD_N_TMD_N-Pat...,14)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.133000 0.071020 0.071020 0.161372 0.138873 0.000491 0.001836 6,10,14 0.000000 0.000000 86 JMD_N_TMD_N-Seg...7,9)-BURA740102 Conformation β-strand Extended Normalized freq...s et al., 1974) 0.132000 0.056043 0.056043 0.119813 0.123454 0.000562 0.001981 14,15 0.231600 0.356019 87 TMD-Segment(3,8)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.132000 0.055783 -0.055783 0.129933 0.133383 0.000558 0.001977 16,17 0.502400 0.761626 88 TMD-Pattern(N,1...,10)-QIAN880124 Conformation β-sheet (C-term) β-sheet (C-terminal) Weights for bet...ejnowski, 1988) 0.131000 0.069857 0.069857 0.157078 0.159138 0.000580 0.002008 11,14,17,20 0.502800 0.811308 89 JMD_N_TMD_N-Pat...,11)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.178000 0.109835 0.109835 0.133383 0.179887 0.000587 0.000587 10,13,17 nan nan 90 JMD_N_TMD_N-Pat...,14)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.130000 0.067433 -0.067433 0.133237 0.146065 0.000642 0.002130 7,11,14 0.306800 0.574245 91 TMD_C_JMD_C-Pat...3,7)-MAXF760105 Conformation α-helix (left-handed) α-helix (left-handed) Normalized freq...Scheraga, 1976) 0.129000 0.071374 0.071374 0.180851 0.152571 0.000727 0.002285 23,27 0.000000 0.000000 92 TMD_C_JMD_C-Pat...,11)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.128000 0.062708 -0.062708 0.113629 0.123346 0.000767 0.002362 25,28,31 0.407200 0.686822 93 JMD_N_TMD_N-Seg...2,3)-CHAM830105 Shape Side chain length n atoms in side chain (3+1) The number of a...-Charton, 1983) 0.128000 0.057140 -0.057140 0.128493 0.130946 0.000672 0.002187 7,8,9,10,11,12,13 0.121600 0.273037 94 JMD_N_TMD_N-Seg...3,9)-ISOY800102 Conformation β-strand Extended Normalized rela...i et al., 1980) 0.126000 0.079975 -0.079975 0.169167 0.182954 0.000926 0.002636 5,6 1.002000 1.075427 max_interpret_gradecaps the worst interpretability grade allowed to remain (1 = best). Withmax_interpret_grade=2every feature graded worse than 2 is targeted for replacement; if it isNone(default) every improvable feature is attempted:df_grade = cpp.simplify(df_feat=df_feat, labels=labels, max_interpret_grade=2) aa.display_df(df_grade, show_shape=True)
DataFrame shape: (96, 15)
feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions feat_importance feat_importance_std 1 TMD_C_JMD_C-Seg...3,4)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.244000 0.103666 0.103666 0.106692 0.110506 0.000000 0.000000 31,32,33,34,35 0.970400 1.438918 2 TMD_C_JMD_C-Seg...3,4)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.243000 0.085064 0.085064 0.098774 0.096946 0.000000 0.000000 31,32,33,34,35 0.000000 0.000000 3 TMD_C_JMD_C-Seg...6,9)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.233000 0.137044 0.137044 0.161683 0.176964 0.000000 0.000001 32,33 1.554800 2.109848 4 TMD_C_JMD_C-Seg...6,9)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.223000 0.095071 0.095071 0.114758 0.132829 0.000000 0.000002 32,33 0.000000 0.000000 5 TMD_C_JMD_C-Seg...2,3)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.222000 0.058671 0.058671 0.064895 0.069547 0.000000 0.000001 27,28,29,30,31,32,33 0.000000 0.000000 6 TMD_C_JMD_C-Seg...3,4)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.215000 0.124317 0.124317 0.166309 0.153364 0.000000 0.000004 31,32,33,34,35 1.080400 1.296094 7 TMD_C_JMD_C-Seg...,10)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.212000 0.141305 -0.141305 0.168603 0.217235 0.000000 0.000005 33,34 1.747200 2.150664 8 TMD_C_JMD_C-Seg...6,9)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.125350 0.125350 0.160819 0.174121 0.000000 0.000005 32,33 1.788800 2.700803 9 TMD_C_JMD_C-Seg...2,3)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.077355 0.077355 0.102965 0.107453 0.000000 0.000005 27,28,29,30,31,32,33 3.048800 3.623912 10 TMD_C_JMD_C-Seg...6,9)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.205000 0.125868 -0.125868 0.172165 0.188333 0.000000 0.000009 32,33 0.000000 0.000000 11 TMD_C_JMD_C-Seg...4,5)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.204000 0.105513 0.105513 0.132849 0.145219 0.000000 0.000009 33,34,35,36 1.992000 2.929460 12 JMD_N_TMD_N-Seg...1,2)-KARP850101 Structure-Activity Flexibility Flexibility (0 ...igid neighbors) Flexibility par...s-Schulz, 1985) 0.196000 0.062671 0.062671 0.083456 0.090427 0.000000 0.000023 1,2,3,4,5,6,7,8,9,10 1.574400 1.835403 13 TMD_C_JMD_C-Seg...4,5)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.193000 0.076770 0.076770 0.092804 0.114150 0.000000 0.000027 33,34,35,36 0.000000 0.000000 14 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.189000 0.125674 0.125674 0.183876 0.218813 0.000001 0.000039 28,29 4.729200 4.776785 15 TMD_C_JMD_C-Seg...6,9)-KOEH090103 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.323000 0.248255 0.248255 0.196374 0.181558 0.000000 0.000000 32,33 nan nan 16 TMD_C_JMD_C-Seg...4,5)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.185000 0.105474 -0.105474 0.157535 0.163039 0.000001 0.000059 33,34,35,36 0.000000 0.000000 17 TMD_C_JMD_C-Seg...6,9)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.185000 0.101798 0.101798 0.145676 0.155096 0.000001 0.000054 32,33 0.000000 0.000000 18 TMD_C_JMD_C-Pat...,15)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.184000 0.062096 0.062096 0.078809 0.091271 0.000000 0.000017 26,30,33 0.147200 0.345306 19 JMD_N_TMD_N-Seg...2,4)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.183000 0.063902 -0.063902 0.090842 0.101427 0.000002 0.000068 6,7,8,9,10 0.823200 1.404583 20 TMD-Pattern(C,3...,15)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.198000 0.109532 0.109532 0.133076 0.159918 0.000122 0.000122 16,20,24,28 nan nan 21 TMD-Pattern(C,3...,15)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.182000 0.096246 0.096246 0.160859 0.159538 0.000002 0.000070 16,20,24,28 0.508400 0.738667 22 JMD_N_TMD_N-Seg...2,4)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.182000 0.066394 -0.066394 0.097857 0.103426 0.000002 0.000070 6,7,8,9,10 0.000000 0.000000 23 TMD_C_JMD_C-Seg...2,3)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.182000 0.063819 0.063819 0.101691 0.105987 0.000002 0.000071 27,28,29,30,31,32,33 0.000000 0.000000 24 TMD_C_JMD_C-Seg...,11)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.181000 0.057287 -0.057287 0.072234 0.106512 0.000002 0.000076 28,29 1.919600 2.094497 25 TMD_C_JMD_C-Pat...,12)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.220000 0.147418 0.147418 0.172594 0.195572 0.000020 0.000020 25,29,32 nan nan 26 TMD-PeriodicPat...3,4)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.180000 0.069277 -0.069277 0.094949 0.119524 0.000002 0.000082 13,16,20,23,27 1.818000 2.308293 27 JMD_N_TMD_N-Pat...,15)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.179000 0.115042 -0.115042 0.151938 0.189623 0.000002 0.000068 6,9,12,15 0.648400 1.061142 28 TMD-Pattern(C,4,7)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.176000 0.120892 0.120892 0.198986 0.216030 0.000004 0.000113 24,27 0.714800 1.118149 29 TMD_C_JMD_C-Pat...4,8)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.176000 0.087846 0.087846 0.140464 0.157561 0.000004 0.000113 24,28 2.704000 4.076269 30 TMD_C_JMD_C-Pat...,12)-BULH740102 ASA/Volume Partial specific volume Partial specific volume Apparent partia...l-Breese, 1974) 0.262000 0.153437 0.153437 0.130978 0.164028 0.000000 0.000000 21,24,28,32 nan nan 31 TMD-Pattern(C,4,7)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.176000 0.056675 -0.056675 0.099355 0.114698 0.000004 0.000113 24,27 0.372000 0.882270 32 TMD_C_JMD_C-Seg...2,3)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.175000 0.055597 -0.055597 0.089100 0.105827 0.000005 0.000126 27,28,29,30,31,32,33 0.664000 1.089536 33 TMD_C_JMD_C-Pat...,11)-QIAN880122 Conformation β-strand β-sheet Weights for bet...ejnowski, 1988) 0.173000 0.056328 0.056328 0.067428 0.094795 0.000006 0.000147 25,28,31 0.483200 0.913371 34 JMD_N_TMD_N-Per...3,2)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.172000 0.087470 -0.087470 0.135114 0.144731 0.000005 0.000137 2,5,8,11,14,17,20 0.444000 0.721620 35 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan 36 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan 37 TMD_C_JMD_C-Seg...2,3)-KOEH090106 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.224000 0.088025 0.088025 0.095746 0.124611 0.000014 0.000014 27,28,29,30,31,32,33 nan nan 38 TMD_C_JMD_C-Seg...4,5)-OOBM770101 Polarity Hydrophilicity Non-bonded energy per atom Average non-bon...take-Ooi, 1977) 0.277000 0.217063 0.217063 0.180330 0.208994 0.000000 0.000000 33,34,35,36 nan nan 39 TMD_C_JMD_C-Pat...,14)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.168000 0.086323 0.086323 0.121405 0.138577 0.000000 0.000030 30,34 0.140400 0.391229 40 TMD_C_JMD_C-Seg...4,5)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.167000 0.080568 0.080568 0.128898 0.128726 0.000011 0.000218 33,34,35,36 1.299200 2.159535 41 TMD_C_JMD_C-Seg...4,5)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.166000 0.081797 0.081797 0.121170 0.149555 0.000013 0.000239 33,34,35,36 1.295200 2.225137 42 TMD-Pattern(C,4,7)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.121210 -0.121210 0.143560 0.207767 0.000015 0.000254 24,27 1.302000 1.466618 43 TMD_C_JMD_C-Pat...5,8)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.119568 -0.119568 0.143560 0.205817 0.000014 0.000253 25,28 0.000000 0.000000 44 TMD_C_JMD_C-Seg...5,7)-TANS770108 Conformation β/α-bridge β/α-bridge Normalized freq...Scheraga, 1977) 0.164000 0.079708 0.079708 0.135324 0.137910 0.000016 0.000271 32,33,34 0.462400 0.706967 45 TMD-PeriodicPat...3,1)-OOBM850101 Structure-Activity Stability Stability (extended-coil) Optimized beta-...e et al., 1985) 0.197000 0.046799 0.046799 0.052251 0.070467 0.000133 0.000133 12,15,18,21,24,27,30 nan nan 46 TMD-Pattern(C,4,7)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.300000 0.236619 0.236619 0.165353 0.219458 0.000000 0.000000 24,27 nan nan 47 TMD_C_JMD_C-Pat...,11)-EISD860101 Polarity Hydrophobicity Solvation free energy Solvation free ...cLachlan, 1986) 0.162000 0.083936 -0.083936 0.143338 0.147948 0.000021 0.000304 30,33,37 0.330400 0.377566 48 TMD_C_JMD_C-Pat...5,8)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.162000 0.070292 -0.070292 0.096915 0.128362 0.000020 0.000302 21,25,28 1.528400 2.418922 49 TMD-Pattern(C,4...,11)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.161000 0.068424 -0.068424 0.096915 0.126975 0.000024 0.000332 20,24,27 0.000000 0.000000 50 JMD_N_TMD_N-Seg...2,4)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.161000 0.058976 -0.058976 0.096823 0.114647 0.000025 0.000335 6,7,8,9,10 0.000000 0.000000 51 JMD_N_TMD_N-Pat...,11)-PRAM820103 Shape Shape and Surface Correlation coe...t in regression Correlation coe...nnuswamy, 1982) 0.161000 0.057828 0.057828 0.088362 0.106085 0.000024 0.000328 1,5,8,11 1.304400 1.657101 52 TMD_C_JMD_C-Seg...5,7)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.160000 0.059281 -0.059281 0.100693 0.120806 0.000027 0.000359 32,33,34 0.757200 1.471249 53 TMD_C_JMD_C-Pat...4,8)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.159000 0.103808 0.103808 0.140977 0.179008 0.000014 0.000248 33,37 0.233200 0.593921 54 JMD_N_TMD_N-Seg...,13)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.157000 0.127895 -0.127895 0.151304 0.258491 0.000035 0.000420 5,6 0.833200 1.360696 55 TMD_C_JMD_C-Pat...4,8)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.155000 0.110281 0.110281 0.178578 0.202098 0.000046 0.000486 33,37,40 0.272400 0.623809 56 JMD_N_TMD_N-Pat...,12)-AURR980113 Conformation α-helix α-helix Normalized posi...ora-Rose, 1998) 0.208000 0.105550 -0.105550 0.151448 0.143693 0.000055 0.000055 9,12,15 nan nan 57 JMD_N_TMD_N-Pat...,15)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.155000 0.059593 -0.059593 0.104862 0.110749 0.000050 0.000508 6,9,12,15 0.482000 0.672000 58 JMD_N_TMD_N-Pat...,11)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.154000 0.092099 -0.092099 0.142836 0.171547 0.000052 0.000520 4,7,11 1.065200 1.916900 59 TMD_C_JMD_C-Seg...,10)-CHAM820102 Polarity Hydrophobicity (interface) Free energy (interface) Free energy of ...-Charton, 1982) 0.154000 0.082300 -0.082300 0.136264 0.177551 0.000050 0.000508 33,34 0.366800 0.691767 60 TMD-Pattern(C,5...,12)-FAUJ880107 Structure-Activity Stability α-CH chemical s...kbone-dynamics) N.m.r. chemical...e et al., 1988) 0.123000 0.065435 -0.065435 0.173044 0.140726 0.017378 0.017378 19,22,26 nan nan 61 TMD_C_JMD_C-Pat...,11)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.153000 0.085041 -0.085041 0.135864 0.161279 0.000059 0.000561 30,33,37 0.473600 0.930690 62 TMD_C_JMD_C-Pat...,15)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.153000 0.069595 -0.069595 0.107314 0.134698 0.000060 0.000566 26,29,33 0.770800 1.299178 63 TMD_C_JMD_C-Pat...,15)-LINS030106 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded proteins) Hydrophilic med...s et al., 2003) 0.151000 0.071208 0.071208 0.136279 0.155749 0.000078 0.000657 26,30,33 0.326400 0.451202 64 TMD-Pattern(C,3...,14)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.150000 0.056439 -0.056439 0.094520 0.108682 0.000084 0.000685 17,20,24,28 0.684400 0.941892 65 TMD_C_JMD_C-Pat...,10)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.149000 0.073526 0.073526 0.133612 0.157088 0.000090 0.000714 31,34,38 2.050800 2.338278 66 JMD_N_TMD_N-Seg...2,6)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.148000 0.076361 -0.076361 0.140513 0.148387 0.000108 0.000790 4,5,6 0.537200 1.041739 67 JMD_N_TMD_N-Pat...,15)-BROC820101 Polarity Hydrophobicity Hydrophobicity ...on coefficient) Retention Coeff...e et al., 1982) 0.148000 0.067069 -0.067069 0.120409 0.137261 0.000103 0.000768 6,9,12,15 0.106400 0.249766 68 TMD_C_JMD_C-Pat...4,8)-CORJ870108 Polarity Hydrophilicity TOTLS index TOTLS index (Co...e et al., 1987) 0.251000 0.170889 -0.170889 0.167914 0.219014 0.000001 0.000001 24,28 nan nan 69 TMD_C_JMD_C-Seg...2,5)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.260000 0.141016 0.141016 0.107804 0.160336 0.000000 0.000000 25,26,27,28 nan nan 70 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.333000 0.246278 0.246278 0.183876 0.212529 0.000000 0.000000 28,29 nan nan 71 JMD_N_TMD_N-Pat...,11)-BIGC670101 ASA/Volume Volume Volume Residue volume (Bigelow, 1967) 0.143000 0.067181 -0.067181 0.141579 0.135502 0.000184 0.001045 5,8,11 0.382000 0.675082 72 JMD_N_TMD_N-Pat...,11)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.142000 0.070908 -0.070908 0.135389 0.144272 0.000190 0.001062 5,8,11 0.384400 0.570074 73 JMD_N_TMD_N-Pat...,14)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.142000 0.058743 -0.058743 0.117342 0.120311 0.000187 0.001056 6,10,14 0.197200 0.344958 74 TMD-Pattern(N,4,7)-QIAN880113 Conformation π-helix α-helix (C-terminal) Weights for alp...ejnowski, 1988) 0.141000 0.070553 -0.070553 0.164819 0.154840 0.000217 0.001151 14,17 0.634800 0.816456 75 JMD_N_TMD_N-Seg...3,6)-PALJ810111 Conformation β-sheet β-sheet Normalized freq...u et al., 1981) 0.108000 0.053468 -0.053468 0.129405 0.141614 0.035900 0.035900 7,8,9,10 nan nan 76 TMD_C_JMD_C-Pat...4,8)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.140000 0.066859 0.066859 0.130397 0.147129 0.000229 0.001185 33,37,40 0.334800 0.632640 77 JMD_N_TMD_N-Seg...2,7)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.139000 0.070195 -0.070195 0.113589 0.146944 0.000259 0.001276 3,4,5 0.498800 0.924962 78 JMD_N_TMD_N-Seg...2,6)-RACS770103 ASA/Volume Accessible surface area (ASA) Side chain orientation Side chain orie...Scheraga, 1977) 0.138000 0.069674 0.069674 0.151437 0.143090 0.000308 0.001398 4,5,6 0.214400 0.501327 79 TMD_C_JMD_C-Seg...4,8)-MIYS850101 Polarity Hydrophobicity Effective partition energy Effective parti...Jernigan, 1985) 0.215000 0.123198 0.123198 0.127012 0.166940 0.000030 0.000030 28,29,30 nan nan 80 JMD_N_TMD_N-Seg...7,8)-KARS160114 Shape Side chain length Eccentricity (average) Average weighte...-Knisley, 2016) 0.137000 0.056352 -0.056352 0.122287 0.122893 0.000322 0.001432 16,17 1.170800 1.925978 81 TMD_C_JMD_C-Seg...,14)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.136000 0.080537 -0.080537 0.194254 0.165343 0.000150 0.000932 26,27 0.638000 0.796859 82 TMD_C_JMD_C-Seg...,10)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.136000 0.072267 -0.072267 0.142246 0.173638 0.000352 0.001527 33,34 1.013200 1.315181 83 TMD_C_JMD_C-Pat...5,8)-LINS030107 ASA/Volume Accessible surface area (ASA) ASA (folded protein) % total accessi...s et al., 2003) 0.136000 0.064864 -0.064864 0.078387 0.131618 0.000367 0.001565 25,28 0.842000 0.904274 84 JMD_N_TMD_N-Pat...6,9)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.135000 0.062723 -0.062723 0.120282 0.141044 0.000396 0.001638 3,6,9 0.696800 1.062095 85 TMD-Pattern(N,1...4,7)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.135000 0.058024 -0.058024 0.115415 0.124556 0.000385 0.001610 11,14,17 0.244400 0.503183 86 JMD_N_TMD_N-Seg...,10)-CRAJ730102 Conformation β-sheet β-sheet Normalized freq...d et al., 1973) 0.134000 0.096792 -0.096792 0.182935 0.210285 0.000461 0.001775 5,6 0.485600 0.792949 87 JMD_N_TMD_N-Pat...,14)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.133000 0.071020 0.071020 0.161372 0.138873 0.000491 0.001836 6,10,14 0.000000 0.000000 88 JMD_N_TMD_N-Seg...7,9)-BURA740102 Conformation β-strand Extended Normalized freq...s et al., 1974) 0.132000 0.056043 0.056043 0.119813 0.123454 0.000562 0.001981 14,15 0.231600 0.356019 89 TMD-Segment(3,8)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.132000 0.055783 -0.055783 0.129933 0.133383 0.000558 0.001977 16,17 0.502400 0.761626 90 TMD-Pattern(N,1...,10)-QIAN880124 Conformation β-sheet (C-term) β-sheet (C-terminal) Weights for bet...ejnowski, 1988) 0.131000 0.069857 0.069857 0.157078 0.159138 0.000580 0.002008 11,14,17,20 0.502800 0.811308 91 JMD_N_TMD_N-Pat...,11)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.178000 0.109835 0.109835 0.133383 0.179887 0.000587 0.000587 10,13,17 nan nan 92 JMD_N_TMD_N-Pat...,14)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.130000 0.067433 -0.067433 0.133237 0.146065 0.000642 0.002130 7,11,14 0.306800 0.574245 93 TMD_C_JMD_C-Pat...3,7)-MAXF760105 Conformation α-helix (left-handed) α-helix (left-handed) Normalized freq...Scheraga, 1976) 0.129000 0.071374 0.071374 0.180851 0.152571 0.000727 0.002285 23,27 0.000000 0.000000 94 TMD_C_JMD_C-Pat...,11)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.128000 0.062708 -0.062708 0.113629 0.123346 0.000767 0.002362 25,28,31 0.407200 0.686822 95 JMD_N_TMD_N-Seg...2,3)-CHAM830105 Shape Side chain length n atoms in side chain (3+1) The number of a...-Charton, 1983) 0.128000 0.057140 -0.057140 0.128493 0.130946 0.000672 0.002187 7,8,9,10,11,12,13 0.121600 0.273037 96 JMD_N_TMD_N-Seg...3,9)-ISOY800102 Conformation β-strand Extended Normalized rela...i et al., 1980) 0.126000 0.079975 -0.079975 0.169167 0.182954 0.000926 0.002636 5,6 1.002000 1.075427 The strategy controls how swaps are chosen and validated.
greedyswaps feature by feature behind the CV gate;consolidatebatches features by subcategory toward the fewest subcategories;swap_allapplies every eligible swap with no cross-validation (fastest):import pandas as pd rows = [] for strategy in ["greedy", "consolidate", "swap_all"]: out = cpp.simplify(df_feat=df_feat, labels=labels, strategy=strategy) rows.append([strategy, len(out), out["subcategory"].nunique()]) df_strategies = pd.DataFrame(rows, columns=["strategy", "n_features", "n_subcategories"]) aa.display_df(df_strategies)
strategy n_features n_subcategories 1 greedy 94 25 2 consolidate 92 25 3 swap_all 94 25 The candidate_search mode trades exactness for speed.
exact(default) tests every eligible candidate per feature and reproduces the result exactly;fastis an opt-in heuristic that caps the search to the most promising candidates (best interpretability, then strongest correlation), which speeds up large scale pools — it mainly acceleratesgreedy(swap_allalready stops at the first viable candidate). The kept-feature set stays close toexact:df_exact = cpp.simplify(df_feat=df_feat, labels=labels, candidate_search="exact") df_fast = cpp.simplify(df_feat=df_feat, labels=labels, candidate_search="fast") kept_exact, kept_fast = set(df_exact["feature"]), set(df_fast["feature"]) jaccard = len(kept_exact & kept_fast) / len(kept_exact | kept_fast) df_cs = pd.DataFrame([["exact", len(df_exact)], ["fast", len(df_fast)]], columns=["candidate_search", "n_features"]) print(f"kept-feature Jaccard (fast vs exact): {jaccard:.2f}") aa.display_df(df_cs)
kept-feature Jaccard (fast vs exact): 1.00
candidate_search n_features 1 exact 94 2 fast 94 The cross-validation gate (
greedy/consolidate) decides whether a swap is kept.ml_modelselects the classifier — a preset'svm'(default; fast),'rf'(recommended for non-linear relationships, slower), or'log_reg'(fastest), or a custom scikit-learn estimator instance.ml_metricis the scoring metric,ml_cvthe number of folds, andml_ththe tolerated CV-score drop (a swap is kept if its score is at leastbaseline - ml_th):df_rf = cpp.simplify(df_feat=df_feat, labels=labels, max_interpret_grade=2, ml_model="rf", ml_metric="accuracy", ml_cv=3, ml_th=0.05) aa.display_df(df_rf, show_shape=True)
DataFrame shape: (96, 15)
feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions feat_importance feat_importance_std 1 TMD_C_JMD_C-Seg...3,4)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.244000 0.103666 0.103666 0.106692 0.110506 0.000000 0.000000 31,32,33,34,35 0.970400 1.438918 2 TMD_C_JMD_C-Seg...3,4)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.243000 0.085064 0.085064 0.098774 0.096946 0.000000 0.000000 31,32,33,34,35 0.000000 0.000000 3 TMD_C_JMD_C-Seg...6,9)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.233000 0.137044 0.137044 0.161683 0.176964 0.000000 0.000001 32,33 1.554800 2.109848 4 TMD_C_JMD_C-Seg...6,9)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.223000 0.095071 0.095071 0.114758 0.132829 0.000000 0.000002 32,33 0.000000 0.000000 5 TMD_C_JMD_C-Seg...2,3)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.222000 0.058671 0.058671 0.064895 0.069547 0.000000 0.000001 27,28,29,30,31,32,33 0.000000 0.000000 6 TMD_C_JMD_C-Seg...3,4)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.215000 0.124317 0.124317 0.166309 0.153364 0.000000 0.000004 31,32,33,34,35 1.080400 1.296094 7 TMD_C_JMD_C-Seg...,10)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.212000 0.141305 -0.141305 0.168603 0.217235 0.000000 0.000005 33,34 1.747200 2.150664 8 TMD_C_JMD_C-Seg...6,9)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.125350 0.125350 0.160819 0.174121 0.000000 0.000005 32,33 1.788800 2.700803 9 TMD_C_JMD_C-Seg...2,3)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.077355 0.077355 0.102965 0.107453 0.000000 0.000005 27,28,29,30,31,32,33 3.048800 3.623912 10 TMD_C_JMD_C-Seg...6,9)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.205000 0.125868 -0.125868 0.172165 0.188333 0.000000 0.000009 32,33 0.000000 0.000000 11 TMD_C_JMD_C-Seg...4,5)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.204000 0.105513 0.105513 0.132849 0.145219 0.000000 0.000009 33,34,35,36 1.992000 2.929460 12 JMD_N_TMD_N-Seg...1,2)-KARP850101 Structure-Activity Flexibility Flexibility (0 ...igid neighbors) Flexibility par...s-Schulz, 1985) 0.196000 0.062671 0.062671 0.083456 0.090427 0.000000 0.000023 1,2,3,4,5,6,7,8,9,10 1.574400 1.835403 13 TMD_C_JMD_C-Seg...4,5)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.193000 0.076770 0.076770 0.092804 0.114150 0.000000 0.000027 33,34,35,36 0.000000 0.000000 14 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.189000 0.125674 0.125674 0.183876 0.218813 0.000001 0.000039 28,29 4.729200 4.776785 15 TMD_C_JMD_C-Seg...6,9)-KOEH090103 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.323000 0.248255 0.248255 0.196374 0.181558 0.000000 0.000000 32,33 nan nan 16 TMD_C_JMD_C-Seg...4,5)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.185000 0.105474 -0.105474 0.157535 0.163039 0.000001 0.000059 33,34,35,36 0.000000 0.000000 17 TMD_C_JMD_C-Seg...6,9)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.185000 0.101798 0.101798 0.145676 0.155096 0.000001 0.000054 32,33 0.000000 0.000000 18 TMD_C_JMD_C-Pat...,15)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.184000 0.062096 0.062096 0.078809 0.091271 0.000000 0.000017 26,30,33 0.147200 0.345306 19 JMD_N_TMD_N-Seg...2,4)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.183000 0.063902 -0.063902 0.090842 0.101427 0.000002 0.000068 6,7,8,9,10 0.823200 1.404583 20 TMD-Pattern(C,3...,15)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.198000 0.109532 0.109532 0.133076 0.159918 0.000122 0.000122 16,20,24,28 nan nan 21 TMD-Pattern(C,3...,15)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.182000 0.096246 0.096246 0.160859 0.159538 0.000002 0.000070 16,20,24,28 0.508400 0.738667 22 JMD_N_TMD_N-Seg...2,4)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.182000 0.066394 -0.066394 0.097857 0.103426 0.000002 0.000070 6,7,8,9,10 0.000000 0.000000 23 TMD_C_JMD_C-Seg...2,3)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.182000 0.063819 0.063819 0.101691 0.105987 0.000002 0.000071 27,28,29,30,31,32,33 0.000000 0.000000 24 TMD_C_JMD_C-Seg...,11)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.181000 0.057287 -0.057287 0.072234 0.106512 0.000002 0.000076 28,29 1.919600 2.094497 25 TMD_C_JMD_C-Pat...,12)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.220000 0.147418 0.147418 0.172594 0.195572 0.000020 0.000020 25,29,32 nan nan 26 TMD-PeriodicPat...3,4)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.180000 0.069277 -0.069277 0.094949 0.119524 0.000002 0.000082 13,16,20,23,27 1.818000 2.308293 27 JMD_N_TMD_N-Pat...,15)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.179000 0.115042 -0.115042 0.151938 0.189623 0.000002 0.000068 6,9,12,15 0.648400 1.061142 28 TMD-Pattern(C,4,7)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.176000 0.120892 0.120892 0.198986 0.216030 0.000004 0.000113 24,27 0.714800 1.118149 29 TMD_C_JMD_C-Pat...4,8)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.176000 0.087846 0.087846 0.140464 0.157561 0.000004 0.000113 24,28 2.704000 4.076269 30 TMD_C_JMD_C-Pat...,12)-BULH740102 ASA/Volume Partial specific volume Partial specific volume Apparent partia...l-Breese, 1974) 0.262000 0.153437 0.153437 0.130978 0.164028 0.000000 0.000000 21,24,28,32 nan nan 31 TMD-Pattern(C,4,7)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.176000 0.056675 -0.056675 0.099355 0.114698 0.000004 0.000113 24,27 0.372000 0.882270 32 TMD_C_JMD_C-Seg...2,3)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.175000 0.055597 -0.055597 0.089100 0.105827 0.000005 0.000126 27,28,29,30,31,32,33 0.664000 1.089536 33 TMD_C_JMD_C-Pat...,11)-QIAN880122 Conformation β-strand β-sheet Weights for bet...ejnowski, 1988) 0.173000 0.056328 0.056328 0.067428 0.094795 0.000006 0.000147 25,28,31 0.483200 0.913371 34 JMD_N_TMD_N-Per...3,2)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.172000 0.087470 -0.087470 0.135114 0.144731 0.000005 0.000137 2,5,8,11,14,17,20 0.444000 0.721620 35 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan 36 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan 37 TMD_C_JMD_C-Seg...2,3)-KOEH090106 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.224000 0.088025 0.088025 0.095746 0.124611 0.000014 0.000014 27,28,29,30,31,32,33 nan nan 38 TMD_C_JMD_C-Seg...4,5)-OOBM770101 Polarity Hydrophilicity Non-bonded energy per atom Average non-bon...take-Ooi, 1977) 0.277000 0.217063 0.217063 0.180330 0.208994 0.000000 0.000000 33,34,35,36 nan nan 39 TMD_C_JMD_C-Pat...,14)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.168000 0.086323 0.086323 0.121405 0.138577 0.000000 0.000030 30,34 0.140400 0.391229 40 TMD_C_JMD_C-Seg...4,5)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.167000 0.080568 0.080568 0.128898 0.128726 0.000011 0.000218 33,34,35,36 1.299200 2.159535 41 TMD_C_JMD_C-Seg...4,5)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.166000 0.081797 0.081797 0.121170 0.149555 0.000013 0.000239 33,34,35,36 1.295200 2.225137 42 TMD-Pattern(C,4,7)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.121210 -0.121210 0.143560 0.207767 0.000015 0.000254 24,27 1.302000 1.466618 43 TMD_C_JMD_C-Pat...5,8)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.119568 -0.119568 0.143560 0.205817 0.000014 0.000253 25,28 0.000000 0.000000 44 TMD_C_JMD_C-Seg...5,7)-TANS770108 Conformation β/α-bridge β/α-bridge Normalized freq...Scheraga, 1977) 0.164000 0.079708 0.079708 0.135324 0.137910 0.000016 0.000271 32,33,34 0.462400 0.706967 45 TMD-PeriodicPat...3,1)-OOBM850101 Structure-Activity Stability Stability (extended-coil) Optimized beta-...e et al., 1985) 0.197000 0.046799 0.046799 0.052251 0.070467 0.000133 0.000133 12,15,18,21,24,27,30 nan nan 46 TMD-Pattern(C,4,7)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.300000 0.236619 0.236619 0.165353 0.219458 0.000000 0.000000 24,27 nan nan 47 TMD_C_JMD_C-Pat...,11)-EISD860101 Polarity Hydrophobicity Solvation free energy Solvation free ...cLachlan, 1986) 0.162000 0.083936 -0.083936 0.143338 0.147948 0.000021 0.000304 30,33,37 0.330400 0.377566 48 TMD_C_JMD_C-Pat...5,8)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.162000 0.070292 -0.070292 0.096915 0.128362 0.000020 0.000302 21,25,28 1.528400 2.418922 49 TMD-Pattern(C,4...,11)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.161000 0.068424 -0.068424 0.096915 0.126975 0.000024 0.000332 20,24,27 0.000000 0.000000 50 JMD_N_TMD_N-Seg...2,4)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.161000 0.058976 -0.058976 0.096823 0.114647 0.000025 0.000335 6,7,8,9,10 0.000000 0.000000 51 JMD_N_TMD_N-Pat...,11)-PRAM820103 Shape Shape and Surface Correlation coe...t in regression Correlation coe...nnuswamy, 1982) 0.161000 0.057828 0.057828 0.088362 0.106085 0.000024 0.000328 1,5,8,11 1.304400 1.657101 52 TMD_C_JMD_C-Seg...5,7)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.160000 0.059281 -0.059281 0.100693 0.120806 0.000027 0.000359 32,33,34 0.757200 1.471249 53 TMD_C_JMD_C-Pat...4,8)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.159000 0.103808 0.103808 0.140977 0.179008 0.000014 0.000248 33,37 0.233200 0.593921 54 JMD_N_TMD_N-Seg...,13)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.157000 0.127895 -0.127895 0.151304 0.258491 0.000035 0.000420 5,6 0.833200 1.360696 55 TMD_C_JMD_C-Pat...4,8)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.155000 0.110281 0.110281 0.178578 0.202098 0.000046 0.000486 33,37,40 0.272400 0.623809 56 JMD_N_TMD_N-Pat...,12)-AURR980113 Conformation α-helix α-helix Normalized posi...ora-Rose, 1998) 0.208000 0.105550 -0.105550 0.151448 0.143693 0.000055 0.000055 9,12,15 nan nan 57 JMD_N_TMD_N-Pat...,15)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.155000 0.059593 -0.059593 0.104862 0.110749 0.000050 0.000508 6,9,12,15 0.482000 0.672000 58 JMD_N_TMD_N-Pat...,11)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.154000 0.092099 -0.092099 0.142836 0.171547 0.000052 0.000520 4,7,11 1.065200 1.916900 59 TMD_C_JMD_C-Seg...,10)-CHAM820102 Polarity Hydrophobicity (interface) Free energy (interface) Free energy of ...-Charton, 1982) 0.154000 0.082300 -0.082300 0.136264 0.177551 0.000050 0.000508 33,34 0.366800 0.691767 60 TMD-Pattern(C,5...,12)-FAUJ880107 Structure-Activity Stability α-CH chemical s...kbone-dynamics) N.m.r. chemical...e et al., 1988) 0.123000 0.065435 -0.065435 0.173044 0.140726 0.017378 0.017378 19,22,26 nan nan 61 TMD_C_JMD_C-Pat...,11)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.153000 0.085041 -0.085041 0.135864 0.161279 0.000059 0.000561 30,33,37 0.473600 0.930690 62 TMD_C_JMD_C-Pat...,15)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.153000 0.069595 -0.069595 0.107314 0.134698 0.000060 0.000566 26,29,33 0.770800 1.299178 63 TMD_C_JMD_C-Pat...,15)-LINS030106 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded proteins) Hydrophilic med...s et al., 2003) 0.151000 0.071208 0.071208 0.136279 0.155749 0.000078 0.000657 26,30,33 0.326400 0.451202 64 TMD-Pattern(C,3...,14)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.150000 0.056439 -0.056439 0.094520 0.108682 0.000084 0.000685 17,20,24,28 0.684400 0.941892 65 TMD_C_JMD_C-Pat...,10)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.149000 0.073526 0.073526 0.133612 0.157088 0.000090 0.000714 31,34,38 2.050800 2.338278 66 JMD_N_TMD_N-Seg...2,6)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.148000 0.076361 -0.076361 0.140513 0.148387 0.000108 0.000790 4,5,6 0.537200 1.041739 67 JMD_N_TMD_N-Pat...,15)-BROC820101 Polarity Hydrophobicity Hydrophobicity ...on coefficient) Retention Coeff...e et al., 1982) 0.148000 0.067069 -0.067069 0.120409 0.137261 0.000103 0.000768 6,9,12,15 0.106400 0.249766 68 TMD_C_JMD_C-Pat...4,8)-CORJ870108 Polarity Hydrophilicity TOTLS index TOTLS index (Co...e et al., 1987) 0.251000 0.170889 -0.170889 0.167914 0.219014 0.000001 0.000001 24,28 nan nan 69 TMD_C_JMD_C-Seg...2,5)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.260000 0.141016 0.141016 0.107804 0.160336 0.000000 0.000000 25,26,27,28 nan nan 70 TMD_C_JMD_C-Seg...,11)-OOBM850101 Structure-Activity Stability Stability (extended-coil) Optimized beta-...e et al., 1985) 0.280000 0.112063 0.112063 0.084763 0.131666 0.000000 0.000000 28,29 nan nan 71 JMD_N_TMD_N-Pat...,11)-BIGC670101 ASA/Volume Volume Volume Residue volume (Bigelow, 1967) 0.143000 0.067181 -0.067181 0.141579 0.135502 0.000184 0.001045 5,8,11 0.382000 0.675082 72 JMD_N_TMD_N-Pat...,11)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.142000 0.070908 -0.070908 0.135389 0.144272 0.000190 0.001062 5,8,11 0.384400 0.570074 73 JMD_N_TMD_N-Pat...,14)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.142000 0.058743 -0.058743 0.117342 0.120311 0.000187 0.001056 6,10,14 0.197200 0.344958 74 TMD-Pattern(N,4,7)-QIAN880113 Conformation π-helix α-helix (C-terminal) Weights for alp...ejnowski, 1988) 0.141000 0.070553 -0.070553 0.164819 0.154840 0.000217 0.001151 14,17 0.634800 0.816456 75 JMD_N_TMD_N-Seg...3,6)-PALJ810111 Conformation β-sheet β-sheet Normalized freq...u et al., 1981) 0.108000 0.053468 -0.053468 0.129405 0.141614 0.035900 0.035900 7,8,9,10 nan nan 76 TMD_C_JMD_C-Pat...4,8)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.140000 0.066859 0.066859 0.130397 0.147129 0.000229 0.001185 33,37,40 0.334800 0.632640 77 JMD_N_TMD_N-Seg...2,7)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.139000 0.070195 -0.070195 0.113589 0.146944 0.000259 0.001276 3,4,5 0.498800 0.924962 78 JMD_N_TMD_N-Seg...2,6)-RACS770103 ASA/Volume Accessible surface area (ASA) Side chain orientation Side chain orie...Scheraga, 1977) 0.138000 0.069674 0.069674 0.151437 0.143090 0.000308 0.001398 4,5,6 0.214400 0.501327 79 TMD_C_JMD_C-Seg...4,8)-MIYS850101 Polarity Hydrophobicity Effective partition energy Effective parti...Jernigan, 1985) 0.215000 0.123198 0.123198 0.127012 0.166940 0.000030 0.000030 28,29,30 nan nan 80 JMD_N_TMD_N-Seg...7,8)-KARS160114 Shape Side chain length Eccentricity (average) Average weighte...-Knisley, 2016) 0.137000 0.056352 -0.056352 0.122287 0.122893 0.000322 0.001432 16,17 1.170800 1.925978 81 TMD_C_JMD_C-Seg...,14)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.136000 0.080537 -0.080537 0.194254 0.165343 0.000150 0.000932 26,27 0.638000 0.796859 82 TMD_C_JMD_C-Seg...,10)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.136000 0.072267 -0.072267 0.142246 0.173638 0.000352 0.001527 33,34 1.013200 1.315181 83 TMD_C_JMD_C-Pat...5,8)-LINS030107 ASA/Volume Accessible surface area (ASA) ASA (folded protein) % total accessi...s et al., 2003) 0.136000 0.064864 -0.064864 0.078387 0.131618 0.000367 0.001565 25,28 0.842000 0.904274 84 JMD_N_TMD_N-Pat...6,9)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.135000 0.062723 -0.062723 0.120282 0.141044 0.000396 0.001638 3,6,9 0.696800 1.062095 85 TMD-Pattern(N,1...4,7)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.135000 0.058024 -0.058024 0.115415 0.124556 0.000385 0.001610 11,14,17 0.244400 0.503183 86 JMD_N_TMD_N-Seg...,10)-CRAJ730102 Conformation β-sheet β-sheet Normalized freq...d et al., 1973) 0.134000 0.096792 -0.096792 0.182935 0.210285 0.000461 0.001775 5,6 0.485600 0.792949 87 JMD_N_TMD_N-Pat...,14)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.133000 0.071020 0.071020 0.161372 0.138873 0.000491 0.001836 6,10,14 0.000000 0.000000 88 JMD_N_TMD_N-Seg...7,9)-BURA740102 Conformation β-strand Extended Normalized freq...s et al., 1974) 0.132000 0.056043 0.056043 0.119813 0.123454 0.000562 0.001981 14,15 0.231600 0.356019 89 TMD-Segment(3,8)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.132000 0.055783 -0.055783 0.129933 0.133383 0.000558 0.001977 16,17 0.502400 0.761626 90 TMD-Pattern(N,1...,10)-QIAN880124 Conformation β-sheet (C-term) β-sheet (C-terminal) Weights for bet...ejnowski, 1988) 0.131000 0.069857 0.069857 0.157078 0.159138 0.000580 0.002008 11,14,17,20 0.502800 0.811308 91 JMD_N_TMD_N-Pat...,11)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.178000 0.109835 0.109835 0.133383 0.179887 0.000587 0.000587 10,13,17 nan nan 92 JMD_N_TMD_N-Pat...,14)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.130000 0.067433 -0.067433 0.133237 0.146065 0.000642 0.002130 7,11,14 0.306800 0.574245 93 TMD_C_JMD_C-Pat...3,7)-MAXF760105 Conformation α-helix (left-handed) α-helix (left-handed) Normalized freq...Scheraga, 1976) 0.129000 0.071374 0.071374 0.180851 0.152571 0.000727 0.002285 23,27 0.000000 0.000000 94 TMD_C_JMD_C-Pat...,11)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.128000 0.062708 -0.062708 0.113629 0.123346 0.000767 0.002362 25,28,31 0.407200 0.686822 95 JMD_N_TMD_N-Seg...2,3)-CHAM830105 Shape Side chain length n atoms in side chain (3+1) The number of a...-Charton, 1983) 0.128000 0.057140 -0.057140 0.128493 0.130946 0.000672 0.002187 7,8,9,10,11,12,13 0.121600 0.273037 96 JMD_N_TMD_N-Seg...3,9)-ISOY800102 Conformation β-strand Extended Normalized rela...i et al., 1980) 0.126000 0.079975 -0.079975 0.169167 0.182954 0.000926 0.002636 5,6 1.002000 1.075427 A custom estimator instance can be passed directly as
ml_model(used as-is):from sklearn.svm import SVC df_custom = cpp.simplify(df_feat=df_feat, labels=labels, max_interpret_grade=2, ml_model=SVC(kernel="linear")) aa.display_df(df_custom, show_shape=True)
DataFrame shape: (109, 15)
feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions feat_importance feat_importance_std 1 TMD_C_JMD_C-Seg...3,4)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.244000 0.103666 0.103666 0.106692 0.110506 0.000000 0.000000 31,32,33,34,35 0.970400 1.438918 2 TMD_C_JMD_C-Seg...3,4)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.243000 0.085064 0.085064 0.098774 0.096946 0.000000 0.000000 31,32,33,34,35 0.000000 0.000000 3 TMD_C_JMD_C-Seg...6,9)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.233000 0.137044 0.137044 0.161683 0.176964 0.000000 0.000001 32,33 1.554800 2.109848 4 TMD_C_JMD_C-Seg...6,9)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.223000 0.095071 0.095071 0.114758 0.132829 0.000000 0.000002 32,33 0.000000 0.000000 5 TMD_C_JMD_C-Seg...2,3)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.222000 0.058671 0.058671 0.064895 0.069547 0.000000 0.000001 27,28,29,30,31,32,33 0.000000 0.000000 6 TMD_C_JMD_C-Seg...3,4)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.215000 0.124317 0.124317 0.166309 0.153364 0.000000 0.000004 31,32,33,34,35 1.080400 1.296094 7 TMD_C_JMD_C-Seg...,10)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.212000 0.141305 -0.141305 0.168603 0.217235 0.000000 0.000005 33,34 1.747200 2.150664 8 TMD_C_JMD_C-Seg...6,9)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.125350 0.125350 0.160819 0.174121 0.000000 0.000005 32,33 1.788800 2.700803 9 TMD_C_JMD_C-Seg...2,3)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.077355 0.077355 0.102965 0.107453 0.000000 0.000005 27,28,29,30,31,32,33 3.048800 3.623912 10 TMD_C_JMD_C-Seg...6,9)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.205000 0.125868 -0.125868 0.172165 0.188333 0.000000 0.000009 32,33 0.000000 0.000000 11 TMD_C_JMD_C-Seg...4,5)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.204000 0.105513 0.105513 0.132849 0.145219 0.000000 0.000009 33,34,35,36 1.992000 2.929460 12 TMD_C_JMD_C-Seg...6,9)-RICJ880113 Conformation α-helix (C-cap) α-helix (C-terminal, inside) Relative prefer...chardson, 1988) 0.198000 0.138293 0.138293 0.172194 0.198814 0.000000 0.000017 32,33 0.832400 1.383718 13 JMD_N_TMD_N-Seg...1,2)-KARP850101 Structure-Activity Flexibility Flexibility (0 ...igid neighbors) Flexibility par...s-Schulz, 1985) 0.196000 0.062671 0.062671 0.083456 0.090427 0.000000 0.000023 1,2,3,4,5,6,7,8,9,10 1.574400 1.835403 14 TMD_C_JMD_C-Seg...4,5)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.193000 0.076770 0.076770 0.092804 0.114150 0.000000 0.000027 33,34,35,36 0.000000 0.000000 15 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.189000 0.125674 0.125674 0.183876 0.218813 0.000001 0.000039 28,29 4.729200 4.776785 16 TMD_C_JMD_C-Seg...6,9)-KOEH090103 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.323000 0.248255 0.248255 0.196374 0.181558 0.000000 0.000000 32,33 nan nan 17 TMD_C_JMD_C-Seg...4,5)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.185000 0.105474 -0.105474 0.157535 0.163039 0.000001 0.000059 33,34,35,36 0.000000 0.000000 18 TMD_C_JMD_C-Seg...6,9)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.185000 0.101798 0.101798 0.145676 0.155096 0.000001 0.000054 32,33 0.000000 0.000000 19 JMD_N_TMD_N-Pat...,10)-AURR980116 Conformation α-helix (C-cap) α-helix (C-terminal, C-cap) Normalized posi...ora-Rose, 1998) 0.184000 0.112728 -0.112728 0.166431 0.183800 0.000001 0.000061 11,15 0.857600 1.339550 20 TMD_C_JMD_C-Pat...,15)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.184000 0.062096 0.062096 0.078809 0.091271 0.000000 0.000017 26,30,33 0.147200 0.345306 21 JMD_N_TMD_N-Seg...2,4)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.183000 0.063902 -0.063902 0.090842 0.101427 0.000002 0.000068 6,7,8,9,10 0.823200 1.404583 22 TMD-Pattern(C,3...,15)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.198000 0.109532 0.109532 0.133076 0.159918 0.000122 0.000122 16,20,24,28 nan nan 23 TMD-Pattern(C,3...,15)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.182000 0.096246 0.096246 0.160859 0.159538 0.000002 0.000070 16,20,24,28 0.508400 0.738667 24 JMD_N_TMD_N-Seg...2,4)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.182000 0.066394 -0.066394 0.097857 0.103426 0.000002 0.000070 6,7,8,9,10 0.000000 0.000000 25 TMD_C_JMD_C-Seg...2,3)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.182000 0.063819 0.063819 0.101691 0.105987 0.000002 0.000071 27,28,29,30,31,32,33 0.000000 0.000000 26 TMD-Pattern(N,4,7)-AURR980116 Conformation α-helix (C-cap) α-helix (C-terminal, C-cap) Normalized posi...ora-Rose, 1998) 0.181000 0.118349 -0.118349 0.169282 0.185522 0.000002 0.000078 14,17 1.226400 1.510986 27 TMD_C_JMD_C-Seg...,11)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.181000 0.057287 -0.057287 0.072234 0.106512 0.000002 0.000076 28,29 1.919600 2.094497 28 TMD_C_JMD_C-Pat...,12)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.220000 0.147418 0.147418 0.172594 0.195572 0.000020 0.000020 25,29,32 nan nan 29 TMD-PeriodicPat...3,4)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.180000 0.069277 -0.069277 0.094949 0.119524 0.000002 0.000082 13,16,20,23,27 1.818000 2.308293 30 JMD_N_TMD_N-Pat...,15)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.179000 0.115042 -0.115042 0.151938 0.189623 0.000002 0.000068 6,9,12,15 0.648400 1.061142 31 TMD-Pattern(C,4,7)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.176000 0.120892 0.120892 0.198986 0.216030 0.000004 0.000113 24,27 0.714800 1.118149 32 TMD_C_JMD_C-Pat...4,8)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.176000 0.087846 0.087846 0.140464 0.157561 0.000004 0.000113 24,28 2.704000 4.076269 33 TMD_C_JMD_C-Pat...,12)-BULH740102 ASA/Volume Partial specific volume Partial specific volume Apparent partia...l-Breese, 1974) 0.262000 0.153437 0.153437 0.130978 0.164028 0.000000 0.000000 21,24,28,32 nan nan 34 TMD-Pattern(C,4,7)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.176000 0.056675 -0.056675 0.099355 0.114698 0.000004 0.000113 24,27 0.372000 0.882270 35 TMD_C_JMD_C-Seg...4,5)-TANS770106 Conformation β-turn (TM helix) β-turn in double bend Normalized freq...Scheraga, 1977) 0.175000 0.078020 0.078020 0.113536 0.125285 0.000005 0.000129 33,34,35,36 0.000000 0.000000 36 TMD_C_JMD_C-Seg...2,3)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.175000 0.055597 -0.055597 0.089100 0.105827 0.000005 0.000126 27,28,29,30,31,32,33 0.664000 1.089536 37 JMD_N_TMD_N-Per...4,3)-QIAN880138 Conformation Coil (C-term) Coil (C-terminal) Weights for coi...ejnowski, 1988) 0.174000 0.067216 0.067216 0.105047 0.116197 0.000005 0.000133 1,4,8,11,15,18 0.000000 0.000000 38 TMD_C_JMD_C-Pat...,11)-QIAN880122 Conformation β-strand β-sheet Weights for bet...ejnowski, 1988) 0.173000 0.056328 0.056328 0.067428 0.094795 0.000006 0.000147 25,28,31 0.483200 0.913371 39 JMD_N_TMD_N-Per...3,2)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.172000 0.087470 -0.087470 0.135114 0.144731 0.000005 0.000137 2,5,8,11,14,17,20 0.444000 0.721620 40 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan 41 TMD_C_JMD_C-Seg...2,3)-KOEH090106 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.224000 0.088025 0.088025 0.095746 0.124611 0.000014 0.000014 27,28,29,30,31,32,33 nan nan 42 TMD_C_JMD_C-Pat...,14)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.168000 0.086323 0.086323 0.121405 0.138577 0.000000 0.000030 30,34 0.140400 0.391229 43 JMD_N_TMD_N-Pat...,11)-QIAN880127 Conformation Coil (N-term) Coil (N-terminal) Weights for coi...ejnowski, 1988) 0.168000 0.071770 -0.071770 0.116934 0.123667 0.000011 0.000216 4,7,11 0.616400 1.124195 44 TMD_C_JMD_C-Seg...2,2)-RICJ880113 Conformation α-helix (C-cap) α-helix (C-terminal, inside) Relative prefer...chardson, 1988) 0.168000 0.067627 0.067627 0.098469 0.110321 0.000011 0.000215 31,32,33,34,35,36,37,38,39,40 1.105200 1.425601 45 TMD_C_JMD_C-Seg...4,5)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.167000 0.080568 0.080568 0.128898 0.128726 0.000011 0.000218 33,34,35,36 1.299200 2.159535 46 TMD_C_JMD_C-Seg...4,5)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.166000 0.081797 0.081797 0.121170 0.149555 0.000013 0.000239 33,34,35,36 1.295200 2.225137 47 TMD-Pattern(C,4,7)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.121210 -0.121210 0.143560 0.207767 0.000015 0.000254 24,27 1.302000 1.466618 48 TMD_C_JMD_C-Pat...5,8)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.119568 -0.119568 0.143560 0.205817 0.000014 0.000253 25,28 0.000000 0.000000 49 TMD-Pattern(C,5...,12)-HUTJ700102 Energy Entropy Entropy Absolute entrop...Hutchens, 1970) 0.165000 0.063134 -0.063134 0.104624 0.113955 0.000015 0.000258 19,22,26 0.000000 0.000000 50 TMD_C_JMD_C-Seg...5,7)-TANS770108 Conformation β/α-bridge β/α-bridge Normalized freq...Scheraga, 1977) 0.164000 0.079708 0.079708 0.135324 0.137910 0.000016 0.000271 32,33,34 0.462400 0.706967 51 TMD_C_JMD_C-Pat...,12)-CHOP780212 Conformation β-sheet (C-term) β-turn (1st residue) Frequency of th...-Fasman, 1978b) 0.164000 0.076207 -0.076207 0.125506 0.147002 0.000016 0.000267 24,28,32 1.095600 1.575630 52 TMD-Pattern(C,4,7)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.300000 0.236619 0.236619 0.165353 0.219458 0.000000 0.000000 24,27 nan nan 53 TMD_C_JMD_C-Pat...,11)-EISD860101 Polarity Hydrophobicity Solvation free energy Solvation free ...cLachlan, 1986) 0.162000 0.083936 -0.083936 0.143338 0.147948 0.000021 0.000304 30,33,37 0.330400 0.377566 54 TMD_C_JMD_C-Pat...5,8)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.162000 0.070292 -0.070292 0.096915 0.128362 0.000020 0.000302 21,25,28 1.528400 2.418922 55 TMD-Pattern(C,4...,11)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.161000 0.068424 -0.068424 0.096915 0.126975 0.000024 0.000332 20,24,27 0.000000 0.000000 56 JMD_N_TMD_N-Seg...2,4)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.161000 0.058976 -0.058976 0.096823 0.114647 0.000025 0.000335 6,7,8,9,10 0.000000 0.000000 57 JMD_N_TMD_N-Pat...,11)-PRAM820103 Shape Shape and Surface Correlation coe...t in regression Correlation coe...nnuswamy, 1982) 0.161000 0.057828 0.057828 0.088362 0.106085 0.000024 0.000328 1,5,8,11 1.304400 1.657101 58 TMD_C_JMD_C-Seg...5,7)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.160000 0.059281 -0.059281 0.100693 0.120806 0.000027 0.000359 32,33,34 0.757200 1.471249 59 TMD_C_JMD_C-Pat...4,8)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.159000 0.103808 0.103808 0.140977 0.179008 0.000014 0.000248 33,37 0.233200 0.593921 60 JMD_N_TMD_N-Seg...,13)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.157000 0.127895 -0.127895 0.151304 0.258491 0.000035 0.000420 5,6 0.833200 1.360696 61 TMD_C_JMD_C-Pat...4,8)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.155000 0.110281 0.110281 0.178578 0.202098 0.000046 0.000486 33,37,40 0.272400 0.623809 62 JMD_N_TMD_N-Pat...,12)-AURR980113 Conformation α-helix α-helix Normalized posi...ora-Rose, 1998) 0.208000 0.105550 -0.105550 0.151448 0.143693 0.000055 0.000055 9,12,15 nan nan 63 JMD_N_TMD_N-Pat...,13)-RICJ880107 Conformation π-helix α-helix Relative prefer...chardson, 1988) 0.155000 0.066867 -0.066867 0.105803 0.129430 0.000047 0.000496 3,6,9,13 0.335200 0.649905 64 JMD_N_TMD_N-Pat...,15)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.155000 0.059593 -0.059593 0.104862 0.110749 0.000050 0.000508 6,9,12,15 0.482000 0.672000 65 JMD_N_TMD_N-Pat...,11)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.154000 0.092099 -0.092099 0.142836 0.171547 0.000052 0.000520 4,7,11 1.065200 1.916900 66 TMD_C_JMD_C-Seg...,10)-CHAM820102 Polarity Hydrophobicity (interface) Free energy (interface) Free energy of ...-Charton, 1982) 0.154000 0.082300 -0.082300 0.136264 0.177551 0.000050 0.000508 33,34 0.366800 0.691767 67 TMD-Pattern(C,5...,12)-FAUJ880107 Structure-Activity Stability α-CH chemical s...kbone-dynamics) N.m.r. chemical...e et al., 1988) 0.123000 0.065435 -0.065435 0.173044 0.140726 0.017378 0.017378 19,22,26 nan nan 68 TMD_C_JMD_C-Pat...,11)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.153000 0.085041 -0.085041 0.135864 0.161279 0.000059 0.000561 30,33,37 0.473600 0.930690 69 TMD_C_JMD_C-Pat...,15)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.153000 0.069595 -0.069595 0.107314 0.134698 0.000060 0.000566 26,29,33 0.770800 1.299178 70 TMD-Pattern(N,2...,11)-RACS820101 Conformation β-sheet (N-term) α-helix with fl...α structure (i) Average relativ...Scheraga, 1982) 0.153000 0.062678 -0.062678 0.109868 0.123054 0.000061 0.000570 12,15,18,21 0.333600 0.598524 71 TMD_C_JMD_C-Pat...,15)-LINS030106 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded proteins) Hydrophilic med...s et al., 2003) 0.151000 0.071208 0.071208 0.136279 0.155749 0.000078 0.000657 26,30,33 0.326400 0.451202 72 TMD-Pattern(C,3...,14)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.150000 0.056439 -0.056439 0.094520 0.108682 0.000084 0.000685 17,20,24,28 0.684400 0.941892 73 TMD_C_JMD_C-Pat...,10)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.149000 0.073526 0.073526 0.133612 0.157088 0.000090 0.000714 31,34,38 2.050800 2.338278 74 JMD_N_TMD_N-Pat...,11)-RACS820101 Conformation β-sheet (N-term) α-helix with fl...α structure (i) Average relativ...Scheraga, 1982) 0.149000 0.063073 -0.063073 0.107731 0.126806 0.000091 0.000716 10,13,16,19 0.000000 0.000000 75 JMD_N_TMD_N-Seg...2,6)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.148000 0.076361 -0.076361 0.140513 0.148387 0.000108 0.000790 4,5,6 0.537200 1.041739 76 JMD_N_TMD_N-Pat...,15)-BROC820101 Polarity Hydrophobicity Hydrophobicity ...on coefficient) Retention Coeff...e et al., 1982) 0.148000 0.067069 -0.067069 0.120409 0.137261 0.000103 0.000768 6,9,12,15 0.106400 0.249766 77 TMD_C_JMD_C-Pat...4,8)-CORJ870108 Polarity Hydrophilicity TOTLS index TOTLS index (Co...e et al., 1987) 0.251000 0.170889 -0.170889 0.167914 0.219014 0.000001 0.000001 24,28 nan nan 78 TMD_C_JMD_C-Seg...2,5)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.260000 0.141016 0.141016 0.107804 0.160336 0.000000 0.000000 25,26,27,28 nan nan 79 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.333000 0.246278 0.246278 0.183876 0.212529 0.000000 0.000000 28,29 nan nan 80 JMD_N_TMD_N-Pat...,11)-BIGC670101 ASA/Volume Volume Volume Residue volume (Bigelow, 1967) 0.143000 0.067181 -0.067181 0.141579 0.135502 0.000184 0.001045 5,8,11 0.382000 0.675082 81 JMD_N_TMD_N-Pat...,11)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.142000 0.070908 -0.070908 0.135389 0.144272 0.000190 0.001062 5,8,11 0.384400 0.570074 82 JMD_N_TMD_N-Pat...,14)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.142000 0.058743 -0.058743 0.117342 0.120311 0.000187 0.001056 6,10,14 0.197200 0.344958 83 TMD-Pattern(N,4,7)-QIAN880113 Conformation π-helix α-helix (C-terminal) Weights for alp...ejnowski, 1988) 0.141000 0.070553 -0.070553 0.164819 0.154840 0.000217 0.001151 14,17 0.634800 0.816456 84 JMD_N_TMD_N-Seg...3,6)-VASM830102 Energy Non-bonded energy Free energy (Extended) Relative popula...z et al., 1983) 0.141000 0.067593 0.067593 0.146572 0.140332 0.000225 0.001173 7,8,9,10 0.484800 0.832789 85 TMD_C_JMD_C-Pat...,14)-FUKS010111 Composition AA composition Proteins of mesophiles (EXT) Entire chain co...ishikawa, 2001) 0.003000 0.015230 -0.015230 0.140051 0.227922 0.959141 0.959141 30,34 nan nan 86 TMD_C_JMD_C-Pat...4,8)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.140000 0.066859 0.066859 0.130397 0.147129 0.000229 0.001185 33,37,40 0.334800 0.632640 87 JMD_N_TMD_N-Seg...2,7)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.139000 0.070195 -0.070195 0.113589 0.146944 0.000259 0.001276 3,4,5 0.498800 0.924962 88 JMD_N_TMD_N-Seg...2,6)-RACS770103 ASA/Volume Accessible surface area (ASA) Side chain orientation Side chain orie...Scheraga, 1977) 0.138000 0.069674 0.069674 0.151437 0.143090 0.000308 0.001398 4,5,6 0.214400 0.501327 89 TMD_C_JMD_C-Seg...4,8)-MIYS850101 Polarity Hydrophobicity Effective partition energy Effective parti...Jernigan, 1985) 0.215000 0.123198 0.123198 0.127012 0.166940 0.000030 0.000030 28,29,30 nan nan 90 TMD_C_JMD_C-Pat...,10)-QIAN880138 Conformation Coil (C-term) Coil (C-terminal) Weights for coi...ejnowski, 1988) 0.137000 0.065719 -0.065719 0.114425 0.146722 0.000312 0.001404 31,35,39 0.360800 0.882718 91 JMD_N_TMD_N-Seg...7,8)-KARS160114 Shape Side chain length Eccentricity (average) Average weighte...-Knisley, 2016) 0.137000 0.056352 -0.056352 0.122287 0.122893 0.000322 0.001432 16,17 1.170800 1.925978 92 TMD_C_JMD_C-Seg...,14)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.136000 0.080537 -0.080537 0.194254 0.165343 0.000150 0.000932 26,27 0.638000 0.796859 93 TMD_C_JMD_C-Seg...,10)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.136000 0.072267 -0.072267 0.142246 0.173638 0.000352 0.001527 33,34 1.013200 1.315181 94 TMD_C_JMD_C-Pat...5,8)-LINS030107 ASA/Volume Accessible surface area (ASA) ASA (folded protein) % total accessi...s et al., 2003) 0.136000 0.064864 -0.064864 0.078387 0.131618 0.000367 0.001565 25,28 0.842000 0.904274 95 JMD_N_TMD_N-Pat...6,9)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.135000 0.062723 -0.062723 0.120282 0.141044 0.000396 0.001638 3,6,9 0.696800 1.062095 96 TMD-Pattern(N,1...4,7)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.135000 0.058024 -0.058024 0.115415 0.124556 0.000385 0.001610 11,14,17 0.244400 0.503183 97 JMD_N_TMD_N-Seg...,10)-CRAJ730102 Conformation β-sheet β-sheet Normalized freq...d et al., 1973) 0.134000 0.096792 -0.096792 0.182935 0.210285 0.000461 0.001775 5,6 0.485600 0.792949 98 JMD_N_TMD_N-Pat...,14)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.133000 0.071020 0.071020 0.161372 0.138873 0.000491 0.001836 6,10,14 0.000000 0.000000 99 JMD_N_TMD_N-Seg...7,9)-BURA740102 Conformation β-strand Extended Normalized freq...s et al., 1974) 0.132000 0.056043 0.056043 0.119813 0.123454 0.000562 0.001981 14,15 0.231600 0.356019 100 TMD-Segment(3,8)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.132000 0.055783 -0.055783 0.129933 0.133383 0.000558 0.001977 16,17 0.502400 0.761626 101 TMD-Pattern(N,1...,10)-QIAN880124 Conformation β-sheet (C-term) β-sheet (C-terminal) Weights for bet...ejnowski, 1988) 0.131000 0.069857 0.069857 0.157078 0.159138 0.000580 0.002008 11,14,17,20 0.502800 0.811308 102 JMD_N_TMD_N-Pat...,11)-ANDN920101 Structure-Activity Backbone-dynamics (-CH) α-CH chemical s...kbone-dynamics) alpha-CH chemic...n et al., 1992) 0.130000 0.087733 -0.087733 0.180612 0.187328 0.000674 0.002190 10,13,17 0.420000 0.643453 103 JMD_N_TMD_N-Pat...,14)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.130000 0.067433 -0.067433 0.133237 0.146065 0.000642 0.002130 7,11,14 0.306800 0.574245 104 JMD_N_TMD_N-Pat...,11)-VASM830102 Energy Non-bonded energy Free energy (Extended) Relative popula...z et al., 1983) 0.129000 0.077724 0.077724 0.148907 0.164954 0.000708 0.002247 4,8,11 0.160400 0.302939 105 TMD_C_JMD_C-Pat...3,7)-MAXF760105 Conformation α-helix (left-handed) α-helix (left-handed) Normalized freq...Scheraga, 1976) 0.129000 0.071374 0.071374 0.180851 0.152571 0.000727 0.002285 23,27 0.000000 0.000000 106 JMD_N_TMD_N-Seg...3,9)-KOEP990102 Conformation β-sheet (N-term) Extended (designed β-sheet) Beta-sheet prop...l-Levitt, 1999) 0.128000 0.086726 0.086726 0.184173 0.184291 0.000769 0.002364 5,6 0.565600 0.778424 107 TMD_C_JMD_C-Pat...,11)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.128000 0.062708 -0.062708 0.113629 0.123346 0.000767 0.002362 25,28,31 0.407200 0.686822 108 JMD_N_TMD_N-Seg...2,3)-CHAM830105 Shape Side chain length n atoms in side chain (3+1) The number of a...-Charton, 1983) 0.128000 0.057140 -0.057140 0.128493 0.130946 0.000672 0.002187 7,8,9,10,11,12,13 0.121600 0.273037 109 JMD_N_TMD_N-Seg...3,9)-ISOY800102 Conformation β-strand Extended Normalized rela...i et al., 1980) 0.126000 0.079975 -0.079975 0.169167 0.182954 0.000926 0.002636 5,6 1.002000 1.075427 Candidate eligibility and per-feature filtering.
min_coris the minimum absolute correlation a candidate scale must have with the original scale (anti-correlation allowed), andmax_std_testis the CPP per-feature pre-filter threshold the recomputed swapped feature must satisfy:df_strict = cpp.simplify(df_feat=df_feat, labels=labels, max_interpret_grade=2, min_cor=0.8, max_std_test=0.2) aa.display_df(df_strict, show_shape=True)
DataFrame shape: (120, 15)
feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions feat_importance feat_importance_std 1 TMD_C_JMD_C-Seg...3,4)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.244000 0.103666 0.103666 0.106692 0.110506 0.000000 0.000000 31,32,33,34,35 0.970400 1.438918 2 TMD_C_JMD_C-Seg...3,4)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.243000 0.085064 0.085064 0.098774 0.096946 0.000000 0.000000 31,32,33,34,35 0.000000 0.000000 3 TMD_C_JMD_C-Seg...6,9)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.233000 0.137044 0.137044 0.161683 0.176964 0.000000 0.000001 32,33 1.554800 2.109848 4 TMD_C_JMD_C-Seg...6,9)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.223000 0.095071 0.095071 0.114758 0.132829 0.000000 0.000002 32,33 0.000000 0.000000 5 TMD_C_JMD_C-Seg...2,3)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.222000 0.058671 0.058671 0.064895 0.069547 0.000000 0.000001 27,28,29,30,31,32,33 0.000000 0.000000 6 TMD_C_JMD_C-Seg...3,4)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.215000 0.124317 0.124317 0.166309 0.153364 0.000000 0.000004 31,32,33,34,35 1.080400 1.296094 7 TMD_C_JMD_C-Seg...,10)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.212000 0.141305 -0.141305 0.168603 0.217235 0.000000 0.000005 33,34 1.747200 2.150664 8 TMD_C_JMD_C-Seg...6,9)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.125350 0.125350 0.160819 0.174121 0.000000 0.000005 32,33 1.788800 2.700803 9 TMD_C_JMD_C-Seg...2,3)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.077355 0.077355 0.102965 0.107453 0.000000 0.000005 27,28,29,30,31,32,33 3.048800 3.623912 10 TMD_C_JMD_C-Seg...3,4)-KOEH090106 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.289000 0.193943 0.193943 0.159718 0.184043 0.000000 0.000000 31,32,33,34,35 nan nan 11 TMD_C_JMD_C-Seg...6,9)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.205000 0.125868 -0.125868 0.172165 0.188333 0.000000 0.000009 32,33 0.000000 0.000000 12 TMD_C_JMD_C-Seg...4,5)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.204000 0.105513 0.105513 0.132849 0.145219 0.000000 0.000009 33,34,35,36 1.992000 2.929460 13 TMD_C_JMD_C-Seg...3,4)-PRAM820102 Shape Shape and Surface Slope in Regression Slope in Regres...nnuswamy, 1982) 0.199000 0.073023 -0.073023 0.087336 0.107750 0.000000 0.000017 31,32,33,34,35 0.616000 0.847660 14 TMD_C_JMD_C-Seg...6,9)-RICJ880113 Conformation α-helix (C-cap) α-helix (C-terminal, inside) Relative prefer...chardson, 1988) 0.198000 0.138293 0.138293 0.172194 0.198814 0.000000 0.000017 32,33 0.832400 1.383718 15 JMD_N_TMD_N-Seg...1,2)-KARP850101 Structure-Activity Flexibility Flexibility (0 ...igid neighbors) Flexibility par...s-Schulz, 1985) 0.196000 0.062671 0.062671 0.083456 0.090427 0.000000 0.000023 1,2,3,4,5,6,7,8,9,10 1.574400 1.835403 16 TMD_C_JMD_C-Seg...4,5)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.193000 0.076770 0.076770 0.092804 0.114150 0.000000 0.000027 33,34,35,36 0.000000 0.000000 17 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.189000 0.125674 0.125674 0.183876 0.218813 0.000001 0.000039 28,29 4.729200 4.776785 18 TMD_C_JMD_C-Seg...6,9)-TANS770106 Conformation β-turn (TM helix) β-turn in double bend Normalized freq...Scheraga, 1977) 0.189000 0.093759 0.093759 0.136715 0.137320 0.000001 0.000039 32,33 0.000000 0.000000 19 TMD_C_JMD_C-Pat...4,8)-KOEH090106 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.283000 0.239889 0.239889 0.181777 0.235674 0.000000 0.000000 33,37 nan nan 20 TMD_C_JMD_C-Seg...4,5)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.185000 0.105474 -0.105474 0.157535 0.163039 0.000001 0.000059 33,34,35,36 0.000000 0.000000 21 TMD_C_JMD_C-Seg...6,9)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.185000 0.101798 0.101798 0.145676 0.155096 0.000001 0.000054 32,33 0.000000 0.000000 22 JMD_N_TMD_N-Pat...,10)-AURR980116 Conformation α-helix (C-cap) α-helix (C-terminal, C-cap) Normalized posi...ora-Rose, 1998) 0.184000 0.112728 -0.112728 0.166431 0.183800 0.000001 0.000061 11,15 0.857600 1.339550 23 TMD_C_JMD_C-Pat...,15)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.184000 0.062096 0.062096 0.078809 0.091271 0.000000 0.000017 26,30,33 0.147200 0.345306 24 JMD_N_TMD_N-Seg...2,4)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.183000 0.063902 -0.063902 0.090842 0.101427 0.000002 0.000068 6,7,8,9,10 0.823200 1.404583 25 TMD_C_JMD_C-Seg...4,5)-RICJ880113 Conformation α-helix (C-cap) α-helix (C-terminal, inside) Relative prefer...chardson, 1988) 0.182000 0.121315 0.121315 0.147184 0.184212 0.000002 0.000070 33,34,35,36 0.865200 1.553379 26 TMD-Pattern(C,3...,15)-ANDN920101 Structure-Activity Backbone-dynamics (-CH) α-CH chemical s...kbone-dynamics) alpha-CH chemic...n et al., 1992) 0.182000 0.098529 -0.098529 0.141641 0.162412 0.000002 0.000072 16,20,24,28 0.221200 0.519240 27 TMD-Pattern(C,3...,15)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.182000 0.096246 0.096246 0.160859 0.159538 0.000002 0.000070 16,20,24,28 0.508400 0.738667 28 JMD_N_TMD_N-Seg...2,4)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.182000 0.066394 -0.066394 0.097857 0.103426 0.000002 0.000070 6,7,8,9,10 0.000000 0.000000 29 TMD_C_JMD_C-Seg...2,3)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.182000 0.063819 0.063819 0.101691 0.105987 0.000002 0.000071 27,28,29,30,31,32,33 0.000000 0.000000 30 TMD-Pattern(N,4,7)-AURR980116 Conformation α-helix (C-cap) α-helix (C-terminal, C-cap) Normalized posi...ora-Rose, 1998) 0.181000 0.118349 -0.118349 0.169282 0.185522 0.000002 0.000078 14,17 1.226400 1.510986 31 TMD_C_JMD_C-Seg...,11)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.181000 0.057287 -0.057287 0.072234 0.106512 0.000002 0.000076 28,29 1.919600 2.094497 32 TMD_C_JMD_C-Pat...,12)-ANDN920101 Structure-Activity Backbone-dynamics (-CH) α-CH chemical s...kbone-dynamics) alpha-CH chemic...n et al., 1992) 0.180000 0.096784 -0.096784 0.151260 0.170153 0.000002 0.000084 25,29,32 0.356800 0.617224 33 TMD-PeriodicPat...3,4)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.180000 0.069277 -0.069277 0.094949 0.119524 0.000002 0.000082 13,16,20,23,27 1.818000 2.308293 34 JMD_N_TMD_N-Pat...,15)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.179000 0.115042 -0.115042 0.151938 0.189623 0.000002 0.000068 6,9,12,15 0.648400 1.061142 35 JMD_N_TMD_N-Per...4,3)-QIAN880138 Conformation Coil (C-term) Coil (C-terminal) Weights for coi...ejnowski, 1988) 0.179000 0.069852 0.069852 0.103576 0.116589 0.000003 0.000093 3,6,10,13,17,20 0.385200 0.555965 36 TMD-Pattern(C,4,7)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.176000 0.120892 0.120892 0.198986 0.216030 0.000004 0.000113 24,27 0.714800 1.118149 37 TMD_C_JMD_C-Pat...4,8)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.176000 0.087846 0.087846 0.140464 0.157561 0.000004 0.000113 24,28 2.704000 4.076269 38 TMD_C_JMD_C-Pat...,12)-FAUJ880108 Energy Electron-ion interaction pot. Electrical Effect Localized Elect...e et al., 1988) 0.176000 0.064253 -0.064253 0.092619 0.113588 0.000004 0.000113 21,24,28,32 0.826400 1.303426 39 TMD-Pattern(C,4,7)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.176000 0.056675 -0.056675 0.099355 0.114698 0.000004 0.000113 24,27 0.372000 0.882270 40 TMD_C_JMD_C-Seg...4,5)-TANS770106 Conformation β-turn (TM helix) β-turn in double bend Normalized freq...Scheraga, 1977) 0.175000 0.078020 0.078020 0.113536 0.125285 0.000005 0.000129 33,34,35,36 0.000000 0.000000 41 TMD_C_JMD_C-Seg...2,3)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.175000 0.055597 -0.055597 0.089100 0.105827 0.000005 0.000126 27,28,29,30,31,32,33 0.664000 1.089536 42 JMD_N_TMD_N-Per...4,3)-QIAN880138 Conformation Coil (C-term) Coil (C-terminal) Weights for coi...ejnowski, 1988) 0.174000 0.067216 0.067216 0.105047 0.116197 0.000005 0.000133 1,4,8,11,15,18 0.000000 0.000000 43 TMD_C_JMD_C-Pat...,11)-QIAN880122 Conformation β-strand β-sheet Weights for bet...ejnowski, 1988) 0.173000 0.056328 0.056328 0.067428 0.094795 0.000006 0.000147 25,28,31 0.483200 0.913371 44 JMD_N_TMD_N-Per...3,2)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.172000 0.087470 -0.087470 0.135114 0.144731 0.000005 0.000137 2,5,8,11,14,17,20 0.444000 0.721620 45 TMD_C_JMD_C-Seg...2,3)-PRAM820102 Shape Shape and Surface Slope in Regression Slope in Regres...nnuswamy, 1982) 0.172000 0.056268 -0.056268 0.074692 0.093571 0.000006 0.000151 27,28,29,30,31,32,33 0.303600 0.618242 46 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan 47 TMD_C_JMD_C-Seg...2,3)-KOEH090106 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.224000 0.088025 0.088025 0.095746 0.124611 0.000014 0.000014 27,28,29,30,31,32,33 nan nan 48 TMD_C_JMD_C-Pat...,14)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.168000 0.086323 0.086323 0.121405 0.138577 0.000000 0.000030 30,34 0.140400 0.391229 49 TMD_C_JMD_C-Seg...2,2)-RICJ880113 Conformation α-helix (C-cap) α-helix (C-terminal, inside) Relative prefer...chardson, 1988) 0.168000 0.067627 0.067627 0.098469 0.110321 0.000011 0.000215 31,32,33,34,35,36,37,38,39,40 1.105200 1.425601 50 TMD_C_JMD_C-Seg...4,5)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.167000 0.080568 0.080568 0.128898 0.128726 0.000011 0.000218 33,34,35,36 1.299200 2.159535 51 TMD-Pattern(C,5...,12)-PRAM820102 Shape Shape and Surface Slope in Regression Slope in Regres...nnuswamy, 1982) 0.167000 0.077343 0.077343 0.135340 0.134263 0.000012 0.000228 19,22,26 1.301600 1.697263 52 TMD_C_JMD_C-Seg...4,5)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.166000 0.081797 0.081797 0.121170 0.149555 0.000013 0.000239 33,34,35,36 1.295200 2.225137 53 TMD-Pattern(C,4,7)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.121210 -0.121210 0.143560 0.207767 0.000015 0.000254 24,27 1.302000 1.466618 54 TMD_C_JMD_C-Pat...5,8)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.119568 -0.119568 0.143560 0.205817 0.000014 0.000253 25,28 0.000000 0.000000 55 TMD_C_JMD_C-Seg...5,7)-TANS770108 Conformation β/α-bridge β/α-bridge Normalized freq...Scheraga, 1977) 0.164000 0.079708 0.079708 0.135324 0.137910 0.000016 0.000271 32,33,34 0.462400 0.706967 56 TMD-PeriodicPat...3,1)-COHE430101 ASA/Volume Partial specific volume Partial specific volume Partial specifi...n-Edsall, 1943) 0.164000 0.058745 0.058745 0.092103 0.106413 0.000017 0.000276 12,15,18,21,24,27,30 1.141600 1.375595 57 TMD-Pattern(C,4,7)-ANDN920101 Structure-Activity Backbone-dynamics (-CH) α-CH chemical s...kbone-dynamics) alpha-CH chemic...n et al., 1992) 0.163000 0.128817 -0.128817 0.184672 0.227780 0.000020 0.000293 24,27 0.872800 1.063156 58 TMD_C_JMD_C-Pat...,11)-EISD860101 Polarity Hydrophobicity Solvation free energy Solvation free ...cLachlan, 1986) 0.162000 0.083936 -0.083936 0.143338 0.147948 0.000021 0.000304 30,33,37 0.330400 0.377566 59 TMD_C_JMD_C-Pat...5,8)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.162000 0.070292 -0.070292 0.096915 0.128362 0.000020 0.000302 21,25,28 1.528400 2.418922 60 TMD-Pattern(C,4...,11)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.161000 0.068424 -0.068424 0.096915 0.126975 0.000024 0.000332 20,24,27 0.000000 0.000000 61 JMD_N_TMD_N-Seg...2,4)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.161000 0.058976 -0.058976 0.096823 0.114647 0.000025 0.000335 6,7,8,9,10 0.000000 0.000000 62 JMD_N_TMD_N-Pat...,11)-PRAM820103 Shape Shape and Surface Correlation coe...t in regression Correlation coe...nnuswamy, 1982) 0.161000 0.057828 0.057828 0.088362 0.106085 0.000024 0.000328 1,5,8,11 1.304400 1.657101 63 TMD_C_JMD_C-Seg...5,7)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.160000 0.059281 -0.059281 0.100693 0.120806 0.000027 0.000359 32,33,34 0.757200 1.471249 64 TMD_C_JMD_C-Pat...4,8)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.159000 0.103808 0.103808 0.140977 0.179008 0.000014 0.000248 33,37 0.233200 0.593921 65 JMD_N_TMD_N-Seg...,13)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.157000 0.127895 -0.127895 0.151304 0.258491 0.000035 0.000420 5,6 0.833200 1.360696 66 TMD_C_JMD_C-Pat...4,8)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.155000 0.110281 0.110281 0.178578 0.202098 0.000046 0.000486 33,37,40 0.272400 0.623809 67 JMD_N_TMD_N-Pat...,12)-AURR980113 Conformation α-helix α-helix Normalized posi...ora-Rose, 1998) 0.208000 0.105550 -0.105550 0.151448 0.143693 0.000055 0.000055 9,12,15 nan nan 68 JMD_N_TMD_N-Pat...,13)-RICJ880107 Conformation π-helix α-helix Relative prefer...chardson, 1988) 0.155000 0.066867 -0.066867 0.105803 0.129430 0.000047 0.000496 3,6,9,13 0.335200 0.649905 69 JMD_N_TMD_N-Pat...,15)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.155000 0.059593 -0.059593 0.104862 0.110749 0.000050 0.000508 6,9,12,15 0.482000 0.672000 70 JMD_N_TMD_N-Pat...,11)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.154000 0.092099 -0.092099 0.142836 0.171547 0.000052 0.000520 4,7,11 1.065200 1.916900 71 TMD_C_JMD_C-Seg...,10)-CHAM820102 Polarity Hydrophobicity (interface) Free energy (interface) Free energy of ...-Charton, 1982) 0.154000 0.082300 -0.082300 0.136264 0.177551 0.000050 0.000508 33,34 0.366800 0.691767 72 TMD-Pattern(C,5...,12)-MAXF760105 Conformation α-helix (left-handed) α-helix (left-handed) Normalized freq...Scheraga, 1976) 0.154000 0.062226 0.062226 0.144085 0.119863 0.000057 0.000548 19,22,26 0.715200 1.186306 73 TMD_C_JMD_C-Pat...,11)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.153000 0.085041 -0.085041 0.135864 0.161279 0.000059 0.000561 30,33,37 0.473600 0.930690 74 TMD_C_JMD_C-Pat...,15)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.153000 0.069595 -0.069595 0.107314 0.134698 0.000060 0.000566 26,29,33 0.770800 1.299178 75 TMD-Pattern(N,2...,11)-RACS820101 Conformation β-sheet (N-term) α-helix with fl...α structure (i) Average relativ...Scheraga, 1982) 0.153000 0.062678 -0.062678 0.109868 0.123054 0.000061 0.000570 12,15,18,21 0.333600 0.598524 76 TMD_C_JMD_C-Pat...,15)-LINS030106 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded proteins) Hydrophilic med...s et al., 2003) 0.151000 0.071208 0.071208 0.136279 0.155749 0.000078 0.000657 26,30,33 0.326400 0.451202 77 TMD-Pattern(C,3...,14)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.150000 0.056439 -0.056439 0.094520 0.108682 0.000084 0.000685 17,20,24,28 0.684400 0.941892 78 TMD_C_JMD_C-Pat...,10)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.149000 0.073526 0.073526 0.133612 0.157088 0.000090 0.000714 31,34,38 2.050800 2.338278 79 TMD_C_JMD_C-Pat...,13)-CHOP780212 Conformation β-sheet (C-term) β-turn (1st residue) Frequency of th...-Fasman, 1978b) 0.149000 0.069627 -0.069627 0.113251 0.143949 0.000093 0.000725 26,29,33 0.842800 1.314094 80 JMD_N_TMD_N-Pat...,11)-RACS820101 Conformation β-sheet (N-term) α-helix with fl...α structure (i) Average relativ...Scheraga, 1982) 0.149000 0.063073 -0.063073 0.107731 0.126806 0.000091 0.000716 10,13,16,19 0.000000 0.000000 81 JMD_N_TMD_N-Seg...2,6)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.148000 0.076361 -0.076361 0.140513 0.148387 0.000108 0.000790 4,5,6 0.537200 1.041739 82 JMD_N_TMD_N-Pat...,15)-BROC820101 Polarity Hydrophobicity Hydrophobicity ...on coefficient) Retention Coeff...e et al., 1982) 0.148000 0.067069 -0.067069 0.120409 0.137261 0.000103 0.000768 6,9,12,15 0.106400 0.249766 83 TMD_C_JMD_C-Pat...4,8)-CORJ870108 Polarity Hydrophilicity TOTLS index TOTLS index (Co...e et al., 1987) 0.251000 0.170889 -0.170889 0.167914 0.219014 0.000001 0.000001 24,28 nan nan 84 TMD_C_JMD_C-Seg...2,5)-ANDN920101 Structure-Activity Backbone-dynamics (-CH) α-CH chemical s...kbone-dynamics) alpha-CH chemic...n et al., 1992) 0.147000 0.079575 -0.079575 0.145620 0.160200 0.000115 0.000811 25,26,27,28 0.322000 0.559943 85 TMD-Pattern(C,4...,11)-RICJ880107 Conformation π-helix α-helix Relative prefer...chardson, 1988) 0.146000 0.068957 0.068957 0.131400 0.140413 0.000131 0.000868 20,23,27 0.697200 1.056350 86 TMD_C_JMD_C-Seg...,11)-COHE430101 ASA/Volume Partial specific volume Partial specific volume Partial specifi...n-Edsall, 1943) 0.145000 0.124999 0.124999 0.180151 0.242281 0.000145 0.000912 28,29 1.740800 2.317117 87 JMD_N_TMD_N-Pat...,11)-BIGC670101 ASA/Volume Volume Volume Residue volume (Bigelow, 1967) 0.143000 0.067181 -0.067181 0.141579 0.135502 0.000184 0.001045 5,8,11 0.382000 0.675082 88 JMD_N_TMD_N-Pat...,11)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.142000 0.070908 -0.070908 0.135389 0.144272 0.000190 0.001062 5,8,11 0.384400 0.570074 89 JMD_N_TMD_N-Pat...,14)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.142000 0.058743 -0.058743 0.117342 0.120311 0.000187 0.001056 6,10,14 0.197200 0.344958 90 TMD-Pattern(N,4,7)-QIAN880113 Conformation π-helix α-helix (C-terminal) Weights for alp...ejnowski, 1988) 0.141000 0.070553 -0.070553 0.164819 0.154840 0.000217 0.001151 14,17 0.634800 0.816456 91 JMD_N_TMD_N-Seg...3,6)-VASM830102 Energy Non-bonded energy Free energy (Extended) Relative popula...z et al., 1983) 0.141000 0.067593 0.067593 0.146572 0.140332 0.000225 0.001173 7,8,9,10 0.484800 0.832789 92 TMD_C_JMD_C-Pat...4,8)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.140000 0.066859 0.066859 0.130397 0.147129 0.000229 0.001185 33,37,40 0.334800 0.632640 93 JMD_N_TMD_N-Seg...2,7)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.139000 0.070195 -0.070195 0.113589 0.146944 0.000259 0.001276 3,4,5 0.498800 0.924962 94 TMD_C_JMD_C-Pat...,12)-QIAN880114 Conformation β-sheet (N-term) β-sheet (N-terminal) Weights for bet...ejnowski, 1988) 0.138000 0.070821 -0.070821 0.121293 0.151868 0.000310 0.001400 24,28,32 0.718800 1.295090 95 JMD_N_TMD_N-Seg...2,6)-RACS770103 ASA/Volume Accessible surface area (ASA) Side chain orientation Side chain orie...Scheraga, 1977) 0.138000 0.069674 0.069674 0.151437 0.143090 0.000308 0.001398 4,5,6 0.214400 0.501327 96 TMD_C_JMD_C-Pat...,15)-KOEH090102 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.186000 0.073577 0.073577 0.103456 0.126131 0.000305 0.000305 26,30,33 nan nan 97 TMD_C_JMD_C-Seg...4,8)-MIYS850101 Polarity Hydrophobicity Effective partition energy Effective parti...Jernigan, 1985) 0.215000 0.123198 0.123198 0.127012 0.166940 0.000030 0.000030 28,29,30 nan nan 98 TMD_C_JMD_C-Pat...,10)-QIAN880138 Conformation Coil (C-term) Coil (C-terminal) Weights for coi...ejnowski, 1988) 0.137000 0.065719 -0.065719 0.114425 0.146722 0.000312 0.001404 31,35,39 0.360800 0.882718 99 JMD_N_TMD_N-Seg...7,8)-KARS160114 Shape Side chain length Eccentricity (average) Average weighte...-Knisley, 2016) 0.137000 0.056352 -0.056352 0.122287 0.122893 0.000322 0.001432 16,17 1.170800 1.925978 100 TMD_C_JMD_C-Seg...,14)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.136000 0.080537 -0.080537 0.194254 0.165343 0.000150 0.000932 26,27 0.638000 0.796859 101 TMD_C_JMD_C-Seg...,10)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.136000 0.072267 -0.072267 0.142246 0.173638 0.000352 0.001527 33,34 1.013200 1.315181 102 TMD_C_JMD_C-Pat...5,8)-LINS030107 ASA/Volume Accessible surface area (ASA) ASA (folded protein) % total accessi...s et al., 2003) 0.136000 0.064864 -0.064864 0.078387 0.131618 0.000367 0.001565 25,28 0.842000 0.904274 103 TMD_C_JMD_C-Seg...6,9)-PALJ810113 Conformation α-helix (left-handed) β-turn (α class) Normalized freq...u et al., 1981) 0.135000 0.072992 -0.072992 0.138972 0.165851 0.000412 0.001667 32,33 0.292400 0.546994 104 JMD_N_TMD_N-Pat...6,9)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.135000 0.062723 -0.062723 0.120282 0.141044 0.000396 0.001638 3,6,9 0.696800 1.062095 105 TMD-Pattern(N,1...4,7)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.135000 0.058024 -0.058024 0.115415 0.124556 0.000385 0.001610 11,14,17 0.244400 0.503183 106 JMD_N_TMD_N-Seg...,10)-CRAJ730102 Conformation β-sheet β-sheet Normalized freq...d et al., 1973) 0.134000 0.096792 -0.096792 0.182935 0.210285 0.000461 0.001775 5,6 0.485600 0.792949 107 JMD_N_TMD_N-Pat...,14)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.133000 0.071020 0.071020 0.161372 0.138873 0.000491 0.001836 6,10,14 0.000000 0.000000 108 JMD_N_TMD_N-Seg...7,9)-BURA740102 Conformation β-strand Extended Normalized freq...s et al., 1974) 0.132000 0.056043 0.056043 0.119813 0.123454 0.000562 0.001981 14,15 0.231600 0.356019 109 TMD-Segment(3,8)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.132000 0.055783 -0.055783 0.129933 0.133383 0.000558 0.001977 16,17 0.502400 0.761626 110 TMD-Pattern(N,1...,10)-QIAN880124 Conformation β-sheet (C-term) β-sheet (C-terminal) Weights for bet...ejnowski, 1988) 0.131000 0.069857 0.069857 0.157078 0.159138 0.000580 0.002008 11,14,17,20 0.502800 0.811308 111 TMD_C_JMD_C-Pat...,10)-TANS770106 Conformation β-turn (TM helix) β-turn in double bend Normalized freq...Scheraga, 1977) 0.131000 0.056621 0.056621 0.144377 0.128425 0.000597 0.002043 31,34,38 0.726800 0.885807 112 JMD_N_TMD_N-Pat...,11)-ANDN920101 Structure-Activity Backbone-dynamics (-CH) α-CH chemical s...kbone-dynamics) alpha-CH chemic...n et al., 1992) 0.130000 0.087733 -0.087733 0.180612 0.187328 0.000674 0.002190 10,13,17 0.420000 0.643453 113 JMD_N_TMD_N-Pat...,14)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.130000 0.067433 -0.067433 0.133237 0.146065 0.000642 0.002130 7,11,14 0.306800 0.574245 114 JMD_N_TMD_N-Seg...2,4)-QIAN880114 Conformation β-sheet (N-term) β-sheet (N-terminal) Weights for bet...ejnowski, 1988) 0.130000 0.058210 0.058210 0.127516 0.112411 0.000633 0.002111 6,7,8,9,10 0.140800 0.376807 115 JMD_N_TMD_N-Pat...,11)-VASM830102 Energy Non-bonded energy Free energy (Extended) Relative popula...z et al., 1983) 0.129000 0.077724 0.077724 0.148907 0.164954 0.000708 0.002247 4,8,11 0.160400 0.302939 116 TMD_C_JMD_C-Pat...3,7)-MAXF760105 Conformation α-helix (left-handed) α-helix (left-handed) Normalized freq...Scheraga, 1976) 0.129000 0.071374 0.071374 0.180851 0.152571 0.000727 0.002285 23,27 0.000000 0.000000 117 JMD_N_TMD_N-Seg...3,9)-KOEP990102 Conformation β-sheet (N-term) Extended (designed β-sheet) Beta-sheet prop...l-Levitt, 1999) 0.128000 0.086726 0.086726 0.184173 0.184291 0.000769 0.002364 5,6 0.565600 0.778424 118 TMD_C_JMD_C-Pat...,11)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.128000 0.062708 -0.062708 0.113629 0.123346 0.000767 0.002362 25,28,31 0.407200 0.686822 119 JMD_N_TMD_N-Seg...2,3)-CHAM830105 Shape Side chain length n atoms in side chain (3+1) The number of a...-Charton, 1983) 0.128000 0.057140 -0.057140 0.128493 0.130946 0.000672 0.002187 7,8,9,10,11,12,13 0.121600 0.273037 120 JMD_N_TMD_N-Seg...3,9)-ISOY800102 Conformation β-strand Extended Normalized rela...i et al., 1980) 0.126000 0.079975 -0.079975 0.169167 0.182954 0.000926 0.002636 5,6 1.002000 1.075427 Unimprovable features. When a targeted feature has no accepted swap,
on_unimprovabledecides its fate:'keep'(retain the original, default),'drop'(remove it), or'drop_if_perf_allows'(remove only if the CV score does not drop). The last feature is never dropped:df_drop = cpp.simplify(df_feat=df_feat, labels=labels, max_interpret_grade=2, on_unimprovable="drop") aa.display_df(df_drop, show_shape=True)
DataFrame shape: (78, 15)
feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions feat_importance feat_importance_std 1 TMD_C_JMD_C-Seg...3,4)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.244000 0.103666 0.103666 0.106692 0.110506 0.000000 0.000000 31,32,33,34,35 0.970400 1.438918 2 TMD_C_JMD_C-Seg...6,9)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.233000 0.137044 0.137044 0.161683 0.176964 0.000000 0.000001 32,33 1.554800 2.109848 3 TMD_C_JMD_C-Seg...6,9)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.223000 0.095071 0.095071 0.114758 0.132829 0.000000 0.000002 32,33 0.000000 0.000000 4 TMD_C_JMD_C-Seg...2,3)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.222000 0.058671 0.058671 0.064895 0.069547 0.000000 0.000001 27,28,29,30,31,32,33 0.000000 0.000000 5 TMD_C_JMD_C-Seg...3,4)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.215000 0.124317 0.124317 0.166309 0.153364 0.000000 0.000004 31,32,33,34,35 1.080400 1.296094 6 TMD_C_JMD_C-Seg...6,9)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.125350 0.125350 0.160819 0.174121 0.000000 0.000005 32,33 1.788800 2.700803 7 TMD_C_JMD_C-Seg...2,3)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.077355 0.077355 0.102965 0.107453 0.000000 0.000005 27,28,29,30,31,32,33 3.048800 3.623912 8 TMD_C_JMD_C-Seg...6,9)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.205000 0.125868 -0.125868 0.172165 0.188333 0.000000 0.000009 32,33 0.000000 0.000000 9 TMD_C_JMD_C-Seg...4,5)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.204000 0.105513 0.105513 0.132849 0.145219 0.000000 0.000009 33,34,35,36 1.992000 2.929460 10 JMD_N_TMD_N-Seg...1,2)-KARP850101 Structure-Activity Flexibility Flexibility (0 ...igid neighbors) Flexibility par...s-Schulz, 1985) 0.196000 0.062671 0.062671 0.083456 0.090427 0.000000 0.000023 1,2,3,4,5,6,7,8,9,10 1.574400 1.835403 11 TMD_C_JMD_C-Seg...4,5)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.193000 0.076770 0.076770 0.092804 0.114150 0.000000 0.000027 33,34,35,36 0.000000 0.000000 12 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.189000 0.125674 0.125674 0.183876 0.218813 0.000001 0.000039 28,29 4.729200 4.776785 13 TMD_C_JMD_C-Seg...6,9)-KOEH090103 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.323000 0.248255 0.248255 0.196374 0.181558 0.000000 0.000000 32,33 nan nan 14 TMD_C_JMD_C-Seg...4,5)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.185000 0.105474 -0.105474 0.157535 0.163039 0.000001 0.000059 33,34,35,36 0.000000 0.000000 15 TMD_C_JMD_C-Seg...6,9)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.185000 0.101798 0.101798 0.145676 0.155096 0.000001 0.000054 32,33 0.000000 0.000000 16 JMD_N_TMD_N-Seg...2,4)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.183000 0.063902 -0.063902 0.090842 0.101427 0.000002 0.000068 6,7,8,9,10 0.823200 1.404583 17 TMD-Pattern(C,3...,15)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.198000 0.109532 0.109532 0.133076 0.159918 0.000122 0.000122 16,20,24,28 nan nan 18 TMD-Pattern(C,3...,15)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.182000 0.096246 0.096246 0.160859 0.159538 0.000002 0.000070 16,20,24,28 0.508400 0.738667 19 JMD_N_TMD_N-Seg...2,4)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.182000 0.066394 -0.066394 0.097857 0.103426 0.000002 0.000070 6,7,8,9,10 0.000000 0.000000 20 TMD_C_JMD_C-Seg...2,3)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.182000 0.063819 0.063819 0.101691 0.105987 0.000002 0.000071 27,28,29,30,31,32,33 0.000000 0.000000 21 TMD_C_JMD_C-Seg...,11)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.181000 0.057287 -0.057287 0.072234 0.106512 0.000002 0.000076 28,29 1.919600 2.094497 22 TMD_C_JMD_C-Pat...,12)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.220000 0.147418 0.147418 0.172594 0.195572 0.000020 0.000020 25,29,32 nan nan 23 JMD_N_TMD_N-Pat...,15)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.179000 0.115042 -0.115042 0.151938 0.189623 0.000002 0.000068 6,9,12,15 0.648400 1.061142 24 TMD-Pattern(C,4,7)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.176000 0.120892 0.120892 0.198986 0.216030 0.000004 0.000113 24,27 0.714800 1.118149 25 TMD_C_JMD_C-Pat...4,8)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.176000 0.087846 0.087846 0.140464 0.157561 0.000004 0.000113 24,28 2.704000 4.076269 26 TMD_C_JMD_C-Pat...,12)-BULH740102 ASA/Volume Partial specific volume Partial specific volume Apparent partia...l-Breese, 1974) 0.262000 0.153437 0.153437 0.130978 0.164028 0.000000 0.000000 21,24,28,32 nan nan 27 TMD-Pattern(C,4,7)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.176000 0.056675 -0.056675 0.099355 0.114698 0.000004 0.000113 24,27 0.372000 0.882270 28 TMD_C_JMD_C-Pat...,11)-QIAN880122 Conformation β-strand β-sheet Weights for bet...ejnowski, 1988) 0.173000 0.056328 0.056328 0.067428 0.094795 0.000006 0.000147 25,28,31 0.483200 0.913371 29 JMD_N_TMD_N-Per...3,2)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.172000 0.087470 -0.087470 0.135114 0.144731 0.000005 0.000137 2,5,8,11,14,17,20 0.444000 0.721620 30 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan 31 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan 32 TMD_C_JMD_C-Seg...2,3)-KOEH090106 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.224000 0.088025 0.088025 0.095746 0.124611 0.000014 0.000014 27,28,29,30,31,32,33 nan nan 33 TMD_C_JMD_C-Seg...4,5)-OOBM770101 Polarity Hydrophilicity Non-bonded energy per atom Average non-bon...take-Ooi, 1977) 0.277000 0.217063 0.217063 0.180330 0.208994 0.000000 0.000000 33,34,35,36 nan nan 34 TMD_C_JMD_C-Pat...,14)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.168000 0.086323 0.086323 0.121405 0.138577 0.000000 0.000030 30,34 0.140400 0.391229 35 TMD_C_JMD_C-Seg...4,5)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.167000 0.080568 0.080568 0.128898 0.128726 0.000011 0.000218 33,34,35,36 1.299200 2.159535 36 TMD_C_JMD_C-Seg...4,5)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.166000 0.081797 0.081797 0.121170 0.149555 0.000013 0.000239 33,34,35,36 1.295200 2.225137 37 TMD-PeriodicPat...3,1)-OOBM850101 Structure-Activity Stability Stability (extended-coil) Optimized beta-...e et al., 1985) 0.197000 0.046799 0.046799 0.052251 0.070467 0.000133 0.000133 12,15,18,21,24,27,30 nan nan 38 TMD-Pattern(C,4,7)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.300000 0.236619 0.236619 0.165353 0.219458 0.000000 0.000000 24,27 nan nan 39 TMD_C_JMD_C-Pat...,11)-EISD860101 Polarity Hydrophobicity Solvation free energy Solvation free ...cLachlan, 1986) 0.162000 0.083936 -0.083936 0.143338 0.147948 0.000021 0.000304 30,33,37 0.330400 0.377566 40 TMD_C_JMD_C-Pat...5,8)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.162000 0.070292 -0.070292 0.096915 0.128362 0.000020 0.000302 21,25,28 1.528400 2.418922 41 TMD-Pattern(C,4...,11)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.161000 0.068424 -0.068424 0.096915 0.126975 0.000024 0.000332 20,24,27 0.000000 0.000000 42 JMD_N_TMD_N-Seg...2,4)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.161000 0.058976 -0.058976 0.096823 0.114647 0.000025 0.000335 6,7,8,9,10 0.000000 0.000000 43 TMD-Pattern(N,1...4,7)-AURR980113 Conformation α-helix α-helix Normalized posi...ora-Rose, 1998) 0.218000 0.114519 -0.114519 0.151936 0.147686 0.000024 0.000024 11,14,17 nan nan 44 JMD_N_TMD_N-Seg...,13)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.157000 0.127895 -0.127895 0.151304 0.258491 0.000035 0.000420 5,6 0.833200 1.360696 45 TMD_C_JMD_C-Pat...4,8)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.155000 0.110281 0.110281 0.178578 0.202098 0.000046 0.000486 33,37,40 0.272400 0.623809 46 JMD_N_TMD_N-Pat...,12)-AURR980113 Conformation α-helix α-helix Normalized posi...ora-Rose, 1998) 0.208000 0.105550 -0.105550 0.151448 0.143693 0.000055 0.000055 9,12,15 nan nan 47 JMD_N_TMD_N-Pat...,15)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.155000 0.059593 -0.059593 0.104862 0.110749 0.000050 0.000508 6,9,12,15 0.482000 0.672000 48 JMD_N_TMD_N-Pat...,11)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.154000 0.092099 -0.092099 0.142836 0.171547 0.000052 0.000520 4,7,11 1.065200 1.916900 49 TMD-Pattern(C,5...,12)-FAUJ880107 Structure-Activity Stability α-CH chemical s...kbone-dynamics) N.m.r. chemical...e et al., 1988) 0.123000 0.065435 -0.065435 0.173044 0.140726 0.017378 0.017378 19,22,26 nan nan 50 TMD_C_JMD_C-Pat...,11)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.153000 0.085041 -0.085041 0.135864 0.161279 0.000059 0.000561 30,33,37 0.473600 0.930690 51 TMD_C_JMD_C-Pat...,15)-LINS030106 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded proteins) Hydrophilic med...s et al., 2003) 0.151000 0.071208 0.071208 0.136279 0.155749 0.000078 0.000657 26,30,33 0.326400 0.451202 52 TMD_C_JMD_C-Pat...,10)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.149000 0.073526 0.073526 0.133612 0.157088 0.000090 0.000714 31,34,38 2.050800 2.338278 53 JMD_N_TMD_N-Seg...2,6)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.148000 0.076361 -0.076361 0.140513 0.148387 0.000108 0.000790 4,5,6 0.537200 1.041739 54 JMD_N_TMD_N-Pat...,15)-BROC820101 Polarity Hydrophobicity Hydrophobicity ...on coefficient) Retention Coeff...e et al., 1982) 0.148000 0.067069 -0.067069 0.120409 0.137261 0.000103 0.000768 6,9,12,15 0.106400 0.249766 55 TMD_C_JMD_C-Pat...4,8)-CORJ870108 Polarity Hydrophilicity TOTLS index TOTLS index (Co...e et al., 1987) 0.251000 0.170889 -0.170889 0.167914 0.219014 0.000001 0.000001 24,28 nan nan 56 TMD_C_JMD_C-Seg...2,5)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.260000 0.141016 0.141016 0.107804 0.160336 0.000000 0.000000 25,26,27,28 nan nan 57 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.333000 0.246278 0.246278 0.183876 0.212529 0.000000 0.000000 28,29 nan nan 58 JMD_N_TMD_N-Pat...,11)-BIGC670101 ASA/Volume Volume Volume Residue volume (Bigelow, 1967) 0.143000 0.067181 -0.067181 0.141579 0.135502 0.000184 0.001045 5,8,11 0.382000 0.675082 59 JMD_N_TMD_N-Pat...,11)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.142000 0.070908 -0.070908 0.135389 0.144272 0.000190 0.001062 5,8,11 0.384400 0.570074 60 JMD_N_TMD_N-Pat...,14)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.142000 0.058743 -0.058743 0.117342 0.120311 0.000187 0.001056 6,10,14 0.197200 0.344958 61 JMD_N_TMD_N-Seg...3,6)-PALJ810111 Conformation β-sheet β-sheet Normalized freq...u et al., 1981) 0.108000 0.053468 -0.053468 0.129405 0.141614 0.035900 0.035900 7,8,9,10 nan nan 62 TMD_C_JMD_C-Pat...4,8)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.140000 0.066859 0.066859 0.130397 0.147129 0.000229 0.001185 33,37,40 0.334800 0.632640 63 JMD_N_TMD_N-Seg...2,7)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.139000 0.070195 -0.070195 0.113589 0.146944 0.000259 0.001276 3,4,5 0.498800 0.924962 64 JMD_N_TMD_N-Seg...2,6)-RACS770103 ASA/Volume Accessible surface area (ASA) Side chain orientation Side chain orie...Scheraga, 1977) 0.138000 0.069674 0.069674 0.151437 0.143090 0.000308 0.001398 4,5,6 0.214400 0.501327 65 TMD_C_JMD_C-Seg...4,8)-MIYS850101 Polarity Hydrophobicity Effective partition energy Effective parti...Jernigan, 1985) 0.215000 0.123198 0.123198 0.127012 0.166940 0.000030 0.000030 28,29,30 nan nan 66 JMD_N_TMD_N-Seg...7,8)-KARS160114 Shape Side chain length Eccentricity (average) Average weighte...-Knisley, 2016) 0.137000 0.056352 -0.056352 0.122287 0.122893 0.000322 0.001432 16,17 1.170800 1.925978 67 TMD_C_JMD_C-Seg...,10)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.136000 0.072267 -0.072267 0.142246 0.173638 0.000352 0.001527 33,34 1.013200 1.315181 68 TMD_C_JMD_C-Pat...5,8)-LINS030107 ASA/Volume Accessible surface area (ASA) ASA (folded protein) % total accessi...s et al., 2003) 0.136000 0.064864 -0.064864 0.078387 0.131618 0.000367 0.001565 25,28 0.842000 0.904274 69 TMD_C_JMD_C-Seg...6,9)-PALJ810105 Conformation β-turn β-turn Normalized freq...u et al., 1981) 0.087000 0.050175 -0.050175 0.089871 0.163195 0.091373 0.091373 32,33 nan nan 70 JMD_N_TMD_N-Pat...6,9)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.135000 0.062723 -0.062723 0.120282 0.141044 0.000396 0.001638 3,6,9 0.696800 1.062095 71 TMD-Pattern(N,1...4,7)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.135000 0.058024 -0.058024 0.115415 0.124556 0.000385 0.001610 11,14,17 0.244400 0.503183 72 JMD_N_TMD_N-Seg...,10)-CRAJ730102 Conformation β-sheet β-sheet Normalized freq...d et al., 1973) 0.134000 0.096792 -0.096792 0.182935 0.210285 0.000461 0.001775 5,6 0.485600 0.792949 73 JMD_N_TMD_N-Pat...,14)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.133000 0.071020 0.071020 0.161372 0.138873 0.000491 0.001836 6,10,14 0.000000 0.000000 74 JMD_N_TMD_N-Seg...7,9)-BURA740102 Conformation β-strand Extended Normalized freq...s et al., 1974) 0.132000 0.056043 0.056043 0.119813 0.123454 0.000562 0.001981 14,15 0.231600 0.356019 75 JMD_N_TMD_N-Pat...,11)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.178000 0.109835 0.109835 0.133383 0.179887 0.000587 0.000587 10,13,17 nan nan 76 JMD_N_TMD_N-Pat...,14)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.130000 0.067433 -0.067433 0.133237 0.146065 0.000642 0.002130 7,11,14 0.306800 0.574245 77 JMD_N_TMD_N-Seg...2,3)-CHAM830105 Shape Side chain length n atoms in side chain (3+1) The number of a...-Charton, 1983) 0.128000 0.057140 -0.057140 0.128493 0.130946 0.000672 0.002187 7,8,9,10,11,12,13 0.121600 0.273037 78 JMD_N_TMD_N-Seg...3,9)-ISOY800102 Conformation β-strand Extended Normalized rela...i et al., 1980) 0.126000 0.079975 -0.079975 0.169167 0.182954 0.000926 0.002636 5,6 1.002000 1.075427 Redundancy reduction removes only swapped features that became redundant — original features are always protected.
redundancy_tie_breakchooses the keeper of a redundant pair ('interpretability'or'performance');max_corandmax_overlapare the scale-correlation and position-overlap thresholds; andcheck_catrestricts comparisons to the same scale category:df_red = cpp.simplify(df_feat=df_feat, labels=labels, max_interpret_grade=2, redundancy_tie_break="performance", max_cor=0.5, max_overlap=0.5, check_cat=True) aa.display_df(df_red, show_shape=True)
DataFrame shape: (96, 15)
feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions feat_importance feat_importance_std 1 TMD_C_JMD_C-Seg...3,4)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.244000 0.103666 0.103666 0.106692 0.110506 0.000000 0.000000 31,32,33,34,35 0.970400 1.438918 2 TMD_C_JMD_C-Seg...3,4)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.243000 0.085064 0.085064 0.098774 0.096946 0.000000 0.000000 31,32,33,34,35 0.000000 0.000000 3 TMD_C_JMD_C-Seg...6,9)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.233000 0.137044 0.137044 0.161683 0.176964 0.000000 0.000001 32,33 1.554800 2.109848 4 TMD_C_JMD_C-Seg...6,9)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.223000 0.095071 0.095071 0.114758 0.132829 0.000000 0.000002 32,33 0.000000 0.000000 5 TMD_C_JMD_C-Seg...2,3)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.222000 0.058671 0.058671 0.064895 0.069547 0.000000 0.000001 27,28,29,30,31,32,33 0.000000 0.000000 6 TMD_C_JMD_C-Seg...3,4)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.215000 0.124317 0.124317 0.166309 0.153364 0.000000 0.000004 31,32,33,34,35 1.080400 1.296094 7 TMD_C_JMD_C-Seg...,10)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.212000 0.141305 -0.141305 0.168603 0.217235 0.000000 0.000005 33,34 1.747200 2.150664 8 TMD_C_JMD_C-Seg...6,9)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.125350 0.125350 0.160819 0.174121 0.000000 0.000005 32,33 1.788800 2.700803 9 TMD_C_JMD_C-Seg...2,3)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.077355 0.077355 0.102965 0.107453 0.000000 0.000005 27,28,29,30,31,32,33 3.048800 3.623912 10 TMD_C_JMD_C-Seg...6,9)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.205000 0.125868 -0.125868 0.172165 0.188333 0.000000 0.000009 32,33 0.000000 0.000000 11 TMD_C_JMD_C-Seg...4,5)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.204000 0.105513 0.105513 0.132849 0.145219 0.000000 0.000009 33,34,35,36 1.992000 2.929460 12 JMD_N_TMD_N-Seg...1,2)-KARP850101 Structure-Activity Flexibility Flexibility (0 ...igid neighbors) Flexibility par...s-Schulz, 1985) 0.196000 0.062671 0.062671 0.083456 0.090427 0.000000 0.000023 1,2,3,4,5,6,7,8,9,10 1.574400 1.835403 13 TMD_C_JMD_C-Seg...4,5)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.193000 0.076770 0.076770 0.092804 0.114150 0.000000 0.000027 33,34,35,36 0.000000 0.000000 14 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.189000 0.125674 0.125674 0.183876 0.218813 0.000001 0.000039 28,29 4.729200 4.776785 15 TMD_C_JMD_C-Seg...6,9)-KOEH090103 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.323000 0.248255 0.248255 0.196374 0.181558 0.000000 0.000000 32,33 nan nan 16 TMD_C_JMD_C-Seg...4,5)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.185000 0.105474 -0.105474 0.157535 0.163039 0.000001 0.000059 33,34,35,36 0.000000 0.000000 17 TMD_C_JMD_C-Seg...6,9)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.185000 0.101798 0.101798 0.145676 0.155096 0.000001 0.000054 32,33 0.000000 0.000000 18 TMD_C_JMD_C-Pat...,15)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.184000 0.062096 0.062096 0.078809 0.091271 0.000000 0.000017 26,30,33 0.147200 0.345306 19 JMD_N_TMD_N-Seg...2,4)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.183000 0.063902 -0.063902 0.090842 0.101427 0.000002 0.000068 6,7,8,9,10 0.823200 1.404583 20 TMD-Pattern(C,3...,15)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.198000 0.109532 0.109532 0.133076 0.159918 0.000122 0.000122 16,20,24,28 nan nan 21 TMD-Pattern(C,3...,15)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.182000 0.096246 0.096246 0.160859 0.159538 0.000002 0.000070 16,20,24,28 0.508400 0.738667 22 JMD_N_TMD_N-Seg...2,4)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.182000 0.066394 -0.066394 0.097857 0.103426 0.000002 0.000070 6,7,8,9,10 0.000000 0.000000 23 TMD_C_JMD_C-Seg...2,3)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.182000 0.063819 0.063819 0.101691 0.105987 0.000002 0.000071 27,28,29,30,31,32,33 0.000000 0.000000 24 TMD_C_JMD_C-Seg...,11)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.181000 0.057287 -0.057287 0.072234 0.106512 0.000002 0.000076 28,29 1.919600 2.094497 25 TMD_C_JMD_C-Pat...,12)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.220000 0.147418 0.147418 0.172594 0.195572 0.000020 0.000020 25,29,32 nan nan 26 TMD-PeriodicPat...3,4)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.180000 0.069277 -0.069277 0.094949 0.119524 0.000002 0.000082 13,16,20,23,27 1.818000 2.308293 27 JMD_N_TMD_N-Pat...,15)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.179000 0.115042 -0.115042 0.151938 0.189623 0.000002 0.000068 6,9,12,15 0.648400 1.061142 28 TMD-Pattern(C,4,7)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.176000 0.120892 0.120892 0.198986 0.216030 0.000004 0.000113 24,27 0.714800 1.118149 29 TMD_C_JMD_C-Pat...4,8)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.176000 0.087846 0.087846 0.140464 0.157561 0.000004 0.000113 24,28 2.704000 4.076269 30 TMD_C_JMD_C-Pat...,12)-BULH740102 ASA/Volume Partial specific volume Partial specific volume Apparent partia...l-Breese, 1974) 0.262000 0.153437 0.153437 0.130978 0.164028 0.000000 0.000000 21,24,28,32 nan nan 31 TMD-Pattern(C,4,7)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.176000 0.056675 -0.056675 0.099355 0.114698 0.000004 0.000113 24,27 0.372000 0.882270 32 TMD_C_JMD_C-Seg...2,3)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.175000 0.055597 -0.055597 0.089100 0.105827 0.000005 0.000126 27,28,29,30,31,32,33 0.664000 1.089536 33 TMD_C_JMD_C-Pat...,11)-QIAN880122 Conformation β-strand β-sheet Weights for bet...ejnowski, 1988) 0.173000 0.056328 0.056328 0.067428 0.094795 0.000006 0.000147 25,28,31 0.483200 0.913371 34 JMD_N_TMD_N-Per...3,2)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.172000 0.087470 -0.087470 0.135114 0.144731 0.000005 0.000137 2,5,8,11,14,17,20 0.444000 0.721620 35 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan 36 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan 37 TMD_C_JMD_C-Seg...2,3)-KOEH090106 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.224000 0.088025 0.088025 0.095746 0.124611 0.000014 0.000014 27,28,29,30,31,32,33 nan nan 38 TMD_C_JMD_C-Seg...4,5)-OOBM770101 Polarity Hydrophilicity Non-bonded energy per atom Average non-bon...take-Ooi, 1977) 0.277000 0.217063 0.217063 0.180330 0.208994 0.000000 0.000000 33,34,35,36 nan nan 39 TMD_C_JMD_C-Pat...,14)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.168000 0.086323 0.086323 0.121405 0.138577 0.000000 0.000030 30,34 0.140400 0.391229 40 TMD_C_JMD_C-Seg...4,5)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.167000 0.080568 0.080568 0.128898 0.128726 0.000011 0.000218 33,34,35,36 1.299200 2.159535 41 TMD_C_JMD_C-Seg...4,5)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.166000 0.081797 0.081797 0.121170 0.149555 0.000013 0.000239 33,34,35,36 1.295200 2.225137 42 TMD-Pattern(C,4,7)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.121210 -0.121210 0.143560 0.207767 0.000015 0.000254 24,27 1.302000 1.466618 43 TMD_C_JMD_C-Pat...5,8)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.119568 -0.119568 0.143560 0.205817 0.000014 0.000253 25,28 0.000000 0.000000 44 TMD_C_JMD_C-Seg...5,7)-TANS770108 Conformation β/α-bridge β/α-bridge Normalized freq...Scheraga, 1977) 0.164000 0.079708 0.079708 0.135324 0.137910 0.000016 0.000271 32,33,34 0.462400 0.706967 45 TMD-PeriodicPat...3,1)-OOBM850101 Structure-Activity Stability Stability (extended-coil) Optimized beta-...e et al., 1985) 0.197000 0.046799 0.046799 0.052251 0.070467 0.000133 0.000133 12,15,18,21,24,27,30 nan nan 46 TMD-Pattern(C,4,7)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.300000 0.236619 0.236619 0.165353 0.219458 0.000000 0.000000 24,27 nan nan 47 TMD_C_JMD_C-Pat...,11)-EISD860101 Polarity Hydrophobicity Solvation free energy Solvation free ...cLachlan, 1986) 0.162000 0.083936 -0.083936 0.143338 0.147948 0.000021 0.000304 30,33,37 0.330400 0.377566 48 TMD_C_JMD_C-Pat...5,8)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.162000 0.070292 -0.070292 0.096915 0.128362 0.000020 0.000302 21,25,28 1.528400 2.418922 49 TMD-Pattern(C,4...,11)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.161000 0.068424 -0.068424 0.096915 0.126975 0.000024 0.000332 20,24,27 0.000000 0.000000 50 JMD_N_TMD_N-Seg...2,4)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.161000 0.058976 -0.058976 0.096823 0.114647 0.000025 0.000335 6,7,8,9,10 0.000000 0.000000 51 JMD_N_TMD_N-Pat...,11)-PRAM820103 Shape Shape and Surface Correlation coe...t in regression Correlation coe...nnuswamy, 1982) 0.161000 0.057828 0.057828 0.088362 0.106085 0.000024 0.000328 1,5,8,11 1.304400 1.657101 52 TMD_C_JMD_C-Seg...5,7)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.160000 0.059281 -0.059281 0.100693 0.120806 0.000027 0.000359 32,33,34 0.757200 1.471249 53 TMD_C_JMD_C-Pat...4,8)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.159000 0.103808 0.103808 0.140977 0.179008 0.000014 0.000248 33,37 0.233200 0.593921 54 JMD_N_TMD_N-Seg...,13)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.157000 0.127895 -0.127895 0.151304 0.258491 0.000035 0.000420 5,6 0.833200 1.360696 55 TMD_C_JMD_C-Pat...4,8)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.155000 0.110281 0.110281 0.178578 0.202098 0.000046 0.000486 33,37,40 0.272400 0.623809 56 JMD_N_TMD_N-Pat...,12)-AURR980113 Conformation α-helix α-helix Normalized posi...ora-Rose, 1998) 0.208000 0.105550 -0.105550 0.151448 0.143693 0.000055 0.000055 9,12,15 nan nan 57 JMD_N_TMD_N-Pat...,15)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.155000 0.059593 -0.059593 0.104862 0.110749 0.000050 0.000508 6,9,12,15 0.482000 0.672000 58 JMD_N_TMD_N-Pat...,11)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.154000 0.092099 -0.092099 0.142836 0.171547 0.000052 0.000520 4,7,11 1.065200 1.916900 59 TMD_C_JMD_C-Seg...,10)-CHAM820102 Polarity Hydrophobicity (interface) Free energy (interface) Free energy of ...-Charton, 1982) 0.154000 0.082300 -0.082300 0.136264 0.177551 0.000050 0.000508 33,34 0.366800 0.691767 60 TMD-Pattern(C,5...,12)-FAUJ880107 Structure-Activity Stability α-CH chemical s...kbone-dynamics) N.m.r. chemical...e et al., 1988) 0.123000 0.065435 -0.065435 0.173044 0.140726 0.017378 0.017378 19,22,26 nan nan 61 TMD_C_JMD_C-Pat...,11)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.153000 0.085041 -0.085041 0.135864 0.161279 0.000059 0.000561 30,33,37 0.473600 0.930690 62 TMD_C_JMD_C-Pat...,15)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.153000 0.069595 -0.069595 0.107314 0.134698 0.000060 0.000566 26,29,33 0.770800 1.299178 63 TMD_C_JMD_C-Pat...,15)-LINS030106 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded proteins) Hydrophilic med...s et al., 2003) 0.151000 0.071208 0.071208 0.136279 0.155749 0.000078 0.000657 26,30,33 0.326400 0.451202 64 TMD-Pattern(C,3...,14)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.150000 0.056439 -0.056439 0.094520 0.108682 0.000084 0.000685 17,20,24,28 0.684400 0.941892 65 TMD_C_JMD_C-Pat...,10)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.149000 0.073526 0.073526 0.133612 0.157088 0.000090 0.000714 31,34,38 2.050800 2.338278 66 JMD_N_TMD_N-Seg...2,6)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.148000 0.076361 -0.076361 0.140513 0.148387 0.000108 0.000790 4,5,6 0.537200 1.041739 67 JMD_N_TMD_N-Pat...,15)-BROC820101 Polarity Hydrophobicity Hydrophobicity ...on coefficient) Retention Coeff...e et al., 1982) 0.148000 0.067069 -0.067069 0.120409 0.137261 0.000103 0.000768 6,9,12,15 0.106400 0.249766 68 TMD_C_JMD_C-Pat...4,8)-CORJ870108 Polarity Hydrophilicity TOTLS index TOTLS index (Co...e et al., 1987) 0.251000 0.170889 -0.170889 0.167914 0.219014 0.000001 0.000001 24,28 nan nan 69 TMD_C_JMD_C-Seg...2,5)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.260000 0.141016 0.141016 0.107804 0.160336 0.000000 0.000000 25,26,27,28 nan nan 70 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.333000 0.246278 0.246278 0.183876 0.212529 0.000000 0.000000 28,29 nan nan 71 JMD_N_TMD_N-Pat...,11)-BIGC670101 ASA/Volume Volume Volume Residue volume (Bigelow, 1967) 0.143000 0.067181 -0.067181 0.141579 0.135502 0.000184 0.001045 5,8,11 0.382000 0.675082 72 JMD_N_TMD_N-Pat...,11)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.142000 0.070908 -0.070908 0.135389 0.144272 0.000190 0.001062 5,8,11 0.384400 0.570074 73 JMD_N_TMD_N-Pat...,14)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.142000 0.058743 -0.058743 0.117342 0.120311 0.000187 0.001056 6,10,14 0.197200 0.344958 74 TMD-Pattern(N,4,7)-QIAN880113 Conformation π-helix α-helix (C-terminal) Weights for alp...ejnowski, 1988) 0.141000 0.070553 -0.070553 0.164819 0.154840 0.000217 0.001151 14,17 0.634800 0.816456 75 JMD_N_TMD_N-Seg...3,6)-PALJ810111 Conformation β-sheet β-sheet Normalized freq...u et al., 1981) 0.108000 0.053468 -0.053468 0.129405 0.141614 0.035900 0.035900 7,8,9,10 nan nan 76 TMD_C_JMD_C-Pat...4,8)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.140000 0.066859 0.066859 0.130397 0.147129 0.000229 0.001185 33,37,40 0.334800 0.632640 77 JMD_N_TMD_N-Seg...2,7)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.139000 0.070195 -0.070195 0.113589 0.146944 0.000259 0.001276 3,4,5 0.498800 0.924962 78 JMD_N_TMD_N-Seg...2,6)-RACS770103 ASA/Volume Accessible surface area (ASA) Side chain orientation Side chain orie...Scheraga, 1977) 0.138000 0.069674 0.069674 0.151437 0.143090 0.000308 0.001398 4,5,6 0.214400 0.501327 79 TMD_C_JMD_C-Seg...4,8)-MIYS850101 Polarity Hydrophobicity Effective partition energy Effective parti...Jernigan, 1985) 0.215000 0.123198 0.123198 0.127012 0.166940 0.000030 0.000030 28,29,30 nan nan 80 JMD_N_TMD_N-Seg...7,8)-KARS160114 Shape Side chain length Eccentricity (average) Average weighte...-Knisley, 2016) 0.137000 0.056352 -0.056352 0.122287 0.122893 0.000322 0.001432 16,17 1.170800 1.925978 81 TMD_C_JMD_C-Seg...,14)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.136000 0.080537 -0.080537 0.194254 0.165343 0.000150 0.000932 26,27 0.638000 0.796859 82 TMD_C_JMD_C-Seg...,10)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.136000 0.072267 -0.072267 0.142246 0.173638 0.000352 0.001527 33,34 1.013200 1.315181 83 TMD_C_JMD_C-Pat...5,8)-LINS030107 ASA/Volume Accessible surface area (ASA) ASA (folded protein) % total accessi...s et al., 2003) 0.136000 0.064864 -0.064864 0.078387 0.131618 0.000367 0.001565 25,28 0.842000 0.904274 84 JMD_N_TMD_N-Pat...6,9)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.135000 0.062723 -0.062723 0.120282 0.141044 0.000396 0.001638 3,6,9 0.696800 1.062095 85 TMD-Pattern(N,1...4,7)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.135000 0.058024 -0.058024 0.115415 0.124556 0.000385 0.001610 11,14,17 0.244400 0.503183 86 JMD_N_TMD_N-Seg...,10)-CRAJ730102 Conformation β-sheet β-sheet Normalized freq...d et al., 1973) 0.134000 0.096792 -0.096792 0.182935 0.210285 0.000461 0.001775 5,6 0.485600 0.792949 87 JMD_N_TMD_N-Pat...,14)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.133000 0.071020 0.071020 0.161372 0.138873 0.000491 0.001836 6,10,14 0.000000 0.000000 88 JMD_N_TMD_N-Seg...7,9)-BURA740102 Conformation β-strand Extended Normalized freq...s et al., 1974) 0.132000 0.056043 0.056043 0.119813 0.123454 0.000562 0.001981 14,15 0.231600 0.356019 89 TMD-Segment(3,8)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.132000 0.055783 -0.055783 0.129933 0.133383 0.000558 0.001977 16,17 0.502400 0.761626 90 TMD-Pattern(N,1...,10)-QIAN880124 Conformation β-sheet (C-term) β-sheet (C-terminal) Weights for bet...ejnowski, 1988) 0.131000 0.069857 0.069857 0.157078 0.159138 0.000580 0.002008 11,14,17,20 0.502800 0.811308 91 JMD_N_TMD_N-Pat...,11)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.178000 0.109835 0.109835 0.133383 0.179887 0.000587 0.000587 10,13,17 nan nan 92 JMD_N_TMD_N-Pat...,14)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.130000 0.067433 -0.067433 0.133237 0.146065 0.000642 0.002130 7,11,14 0.306800 0.574245 93 TMD_C_JMD_C-Pat...3,7)-MAXF760105 Conformation α-helix (left-handed) α-helix (left-handed) Normalized freq...Scheraga, 1976) 0.129000 0.071374 0.071374 0.180851 0.152571 0.000727 0.002285 23,27 0.000000 0.000000 94 TMD_C_JMD_C-Pat...,11)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.128000 0.062708 -0.062708 0.113629 0.123346 0.000767 0.002362 25,28,31 0.407200 0.686822 95 JMD_N_TMD_N-Seg...2,3)-CHAM830105 Shape Side chain length n atoms in side chain (3+1) The number of a...-Charton, 1983) 0.128000 0.057140 -0.057140 0.128493 0.130946 0.000672 0.002187 7,8,9,10,11,12,13 0.121600 0.273037 96 JMD_N_TMD_N-Seg...3,9)-ISOY800102 Conformation β-strand Extended Normalized rela...i et al., 1980) 0.126000 0.079975 -0.079975 0.169167 0.182954 0.000926 0.002636 5,6 1.002000 1.075427 label_test/label_refname the test and reference classes inlabels(default 1 / 0); set them to match a different encoding:df_lab = cpp.simplify(df_feat=df_feat, labels=labels, max_interpret_grade=2, label_test=1, label_ref=0) aa.display_df(df_lab, show_shape=True)
DataFrame shape: (96, 15)
feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions feat_importance feat_importance_std 1 TMD_C_JMD_C-Seg...3,4)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.244000 0.103666 0.103666 0.106692 0.110506 0.000000 0.000000 31,32,33,34,35 0.970400 1.438918 2 TMD_C_JMD_C-Seg...3,4)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.243000 0.085064 0.085064 0.098774 0.096946 0.000000 0.000000 31,32,33,34,35 0.000000 0.000000 3 TMD_C_JMD_C-Seg...6,9)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.233000 0.137044 0.137044 0.161683 0.176964 0.000000 0.000001 32,33 1.554800 2.109848 4 TMD_C_JMD_C-Seg...6,9)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.223000 0.095071 0.095071 0.114758 0.132829 0.000000 0.000002 32,33 0.000000 0.000000 5 TMD_C_JMD_C-Seg...2,3)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.222000 0.058671 0.058671 0.064895 0.069547 0.000000 0.000001 27,28,29,30,31,32,33 0.000000 0.000000 6 TMD_C_JMD_C-Seg...3,4)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.215000 0.124317 0.124317 0.166309 0.153364 0.000000 0.000004 31,32,33,34,35 1.080400 1.296094 7 TMD_C_JMD_C-Seg...,10)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.212000 0.141305 -0.141305 0.168603 0.217235 0.000000 0.000005 33,34 1.747200 2.150664 8 TMD_C_JMD_C-Seg...6,9)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.125350 0.125350 0.160819 0.174121 0.000000 0.000005 32,33 1.788800 2.700803 9 TMD_C_JMD_C-Seg...2,3)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.211000 0.077355 0.077355 0.102965 0.107453 0.000000 0.000005 27,28,29,30,31,32,33 3.048800 3.623912 10 TMD_C_JMD_C-Seg...6,9)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.205000 0.125868 -0.125868 0.172165 0.188333 0.000000 0.000009 32,33 0.000000 0.000000 11 TMD_C_JMD_C-Seg...4,5)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.204000 0.105513 0.105513 0.132849 0.145219 0.000000 0.000009 33,34,35,36 1.992000 2.929460 12 JMD_N_TMD_N-Seg...1,2)-KARP850101 Structure-Activity Flexibility Flexibility (0 ...igid neighbors) Flexibility par...s-Schulz, 1985) 0.196000 0.062671 0.062671 0.083456 0.090427 0.000000 0.000023 1,2,3,4,5,6,7,8,9,10 1.574400 1.835403 13 TMD_C_JMD_C-Seg...4,5)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.193000 0.076770 0.076770 0.092804 0.114150 0.000000 0.000027 33,34,35,36 0.000000 0.000000 14 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.189000 0.125674 0.125674 0.183876 0.218813 0.000001 0.000039 28,29 4.729200 4.776785 15 TMD_C_JMD_C-Seg...6,9)-KOEH090103 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.323000 0.248255 0.248255 0.196374 0.181558 0.000000 0.000000 32,33 nan nan 16 TMD_C_JMD_C-Seg...4,5)-CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 0.185000 0.105474 -0.105474 0.157535 0.163039 0.000001 0.000059 33,34,35,36 0.000000 0.000000 17 TMD_C_JMD_C-Seg...6,9)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.185000 0.101798 0.101798 0.145676 0.155096 0.000001 0.000054 32,33 0.000000 0.000000 18 TMD_C_JMD_C-Pat...,15)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.184000 0.062096 0.062096 0.078809 0.091271 0.000000 0.000017 26,30,33 0.147200 0.345306 19 JMD_N_TMD_N-Seg...2,4)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.183000 0.063902 -0.063902 0.090842 0.101427 0.000002 0.000068 6,7,8,9,10 0.823200 1.404583 20 TMD-Pattern(C,3...,15)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.198000 0.109532 0.109532 0.133076 0.159918 0.000122 0.000122 16,20,24,28 nan nan 21 TMD-Pattern(C,3...,15)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.182000 0.096246 0.096246 0.160859 0.159538 0.000002 0.000070 16,20,24,28 0.508400 0.738667 22 JMD_N_TMD_N-Seg...2,4)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.182000 0.066394 -0.066394 0.097857 0.103426 0.000002 0.000070 6,7,8,9,10 0.000000 0.000000 23 TMD_C_JMD_C-Seg...2,3)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.182000 0.063819 0.063819 0.101691 0.105987 0.000002 0.000071 27,28,29,30,31,32,33 0.000000 0.000000 24 TMD_C_JMD_C-Seg...,11)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.181000 0.057287 -0.057287 0.072234 0.106512 0.000002 0.000076 28,29 1.919600 2.094497 25 TMD_C_JMD_C-Pat...,12)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.220000 0.147418 0.147418 0.172594 0.195572 0.000020 0.000020 25,29,32 nan nan 26 TMD-PeriodicPat...3,4)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.180000 0.069277 -0.069277 0.094949 0.119524 0.000002 0.000082 13,16,20,23,27 1.818000 2.308293 27 JMD_N_TMD_N-Pat...,15)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.179000 0.115042 -0.115042 0.151938 0.189623 0.000002 0.000068 6,9,12,15 0.648400 1.061142 28 TMD-Pattern(C,4,7)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.176000 0.120892 0.120892 0.198986 0.216030 0.000004 0.000113 24,27 0.714800 1.118149 29 TMD_C_JMD_C-Pat...4,8)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.176000 0.087846 0.087846 0.140464 0.157561 0.000004 0.000113 24,28 2.704000 4.076269 30 TMD_C_JMD_C-Pat...,12)-BULH740102 ASA/Volume Partial specific volume Partial specific volume Apparent partia...l-Breese, 1974) 0.262000 0.153437 0.153437 0.130978 0.164028 0.000000 0.000000 21,24,28,32 nan nan 31 TMD-Pattern(C,4,7)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.176000 0.056675 -0.056675 0.099355 0.114698 0.000004 0.000113 24,27 0.372000 0.882270 32 TMD_C_JMD_C-Seg...2,3)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.175000 0.055597 -0.055597 0.089100 0.105827 0.000005 0.000126 27,28,29,30,31,32,33 0.664000 1.089536 33 TMD_C_JMD_C-Pat...,11)-QIAN880122 Conformation β-strand β-sheet Weights for bet...ejnowski, 1988) 0.173000 0.056328 0.056328 0.067428 0.094795 0.000006 0.000147 25,28,31 0.483200 0.913371 34 JMD_N_TMD_N-Per...3,2)-CHAM830104 Shape Side chain length n atoms in side chain (2+1) The number of a...-Charton, 1983) 0.172000 0.087470 -0.087470 0.135114 0.144731 0.000005 0.000137 2,5,8,11,14,17,20 0.444000 0.721620 35 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan 36 TMD_C_JMD_C-Seg...2,3)-LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 0.239000 0.099677 0.099677 0.098523 0.119762 0.000004 0.000004 27,28,29,30,31,32,33 nan nan 37 TMD_C_JMD_C-Seg...2,3)-KOEH090106 Polarity Hydrophilicity Polarity (hydrophilicity) Hydrophobicity ...r et al. (2009) 0.224000 0.088025 0.088025 0.095746 0.124611 0.000014 0.000014 27,28,29,30,31,32,33 nan nan 38 TMD_C_JMD_C-Seg...4,5)-OOBM770101 Polarity Hydrophilicity Non-bonded energy per atom Average non-bon...take-Ooi, 1977) 0.277000 0.217063 0.217063 0.180330 0.208994 0.000000 0.000000 33,34,35,36 nan nan 39 TMD_C_JMD_C-Pat...,14)-KLEP840101 Energy Charge Charge Net charge (Kle...n et al., 1984) 0.168000 0.086323 0.086323 0.121405 0.138577 0.000000 0.000030 30,34 0.140400 0.391229 40 TMD_C_JMD_C-Seg...4,5)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.167000 0.080568 0.080568 0.128898 0.128726 0.000011 0.000218 33,34,35,36 1.299200 2.159535 41 TMD_C_JMD_C-Seg...4,5)-AURR980110 Conformation α-helix α-helix (middle) Normalized posi...ora-Rose, 1998) 0.166000 0.081797 0.081797 0.121170 0.149555 0.000013 0.000239 33,34,35,36 1.295200 2.225137 42 TMD-Pattern(C,4,7)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.121210 -0.121210 0.143560 0.207767 0.000015 0.000254 24,27 1.302000 1.466618 43 TMD_C_JMD_C-Pat...5,8)-VELV850101 Energy Electron-ion interaction pot. Electron-ion in...ction potential Electron-ion in...c et al., 1985) 0.165000 0.119568 -0.119568 0.143560 0.205817 0.000014 0.000253 25,28 0.000000 0.000000 44 TMD_C_JMD_C-Seg...5,7)-TANS770108 Conformation β/α-bridge β/α-bridge Normalized freq...Scheraga, 1977) 0.164000 0.079708 0.079708 0.135324 0.137910 0.000016 0.000271 32,33,34 0.462400 0.706967 45 TMD-PeriodicPat...3,1)-OOBM850101 Structure-Activity Stability Stability (extended-coil) Optimized beta-...e et al., 1985) 0.197000 0.046799 0.046799 0.052251 0.070467 0.000133 0.000133 12,15,18,21,24,27,30 nan nan 46 TMD-Pattern(C,4,7)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.300000 0.236619 0.236619 0.165353 0.219458 0.000000 0.000000 24,27 nan nan 47 TMD_C_JMD_C-Pat...,11)-EISD860101 Polarity Hydrophobicity Solvation free energy Solvation free ...cLachlan, 1986) 0.162000 0.083936 -0.083936 0.143338 0.147948 0.000021 0.000304 30,33,37 0.330400 0.377566 48 TMD_C_JMD_C-Pat...5,8)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.162000 0.070292 -0.070292 0.096915 0.128362 0.000020 0.000302 21,25,28 1.528400 2.418922 49 TMD-Pattern(C,4...,11)-QIAN880130 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.161000 0.068424 -0.068424 0.096915 0.126975 0.000024 0.000332 20,24,27 0.000000 0.000000 50 JMD_N_TMD_N-Seg...2,4)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.161000 0.058976 -0.058976 0.096823 0.114647 0.000025 0.000335 6,7,8,9,10 0.000000 0.000000 51 JMD_N_TMD_N-Pat...,11)-PRAM820103 Shape Shape and Surface Correlation coe...t in regression Correlation coe...nnuswamy, 1982) 0.161000 0.057828 0.057828 0.088362 0.106085 0.000024 0.000328 1,5,8,11 1.304400 1.657101 52 TMD_C_JMD_C-Seg...5,7)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.160000 0.059281 -0.059281 0.100693 0.120806 0.000027 0.000359 32,33,34 0.757200 1.471249 53 TMD_C_JMD_C-Pat...4,8)-FINA910104 Conformation α-helix (C-cap) α-helix termination Helix terminati...n et al., 1991) 0.159000 0.103808 0.103808 0.140977 0.179008 0.000014 0.000248 33,37 0.233200 0.593921 54 JMD_N_TMD_N-Seg...,13)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.157000 0.127895 -0.127895 0.151304 0.258491 0.000035 0.000420 5,6 0.833200 1.360696 55 TMD_C_JMD_C-Pat...4,8)-JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 0.155000 0.110281 0.110281 0.178578 0.202098 0.000046 0.000486 33,37,40 0.272400 0.623809 56 JMD_N_TMD_N-Pat...,12)-AURR980113 Conformation α-helix α-helix Normalized posi...ora-Rose, 1998) 0.208000 0.105550 -0.105550 0.151448 0.143693 0.000055 0.000055 9,12,15 nan nan 57 JMD_N_TMD_N-Pat...,15)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.155000 0.059593 -0.059593 0.104862 0.110749 0.000050 0.000508 6,9,12,15 0.482000 0.672000 58 JMD_N_TMD_N-Pat...,11)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.154000 0.092099 -0.092099 0.142836 0.171547 0.000052 0.000520 4,7,11 1.065200 1.916900 59 TMD_C_JMD_C-Seg...,10)-CHAM820102 Polarity Hydrophobicity (interface) Free energy (interface) Free energy of ...-Charton, 1982) 0.154000 0.082300 -0.082300 0.136264 0.177551 0.000050 0.000508 33,34 0.366800 0.691767 60 TMD-Pattern(C,5...,12)-FAUJ880107 Structure-Activity Stability α-CH chemical s...kbone-dynamics) N.m.r. chemical...e et al., 1988) 0.123000 0.065435 -0.065435 0.173044 0.140726 0.017378 0.017378 19,22,26 nan nan 61 TMD_C_JMD_C-Pat...,11)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.153000 0.085041 -0.085041 0.135864 0.161279 0.000059 0.000561 30,33,37 0.473600 0.930690 62 TMD_C_JMD_C-Pat...,15)-WILM950103 Polarity Hydrophobicity (interface) Hydrophobicity (interface) Hydrophobicity ...e et al., 1995) 0.153000 0.069595 -0.069595 0.107314 0.134698 0.000060 0.000566 26,29,33 0.770800 1.299178 63 TMD_C_JMD_C-Pat...,15)-LINS030106 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded proteins) Hydrophilic med...s et al., 2003) 0.151000 0.071208 0.071208 0.136279 0.155749 0.000078 0.000657 26,30,33 0.326400 0.451202 64 TMD-Pattern(C,3...,14)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.150000 0.056439 -0.056439 0.094520 0.108682 0.000084 0.000685 17,20,24,28 0.684400 0.941892 65 TMD_C_JMD_C-Pat...,10)-LEVM760105 Shape Side chain length Side chain length Radius of gyrat... (Levitt, 1976) 0.149000 0.073526 0.073526 0.133612 0.157088 0.000090 0.000714 31,34,38 2.050800 2.338278 66 JMD_N_TMD_N-Seg...2,6)-ARGP820101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity ...s et al., 1982) 0.148000 0.076361 -0.076361 0.140513 0.148387 0.000108 0.000790 4,5,6 0.537200 1.041739 67 JMD_N_TMD_N-Pat...,15)-BROC820101 Polarity Hydrophobicity Hydrophobicity ...on coefficient) Retention Coeff...e et al., 1982) 0.148000 0.067069 -0.067069 0.120409 0.137261 0.000103 0.000768 6,9,12,15 0.106400 0.249766 68 TMD_C_JMD_C-Pat...4,8)-CORJ870108 Polarity Hydrophilicity TOTLS index TOTLS index (Co...e et al., 1987) 0.251000 0.170889 -0.170889 0.167914 0.219014 0.000001 0.000001 24,28 nan nan 69 TMD_C_JMD_C-Seg...2,5)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.260000 0.141016 0.141016 0.107804 0.160336 0.000000 0.000000 25,26,27,28 nan nan 70 TMD_C_JMD_C-Seg...,11)-LIFS790102 Conformation β-strand β-strand Conformational ...n-Sander, 1979) 0.333000 0.246278 0.246278 0.183876 0.212529 0.000000 0.000000 28,29 nan nan 71 JMD_N_TMD_N-Pat...,11)-BIGC670101 ASA/Volume Volume Volume Residue volume (Bigelow, 1967) 0.143000 0.067181 -0.067181 0.141579 0.135502 0.000184 0.001045 5,8,11 0.382000 0.675082 72 JMD_N_TMD_N-Pat...,11)-CIDH920102 Polarity Hydrophobicity Hydrophobicity Normalized hydr...d et al., 1992) 0.142000 0.070908 -0.070908 0.135389 0.144272 0.000190 0.001062 5,8,11 0.384400 0.570074 73 JMD_N_TMD_N-Pat...,14)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.142000 0.058743 -0.058743 0.117342 0.120311 0.000187 0.001056 6,10,14 0.197200 0.344958 74 TMD-Pattern(N,4,7)-QIAN880113 Conformation π-helix α-helix (C-terminal) Weights for alp...ejnowski, 1988) 0.141000 0.070553 -0.070553 0.164819 0.154840 0.000217 0.001151 14,17 0.634800 0.816456 75 JMD_N_TMD_N-Seg...3,6)-PALJ810111 Conformation β-sheet β-sheet Normalized freq...u et al., 1981) 0.108000 0.053468 -0.053468 0.129405 0.141614 0.035900 0.035900 7,8,9,10 nan nan 76 TMD_C_JMD_C-Pat...4,8)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.140000 0.066859 0.066859 0.130397 0.147129 0.000229 0.001185 33,37,40 0.334800 0.632640 77 JMD_N_TMD_N-Seg...2,7)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.139000 0.070195 -0.070195 0.113589 0.146944 0.000259 0.001276 3,4,5 0.498800 0.924962 78 JMD_N_TMD_N-Seg...2,6)-RACS770103 ASA/Volume Accessible surface area (ASA) Side chain orientation Side chain orie...Scheraga, 1977) 0.138000 0.069674 0.069674 0.151437 0.143090 0.000308 0.001398 4,5,6 0.214400 0.501327 79 TMD_C_JMD_C-Seg...4,8)-MIYS850101 Polarity Hydrophobicity Effective partition energy Effective parti...Jernigan, 1985) 0.215000 0.123198 0.123198 0.127012 0.166940 0.000030 0.000030 28,29,30 nan nan 80 JMD_N_TMD_N-Seg...7,8)-KARS160114 Shape Side chain length Eccentricity (average) Average weighte...-Knisley, 2016) 0.137000 0.056352 -0.056352 0.122287 0.122893 0.000322 0.001432 16,17 1.170800 1.925978 81 TMD_C_JMD_C-Seg...,14)-ROSM880103 Structure-Activity Backbone-dynamics (-CH) Loss of hydropa...helix formation Loss of Side ch...(Roseman, 1988) 0.136000 0.080537 -0.080537 0.194254 0.165343 0.000150 0.000932 26,27 0.638000 0.796859 82 TMD_C_JMD_C-Seg...,10)-BAEK050101 Conformation β-strand Linker index (n...AA long region) Linker index (B...e et al., 2005) 0.136000 0.072267 -0.072267 0.142246 0.173638 0.000352 0.001527 33,34 1.013200 1.315181 83 TMD_C_JMD_C-Pat...5,8)-LINS030107 ASA/Volume Accessible surface area (ASA) ASA (folded protein) % total accessi...s et al., 2003) 0.136000 0.064864 -0.064864 0.078387 0.131618 0.000367 0.001565 25,28 0.842000 0.904274 84 JMD_N_TMD_N-Pat...6,9)-ZHOH040101 Structure-Activity Stability Stability The stability s...hou-Zhou, 2004) 0.135000 0.062723 -0.062723 0.120282 0.141044 0.000396 0.001638 3,6,9 0.696800 1.062095 85 TMD-Pattern(N,1...4,7)-RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) 0.135000 0.058024 -0.058024 0.115415 0.124556 0.000385 0.001610 11,14,17 0.244400 0.503183 86 JMD_N_TMD_N-Seg...,10)-CRAJ730102 Conformation β-sheet β-sheet Normalized freq...d et al., 1973) 0.134000 0.096792 -0.096792 0.182935 0.210285 0.000461 0.001775 5,6 0.485600 0.792949 87 JMD_N_TMD_N-Pat...,14)-QIAN880134 Conformation Coil Coil Weights for coi...ejnowski, 1988) 0.133000 0.071020 0.071020 0.161372 0.138873 0.000491 0.001836 6,10,14 0.000000 0.000000 88 JMD_N_TMD_N-Seg...7,9)-BURA740102 Conformation β-strand Extended Normalized freq...s et al., 1974) 0.132000 0.056043 0.056043 0.119813 0.123454 0.000562 0.001981 14,15 0.231600 0.356019 89 TMD-Segment(3,8)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.132000 0.055783 -0.055783 0.129933 0.133383 0.000558 0.001977 16,17 0.502400 0.761626 90 TMD-Pattern(N,1...,10)-QIAN880124 Conformation β-sheet (C-term) β-sheet (C-terminal) Weights for bet...ejnowski, 1988) 0.131000 0.069857 0.069857 0.157078 0.159138 0.000580 0.002008 11,14,17,20 0.502800 0.811308 91 JMD_N_TMD_N-Pat...,11)-FUKS010109 Composition AA composition Proteins of thermophiles (INT) Entire chain co...ishikawa, 2001) 0.178000 0.109835 0.109835 0.133383 0.179887 0.000587 0.000587 10,13,17 nan nan 92 JMD_N_TMD_N-Pat...,14)-BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 0.130000 0.067433 -0.067433 0.133237 0.146065 0.000642 0.002130 7,11,14 0.306800 0.574245 93 TMD_C_JMD_C-Pat...3,7)-MAXF760105 Conformation α-helix (left-handed) α-helix (left-handed) Normalized freq...Scheraga, 1976) 0.129000 0.071374 0.071374 0.180851 0.152571 0.000727 0.002285 23,27 0.000000 0.000000 94 TMD_C_JMD_C-Pat...,11)-TANS770102 Conformation α-helix (C-term, out) α-helix (C-terminal, outside) Normalized freq...Scheraga, 1977) 0.128000 0.062708 -0.062708 0.113629 0.123346 0.000767 0.002362 25,28,31 0.407200 0.686822 95 JMD_N_TMD_N-Seg...2,3)-CHAM830105 Shape Side chain length n atoms in side chain (3+1) The number of a...-Charton, 1983) 0.128000 0.057140 -0.057140 0.128493 0.130946 0.000672 0.002187 7,8,9,10,11,12,13 0.121600 0.273037 96 JMD_N_TMD_N-Seg...3,9)-ISOY800102 Conformation β-strand Extended Normalized rela...i et al., 1980) 0.126000 0.079975 -0.079975 0.169167 0.182954 0.000926 0.002636 5,6 1.002000 1.075427 return_details=Trueadditionally returns a long-form table of every candidate scale considered for each feature, with its interpretability grade, correlation with the original scale, recomputedstd_test, and whether it was accepted:df_simple2, df_candidates = cpp.simplify(df_feat=df_feat, labels=labels, max_interpret_grade=2, return_details=True) aa.display_df(df_candidates, show_shape=True)
DataFrame shape: (78, 9)
feature candidate_scale interpretability_orig interpretability_cand cor std_test accepted cv_score reason 1 TMD_C_JMD_C-Seg...6,9)-TANS770106 RADA880101 8.000000 1.000000 -0.742074 0.208957 False nan max_std_test 2 TMD_C_JMD_C-Seg...6,9)-TANS770106 KOEH090103 8.000000 1.000000 0.737712 0.196374 True 0.907051 accepted 3 TMD-Pattern(C,3...,15)-ANDN920101 FUKS010109 8.000000 1.000000 -0.728856 0.133076 True 0.907051 accepted 4 TMD_C_JMD_C-Pat...,12)-ANDN920101 FUKS010109 8.000000 1.000000 -0.728856 0.172594 True 0.907051 accepted 5 TMD_C_JMD_C-Seg...4,5)-TANS770106 RADA880101 8.000000 1.000000 -0.742074 0.162400 True 0.907051 accepted 6 TMD-Pattern(C,5...,15)-OOBM770105 KARS160117 8.000000 1.000000 -0.959796 0.082837 True 0.907051 accepted 7 TMD-Pattern(C,4,7)-ANDN920101 FUKS010109 8.000000 1.000000 -0.728856 0.165353 True 0.907051 accepted 8 TMD_C_JMD_C-Seg...3,4)-MONM990101 KOEH090102 8.000000 1.000000 0.922232 0.134902 True 0.907051 accepted 9 TMD_C_JMD_C-Pat...3,7)-OOBM770105 KARS160117 8.000000 1.000000 -0.959796 0.117714 True 0.907051 accepted 10 TMD_C_JMD_C-Seg...2,5)-ANDN920101 FUKS010109 8.000000 1.000000 -0.728856 0.107804 True 0.907051 accepted 11 JMD_N_TMD_N-Seg...3,6)-VASM830102 PALJ810111 8.000000 1.000000 -0.754824 0.129405 True 0.907051 accepted 12 TMD_C_JMD_C-Pat...,15)-MONM990101 KOEH090102 8.000000 1.000000 0.922232 0.103456 True 0.907051 accepted 13 TMD_C_JMD_C-Pat...,10)-TANS770106 RADA880101 8.000000 1.000000 -0.742074 0.178993 True 0.907051 accepted 14 JMD_N_TMD_N-Pat...,11)-ANDN920101 FUKS010109 8.000000 1.000000 -0.728856 0.133383 True 0.907051 accepted 15 JMD_N_TMD_N-Pat...,11)-VASM830102 PALJ810111 8.000000 1.000000 -0.754824 0.125346 True 0.907051 accepted 16 TMD_C_JMD_C-Seg...3,4)-PRAM820102 LINS030101 7.000000 1.000000 -0.756652 0.121842 True 0.907051 accepted 17 TMD_C_JMD_C-Pat...,12)-FAUJ880108 BULH740102 7.000000 4.000000 -0.740983 0.130978 True 0.907051 accepted 18 TMD_C_JMD_C-Seg...2,3)-PRAM820102 LINS030101 7.000000 1.000000 -0.756652 0.098523 True 0.907051 accepted 19 TMD-Pattern(C,5...,12)-PRAM820102 LINS030101 7.000000 1.000000 -0.756652 0.110439 True 0.907051 accepted 20 JMD_N_TMD_N-Seg...,15)-FAUJ880101 RACS820111 7.000000 1.000000 0.796898 0.194309 True 0.907051 accepted 21 JMD_N_TMD_N-Seg...3,8)-FAUJ880101 RACS820111 7.000000 1.000000 0.796898 0.163724 True 0.907051 accepted 22 JMD_N_TMD_N-Seg...2,4)-MEIH800101 MIYS850101 7.000000 1.000000 -0.956832 0.096040 True 0.907051 accepted 23 JMD_N_TMD_N-Seg...2,6)-MEIH800101 MIYS850101 7.000000 1.000000 -0.956832 0.117602 True 0.907051 accepted 24 TMD_C_JMD_C-Pat...4,8)-MEIH800101 MIYS850101 7.000000 1.000000 -0.956832 0.153526 True 0.907051 accepted 25 TMD_C_JMD_C-Seg...4,8)-MEIH800101 MIYS850101 7.000000 1.000000 -0.956832 0.127012 True 0.907051 accepted 26 JMD_N_TMD_N-Per...4,3)-QIAN880138 QIAN880112 6.000000 5.000000 -0.774206 0.089768 True 0.907051 accepted 27 JMD_N_TMD_N-Per...4,3)-QIAN880138 QIAN880112 6.000000 5.000000 -0.774206 0.090360 True 0.907051 accepted 28 JMD_N_TMD_N-Pat...,11)-QIAN880127 OOBM850105 6.000000 1.000000 -0.812942 0.118719 True 0.907051 accepted 29 TMD_C_JMD_C-Pat...,10)-QIAN880138 QIAN880112 6.000000 5.000000 -0.774206 0.101748 True 0.907051 accepted 30 TMD_C_JMD_C-Seg...2,3)-CHOP780212 PALJ810106 5.000000 1.000000 0.800921 0.109387 True 0.907051 accepted 31 TMD_C_JMD_C-Pat...,12)-FINA770101 AURR980113 5.000000 1.000000 0.848669 0.103029 True 0.907051 accepted 32 TMD_C_JMD_C-Pat...,12)-CHOP780212 PALJ810106 5.000000 1.000000 0.800921 0.156745 True 0.907051 accepted 33 TMD-Pattern(N,1...4,7)-FINA770101 AURR980113 5.000000 1.000000 0.848669 0.151936 True 0.907051 accepted 34 JMD_N_TMD_N-Pat...,12)-FINA770101 AURR980113 5.000000 1.000000 0.848669 0.151448 True 0.907051 accepted 35 TMD-Pattern(C,5...,12)-MAXF760105 FAUJ880107 5.000000 1.000000 -0.771112 0.173044 True 0.907051 accepted 36 TMD-Pattern(N,2...,11)-RACS820101 CIDH920104 5.000000 1.000000 -0.716346 0.128234 True 0.907051 accepted 37 TMD_C_JMD_C-Pat...,13)-CHOP780212 PALJ810106 5.000000 1.000000 0.800921 0.158145 True 0.907051 accepted 38 JMD_N_TMD_N-Pat...,11)-RACS820101 CIDH920104 5.000000 1.000000 -0.716346 0.126433 True 0.907051 accepted 39 TMD-Pattern(C,4...,11)-FINA770101 AURR980113 5.000000 1.000000 0.848669 0.127339 True 0.907051 accepted 40 TMD_C_JMD_C-Pat...,12)-QIAN880114 CIDH920103 5.000000 1.000000 -0.763299 0.124914 True 0.907051 accepted 41 TMD_C_JMD_C-Seg...6,9)-PALJ810113 PALJ810105 5.000000 1.000000 0.732768 0.089871 True 0.907051 accepted 42 JMD_N_TMD_N-Seg...2,4)-QIAN880114 CIDH920103 5.000000 1.000000 -0.763299 0.095751 True 0.907051 accepted 43 TMD_C_JMD_C-Pat...3,7)-MAXF760105 FAUJ880107 5.000000 1.000000 -0.771112 0.211477 False nan max_std_test 44 JMD_N_TMD_N-Seg...3,9)-KOEP990102 GRAR740102 5.000000 1.000000 0.773220 0.156564 True 0.907051 accepted 45 TMD_C_JMD_C-Seg...3,4)-HUTJ700102 LINS030101 4.000000 1.000000 0.869506 0.121842 True 0.907051 accepted 46 TMD_C_JMD_C-Seg...3,4)-JANJ790102 KOEH090106 4.000000 1.000000 -1.000000 0.159718 True 0.907051 accepted 47 TMD_C_JMD_C-Seg...6,9)-DESM900102 OOBM770101 4.000000 1.000000 -0.949505 0.202111 False nan max_std_test 48 TMD_C_JMD_C-Seg...6,9)-DESM900102 LINS030107 4.000000 1.000000 -0.948654 0.193808 True 0.907051 accepted 49 TMD_C_JMD_C-Seg...6,9)-RICJ880113 WILM950101 4.000000 1.000000 -0.708204 0.109849 True 0.907051 accepted 50 TMD_C_JMD_C-Pat...5,8)-RADA880104 EISD840101 4.000000 1.000000 0.908505 0.052593 True 0.907051 accepted 51 TMD-Pattern(C,4,7)-RADA880104 EISD840101 4.000000 1.000000 0.908505 0.052593 True 0.907051 accepted 52 TMD_C_JMD_C-Pat...4,8)-JANJ790102 KOEH090106 4.000000 1.000000 -1.000000 0.181777 True 0.907051 accepted 53 JMD_N_TMD_N-Pat...,10)-AURR980116 QIAN880110 4.000000 1.000000 0.753792 0.153325 True 0.907051 accepted 54 TMD_C_JMD_C-Seg...4,5)-RICJ880113 WILM950101 4.000000 1.000000 -0.708204 0.102916 True 0.907051 accepted 55 TMD-Pattern(N,4,7)-AURR980116 QIAN880110 4.000000 1.000000 0.753792 0.167930 True 0.914744 accepted 56 TMD_C_JMD_C-Seg...4,5)-YUTK870103 EISD860102 4.000000 3.000000 -0.838651 0.180052 True 0.914744 accepted 57 TMD_C_JMD_C-Pat...,15)-YUTK870101 GUYH850105 4.000000 1.000000 -0.840600 0.126437 True 0.914744 accepted 58 TMD_C_JMD_C-Seg...2,3)-HUTJ700102 LINS030101 4.000000 1.000000 0.869506 0.098523 True 0.914744 accepted 59 TMD_C_JMD_C-Seg...2,3)-JANJ790102 KOEH090106 4.000000 1.000000 -1.000000 0.095746 True 0.914744 accepted 60 TMD_C_JMD_C-Seg...4,5)-DESM900102 OOBM770101 4.000000 1.000000 -0.949505 0.180330 True 0.914744 accepted 61 TMD_C_JMD_C-Seg...2,2)-RICJ880113 WILM950101 4.000000 1.000000 -0.708204 0.067646 True 0.914744 accepted 62 TMD-Pattern(C,5...,12)-HUTJ700102 LINS030101 4.000000 1.000000 0.869506 0.110439 True 0.914744 accepted 63 TMD-PeriodicPat...3,1)-COHE430101 OOBM850101 4.000000 1.000000 0.759615 0.052251 True 0.914744 accepted 64 JMD_N_TMD_N-Pat...,13)-RICJ880107 ROSG850101 4.000000 1.000000 0.796390 0.101561 True 0.914744 accepted 65 TMD_C_JMD_C-Pat...4,8)-CORJ870107 CORJ870108 4.000000 1.000000 -0.996278 0.167914 True 0.914744 accepted 66 TMD-Pattern(C,4...,11)-RICJ880107 ROSG850101 4.000000 1.000000 0.796390 0.124610 False 0.907051 cv_drop 67 TMD-Pattern(C,4...,11)-RICJ880107 CHOP780203 4.000000 1.000000 -0.792356 0.138664 True 0.922436 accepted 68 TMD_C_JMD_C-Seg...,11)-COHE430101 OOBM850101 4.000000 1.000000 0.759615 0.084763 False 0.914744 cv_drop 69 TMD_C_JMD_C-Seg...,11)-COHE430101 PALJ810106 4.000000 1.000000 -0.717945 0.167896 False 0.914744 cv_drop 70 TMD_C_JMD_C-Seg...,11)-COHE430101 LIFS790102 4.000000 1.000000 0.713730 0.183876 True 0.922436 accepted 71 JMD_N_TMD_N-Seg...,10)-RICJ880111 BHAR880101 4.000000 1.000000 -0.812868 0.178100 True 0.922436 accepted 72 TMD_C_JMD_C-Pat...,14)-HUTJ700102 LINS030101 4.000000 1.000000 0.869506 0.165586 True 0.922436 accepted 73 JMD_N_TMD_N-Pat...,11)-HUTJ700102 LINS030101 4.000000 1.000000 0.869506 0.137245 True 0.922436 accepted 74 TMD_C_JMD_C-Seg...2,3)-YUTK870103 EISD860102 4.000000 3.000000 -0.838651 0.108394 True 0.922436 accepted 75 JMD_N_TMD_N-Pat...,12)-RICJ880111 BHAR880101 4.000000 1.000000 -0.812868 0.164544 True 0.922436 accepted 76 TMD_C_JMD_C-Seg...4,5)-FAUJ880109 GUYH850105 3.000000 1.000000 0.926858 0.157550 True 0.922436 accepted 77 TMD_C_JMD_C-Seg...4,6)-FAUJ880109 GUYH850105 3.000000 1.000000 0.926858 0.162605 True 0.922436 accepted 78 TMD_C_JMD_C-Seg...2,2)-FAUJ880109 GUYH850105 3.000000 1.000000 0.926858 0.086797 True 0.922436 accepted CPPPlot().feature_map— the signature CPP visualization — makes the simplification visible: it lays the per-feature mean differences out along the sequence, groups the rows by subcategory, and shows feature importance as the bar track. With 150 features the map is unreadable, so we show the top 40 features by importance on a tall canvas. The original feature set already carriesfeat_importance:import matplotlib.pyplot as plt cpp_plot = aa.CPPPlot() aa.plot_settings(weight_bold=False) df_feat_top = df_feat.sort_values("feat_importance", ascending=False).head(40) cpp_plot.feature_map(df_feat=df_feat_top, figsize=(8, 14)) plt.show()
And the simplified set (swapped features carry no importance, so we re-attach it with
TreeModel) — it speaks in fewer, more interpretable subcategories, with the original (most interpretable) features protected:df_scales_all = aa.load_scales() X = sf.feature_matrix(features=list(df_simple["feature"]), df_parts=df_parts, df_scales=df_scales_all) df_simple_imp = aa.TreeModel().fit(X, labels=labels).add_feat_importance(df_feat=df_simple, drop=True, sort=True) df_simple_top = df_simple_imp.head(40) cpp_plot.feature_map(df_feat=df_simple_top, figsize=(8, 14)) plt.show()
/Users/stephanbreimann/Programming/1Packages/aaanalysis-simplify-fast/aaanalysis/feature_engineering/_backend/cpp_run.py:143: UserWarning: CPP is using the Python kernel fallback — the compiled Cython extension is not available in this install. Output is bit-exact with the Cython path but ~2x slower. Reinstall via pip install --force-reinstall aaanalysis to fetch a prebuilt wheel. warnings.warn(
Finally, an overview of how the subcategory vocabulary shifts as
max_interpret_gradeis tightened from 10 (keep everything) down to 1 (only the best, grade-1 tier). Each grade level is one colored series (aa.plot_get_clist(n_colors=10)); as the grade tightens, features in worse-graded subcategories are replaced and migrate into the most interpretable ones:import matplotlib.pyplot as plt import pandas as pd levels = list(range(1, 11)) colors = aa.plot_get_clist(n_colors=10) counts = {g: cpp.simplify(df_feat=df_feat, labels=labels, max_interpret_grade=g) ["subcategory"].value_counts() for g in levels} df_levels = pd.DataFrame(counts).fillna(0).astype(int) print(df_levels) top_subcats = df_levels[10].sort_values(ascending=False).head(8).index df_plot = df_levels.loc[top_subcats] aa.plot_settings(weight_bold=False) ax = df_plot.plot(kind="barh", figsize=(7, 6), color=colors, width=0.85, legend=False) ax.invert_yaxis() ax.set_xlabel("number of features") ax.set_ylabel("subcategory") ax.legend(title="max_interpret_grade", labels=[str(g) for g in levels], loc="upper center", bbox_to_anchor=(0.5, -0.12), ncol=5, frameon=False) plt.tight_layout() plt.show()
1 2 3 4 5 6 7 8 9 10 subcategory AA composition 5 5 5 5 5 5 5 0 0 0 Accessible surface area (ASA) 6 6 6 6 6 6 6 6 6 6 Amphiphilicity 0 3 3 3 3 3 3 3 3 3 Amphiphilicity (α-helix) 0 0 0 3 3 3 3 3 3 3 Backbone-dynamics (-CH) 2 2 2 2 2 2 2 7 7 7 Buried 5 5 5 5 5 5 5 5 5 5 Charge 3 3 3 3 3 3 3 3 3 3 Coil 5 5 5 5 5 5 5 5 5 5 Coil (C-term) 0 0 0 0 0 3 3 3 3 3 Coil (N-term) 0 0 0 0 0 1 1 1 1 1 Electron-ion interaction pot. 3 3 3 3 3 3 4 4 4 4 Entropy 0 0 0 5 5 5 5 5 5 5 Flexibility 1 1 1 1 1 1 1 1 1 1 Free energy (unfolding) 0 0 0 8 8 8 8 8 8 8 Hydrophilicity 4 4 4 2 1 1 1 0 0 0 Hydrophobicity 7 7 7 6 6 6 6 6 6 6 Hydrophobicity (interface) 4 4 4 4 4 4 4 4 4 4 Isoelectric point 0 0 3 3 3 3 3 3 3 3 Non-bonded energy 0 0 0 0 0 0 0 4 4 4 Partial specific volume 1 1 1 2 2 2 2 2 2 2 Reduced distance 0 0 0 0 0 0 4 4 4 4 Shape and Surface 1 1 1 1 1 1 4 4 4 4 Side chain length 8 7 7 7 7 7 7 7 7 7 Stability 6 6 6 5 4 4 4 4 4 4 Stability (helix-coil) 0 0 0 0 4 4 4 4 4 4 Steric parameter 0 0 0 0 0 0 2 2 2 2 Volume 7 7 7 6 6 6 5 5 5 5 α-helix 5 5 5 4 4 4 4 4 4 4 α-helix (C-cap) 3 3 3 8 8 8 8 8 8 8 α-helix (C-term, out) 3 3 3 3 3 3 3 3 3 3 α-helix (left-handed) 1 1 1 1 3 3 3 3 3 3 β-sheet 2 2 2 2 2 2 2 1 1 1 β-sheet (C-term) 1 1 1 1 4 4 4 4 4 4 β-sheet (N-term) 0 0 0 0 5 5 5 5 5 5 β-strand 9 9 9 8 8 8 8 8 8 8 β-turn (TM helix) 0 0 0 0 0 0 0 5 5 5 β/α-bridge 1 1 1 1 1 1 1 1 1 1 π-helix 1 1 1 5 5 5 5 5 5 5