aaanalysis.CPP.run_num
- CPP.run_num(dict_num_parts=None, labels=None, label_test=1, label_ref=0, n_filter=100, n_pre_filter=None, pct_pre_filter=5, max_std_test=0.2, max_overlap=0.5, max_cor=0.5, check_cat=True, parametric=False, start=1, tmd_len=20, jmd_n_len=10, jmd_c_len=10, n_jobs=None, vectorized=True, n_batches=None, return_stats=False)[source]
Numerical-mode CPP: same algorithm as
run(), but per-residue values come from a pre-sliced numerical tensor (dict_num_parts) instead of an AA→scale lookup. Use for PLM embeddings, DSSP one-hots, PTM dummies, or any per-residue numerical representation.Same pipeline (pre-filter stats, pre-filter, recompute, add_stat, redundancy filter) and same output schema as
run(). The constructor-bounddf_scales/df_catprovide DIMENSION NAMES + categories for the D axis ofdict_num_parts(the per-AA values they would normally provide are unused —dict_num_partsis the value source).Added in version 1.1.0.
- Parameters:
dict_num_parts (dict[str, np.ndarray], required) – Per-part NaN-padded numerical tensors, produced by
NumericalFeature.get_parts(). Each value has shape(n_samples, L_part_max, D)aligned row-for-row withself.df_parts. Keys must matchself.df_parts.columns.Dmust equallen(self.df_scales.columns)(each D dimension names a “scale”).labels (
Union[Sequence[Union[int,float]],ndarray,Series]) – Seerun(). Same semantics, same defaults.label_test (
int) – Seerun(). Same semantics, same defaults.n_pre_filter (
Optional[int]) – Seerun(). Same semantics, same defaults.pct_pre_filter (
int) – Seerun(). Same semantics, same defaults.max_std_test (
float) – Seerun(). Same semantics, same defaults.max_overlap (
float) – Seerun(). Same semantics, same defaults.check_cat (
bool) – Seerun(). Same semantics, same defaults.parametric (
bool) – Seerun(). Same semantics, same defaults.n_jobs (
Optional[int]) – Seerun(). Same semantics, same defaults.vectorized (
bool) – Seerun(). Same semantics, same defaults.n_batches (
Optional[int]) – Seerun(). Same semantics, same defaults.
- Returns:
df_feat – Same schema as
run().- Return type:
pd.DataFrame, shape (n_features, n_feature_info)
- Raises:
ValueError – If
dict_num_partsisNone(userun()for seq-mode), or if its shape / part names / D don’t align with the constructor’sself.df_partsandself.df_scales.NotImplementedError – If
n_batchesis supplied (batched orchestration over the D axis is not yet implemented for numerical mode; passn_batches=None).
Notes
Raw PLM embeddings are not directly usable — normalize them first. Per-residue values are expected in
[0, 1](theStructurePreprocessor/AnnotationPreprocessornormalization convention), since the defaultmax_std_test=0.2pre-filter is calibrated for that range. Raw embeddings (unbounded floats) must be passed throughEmbeddingPreprocessor.encode()to obtain a[0, 1]-normalized{entry: (L, D)}dict_numbeforeNumericalFeature.get_parts(). (EmbeddingPreprocessor.build_scales/build_catserve the other, AA-scale path viarun(); they are not a per-residue value source here.)Three arms, one entry point. structure-only (
dict_numfromStructurePreprocessor), embedding (EmbeddingPreprocessor.encode), and fused (concatenate sources withaaanalysis.combine_dict_nums()first) all flow throughget_parts→run_num— only thedict_numdiffers.
See also
run(): sequence-mode equivalent (nodict_num_parts).NumericalFeature.get_parts(): produces(df_parts, dict_num_parts)from rawdf_seq + dict_num.EmbeddingPreprocessor,StructurePreprocessor,AnnotationPreprocessor: the per-residuedict_numsources (PLM embeddings / structure / annotations), combinable viaaaanalysis.combine_dict_nums().
Examples
To demonstrate the
CPP().run()method, we load theDOM_GSECexample dataset (see [Breimann25a]):import aaanalysis as aa aa.options["verbose"] = False df_seq = aa.load_dataset(name="DOM_GSEC") labels = df_seq["label"].to_list() sf = aa.SequenceFeature() df_parts = sf.get_df_parts(df_seq=df_seq)
You just need to provide
df_partsto theCPPobject and run the algorithm with its respective labels using theCPP().run()method:cpp = aa.CPP(df_parts=df_parts) # Create >500,000 feature and filter them down to 100 features df_feat = cpp.run(labels=labels) aa.display_df(df_feat, n_rows=10, show_shape=True)
DataFrame shape: (100, 13)
feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions 1 TMD_C_JMD_C-Seg...2,3)-QIAN880106 Conformation α-helix α-helix (middle) Weights for alp...ejnowski, 1988) 0.387000 0.118000 0.118000 0.068000 0.080000 0.000000 0.000000 27,28,29,30,31,32,33 2 TMD_C_JMD_C-Pat...,14)-CRAJ730103 Conformation β-turn β-turn Normalized freq...d et al., 1973) 0.377000 0.285000 -0.285000 0.164000 0.177000 0.000000 0.000000 27,31 3 TMD_C_JMD_C-Seg...6,9)-FAUJ880104 Shape Side chain length Steric parameter STERIMOL length...e et al., 1988) 0.367000 0.263000 0.263000 0.161000 0.168000 0.000000 0.000000 32,33 4 TMD_C_JMD_C-Seg...6,9)-ONEK900101 Others Unclassified (Others) ΔG values in peptides Delta G values ...-DeGrado, 1990) 0.366000 0.111000 0.111000 0.070000 0.114000 0.000000 0.000000 32,33 5 TMD_C_JMD_C-Pat...,15)-QIAN880107 Conformation α-helix α-helix (middle) Weights for alp...ejnowski, 1988) 0.363000 0.162000 0.162000 0.091000 0.118000 0.000000 0.000000 24,28,32,35 6 TMD_C_JMD_C-Seg...3,4)-HUTJ700103 Energy Entropy Entropy Entropy of form...Hutchens, 1970) 0.360000 0.187000 0.187000 0.115000 0.128000 0.000000 0.000000 31,32,33,34,35 7 TMD_C_JMD_C-Seg...2,3)-WOLS870103 Others PC 4 Principal Component 3 (Wold) Principal prope...d et al., 1987) 0.359000 0.159000 -0.159000 0.090000 0.130000 0.000000 0.000000 27,28,29,30,31,32,33 8 TMD_C_JMD_C-Pat...,12)-CRAJ730103 Conformation β-turn β-turn Normalized freq...d et al., 1973) 0.352000 0.227000 -0.227000 0.150000 0.170000 0.000000 0.000000 24,28,32 9 TMD_C_JMD_C-Seg...6,9)-MUNV940102 Energy Free energy (folding) Free energy (α-helix) Free energy in ...-Serrano, 1994) 0.350000 0.129000 -0.129000 0.079000 0.124000 0.000000 0.000000 32,33 10 TMD_C_JMD_C-Seg...3,4)-WOLS870103 Others PC 4 Principal Component 3 (Wold) Principal prope...d et al., 1987) 0.341000 0.214000 -0.214000 0.128000 0.177000 0.000000 0.000000 31,32,33,34,35 Adjust Parts, Splits, and Scales as follows:
df_parts = sf.get_df_parts(df_seq=df_seq, list_parts=["tmd_jmd"]) split_kws = sf.get_split_kws(split_types=["Segment"], n_split_min=1, n_split_max=5) # Load one of the provided top scale datasets df_scales = aa.load_scales(top60_n=38) # Create ~700 feature and filter them down to 19 features cpp = aa.CPP(df_parts=df_parts, split_kws=split_kws, df_scales=df_scales) df_feat = cpp.run(labels=labels) aa.display_df(df_feat, n_rows=10, show_shape=True)
DataFrame shape: (19, 13)
feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions 1 TMD_JMD-Segment...4,5)-ROBB760113 Conformation β-turn β-turn Information mea...n-Suzuki, 1976) 0.316000 0.137000 -0.137000 0.102000 0.108000 0.000000 0.000000 25,26,27,28,29,30,31,32 2 TMD_JMD-Segment...4,4)-ZIMJ680104 Energy Isoelectric point Isoelectric point Isoelectric poi...n et al., 1968) 0.312000 0.099000 0.099000 0.069000 0.095000 0.000000 0.000000 31,32,33,34,35,36,37,38,39,40 3 TMD_JMD-Segment...4,5)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.297000 0.086000 0.086000 0.077000 0.068000 0.000000 0.000000 25,26,27,28,29,30,31,32 4 TMD_JMD-Segment...5,5)-LINS030104 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Total median ac...s et al., 2003) 0.295000 0.141000 0.141000 0.115000 0.130000 0.000000 0.000000 33,34,35,36,37,38,39,40 5 TMD_JMD-Segment...5,5)-JANJ780102 ASA/Volume Buried Buried Percentage of b...n et al., 1978) 0.291000 0.130000 -0.130000 0.099000 0.124000 0.000000 0.000000 33,34,35,36,37,38,39,40 6 TMD_JMD-Segment...5,5)-ZIMJ680103 Polarity Hydrophilicity Polarity (hydrophilicity) Polarity (Zimme...n et al., 1968) 0.289000 0.178000 0.178000 0.159000 0.163000 0.000000 0.000000 33,34,35,36,37,38,39,40 7 TMD_JMD-Segment...4,5)-FUKS010106 Composition Membrane proteins (MPs) Proteins of mesophiles (INT) Interior compos...ishikawa, 2001) 0.277000 0.123000 0.123000 0.104000 0.127000 0.000000 0.000000 25,26,27,28,29,30,31,32 8 TMD_JMD-Segment...4,4)-WOLR790101 Polarity Hydrophobicity (surrounding) Hydration potential Hydrophobicity ...n et al., 1979) 0.267000 0.105000 -0.105000 0.100000 0.113000 0.000000 0.000001 31,32,33,34,35,36,37,38,39,40 9 TMD_JMD-Segment...2,2)-CEDJ970105 Composition AA composition Nuclear proteins Composition of ...o et al., 1997) 0.263000 0.062000 0.062000 0.062000 0.069000 0.000000 0.000001 21,22,23,24,25,...,36,37,38,39,40 10 TMD_JMD-Segment...5,5)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.262000 0.073000 0.073000 0.071000 0.086000 0.000000 0.000001 33,34,35,36,37,38,39,40 The maximum number of final features can be adjusted using the
n_filter(default=100) parameter. The actual number of features may be less, depending on: (a) the initial number of features generated (defined by thepart-split-scalecombinations), and (b) the strictness of both pre-filtering and filtering criteria.# Create ~700 feature and filter them down to 10 features df_feat = cpp.run(labels=labels, n_filter=10) aa.display_df(df_feat, n_rows=10, show_shape=True)
DataFrame shape: (10, 13)
feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions 1 TMD_JMD-Segment...4,5)-ROBB760113 Conformation β-turn β-turn Information mea...n-Suzuki, 1976) 0.316000 0.137000 -0.137000 0.102000 0.108000 0.000000 0.000000 25,26,27,28,29,30,31,32 2 TMD_JMD-Segment...4,4)-ZIMJ680104 Energy Isoelectric point Isoelectric point Isoelectric poi...n et al., 1968) 0.312000 0.099000 0.099000 0.069000 0.095000 0.000000 0.000000 31,32,33,34,35,36,37,38,39,40 3 TMD_JMD-Segment...4,5)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.297000 0.086000 0.086000 0.077000 0.068000 0.000000 0.000000 25,26,27,28,29,30,31,32 4 TMD_JMD-Segment...5,5)-LINS030104 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Total median ac...s et al., 2003) 0.295000 0.141000 0.141000 0.115000 0.130000 0.000000 0.000000 33,34,35,36,37,38,39,40 5 TMD_JMD-Segment...5,5)-JANJ780102 ASA/Volume Buried Buried Percentage of b...n et al., 1978) 0.291000 0.130000 -0.130000 0.099000 0.124000 0.000000 0.000000 33,34,35,36,37,38,39,40 6 TMD_JMD-Segment...5,5)-ZIMJ680103 Polarity Hydrophilicity Polarity (hydrophilicity) Polarity (Zimme...n et al., 1968) 0.289000 0.178000 0.178000 0.159000 0.163000 0.000000 0.000000 33,34,35,36,37,38,39,40 7 TMD_JMD-Segment...4,5)-FUKS010106 Composition Membrane proteins (MPs) Proteins of mesophiles (INT) Interior compos...ishikawa, 2001) 0.277000 0.123000 0.123000 0.104000 0.127000 0.000000 0.000000 25,26,27,28,29,30,31,32 8 TMD_JMD-Segment...4,4)-WOLR790101 Polarity Hydrophobicity (surrounding) Hydration potential Hydrophobicity ...n et al., 1979) 0.267000 0.105000 -0.105000 0.100000 0.113000 0.000000 0.000000 31,32,33,34,35,36,37,38,39,40 9 TMD_JMD-Segment...5,5)-MIYS990104 Composition MPs (anchor) Partition energy Optimized relat...Jernigan, 1999) 0.243000 0.103000 0.103000 0.095000 0.126000 0.000002 0.000004 33,34,35,36,37,38,39,40 10 TMD_JMD-Segment...4,5)-ANDN920101 Structure-Activity Backbone-dynamics (-CH) α-CH chemical s...kbone-dynamics) alpha-CH chemic...n et al., 1992) 0.229000 0.102000 -0.102000 0.097000 0.125000 0.000009 0.000012 25,26,27,28,29,30,31,32 In the initial CPP pre-filtering step, you can either set the number of retained features using
n_pre_filteror define a percentage of initial features withpct_pre_filter(default with 5%). Additionally, adjust the maximum standard deviation allowed in the test dataset for each feature viamax_std_test:# Pre-filtering by allowing 50% with 0.5 maximum std in the test set # Create ~700 feature and filter them down to 26 features df_feat = cpp.run(labels=labels, pct_pre_filter=50, max_std_test=0.5) aa.display_df(df_feat, n_rows=10, show_shape=True)
DataFrame shape: (26, 13)
feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions 1 TMD_JMD-Segment...4,5)-ROBB760113 Conformation β-turn β-turn Information mea...n-Suzuki, 1976) 0.316000 0.137000 -0.137000 0.102000 0.108000 0.000000 0.000000 25,26,27,28,29,30,31,32 2 TMD_JMD-Segment...4,4)-ZIMJ680104 Energy Isoelectric point Isoelectric point Isoelectric poi...n et al., 1968) 0.312000 0.099000 0.099000 0.069000 0.095000 0.000000 0.000000 31,32,33,34,35,36,37,38,39,40 3 TMD_JMD-Segment...2,2)-ONEK900101 Others Unclassified (Others) ΔG values in peptides Delta G values ...-DeGrado, 1990) 0.310000 0.041000 0.041000 0.028000 0.044000 0.000000 0.000000 21,22,23,24,25,...,36,37,38,39,40 4 TMD_JMD-Segment...4,5)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.297000 0.086000 0.086000 0.077000 0.068000 0.000000 0.000000 25,26,27,28,29,30,31,32 5 TMD_JMD-Segment...5,5)-LINS030104 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Total median ac...s et al., 2003) 0.295000 0.141000 0.141000 0.115000 0.130000 0.000000 0.000001 33,34,35,36,37,38,39,40 6 TMD_JMD-Segment...5,5)-JANJ780102 ASA/Volume Buried Buried Percentage of b...n et al., 1978) 0.291000 0.130000 -0.130000 0.099000 0.124000 0.000000 0.000001 33,34,35,36,37,38,39,40 7 TMD_JMD-Segment...5,5)-ZIMJ680103 Polarity Hydrophilicity Polarity (hydrophilicity) Polarity (Zimme...n et al., 1968) 0.289000 0.178000 0.178000 0.159000 0.163000 0.000000 0.000001 33,34,35,36,37,38,39,40 8 TMD_JMD-Segment...4,5)-FUKS010106 Composition Membrane proteins (MPs) Proteins of mesophiles (INT) Interior compos...ishikawa, 2001) 0.277000 0.123000 0.123000 0.104000 0.127000 0.000000 0.000001 25,26,27,28,29,30,31,32 9 TMD_JMD-Segment...3,4)-WOLR790101 Polarity Hydrophobicity (surrounding) Hydration potential Hydrophobicity ...n et al., 1979) 0.274000 0.052000 0.052000 0.034000 0.060000 0.000000 0.000001 21,22,23,24,25,26,27,28,29,30 10 TMD_JMD-Segment...1,2)-WEBA780101 Others Mutability RF value RF value in hig...er-Lacey, 1978) 0.268000 0.042000 0.042000 0.039000 0.046000 0.000000 0.000002 1,2,3,4,5,6,7,8...,16,17,18,19,20 For the final CPP filtering step, you can use the following three parameters:
max_overlapsetting the allowed maximum positional overlap of similar features (the higher, the less strict),max_cordefining the allowed maximum Pearson correlation for scales of similar features (the higher, the less strict), andcheck_catsetting whether redundancy of scale categories should be considered or not (setting it toFalsewill result in stricter filtering since features across all categories are compared):# Disable filtering by setting max_overlap and max_cor to 1 # Create ~700 feature and filter them down to 100 features df_feat = cpp.run(labels=labels, max_overlap=1, max_cor=1) aa.display_df(df_feat, n_rows=10, show_shape=True)
DataFrame shape: (100, 13)
feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions 1 TMD_JMD-Segment...4,5)-ROBB760113 Conformation β-turn β-turn Information mea...n-Suzuki, 1976) 0.316000 0.137000 -0.137000 0.102000 0.108000 0.000000 0.000000 25,26,27,28,29,30,31,32 2 TMD_JMD-Segment...4,4)-ZIMJ680104 Energy Isoelectric point Isoelectric point Isoelectric poi...n et al., 1968) 0.312000 0.099000 0.099000 0.069000 0.095000 0.000000 0.000000 31,32,33,34,35,36,37,38,39,40 3 TMD_JMD-Segment...3,3)-ZIMJ680104 Energy Isoelectric point Isoelectric point Isoelectric poi...n et al., 1968) 0.304000 0.069000 0.069000 0.051000 0.073000 0.000000 0.000000 27,28,29,30,31,...,36,37,38,39,40 4 TMD_JMD-Segment...4,5)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.297000 0.086000 0.086000 0.077000 0.068000 0.000000 0.000000 25,26,27,28,29,30,31,32 5 TMD_JMD-Segment...5,5)-LINS030104 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Total median ac...s et al., 2003) 0.295000 0.141000 0.141000 0.115000 0.130000 0.000000 0.000000 33,34,35,36,37,38,39,40 6 TMD_JMD-Segment...2,2)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.292000 0.058000 0.058000 0.045000 0.054000 0.000000 0.000000 21,22,23,24,25,...,36,37,38,39,40 7 TMD_JMD-Segment...5,5)-JANJ780102 ASA/Volume Buried Buried Percentage of b...n et al., 1978) 0.291000 0.130000 -0.130000 0.099000 0.124000 0.000000 0.000000 33,34,35,36,37,38,39,40 8 TMD_JMD-Segment...4,4)-LINS030104 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Total median ac...s et al., 2003) 0.291000 0.127000 0.127000 0.097000 0.121000 0.000000 0.000000 31,32,33,34,35,36,37,38,39,40 9 TMD_JMD-Segment...5,5)-ZIMJ680103 Polarity Hydrophilicity Polarity (hydrophilicity) Polarity (Zimme...n et al., 1968) 0.289000 0.178000 0.178000 0.159000 0.163000 0.000000 0.000000 33,34,35,36,37,38,39,40 10 TMD_JMD-Segment...4,4)-ZIMJ680103 Polarity Hydrophilicity Polarity (hydrophilicity) Polarity (Zimme...n et al., 1968) 0.288000 0.164000 0.164000 0.135000 0.145000 0.000000 0.000000 31,32,33,34,35,36,37,38,39,40 # Perform stricter filtering by setting check_cat=False # Create ~700 feature and filter them down to 11 features df_feat = cpp.run(labels=labels, check_cat=False) aa.display_df(df_feat, n_rows=10, show_shape=True)
DataFrame shape: (11, 13)
feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions 1 TMD_JMD-Segment...4,5)-ROBB760113 Conformation β-turn β-turn Information mea...n-Suzuki, 1976) 0.316000 0.137000 -0.137000 0.102000 0.108000 0.000000 0.000000 25,26,27,28,29,30,31,32 2 TMD_JMD-Segment...4,4)-ZIMJ680104 Energy Isoelectric point Isoelectric point Isoelectric poi...n et al., 1968) 0.312000 0.099000 0.099000 0.069000 0.095000 0.000000 0.000000 31,32,33,34,35,36,37,38,39,40 3 TMD_JMD-Segment...4,5)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.297000 0.086000 0.086000 0.077000 0.068000 0.000000 0.000000 25,26,27,28,29,30,31,32 4 TMD_JMD-Segment...5,5)-LINS030104 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Total median ac...s et al., 2003) 0.295000 0.141000 0.141000 0.115000 0.130000 0.000000 0.000000 33,34,35,36,37,38,39,40 5 TMD_JMD-Segment...5,5)-JANJ780102 ASA/Volume Buried Buried Percentage of b...n et al., 1978) 0.291000 0.130000 -0.130000 0.099000 0.124000 0.000000 0.000000 33,34,35,36,37,38,39,40 6 TMD_JMD-Segment...2,2)-CEDJ970105 Composition AA composition Nuclear proteins Composition of ...o et al., 1997) 0.263000 0.062000 0.062000 0.062000 0.069000 0.000000 0.000001 21,22,23,24,25,...,36,37,38,39,40 7 TMD_JMD-Segment...5,5)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.262000 0.073000 0.073000 0.071000 0.086000 0.000000 0.000001 33,34,35,36,37,38,39,40 8 TMD_JMD-Segment...1,2)-SIMZ760101 Polarity Hydrophobicity Transfer free e...TFE) to outside Transfer free e...-Charton (1982) 0.259000 0.064000 -0.064000 0.069000 0.072000 0.000001 0.000002 1,2,3,4,5,6,7,8...,16,17,18,19,20 9 TMD_JMD-Segment...4,5)-ANDN920101 Structure-Activity Backbone-dynamics (-CH) α-CH chemical s...kbone-dynamics) alpha-CH chemic...n et al., 1992) 0.229000 0.102000 -0.102000 0.097000 0.125000 0.000009 0.000017 25,26,27,28,29,30,31,32 10 TMD_JMD-Segment...4,4)-YUTK870103 Energy Free energy (unfolding) Free energy (unfolding) Activation Gibb...i et al., 1987) 0.201000 0.084000 -0.084000 0.115000 0.118000 0.000103 0.000143 31,32,33,34,35,36,37,38,39,40 The residue positions can be adjusted using the
start,tmd_len,jmd_n_len, andjmd_c_lenparameters:# Shift positions by 10 residues df_feat = cpp.run(labels=labels, start=11) aa.display_df(df_feat, n_rows=10, show_shape=True)
DataFrame shape: (19, 13)
feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions 1 TMD_JMD-Segment...4,5)-ROBB760113 Conformation β-turn β-turn Information mea...n-Suzuki, 1976) 0.316000 0.137000 -0.137000 0.102000 0.108000 0.000000 0.000000 35,36,37,38,39,40,41,42 2 TMD_JMD-Segment...4,4)-ZIMJ680104 Energy Isoelectric point Isoelectric point Isoelectric poi...n et al., 1968) 0.312000 0.099000 0.099000 0.069000 0.095000 0.000000 0.000000 41,42,43,44,45,46,47,48,49,50 3 TMD_JMD-Segment...4,5)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.297000 0.086000 0.086000 0.077000 0.068000 0.000000 0.000000 35,36,37,38,39,40,41,42 4 TMD_JMD-Segment...5,5)-LINS030104 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Total median ac...s et al., 2003) 0.295000 0.141000 0.141000 0.115000 0.130000 0.000000 0.000000 43,44,45,46,47,48,49,50 5 TMD_JMD-Segment...5,5)-JANJ780102 ASA/Volume Buried Buried Percentage of b...n et al., 1978) 0.291000 0.130000 -0.130000 0.099000 0.124000 0.000000 0.000000 43,44,45,46,47,48,49,50 6 TMD_JMD-Segment...5,5)-ZIMJ680103 Polarity Hydrophilicity Polarity (hydrophilicity) Polarity (Zimme...n et al., 1968) 0.289000 0.178000 0.178000 0.159000 0.163000 0.000000 0.000000 43,44,45,46,47,48,49,50 7 TMD_JMD-Segment...4,5)-FUKS010106 Composition Membrane proteins (MPs) Proteins of mesophiles (INT) Interior compos...ishikawa, 2001) 0.277000 0.123000 0.123000 0.104000 0.127000 0.000000 0.000000 35,36,37,38,39,40,41,42 8 TMD_JMD-Segment...4,4)-WOLR790101 Polarity Hydrophobicity (surrounding) Hydration potential Hydrophobicity ...n et al., 1979) 0.267000 0.105000 -0.105000 0.100000 0.113000 0.000000 0.000001 41,42,43,44,45,46,47,48,49,50 9 TMD_JMD-Segment...2,2)-CEDJ970105 Composition AA composition Nuclear proteins Composition of ...o et al., 1997) 0.263000 0.062000 0.062000 0.062000 0.069000 0.000000 0.000001 31,32,33,34,35,...,46,47,48,49,50 10 TMD_JMD-Segment...5,5)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.262000 0.073000 0.073000 0.071000 0.086000 0.000000 0.000001 43,44,45,46,47,48,49,50 # Increase TMD length from 20 to 50 df_feat = cpp.run(labels=labels, tmd_len=50) aa.display_df(df_feat, n_rows=10, show_shape=True)
DataFrame shape: (19, 13)
feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions 1 TMD_JMD-Segment...4,5)-ROBB760113 Conformation β-turn β-turn Information mea...n-Suzuki, 1976) 0.316000 0.137000 -0.137000 0.102000 0.108000 0.000000 0.000000 43,44,45,46,47,...,52,53,54,55,56 2 TMD_JMD-Segment...4,4)-ZIMJ680104 Energy Isoelectric point Isoelectric point Isoelectric poi...n et al., 1968) 0.312000 0.099000 0.099000 0.069000 0.095000 0.000000 0.000000 53,54,55,56,57,...,66,67,68,69,70 3 TMD_JMD-Segment...4,5)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.297000 0.086000 0.086000 0.077000 0.068000 0.000000 0.000000 43,44,45,46,47,...,52,53,54,55,56 4 TMD_JMD-Segment...5,5)-LINS030104 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Total median ac...s et al., 2003) 0.295000 0.141000 0.141000 0.115000 0.130000 0.000000 0.000000 57,58,59,60,61,...,66,67,68,69,70 5 TMD_JMD-Segment...5,5)-JANJ780102 ASA/Volume Buried Buried Percentage of b...n et al., 1978) 0.291000 0.130000 -0.130000 0.099000 0.124000 0.000000 0.000000 57,58,59,60,61,...,66,67,68,69,70 6 TMD_JMD-Segment...5,5)-ZIMJ680103 Polarity Hydrophilicity Polarity (hydrophilicity) Polarity (Zimme...n et al., 1968) 0.289000 0.178000 0.178000 0.159000 0.163000 0.000000 0.000000 57,58,59,60,61,...,66,67,68,69,70 7 TMD_JMD-Segment...4,5)-FUKS010106 Composition Membrane proteins (MPs) Proteins of mesophiles (INT) Interior compos...ishikawa, 2001) 0.277000 0.123000 0.123000 0.104000 0.127000 0.000000 0.000000 43,44,45,46,47,...,52,53,54,55,56 8 TMD_JMD-Segment...4,4)-WOLR790101 Polarity Hydrophobicity (surrounding) Hydration potential Hydrophobicity ...n et al., 1979) 0.267000 0.105000 -0.105000 0.100000 0.113000 0.000000 0.000001 53,54,55,56,57,...,66,67,68,69,70 9 TMD_JMD-Segment...2,2)-CEDJ970105 Composition AA composition Nuclear proteins Composition of ...o et al., 1997) 0.263000 0.062000 0.062000 0.062000 0.069000 0.000000 0.000001 36,37,38,39,40,...,66,67,68,69,70 10 TMD_JMD-Segment...5,5)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.262000 0.073000 0.073000 0.071000 0.086000 0.000000 0.000001 57,58,59,60,61,...,66,67,68,69,70 Multiprocessing can be enabled by using the
n_jobsparameter, which is set to the maximum ifn_jobs=None. However, this is only recommend for more than ~1000 features per core due to potential process management overhead.import time # Run without multiprocessing time_start = time.time() df_feat = cpp.run(labels=labels, n_jobs=1) time_no_mp = round(time.time() - time_start, 2) print(f"Time without multiprocessing: {time_no_mp} seconds") # Run with multiprocessing time_start = time.time() df_feat = cpp.run(labels=labels, n_jobs=None) time_mp = round(time.time() - time_start, 2) print(f"Time with multiprocessing. {time_mp} seconds")
Time without multiprocessing: 0.09 seconds Time with multiprocessing. 2.55 seconds
- Parameters:
return_stats (
bool)