aaanalysis.CPP.run_num

CPP.run_num(dict_num_parts=None, labels=None, label_test=1, label_ref=0, n_filter=100, n_pre_filter=None, pct_pre_filter=5, max_std_test=0.2, max_overlap=0.5, max_cor=0.5, check_cat=True, parametric=False, start=1, tmd_len=20, jmd_n_len=10, jmd_c_len=10, n_jobs=None, vectorized=True, n_batches=None, return_stats=False)[source]

Numerical-mode CPP: same algorithm as run(), but per-residue values come from a pre-sliced numerical tensor (dict_num_parts) instead of an AA→scale lookup. Use for PLM embeddings, DSSP one-hots, PTM dummies, or any per-residue numerical representation.

Same pipeline (pre-filter stats, pre-filter, recompute, add_stat, redundancy filter) and same output schema as run(). The constructor-bound df_scales / df_cat provide DIMENSION NAMES + categories for the D axis of dict_num_parts (the per-AA values they would normally provide are unused — dict_num_parts is the value source).

Added in version 1.1.0.

Parameters:
  • dict_num_parts (dict[str, np.ndarray], required) – Per-part NaN-padded numerical tensors, produced by NumericalFeature.get_parts(). Each value has shape (n_samples, L_part_max, D) aligned row-for-row with self.df_parts. Keys must match self.df_parts.columns. D must equal len(self.df_scales.columns) (each D dimension names a “scale”).

  • labels (Union[Sequence[Union[int, float]], ndarray, Series]) – See run(). Same semantics, same defaults.

  • label_test (int) – See run(). Same semantics, same defaults.

  • label_ref (int) – See run(). Same semantics, same defaults.

  • n_filter (int) – See run(). Same semantics, same defaults.

  • n_pre_filter (Optional[int]) – See run(). Same semantics, same defaults.

  • pct_pre_filter (int) – See run(). Same semantics, same defaults.

  • max_std_test (float) – See run(). Same semantics, same defaults.

  • max_overlap (float) – See run(). Same semantics, same defaults.

  • max_cor (float) – See run(). Same semantics, same defaults.

  • check_cat (bool) – See run(). Same semantics, same defaults.

  • parametric (bool) – See run(). Same semantics, same defaults.

  • start (int) – See run(). Same semantics, same defaults.

  • tmd_len (int) – See run(). Same semantics, same defaults.

  • jmd_n_len (int) – See run(). Same semantics, same defaults.

  • jmd_c_len (int) – See run(). Same semantics, same defaults.

  • n_jobs (Optional[int]) – See run(). Same semantics, same defaults.

  • vectorized (bool) – See run(). Same semantics, same defaults.

  • n_batches (Optional[int]) – See run(). Same semantics, same defaults.

Returns:

df_feat – Same schema as run().

Return type:

pd.DataFrame, shape (n_features, n_feature_info)

Raises:
  • ValueError – If dict_num_parts is None (use run() for seq-mode), or if its shape / part names / D don’t align with the constructor’s self.df_parts and self.df_scales.

  • NotImplementedError – If n_batches is supplied (batched orchestration over the D axis is not yet implemented for numerical mode; pass n_batches=None).

Notes

  • Raw PLM embeddings are not directly usable — normalize them first. Per-residue values are expected in [0, 1] (the StructurePreprocessor / AnnotationPreprocessor normalization convention), since the default max_std_test=0.2 pre-filter is calibrated for that range. Raw embeddings (unbounded floats) must be passed through EmbeddingPreprocessor.encode() to obtain a [0, 1]-normalized {entry: (L, D)} dict_num before NumericalFeature.get_parts(). (EmbeddingPreprocessor.build_scales / build_cat serve the other, AA-scale path via run(); they are not a per-residue value source here.)

  • Three arms, one entry point. structure-only (dict_num from StructurePreprocessor), embedding (EmbeddingPreprocessor.encode), and fused (concatenate sources with aaanalysis.combine_dict_nums() first) all flow through get_partsrun_num — only the dict_num differs.

See also

Examples

To demonstrate the CPP().run() method, we load the DOM_GSEC example dataset (see [Breimann25a]):

import aaanalysis as aa
aa.options["verbose"] = False
df_seq = aa.load_dataset(name="DOM_GSEC")
labels = df_seq["label"].to_list()
sf = aa.SequenceFeature()
df_parts = sf.get_df_parts(df_seq=df_seq)

You just need to provide df_parts to the CPP object and run the algorithm with its respective labels using the CPP().run() method:

cpp = aa.CPP(df_parts=df_parts)
# Create >500,000 feature and filter them down to 100 features
df_feat = cpp.run(labels=labels)
aa.display_df(df_feat, n_rows=10, show_shape=True)
DataFrame shape: (100, 13)
  feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions
1 TMD_C_JMD_C-Seg...2,3)-QIAN880106 Conformation α-helix α-helix (middle) Weights for alp...ejnowski, 1988) 0.387000 0.118000 0.118000 0.068000 0.080000 0.000000 0.000000 27,28,29,30,31,32,33
2 TMD_C_JMD_C-Pat...,14)-CRAJ730103 Conformation β-turn β-turn Normalized freq...d et al., 1973) 0.377000 0.285000 -0.285000 0.164000 0.177000 0.000000 0.000000 27,31
3 TMD_C_JMD_C-Seg...6,9)-FAUJ880104 Shape Side chain length Steric parameter STERIMOL length...e et al., 1988) 0.367000 0.263000 0.263000 0.161000 0.168000 0.000000 0.000000 32,33
4 TMD_C_JMD_C-Seg...6,9)-ONEK900101 Others Unclassified (Others) ΔG values in peptides Delta G values ...-DeGrado, 1990) 0.366000 0.111000 0.111000 0.070000 0.114000 0.000000 0.000000 32,33
5 TMD_C_JMD_C-Pat...,15)-QIAN880107 Conformation α-helix α-helix (middle) Weights for alp...ejnowski, 1988) 0.363000 0.162000 0.162000 0.091000 0.118000 0.000000 0.000000 24,28,32,35
6 TMD_C_JMD_C-Seg...3,4)-HUTJ700103 Energy Entropy Entropy Entropy of form...Hutchens, 1970) 0.360000 0.187000 0.187000 0.115000 0.128000 0.000000 0.000000 31,32,33,34,35
7 TMD_C_JMD_C-Seg...2,3)-WOLS870103 Others PC 4 Principal Component 3 (Wold) Principal prope...d et al., 1987) 0.359000 0.159000 -0.159000 0.090000 0.130000 0.000000 0.000000 27,28,29,30,31,32,33
8 TMD_C_JMD_C-Pat...,12)-CRAJ730103 Conformation β-turn β-turn Normalized freq...d et al., 1973) 0.352000 0.227000 -0.227000 0.150000 0.170000 0.000000 0.000000 24,28,32
9 TMD_C_JMD_C-Seg...6,9)-MUNV940102 Energy Free energy (folding) Free energy (α-helix) Free energy in ...-Serrano, 1994) 0.350000 0.129000 -0.129000 0.079000 0.124000 0.000000 0.000000 32,33
10 TMD_C_JMD_C-Seg...3,4)-WOLS870103 Others PC 4 Principal Component 3 (Wold) Principal prope...d et al., 1987) 0.341000 0.214000 -0.214000 0.128000 0.177000 0.000000 0.000000 31,32,33,34,35

Adjust Parts, Splits, and Scales as follows:

df_parts = sf.get_df_parts(df_seq=df_seq, list_parts=["tmd_jmd"])
split_kws = sf.get_split_kws(split_types=["Segment"], n_split_min=1, n_split_max=5)
# Load one of the provided top scale datasets
df_scales = aa.load_scales(top60_n=38)
# Create ~700 feature and filter them down to 19 features
cpp = aa.CPP(df_parts=df_parts, split_kws=split_kws, df_scales=df_scales)
df_feat = cpp.run(labels=labels)
aa.display_df(df_feat, n_rows=10, show_shape=True)
DataFrame shape: (19, 13)
  feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions
1 TMD_JMD-Segment...4,5)-ROBB760113 Conformation β-turn β-turn Information mea...n-Suzuki, 1976) 0.316000 0.137000 -0.137000 0.102000 0.108000 0.000000 0.000000 25,26,27,28,29,30,31,32
2 TMD_JMD-Segment...4,4)-ZIMJ680104 Energy Isoelectric point Isoelectric point Isoelectric poi...n et al., 1968) 0.312000 0.099000 0.099000 0.069000 0.095000 0.000000 0.000000 31,32,33,34,35,36,37,38,39,40
3 TMD_JMD-Segment...4,5)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.297000 0.086000 0.086000 0.077000 0.068000 0.000000 0.000000 25,26,27,28,29,30,31,32
4 TMD_JMD-Segment...5,5)-LINS030104 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Total median ac...s et al., 2003) 0.295000 0.141000 0.141000 0.115000 0.130000 0.000000 0.000000 33,34,35,36,37,38,39,40
5 TMD_JMD-Segment...5,5)-JANJ780102 ASA/Volume Buried Buried Percentage of b...n et al., 1978) 0.291000 0.130000 -0.130000 0.099000 0.124000 0.000000 0.000000 33,34,35,36,37,38,39,40
6 TMD_JMD-Segment...5,5)-ZIMJ680103 Polarity Hydrophilicity Polarity (hydrophilicity) Polarity (Zimme...n et al., 1968) 0.289000 0.178000 0.178000 0.159000 0.163000 0.000000 0.000000 33,34,35,36,37,38,39,40
7 TMD_JMD-Segment...4,5)-FUKS010106 Composition Membrane proteins (MPs) Proteins of mesophiles (INT) Interior compos...ishikawa, 2001) 0.277000 0.123000 0.123000 0.104000 0.127000 0.000000 0.000000 25,26,27,28,29,30,31,32
8 TMD_JMD-Segment...4,4)-WOLR790101 Polarity Hydrophobicity (surrounding) Hydration potential Hydrophobicity ...n et al., 1979) 0.267000 0.105000 -0.105000 0.100000 0.113000 0.000000 0.000001 31,32,33,34,35,36,37,38,39,40
9 TMD_JMD-Segment...2,2)-CEDJ970105 Composition AA composition Nuclear proteins Composition of ...o et al., 1997) 0.263000 0.062000 0.062000 0.062000 0.069000 0.000000 0.000001 21,22,23,24,25,...,36,37,38,39,40
10 TMD_JMD-Segment...5,5)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.262000 0.073000 0.073000 0.071000 0.086000 0.000000 0.000001 33,34,35,36,37,38,39,40

The maximum number of final features can be adjusted using the n_filter (default=100) parameter. The actual number of features may be less, depending on: (a) the initial number of features generated (defined by the part-split-scale combinations), and (b) the strictness of both pre-filtering and filtering criteria.

# Create ~700 feature and filter them down to 10 features
df_feat = cpp.run(labels=labels, n_filter=10)
aa.display_df(df_feat, n_rows=10, show_shape=True)
DataFrame shape: (10, 13)
  feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions
1 TMD_JMD-Segment...4,5)-ROBB760113 Conformation β-turn β-turn Information mea...n-Suzuki, 1976) 0.316000 0.137000 -0.137000 0.102000 0.108000 0.000000 0.000000 25,26,27,28,29,30,31,32
2 TMD_JMD-Segment...4,4)-ZIMJ680104 Energy Isoelectric point Isoelectric point Isoelectric poi...n et al., 1968) 0.312000 0.099000 0.099000 0.069000 0.095000 0.000000 0.000000 31,32,33,34,35,36,37,38,39,40
3 TMD_JMD-Segment...4,5)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.297000 0.086000 0.086000 0.077000 0.068000 0.000000 0.000000 25,26,27,28,29,30,31,32
4 TMD_JMD-Segment...5,5)-LINS030104 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Total median ac...s et al., 2003) 0.295000 0.141000 0.141000 0.115000 0.130000 0.000000 0.000000 33,34,35,36,37,38,39,40
5 TMD_JMD-Segment...5,5)-JANJ780102 ASA/Volume Buried Buried Percentage of b...n et al., 1978) 0.291000 0.130000 -0.130000 0.099000 0.124000 0.000000 0.000000 33,34,35,36,37,38,39,40
6 TMD_JMD-Segment...5,5)-ZIMJ680103 Polarity Hydrophilicity Polarity (hydrophilicity) Polarity (Zimme...n et al., 1968) 0.289000 0.178000 0.178000 0.159000 0.163000 0.000000 0.000000 33,34,35,36,37,38,39,40
7 TMD_JMD-Segment...4,5)-FUKS010106 Composition Membrane proteins (MPs) Proteins of mesophiles (INT) Interior compos...ishikawa, 2001) 0.277000 0.123000 0.123000 0.104000 0.127000 0.000000 0.000000 25,26,27,28,29,30,31,32
8 TMD_JMD-Segment...4,4)-WOLR790101 Polarity Hydrophobicity (surrounding) Hydration potential Hydrophobicity ...n et al., 1979) 0.267000 0.105000 -0.105000 0.100000 0.113000 0.000000 0.000000 31,32,33,34,35,36,37,38,39,40
9 TMD_JMD-Segment...5,5)-MIYS990104 Composition MPs (anchor) Partition energy Optimized relat...Jernigan, 1999) 0.243000 0.103000 0.103000 0.095000 0.126000 0.000002 0.000004 33,34,35,36,37,38,39,40
10 TMD_JMD-Segment...4,5)-ANDN920101 Structure-Activity Backbone-dynamics (-CH) α-CH chemical s...kbone-dynamics) alpha-CH chemic...n et al., 1992) 0.229000 0.102000 -0.102000 0.097000 0.125000 0.000009 0.000012 25,26,27,28,29,30,31,32

In the initial CPP pre-filtering step, you can either set the number of retained features using n_pre_filter or define a percentage of initial features with pct_pre_filter (default with 5%). Additionally, adjust the maximum standard deviation allowed in the test dataset for each feature via max_std_test:

# Pre-filtering by allowing 50% with 0.5 maximum std in the test set
# Create ~700 feature and filter them down to 26 features
df_feat = cpp.run(labels=labels, pct_pre_filter=50, max_std_test=0.5)
aa.display_df(df_feat, n_rows=10, show_shape=True)
DataFrame shape: (26, 13)
  feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions
1 TMD_JMD-Segment...4,5)-ROBB760113 Conformation β-turn β-turn Information mea...n-Suzuki, 1976) 0.316000 0.137000 -0.137000 0.102000 0.108000 0.000000 0.000000 25,26,27,28,29,30,31,32
2 TMD_JMD-Segment...4,4)-ZIMJ680104 Energy Isoelectric point Isoelectric point Isoelectric poi...n et al., 1968) 0.312000 0.099000 0.099000 0.069000 0.095000 0.000000 0.000000 31,32,33,34,35,36,37,38,39,40
3 TMD_JMD-Segment...2,2)-ONEK900101 Others Unclassified (Others) ΔG values in peptides Delta G values ...-DeGrado, 1990) 0.310000 0.041000 0.041000 0.028000 0.044000 0.000000 0.000000 21,22,23,24,25,...,36,37,38,39,40
4 TMD_JMD-Segment...4,5)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.297000 0.086000 0.086000 0.077000 0.068000 0.000000 0.000000 25,26,27,28,29,30,31,32
5 TMD_JMD-Segment...5,5)-LINS030104 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Total median ac...s et al., 2003) 0.295000 0.141000 0.141000 0.115000 0.130000 0.000000 0.000001 33,34,35,36,37,38,39,40
6 TMD_JMD-Segment...5,5)-JANJ780102 ASA/Volume Buried Buried Percentage of b...n et al., 1978) 0.291000 0.130000 -0.130000 0.099000 0.124000 0.000000 0.000001 33,34,35,36,37,38,39,40
7 TMD_JMD-Segment...5,5)-ZIMJ680103 Polarity Hydrophilicity Polarity (hydrophilicity) Polarity (Zimme...n et al., 1968) 0.289000 0.178000 0.178000 0.159000 0.163000 0.000000 0.000001 33,34,35,36,37,38,39,40
8 TMD_JMD-Segment...4,5)-FUKS010106 Composition Membrane proteins (MPs) Proteins of mesophiles (INT) Interior compos...ishikawa, 2001) 0.277000 0.123000 0.123000 0.104000 0.127000 0.000000 0.000001 25,26,27,28,29,30,31,32
9 TMD_JMD-Segment...3,4)-WOLR790101 Polarity Hydrophobicity (surrounding) Hydration potential Hydrophobicity ...n et al., 1979) 0.274000 0.052000 0.052000 0.034000 0.060000 0.000000 0.000001 21,22,23,24,25,26,27,28,29,30
10 TMD_JMD-Segment...1,2)-WEBA780101 Others Mutability RF value RF value in hig...er-Lacey, 1978) 0.268000 0.042000 0.042000 0.039000 0.046000 0.000000 0.000002 1,2,3,4,5,6,7,8...,16,17,18,19,20

For the final CPP filtering step, you can use the following three parameters: max_overlap setting the allowed maximum positional overlap of similar features (the higher, the less strict), max_cor defining the allowed maximum Pearson correlation for scales of similar features (the higher, the less strict), and check_cat setting whether redundancy of scale categories should be considered or not (setting it to False will result in stricter filtering since features across all categories are compared):

# Disable filtering by setting max_overlap and max_cor to 1
# Create ~700 feature and filter them down to 100 features
df_feat = cpp.run(labels=labels, max_overlap=1, max_cor=1)
aa.display_df(df_feat, n_rows=10, show_shape=True)
DataFrame shape: (100, 13)
  feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions
1 TMD_JMD-Segment...4,5)-ROBB760113 Conformation β-turn β-turn Information mea...n-Suzuki, 1976) 0.316000 0.137000 -0.137000 0.102000 0.108000 0.000000 0.000000 25,26,27,28,29,30,31,32
2 TMD_JMD-Segment...4,4)-ZIMJ680104 Energy Isoelectric point Isoelectric point Isoelectric poi...n et al., 1968) 0.312000 0.099000 0.099000 0.069000 0.095000 0.000000 0.000000 31,32,33,34,35,36,37,38,39,40
3 TMD_JMD-Segment...3,3)-ZIMJ680104 Energy Isoelectric point Isoelectric point Isoelectric poi...n et al., 1968) 0.304000 0.069000 0.069000 0.051000 0.073000 0.000000 0.000000 27,28,29,30,31,...,36,37,38,39,40
4 TMD_JMD-Segment...4,5)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.297000 0.086000 0.086000 0.077000 0.068000 0.000000 0.000000 25,26,27,28,29,30,31,32
5 TMD_JMD-Segment...5,5)-LINS030104 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Total median ac...s et al., 2003) 0.295000 0.141000 0.141000 0.115000 0.130000 0.000000 0.000000 33,34,35,36,37,38,39,40
6 TMD_JMD-Segment...2,2)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.292000 0.058000 0.058000 0.045000 0.054000 0.000000 0.000000 21,22,23,24,25,...,36,37,38,39,40
7 TMD_JMD-Segment...5,5)-JANJ780102 ASA/Volume Buried Buried Percentage of b...n et al., 1978) 0.291000 0.130000 -0.130000 0.099000 0.124000 0.000000 0.000000 33,34,35,36,37,38,39,40
8 TMD_JMD-Segment...4,4)-LINS030104 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Total median ac...s et al., 2003) 0.291000 0.127000 0.127000 0.097000 0.121000 0.000000 0.000000 31,32,33,34,35,36,37,38,39,40
9 TMD_JMD-Segment...5,5)-ZIMJ680103 Polarity Hydrophilicity Polarity (hydrophilicity) Polarity (Zimme...n et al., 1968) 0.289000 0.178000 0.178000 0.159000 0.163000 0.000000 0.000000 33,34,35,36,37,38,39,40
10 TMD_JMD-Segment...4,4)-ZIMJ680103 Polarity Hydrophilicity Polarity (hydrophilicity) Polarity (Zimme...n et al., 1968) 0.288000 0.164000 0.164000 0.135000 0.145000 0.000000 0.000000 31,32,33,34,35,36,37,38,39,40
# Perform stricter filtering by setting check_cat=False
# Create ~700 feature and filter them down to 11 features
df_feat = cpp.run(labels=labels, check_cat=False)
aa.display_df(df_feat, n_rows=10, show_shape=True)
DataFrame shape: (11, 13)
  feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions
1 TMD_JMD-Segment...4,5)-ROBB760113 Conformation β-turn β-turn Information mea...n-Suzuki, 1976) 0.316000 0.137000 -0.137000 0.102000 0.108000 0.000000 0.000000 25,26,27,28,29,30,31,32
2 TMD_JMD-Segment...4,4)-ZIMJ680104 Energy Isoelectric point Isoelectric point Isoelectric poi...n et al., 1968) 0.312000 0.099000 0.099000 0.069000 0.095000 0.000000 0.000000 31,32,33,34,35,36,37,38,39,40
3 TMD_JMD-Segment...4,5)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.297000 0.086000 0.086000 0.077000 0.068000 0.000000 0.000000 25,26,27,28,29,30,31,32
4 TMD_JMD-Segment...5,5)-LINS030104 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Total median ac...s et al., 2003) 0.295000 0.141000 0.141000 0.115000 0.130000 0.000000 0.000000 33,34,35,36,37,38,39,40
5 TMD_JMD-Segment...5,5)-JANJ780102 ASA/Volume Buried Buried Percentage of b...n et al., 1978) 0.291000 0.130000 -0.130000 0.099000 0.124000 0.000000 0.000000 33,34,35,36,37,38,39,40
6 TMD_JMD-Segment...2,2)-CEDJ970105 Composition AA composition Nuclear proteins Composition of ...o et al., 1997) 0.263000 0.062000 0.062000 0.062000 0.069000 0.000000 0.000001 21,22,23,24,25,...,36,37,38,39,40
7 TMD_JMD-Segment...5,5)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.262000 0.073000 0.073000 0.071000 0.086000 0.000000 0.000001 33,34,35,36,37,38,39,40
8 TMD_JMD-Segment...1,2)-SIMZ760101 Polarity Hydrophobicity Transfer free e...TFE) to outside Transfer free e...-Charton (1982) 0.259000 0.064000 -0.064000 0.069000 0.072000 0.000001 0.000002 1,2,3,4,5,6,7,8...,16,17,18,19,20
9 TMD_JMD-Segment...4,5)-ANDN920101 Structure-Activity Backbone-dynamics (-CH) α-CH chemical s...kbone-dynamics) alpha-CH chemic...n et al., 1992) 0.229000 0.102000 -0.102000 0.097000 0.125000 0.000009 0.000017 25,26,27,28,29,30,31,32
10 TMD_JMD-Segment...4,4)-YUTK870103 Energy Free energy (unfolding) Free energy (unfolding) Activation Gibb...i et al., 1987) 0.201000 0.084000 -0.084000 0.115000 0.118000 0.000103 0.000143 31,32,33,34,35,36,37,38,39,40

The residue positions can be adjusted using the start, tmd_len, jmd_n_len, and jmd_c_len parameters:

# Shift positions by 10 residues
df_feat = cpp.run(labels=labels, start=11)
aa.display_df(df_feat, n_rows=10, show_shape=True)
DataFrame shape: (19, 13)
  feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions
1 TMD_JMD-Segment...4,5)-ROBB760113 Conformation β-turn β-turn Information mea...n-Suzuki, 1976) 0.316000 0.137000 -0.137000 0.102000 0.108000 0.000000 0.000000 35,36,37,38,39,40,41,42
2 TMD_JMD-Segment...4,4)-ZIMJ680104 Energy Isoelectric point Isoelectric point Isoelectric poi...n et al., 1968) 0.312000 0.099000 0.099000 0.069000 0.095000 0.000000 0.000000 41,42,43,44,45,46,47,48,49,50
3 TMD_JMD-Segment...4,5)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.297000 0.086000 0.086000 0.077000 0.068000 0.000000 0.000000 35,36,37,38,39,40,41,42
4 TMD_JMD-Segment...5,5)-LINS030104 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Total median ac...s et al., 2003) 0.295000 0.141000 0.141000 0.115000 0.130000 0.000000 0.000000 43,44,45,46,47,48,49,50
5 TMD_JMD-Segment...5,5)-JANJ780102 ASA/Volume Buried Buried Percentage of b...n et al., 1978) 0.291000 0.130000 -0.130000 0.099000 0.124000 0.000000 0.000000 43,44,45,46,47,48,49,50
6 TMD_JMD-Segment...5,5)-ZIMJ680103 Polarity Hydrophilicity Polarity (hydrophilicity) Polarity (Zimme...n et al., 1968) 0.289000 0.178000 0.178000 0.159000 0.163000 0.000000 0.000000 43,44,45,46,47,48,49,50
7 TMD_JMD-Segment...4,5)-FUKS010106 Composition Membrane proteins (MPs) Proteins of mesophiles (INT) Interior compos...ishikawa, 2001) 0.277000 0.123000 0.123000 0.104000 0.127000 0.000000 0.000000 35,36,37,38,39,40,41,42
8 TMD_JMD-Segment...4,4)-WOLR790101 Polarity Hydrophobicity (surrounding) Hydration potential Hydrophobicity ...n et al., 1979) 0.267000 0.105000 -0.105000 0.100000 0.113000 0.000000 0.000001 41,42,43,44,45,46,47,48,49,50
9 TMD_JMD-Segment...2,2)-CEDJ970105 Composition AA composition Nuclear proteins Composition of ...o et al., 1997) 0.263000 0.062000 0.062000 0.062000 0.069000 0.000000 0.000001 31,32,33,34,35,...,46,47,48,49,50
10 TMD_JMD-Segment...5,5)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.262000 0.073000 0.073000 0.071000 0.086000 0.000000 0.000001 43,44,45,46,47,48,49,50
# Increase TMD length from 20 to 50
df_feat = cpp.run(labels=labels, tmd_len=50)
aa.display_df(df_feat, n_rows=10, show_shape=True)
DataFrame shape: (19, 13)
  feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions
1 TMD_JMD-Segment...4,5)-ROBB760113 Conformation β-turn β-turn Information mea...n-Suzuki, 1976) 0.316000 0.137000 -0.137000 0.102000 0.108000 0.000000 0.000000 43,44,45,46,47,...,52,53,54,55,56
2 TMD_JMD-Segment...4,4)-ZIMJ680104 Energy Isoelectric point Isoelectric point Isoelectric poi...n et al., 1968) 0.312000 0.099000 0.099000 0.069000 0.095000 0.000000 0.000000 53,54,55,56,57,...,66,67,68,69,70
3 TMD_JMD-Segment...4,5)-KANM800103 Conformation α-helix α-helix Average relativ...sa-Tsong, 1980) 0.297000 0.086000 0.086000 0.077000 0.068000 0.000000 0.000000 43,44,45,46,47,...,52,53,54,55,56
4 TMD_JMD-Segment...5,5)-LINS030104 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Total median ac...s et al., 2003) 0.295000 0.141000 0.141000 0.115000 0.130000 0.000000 0.000000 57,58,59,60,61,...,66,67,68,69,70
5 TMD_JMD-Segment...5,5)-JANJ780102 ASA/Volume Buried Buried Percentage of b...n et al., 1978) 0.291000 0.130000 -0.130000 0.099000 0.124000 0.000000 0.000000 57,58,59,60,61,...,66,67,68,69,70
6 TMD_JMD-Segment...5,5)-ZIMJ680103 Polarity Hydrophilicity Polarity (hydrophilicity) Polarity (Zimme...n et al., 1968) 0.289000 0.178000 0.178000 0.159000 0.163000 0.000000 0.000000 57,58,59,60,61,...,66,67,68,69,70
7 TMD_JMD-Segment...4,5)-FUKS010106 Composition Membrane proteins (MPs) Proteins of mesophiles (INT) Interior compos...ishikawa, 2001) 0.277000 0.123000 0.123000 0.104000 0.127000 0.000000 0.000000 43,44,45,46,47,...,52,53,54,55,56
8 TMD_JMD-Segment...4,4)-WOLR790101 Polarity Hydrophobicity (surrounding) Hydration potential Hydrophobicity ...n et al., 1979) 0.267000 0.105000 -0.105000 0.100000 0.113000 0.000000 0.000001 53,54,55,56,57,...,66,67,68,69,70
9 TMD_JMD-Segment...2,2)-CEDJ970105 Composition AA composition Nuclear proteins Composition of ...o et al., 1997) 0.263000 0.062000 0.062000 0.062000 0.069000 0.000000 0.000001 36,37,38,39,40,...,66,67,68,69,70
10 TMD_JMD-Segment...5,5)-MITS020101 Polarity Amphiphilicity Amphiphilicity Amphiphilicity ...u et al., 2002) 0.262000 0.073000 0.073000 0.071000 0.086000 0.000000 0.000001 57,58,59,60,61,...,66,67,68,69,70

Multiprocessing can be enabled by using the n_jobs parameter, which is set to the maximum if n_jobs=None. However, this is only recommend for more than ~1000 features per core due to potential process management overhead.

import time

# Run without multiprocessing
time_start = time.time()
df_feat = cpp.run(labels=labels, n_jobs=1)
time_no_mp = round(time.time() - time_start, 2)
print(f"Time without multiprocessing: {time_no_mp} seconds")

# Run with multiprocessing
time_start = time.time()
df_feat = cpp.run(labels=labels, n_jobs=None)
time_mp = round(time.time() - time_start, 2)
print(f"Time with multiprocessing. {time_mp} seconds")
Time without multiprocessing: 0.09 seconds
Time with multiprocessing. 2.55 seconds
Parameters:

return_stats (bool)