aaanalysis.CPPGrid.run

CPPGrid.run(params_parts=None, params_split=None, params_scales=None, params_cpp=None)[source]

Run the configuration grid and return per-combo feature tables plus a sweep summary.

Parameters:
  • params_parts (dict, optional) – get_df_parts / get_parts kwargs (tmd_len, jmd_n_len, jmd_c_len, list_parts, …). List-valued entries are swept.

  • params_split (dict, optional) – SequenceFeature.get_split_kws kwargs (split_types, n_split_max, len_max, steps_pattern, …). List-valued entries are swept.

  • params_scales (pd.DataFrame or list of pd.DataFrame, optional) – A single df_scales or a list of df_scales to sweep. df_cat is resolved internally per scale set.

  • params_cpp (dict, optional) – CPP.run / run_num kwargs (n_filter, max_std_test, max_overlap, max_cor, …). List-valued entries are swept.

Returns:

  • list_df_feat (list of pd.DataFrame or None) – One feature table per configuration (None where that configuration raised at run time), aligned to df_params rows in itertools.product order.

  • df_params (pd.DataFrame) – One row per configuration describing it: scalar axes hold the literal value, object axes (df_scales and any list-valued knob) hold the position index into their candidate list, plus n_warnings and n_errors counts.

Notes

  • List = swept axis. A list/tuple value is swept element-wise; a scalar is fixed. To sweep a knob that is itself list-valued (steps_pattern, list_parts) wrap each candidate (steps_pattern=[[3, 4], [2, 5]]); to use one such list as a single fixed value, wrap it once (steps_pattern=[[3, 4]]). Passing a flat list for these knobs (steps_pattern=[3, 4]) is swept as two single values and emits a UserWarning — almost always a mistake.

  • Results are also stored on the instance (list_df_feat_, df_params_); eval() ranks the configurations best-first.

  • n_warnings is derived from each run’s filter-funnel counts (sparse-config and filter-shortfall conditions); n_errors counts configurations that raised.

  • Smart sweeping (no redundant CPP runs). Sweeping n_filter does not re-run CPP per value: configurations that differ only in n_filter run CPP once at the largest value, and the smaller ones are exact head(n) slices (the redundancy filter is a greedy top-down pass, so the top-n is invariant). df_parts are built once per parts-config and split_kws once per split-config, then reused across the grid; the D3 scale-lookup LRU is reused across configs sharing a df_scales.

Examples

CPPGrid runs a grid sweep of CPP configurations in one call. The dataset (df_seq + labels) is bound at construction; :meth:CPPGrid.run takes stage-grouped parameter dicts whose list-valued entries are swept (here two n_filter values). It returns one feature table per configuration plus a df_params summary (one row per configuration).

import aaanalysis as aa
aa.options["verbose"] = False

df_seq = aa.load_dataset(name="DOM_GSEC", n=10)
labels = df_seq["label"].to_list()
grid = aa.CPPGrid(df_seq=df_seq, labels=labels, n_jobs=1, random_state=0)
list_df_feat, df_params = grid.run(params_cpp={"n_filter": [10, 25]})
df_params
n_filter df_scales n_warnings n_errors
0 10 0 0 0
1 25 0 0 0

Each row of df_params aligns by index to list_df_feat (and to grid.df_params_ / grid.list_df_feat_). To sweep a knob that is itself list-valued (steps_pattern, list_parts) wrap each candidate, e.g. params_split={"steps_pattern": [[3, 4], [2, 5]]}.

list_df_feat[0].head()
feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions
0 JMD_N_TMD_N-Segment(1,10)-ZIMJ680101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity (Zimmerman et al., 1968) 0.500 0.361 0.361 0.156 0.150 0.000157 1.0 1,2
1 JMD_N_TMD_N-Pattern(C,2,5,8,12)-PALJ810110 Conformation β-sheet β-sheet Normalized frequency of beta-sheet in all-beta... 0.470 0.233 -0.233 0.092 0.095 0.000381 1.0 9,13,16,19
2 TMD-Pattern(N,1,5,8,11)-PALJ810110 Conformation β-sheet β-sheet Normalized frequency of beta-sheet in all-beta... 0.470 0.233 -0.233 0.092 0.095 0.000381 1.0 11,15,18,21
3 TMD_C_JMD_C-Pattern(N,4,8,12)-TANS770105 Conformation β-turn (C-term) β-turn (3rd residue) Normalized frequency of chain reversal S (Tana... 0.470 0.230 -0.230 0.061 0.111 0.000381 1.0 24,28,32
4 TMD_C_JMD_C-Pattern(C,8,12,15)-AURR980102 Conformation Linker (6-14 AA) α-helix (N-terminal, outside) Normalized positional residue frequency at hel... 0.465 0.189 0.189 0.054 0.099 0.000440 1.0 26,29,33