CPPGrid.run

CPPGrid.run(params_parts=None, params_split=None, params_scales=None, params_cpp=None)[source]

Run the configuration grid and return per-combo feature tables plus a sweep summary.

Expands the four stage-grouped parameter dicts into a Cartesian product of configurations and runs the full parts → splits → scales → CPP.run() pipeline for each one in parallel. Configurations that share all settings except n_filter are executed once at the largest value and sliced, avoiding redundant work.

Added in version 1.1.0.

Parameters:

params_parts (dict, optional) – get_df_parts / get_parts kwargs (tmd_len, jmd_n_len, jmd_c_len, list_parts, …). List-valued entries are swept.
params_split (dict, optional) – SequenceFeature.get_split_kws kwargs (split_types, n_split_max, len_max, steps_pattern, …). List-valued entries are swept.
params_scales (pd.DataFrame or list of pd.DataFrame, optional) – A single df_scales or a list of df_scales to sweep. df_cat is resolved internally per scale set.
params_cpp (dict, optional) – CPP.run / run_num kwargs (n_filter, max_std_test, max_overlap, max_cor, …). List-valued entries are swept.

Returns:

list_df_feat (list of pd.DataFrame or None) – One feature table per configuration (None where that configuration raised at run time), aligned to df_params rows in itertools.product order.
df_params (pd.DataFrame) – One row per configuration describing it: scalar axes hold the literal value, object axes (df_scales and any list-valued knob) hold the position index into their candidate list, plus n_warnings and n_errors counts.

Notes

List = swept axis. A list/tuple value is swept element-wise; a scalar is fixed. To sweep a knob that is itself list-valued (steps_pattern, list_parts) wrap each candidate (steps_pattern=[[3, 4], [2, 5]]); to use one such list as a single fixed value, wrap it once (steps_pattern=[[3, 4]]). Passing a flat list for these knobs (steps_pattern=[3, 4]) is swept as two single values and emits a UserWarning — almost always a mistake.
Results are also stored on the instance (list_df_feat_, df_params_); eval() ranks the configurations best-first.
n_warnings is derived from each run’s filter-funnel counts (sparse-config and filter-shortfall conditions); n_errors counts configurations that raised.
Smart sweeping (no redundant CPP runs). Sweeping n_filter does not re-run CPP per value: configurations that differ only in n_filter run CPP once at the largest value, and the smaller ones are exact head(n) slices (the redundancy filter is a greedy top-down pass, so the top-n is invariant). df_parts are built once per parts-config and split_kws once per split-config, then reused across the grid; the D3 scale-lookup LRU is reused across configs sharing a df_scales.

Examples

CPPGrid runs a grid sweep of CPP configurations in one call. The dataset (df_seq + labels) is bound at construction; :meth:CPPGrid.run takes stage-grouped parameter dicts whose list-valued entries are swept (here two n_filter values). It returns one feature table per configuration plus a df_params summary (one row per configuration).

import aaanalysis as aa
aa.options["verbose"] = False

df_seq = aa.load_dataset(name="DOM_GSEC", n=10)
labels = df_seq["label"].to_list()
cppg = aa.CPPGrid(df_seq=df_seq, labels=labels, n_jobs=1, random_state=0)
list_df_feat, df_params = cppg.run(params_cpp={"n_filter": [10, 25]})
df_params

	n_filter	df_scales	n_warnings	n_errors
0	10	0	0	0
1	25	0	0	0

Each row of df_params aligns by index to list_df_feat (and to grid.df_params_ / grid.list_df_feat_). To sweep a knob that is itself list-valued (steps_pattern, list_parts) wrap each candidate, e.g. params_split={"steps_pattern": [[3, 4], [2, 5]]}.

list_df_feat[0].head()

	feature	category	subcategory	scale_name	scale_description	abs_auc	abs_mean_dif	mean_dif	std_test	std_ref	p_val_mann_whitney	p_val_fdr_bh	positions
0	JMD_N_TMD_N-Segment(1,10)-ZIMJ680101	Polarity	Hydrophobicity	Hydrophobicity	Hydrophobicity (Zimmerman et al., 1968)	0.500	0.361	0.361	0.156	0.150	0.000157	0.080663	1,2
1	JMD_N_TMD_N-Pattern(C,2,5,8,12)-PALJ810110	Conformation	β-sheet	β-sheet	Normalized frequency of beta-sheet in all-beta...	0.470	0.233	-0.233	0.092	0.095	0.000381	0.080663	9,13,16,19
2	TMD-Pattern(N,1,5,8,11)-PALJ810110	Conformation	β-sheet	β-sheet	Normalized frequency of beta-sheet in all-beta...	0.470	0.233	-0.233	0.092	0.095	0.000381	0.080663	11,15,18,21
3	TMD_C_JMD_C-Pattern(N,4,8,12)-TANS770105	Conformation	β-turn (C-term)	β-turn (3rd residue)	Normalized frequency of chain reversal S (Tana...	0.470	0.230	-0.230	0.061	0.111	0.000381	0.080663	24,28,32
4	TMD_C_JMD_C-Pattern(C,8,12,15)-AURR980102	Conformation	Linker (6-14 AA)	α-helix (N-terminal, outside)	Normalized positional residue frequency at hel...	0.465	0.189	0.189	0.054	0.099	0.000440	0.080663	26,29,33

Further parameters. CPPGrid.run also accepts: params_parts — get_df_parts / get_parts kwargs (tmd_len, jmd_n_len, jmd_c_len, list_parts, ..); params_scales — A single df_scales or a list of df_scales to sweep.

# Further parameters: parts geometry (params_parts), split settings (params_split), and the
# scale set(s) to sweep (params_scales). A list value is swept; here two scale sets are compared.
df_scales_a = aa.load_scales()
df_scales_b = aa.load_scales(top60_n=38)
list_df_feat, df_params = cppg.run(
    params_parts={"jmd_n_len": 8, "jmd_c_len": 8},
    params_split={"n_split_max": 3},
    params_scales=[df_scales_a, df_scales_b],
)
aa.display_df(df_params, n_rows=10, show_shape=True)

DataFrame shape: (2, 6)

	jmd_n_len	jmd_c_len	n_split_max	df_scales	n_warnings	n_errors
1	8	8	3	0	0	0
2	8	8	3	1	0	0