aaanalysis.CPPGrid
- class aaanalysis.CPPGrid(df_seq=None, labels=None, dict_num=None, accept_gaps=False, verbose=True, random_state=None, n_jobs=-1, backend='threads')[source]
Bases:
ToolGrid-style sweep over CPP configurations (Tool) [Breimann25a].
Runs the full parts → splits → scales → run pipeline across a Cartesian grid of configurations so a sweep needs one call instead of many manual
get_df_parts/get_split_kws/CPPconstructions. The dataset (df_seq+labels, plusdict_numfor the numerical arm) is bound at construction;run()takes four stage-grouped parameter dictionaries whose list-valued entries are swept.Added in version 1.1.0.
Notes
Inside each configuration
CPP.run/run_numruns serially (n_jobs=1); the grid is parallelized across configurations to avoid nested oversubscription.The default
backend="threads"sharesdf_seq/df_scalesin-process (no dataframe serialization, and it sidesteps the Python 3.14 / macOS__main__-guard spawn footgun). Passbackend="loky"for process-based parallelism.
After
run(), the feature tables and the sweep summary are also kept on the instance aslist_df_feat_anddf_params_(aligned by row index), andeval()scores the configurations and returns them best-first.See also
- Parameters:
Methods
eval([sort_by, ascending])Score the swept configurations and return
df_paramsjoined to per-config quality, best-first.run([params_parts, params_split, ...])Run the configuration grid and return per-combo feature tables plus a sweep summary.
- __init__(df_seq=None, labels=None, dict_num=None, accept_gaps=False, verbose=True, random_state=None, n_jobs=-1, backend='threads')[source]
- Parameters:
df_seq (pd.DataFrame, shape (n_samples, n_seq_info)) – DataFrame containing an
entrycolumn with unique protein identifiers and asequencecolumn with full protein sequences (any format accepted bySequenceFeature.get_df_parts()).labels (array-like, shape (n_samples,)) – Class labels aligned to the resulting
df_partsrows (test vs reference).dict_num (dict[str, np.ndarray], optional) – Mapping
entry -> (L, D)per-residue tensor. If given, the grid runs the numerical arm (NumericalFeature.get_parts→CPP.run_num).accept_gaps (bool, default=False) – Whether to accept gaps when assigning scale values.
verbose (bool, default=True) – If
True, enable verbose output.random_state (int, optional) – Seed forwarded to each
CPPfor reproducibility.n_jobs (int, default=-1) – Number of workers used across configurations (
-1= all cores).backend ({'threads', 'loky'}, default='threads') – Joblib backend used across configurations.