aaanalysis.CPPGrid

class aaanalysis.CPPGrid(df_seq=None, labels=None, dict_num=None, accept_gaps=False, verbose=True, random_state=None, n_jobs=-1, backend='threads')[source]

Bases: Tool

Grid-style sweep over CPP configurations (Tool) [Breimann25a].

Runs the full parts → splits → scales → run pipeline across a Cartesian grid of configurations so a sweep needs one call instead of many manual get_df_parts / get_split_kws / CPP constructions. The dataset (df_seq + labels, plus dict_num for the numerical arm) is bound at construction; run() takes four stage-grouped parameter dictionaries whose list-valued entries are swept.

Added in version 1.1.0.

Notes

  • Inside each configuration CPP.run / run_num runs serially (n_jobs=1); the grid is parallelized across configurations to avoid nested oversubscription.

  • The default backend="threads" shares df_seq / df_scales in-process (no dataframe serialization, and it sidesteps the Python 3.14 / macOS __main__-guard spawn footgun). Pass backend="loky" for process-based parallelism.

After run(), the feature tables and the sweep summary are also kept on the instance as list_df_feat_ and df_params_ (aligned by row index), and eval() scores the configurations and returns them best-first.

See also

  • CPP: the per-configuration engine this class orchestrates.

  • eval(): score the swept configurations and rank them best-first.

Parameters:

Methods

eval([sort_by, ascending])

Score the swept configurations and return df_params joined to per-config quality, best-first.

run([params_parts, params_split, ...])

Run the configuration grid and return per-combo feature tables plus a sweep summary.

__init__(df_seq=None, labels=None, dict_num=None, accept_gaps=False, verbose=True, random_state=None, n_jobs=-1, backend='threads')[source]
Parameters:
  • df_seq (pd.DataFrame, shape (n_samples, n_seq_info)) – DataFrame containing an entry column with unique protein identifiers and a sequence column with full protein sequences (any format accepted by SequenceFeature.get_df_parts()).

  • labels (array-like, shape (n_samples,)) – Class labels aligned to the resulting df_parts rows (test vs reference).

  • dict_num (dict[str, np.ndarray], optional) – Mapping entry -> (L, D) per-residue tensor. If given, the grid runs the numerical arm (NumericalFeature.get_partsCPP.run_num).

  • accept_gaps (bool, default=False) – Whether to accept gaps when assigning scale values.

  • verbose (bool, default=True) – If True, enable verbose output.

  • random_state (int, optional) – Seed forwarded to each CPP for reproducibility.

  • n_jobs (int, default=-1) – Number of workers used across configurations (-1 = all cores).

  • backend ({'threads', 'loky'}, default='threads') – Joblib backend used across configurations.