aaanalysis.CPPGrid.eval

CPPGrid.eval(sort_by='avg_ABS_AUC', ascending=None)[source]

Score the swept configurations and return df_params joined to per-config quality, best-first.

Aggregates each configuration’s feature table (list_df_feat_) into the same discriminative-power columns CPP.eval() reports — avg_ABS_AUC is the mean of the per-feature abs_auc in that df_feat — and joins them onto df_params. The result is sorted so the best configuration is df.iloc[0]. Configurations that errored (df_feat is None) get NaN quality and sort last.

Parameters:
  • sort_by (str, default='avg_ABS_AUC') – Quality column to rank by. One of the added columns (avg_ABS_AUC, avg_abs_mean_dif, n_features) or any existing df_params column.

  • ascending (bool, optional) – Sort direction. None (default) picks the sensible direction: descending for the higher-is-better metrics (avg_ABS_AUC, avg_abs_mean_dif), ascending for everything else.

Returns:

df_evaldf_params with appended quality columns, sorted best-first. One row per configuration; the original product-order index is preserved so self.list_df_feat_[i] still maps to row label i.

Return type:

pd.DataFrame

Notes

  • Call run() first; otherwise a RuntimeError is raised.

  • Redundancy (n_clusters) is not computed here — grid configurations can use different df_parts / df_scales, so the per-set clustering that CPP.eval() performs is not well-defined across the sweep. Use CPP.eval() on a single df_feat if you need it.

See also

  • run(): produces the list_df_feat_ / df_params_ this consumes.

  • CPP.eval(): the per-configuration evaluator whose avg_ABS_AUC this matches.

Examples

After :meth:CPPGrid.run, :meth:CPPGrid.eval scores every configuration and returns df_params joined to per-configuration quality columns (avg_ABS_AUC is the mean of each feature table’s abs_auc), sorted best-first so df.iloc[0] is the best configuration.

import aaanalysis as aa
aa.options["verbose"] = False

df_seq = aa.load_dataset(name="DOM_GSEC", n=10)
labels = df_seq["label"].to_list()
grid = aa.CPPGrid(df_seq=df_seq, labels=labels, n_jobs=1, random_state=0)
grid.run(params_cpp={"n_filter": [10, 25]})
df_eval = grid.eval(sort_by="avg_ABS_AUC")
df_eval
n_filter df_scales n_warnings n_errors avg_ABS_AUC avg_abs_mean_dif n_features
0 10 0 0 0 0.4680 0.23950 10
1 25 0 0 0 0.4548 0.21852 25

The best configuration’s feature table is then grid.list_df_feat_[df_eval.index[0]].

best_i = df_eval.index[0]
grid.list_df_feat_[best_i].head()
feature category subcategory scale_name scale_description abs_auc abs_mean_dif mean_dif std_test std_ref p_val_mann_whitney p_val_fdr_bh positions
0 JMD_N_TMD_N-Segment(1,10)-ZIMJ680101 Polarity Hydrophobicity Hydrophobicity Hydrophobicity (Zimmerman et al., 1968) 0.500 0.361 0.361 0.156 0.150 0.000157 1.0 1,2
1 JMD_N_TMD_N-Pattern(C,2,5,8,12)-PALJ810110 Conformation β-sheet β-sheet Normalized frequency of beta-sheet in all-beta... 0.470 0.233 -0.233 0.092 0.095 0.000381 1.0 9,13,16,19
2 TMD-Pattern(N,1,5,8,11)-PALJ810110 Conformation β-sheet β-sheet Normalized frequency of beta-sheet in all-beta... 0.470 0.233 -0.233 0.092 0.095 0.000381 1.0 11,15,18,21
3 TMD_C_JMD_C-Pattern(N,4,8,12)-TANS770105 Conformation β-turn (C-term) β-turn (3rd residue) Normalized frequency of chain reversal S (Tana... 0.470 0.230 -0.230 0.061 0.111 0.000381 1.0 24,28,32
4 TMD_C_JMD_C-Pattern(C,8,12,15)-AURR980102 Conformation Linker (6-14 AA) α-helix (N-terminal, outside) Normalized positional residue frequency at hel... 0.465 0.189 0.189 0.054 0.099 0.000440 1.0 26,29,33