aaanalysis.StructurePreprocessor.encode_dssp

StructurePreprocessor.encode_dssp(df_seq=None, pdb_folder=None, features=None, ss_mode='ss3', gap_handling='pad', on_failure='nan', return_df=False, verbose=None)[source]

Run DSSP and the per-feature encoders to build a [0, 1]-normalized dict_dssp.

Parameters:
  • df_seq (pd.DataFrame, shape (n_samples, n_seq_info)) – DataFrame containing an entry column with unique protein identifiers and a sequence column with full protein sequences. Pre-computed DSSP columns (from get_dssp()) are reused if present; otherwise DSSP is run inline.

  • pdb_folder (str or pathlib.Path) – Directory containing one <entry>.pdb file per row. Required when df_seq does not already carry the necessary DSSP columns.

  • features (list of str) – Feature keys from the StructurePreprocessor registry that belong to encode_dssp: any subset of {'ss3', 'ss8', 'rasa', 'phi_psi_sincos'}. Each key’s output is normalized to [0, 1] per the table in the class docstring.

  • ss_mode ({'ss3', 'ss8'}, default='ss3') – Forwarded to get_dssp() when DSSP is run inline. The chosen SS feature key ('ss3' / 'ss8') drives the actual one-hot dimensionality independently of this option.

  • gap_handling ({'pad', 'omit'}, default='pad') – Forwarded to get_dssp() when DSSP is run inline.

  • on_failure ({'nan', 'drop', 'raise'}, default='nan') – What to do for entries whose DSSP run failed. 'nan' fills with NaN-only tensors; 'drop' removes failed entries from the output dict; 'raise' raises RuntimeError if any entry failed.

  • return_df (bool, default=False) – If True, also return the per-row status DataFrame as a second element (dict_num, df_seq_out). If False (default), return only dict_num.

  • verbose (bool, optional) – Override instance verbosity for this call only.

Returns:

  • dict_dssp (dict[str, np.ndarray]) – {entry: (L_entry, D_total) ndarray} per-residue DSSP features concatenated in the order of features. Values are in [0, 1] (NaN for unresolved positions).

  • df_seq_out (pd.DataFrame) – Returned only when return_df=True. Echo of the (possibly DSSP-augmented) df_seq plus an encode_dssp_ok column flagging per-row success. Rows are dropped when on_failure='drop'.

Raises:
  • ValueError – On invalid arguments or feature keys not in this method’s registry slice.

  • RuntimeError – If mkdssp is unavailable, or if any entry failed under on_failure='raise'.