aaanalysis.StructurePreprocessor.get_dssp

StructurePreprocessor.get_dssp(df_seq=None, pdb_folder=None, features=None, ss_mode='ss3', gap_handling='pad', verbose=None)[source]

Run DSSP and append per-residue list columns to df_seq.

Parameters:
  • df_seq (pd.DataFrame, shape (n_samples, n_seq_info)) – DataFrame containing an entry column with unique protein identifiers and a sequence column with full protein sequences. entry is used as the PDB-file basename (<entry>.pdb) and sequence is the target sequence to which DSSP output is aligned.

  • pdb_folder (str or pathlib.Path) – Directory containing one <entry>.pdb file per row of df_seq. Missing files emit a UserWarning and produce dssp_ok=False for that row.

  • features (list of str, default=['ss', 'asa', 'phi_psi']) – Which DSSP feature streams to extract. Any subset of {'ss', 'asa', 'phi_psi'}. Only the requested columns are appended; dssp_ok is always appended.

  • ss_mode ({'ss3', 'ss8'}, default='ss3') – Secondary-structure encoding for the ss column.

  • gap_handling ({'pad', 'omit'}, default='pad') – How to handle positions without DSSP coverage. 'pad' preserves length-alignment to df_seq[sequence] and fills with ut.STR_SS_GAP / NaN; 'omit' drops them across all requested streams simultaneously.

  • verbose (bool, optional) – Override instance verbosity for this call only.

Returns:

df_out – A copy of df_seq with appended list columns for each requested feature stream plus a boolean dssp_ok column.

Return type:

pd.DataFrame

Raises:
  • RuntimeError – If mkdssp / dssp is not on PATH.

  • ValueError – On invalid arguments or pre-existing output columns in df_seq.

Examples