aaanalysis.StructurePreprocessor.get_dssp
- StructurePreprocessor.get_dssp(df_seq=None, pdb_folder=None, features=None, ss_mode='ss3', gap_handling='pad', verbose=None)[source]
Run DSSP and append per-residue list columns to
df_seq.- Parameters:
df_seq (pd.DataFrame, shape (n_samples, n_seq_info)) – DataFrame containing an
entrycolumn with unique protein identifiers and asequencecolumn with full protein sequences.entryis used as the PDB-file basename (<entry>.pdb) andsequenceis the target sequence to which DSSP output is aligned.pdb_folder (str or pathlib.Path) – Directory containing one
<entry>.pdbfile per row ofdf_seq. Missing files emit aUserWarningand producedssp_ok=Falsefor that row.features (list of str, default=['ss', 'asa', 'phi_psi']) – Which DSSP feature streams to extract. Any subset of
{'ss', 'asa', 'phi_psi'}. Only the requested columns are appended;dssp_okis always appended.ss_mode ({'ss3', 'ss8'}, default='ss3') – Secondary-structure encoding for the
sscolumn.gap_handling ({'pad', 'omit'}, default='pad') – How to handle positions without DSSP coverage.
'pad'preserves length-alignment todf_seq[sequence]and fills withut.STR_SS_GAP/ NaN;'omit'drops them across all requested streams simultaneously.verbose (bool, optional) – Override instance verbosity for this call only.
- Returns:
df_out – A copy of
df_seqwith appended list columns for each requested feature stream plus a booleandssp_okcolumn.- Return type:
pd.DataFrame
- Raises:
RuntimeError – If
mkdssp/dsspis not on PATH.ValueError – On invalid arguments or pre-existing output columns in
df_seq.
Examples