aaanalysis.StructurePreprocessor.encode_pdb
- StructurePreprocessor.encode_pdb(df_seq=None, pdb_folder=None, features=None, plddt_disorder_threshold=70.0, on_failure='nan', return_df=False, verbose=None)[source]
Extract per-residue features from PDB ATOM records into
dict_pdb.- Parameters:
df_seq (pd.DataFrame, shape (n_samples, n_seq_info)) – DataFrame containing an
entrycolumn with unique protein identifiers and asequencecolumn with full protein sequences.entryis the PDB-file basename;sequenceis the target sequence used for chain selection and alignment.pdb_folder (str or pathlib.Path) – Directory containing one
<entry>.pdbfile per row ofdf_seq.features (list of str) – Feature keys from the StructurePreprocessor registry that belong to
encode_pdb: any subset of{bfactor, depth}. Thedepthfeature requires the externalmsmsbinary on PATH; absence raisesRuntimeErrorwith an install hint.on_failure ({'nan', 'drop', 'raise'}, default='nan') – Failure policy for entries whose PDB load fails (missing file, unparseable structure, no matched chain).
'nan'fills with NaN-only tensors;'drop'removes those entries;'raise're-raises.return_df (bool, default=False) – If
True, also return the per-row status DataFrame as a second element(dict_num, df_seq_out). IfFalse(default), return onlydict_num.verbose (bool, optional) – Override instance verbosity for this call only.
plddt_disorder_threshold (
float)
- Returns:
dict_pdb (dict[str, np.ndarray]) –
{entry: (L_entry, D_total) ndarray}per-residue PDB features concatenated in the order offeatures.df_seq_out (pd.DataFrame) – Returned only when
return_df=True. Echo ofdf_seqplus a booleanpdb_okcolumn.
- Raises:
ValueError – On invalid arguments.
RuntimeError – If
msmsis not installed and'depth'is requested, or if any entry failed underon_failure='raise'.