aaanalysis.StructurePreprocessor.get_domains
- StructurePreprocessor.get_domains(df_seq=None, pdb_folder=None, pae_folder=None, tool='afragmenter', chainsaw_path=None, resolution=0.7, threshold=2.0, on_failure='nan', verbose=None)[source]
Run a domain-segmentation tool and append a
choppingcolumn.Mirrors the
get_dssp→encode_dssppattern: this method runs the external tool inline, returns a copy ofdf_seqwith appendedchopping(Merizo/ChainSaw common format) anddomain_okboolean columns. The result feeds intoencode_domains()(which now accepts the in-memorychoppingcolumn directly).- Parameters:
df_seq (pd.DataFrame, shape (n_samples, n_seq_info)) – DataFrame containing an
entrycolumn with unique protein identifiers and asequencecolumn with full protein sequences.pdb_folder (str or pathlib.Path, optional) – Directory with one
<entry>.pdb/.cif/.pdb.gz/.cif.gzper row. Required whentool='chainsaw'.pae_folder (str or pathlib.Path, optional) – Directory with one PAE JSON per row (same canonical filename resolution as
encode_pae()). Required whentool='afragmenter'.tool ({'chainsaw', 'afragmenter'}, default='afragmenter') –
Which segmentation tool to run.
'afragmenter': pip-installable PAE-based segmenter. Requires the optional extrapip install aaanalysis[pro](lazy-import; the friendly install hint fires only when this tool is requested). Operates on the PAE matrix frompae_folder.'chainsaw': PDB-based segmenter (Chainsaw). Not on PyPI; clone the repo locally and pass its directory aschainsaw_path. Operates on PDB / CIF frompdb_foldervia subprocess.
chainsaw_path (str or pathlib.Path, optional) – Local clone of the ChainSaw repository. Required when
tool='chainsaw'(ignored otherwise).resolution (float, default=0.7) – AFragmenter Leiden-resolution knob (only used when
tool='afragmenter').threshold (float, default=2.0) – AFragmenter PAE graph-edge threshold in Å (only used when
tool='afragmenter').on_failure ({'nan', 'drop', 'raise'}, default='nan') – What to do for entries where the tool fails / file is missing.
'nan'fillschoppingwith an empty string and marksdomain_ok=False;'drop'removes the row;'raise're-raises.verbose (bool, optional) – Override instance verbosity for this call only.
- Returns:
df_out – A copy of
df_seqwith two appended columns: *chopping(str): the Merizo/ChainSaw common-formatchopping string, or
''on failure.domain_ok(bool):Trueif the tool returned a non-empty chopping for this entry.
- Return type:
pd.DataFrame
- Raises:
ValueError – On invalid arguments or missing per-tool kwargs.
RuntimeError – If the tool’s Python dependency is not installed (AFragmenter via
[pro]), ifchainsaw_pathis invalid, or if any entry failed underon_failure='raise'.