SequenceFeature.get_split_kws
- static SequenceFeature.get_split_kws(split_types=None, n_split_min=1, n_split_max=15, steps_pattern=None, n_min=2, n_max=4, len_max=15, steps_periodicpattern=None)[source]
Create dictionary with kwargs for three split types:
Segment: continuous sub-sequence.
Pattern: non-periodic discontinuous sub-sequence
PeriodicPattern: periodic discontinuous sub-sequence.
Added in version 0.1.0.
- Parameters:
split_types (list of str, default=[
Segment,Pattern,PeriodicPattern]) – Split types for which parameter dictionary should be generated.n_split_min (int, default=1) – Number to specify the greatest
Segment. Should be > 0.n_split_max (int, default=15,) – Number to specify the smallest
Segment. Should be >=n_split_min.steps_pattern (list of int, default=[3, 4], optional) – Possible steps sizes for
Pattern. Should contain at least 1 non-negative integers ifPatternsplit_type is used. IfNone, default is used.n_min (int, default=2) – Minimum number of steps for
Pattern. Should be <=n_max.n_max (int, default=4) – Maximum number of steps for
Pattern. Should be >=n_min.len_max (int, default=15) – Maximum length in amino acid position for
Patternby varying start position. Should be > min(steps_pattern).steps_periodicpattern (list of int, default=[3, 4], optional) – Size of odd and even steps for
PeriodicPattern. Should contain two non-negative integers ifPeriodicPatternsplit_type is used. IfNone, default is used.
- Returns:
split_kws – Nested dictionary with parameters for chosen split_types:
Segment: {n_split_min:1, n_split_max=15}
Pattern: {steps=[3, 4], n_min=2, n_max=4, len_max=15}
PeriodicPattern: {steps=[3, 4]}
- Return type:
Notes
The split bounds returned here are validated for internal consistency (e.g.,
n_split_min <= n_split_max,n_min <= n_max,len_max > min(steps_pattern)); inconsistent values raise aValueError.Beyond that, the feasible maxima are effectively capped by the CPP part lengths used to build
df_parts(tmd_len,jmd_n_len,jmd_c_len; defaults 20/10/10): aSegmentcannot be split into more pieces than its part has residues, and aPattern/PeriodicPatterncannot span beyond the part length. Choosingn_split_max,n_max, orlen_maxlarger than a part can accommodate does not raise here — those splits simply yield empty feature buckets downstream. The one config that is always degenerate regardless of part length, an emptyPatternbucket where even the shortest repeat exceedslen_max(n_min * min(steps_pattern) > len_max), emits aUserWarningnaming the offending parameters so it can be fixed by raisinglen_maxor loweringsteps_pattern/n_min.Examples
Get default arguments for all splits types (
Segment,Pattern,PeriodicPattern):import aaanalysis as aa sf = aa.SequenceFeature() split_kws = sf.get_split_kws() split_kws
{'Segment': {'n_split_min': 1, 'n_split_max': 15}, 'Pattern': {'steps': [3, 4], 'n_min': 2, 'n_max': 4, 'len_max': 15}, 'PeriodicPattern': {'steps': [3, 4]}}
You can also retrieve arguments for specific split types:
split_kws = sf.get_split_kws(split_types=["Segment", "Pattern"]) split_kws
{'Segment': {'n_split_min': 1, 'n_split_max': 15}, 'Pattern': {'steps': [3, 4], 'n_min': 2, 'n_max': 4, 'len_max': 15}}
The arguments for each split type can be adjusted. For
Segments, their minimum and maximum length can be changed by then_split_min(default=1) andn_split_max(default=15) parameters:split_kws = sf.get_split_kws(split_types="Segment", n_split_min=5, n_split_max=10) split_kws
{'Segment': {'n_split_min': 5, 'n_split_max': 10}}
For
PeriodicPattern, the step size of each odd and even step can be specified as follows using thesteps_periodicpattern(default=[3, 4]):split_kws = sf.get_split_kws(split_types="PeriodicPattern", steps_periodicpattern=[5, 10]) split_kws
{'PeriodicPattern': {'steps': [5, 10]}}
And for
Patterns, the step size, the minimum and maximum number of steps, and the maximum residue size of the pattern can be adjusted using thesteps_pattern(default=[3, 4]),n_min(default=2),n_max(default=4), andlen_max(default=10) parameters:split_kws = sf.get_split_kws(split_types="Pattern", steps_pattern=[5, 10], n_min=3, n_max=5, len_max=30) split_kws
{'Pattern': {'steps': [5, 10], 'n_min': 3, 'n_max': 5, 'len_max': 30}}