SequencePreprocessor.get_sliding_aa_window
- static SequencePreprocessor.get_sliding_aa_window(seq, slide_start=0, slide_stop=None, window_size=5, index1=False, gap='-', accept_gap=True)[source]
Extract sliding windows of amino acids from a sequence.
Slides a fixed-length window one position at a time from
slide_starttoslide_stop, collecting each window as a string. This is the multi-position counterpart ofget_aa_window(): instead of one window at a fixed site, it returns a list covering every slide position in the requested range.Added in version 0.1.0.
- Parameters:
seq (str) – The protein sequence from which to extract the windows.
slide_start (int, default=0) – The starting position (>=0) for sliding window extraction.
slide_stop (int, optional) – The ending position (>=1) for sliding window extraction. If
None, all positions fromslide_startto the end of the sequence are covered.window_size (int, default=5) – The size of each window (>=1) to extract.
index1 (bool, default=False) – Whether position index starts at 1 (if
True) or 0 (ifFalse), where first amino acid is at position 1 or 0, respectively.gap (str, default='-') – The character used to represent gaps.
accept_gap (bool, default=True) – Whether to accept gaps in the window. If
True, C-terminally padding is enabled.
- Returns:
list_windows – A list of extracted windows of amino acids.
- Return type:
See also
get_aa_window(): extract a single fixed window at a specific position.
Examples
You can obtain multiple defined amino acid windows (shifted by 1 residue position towards the C-terminus) from a protein sequences using the
SequencePreprocessor().get_sliding_aa_window()method. We first create an example sequence and theSequencePrepreprocessor()object as follows:import aaanalysis as aa seq = "ABCDEFGHIJ" sp = aa.SequencePreprocessor()
Provide the sequence as
seqparameter and specify a stop position using theslid_stopparameter:# Get all 6 amino acid windows of size 5 list_windows = sp.get_sliding_aa_window(seq=seq, slide_start=0, slide_stop=9) print(list_windows)
['ABCDE', 'BCDEF', 'CDEFG', 'DEFGH', 'EFGHI', 'FGHIJ']
You can change the start position (default=0) using the
slide_startparameter:# Get all 3 amino acid windows of size 5 list_windows = sp.get_sliding_aa_window(seq=seq, slide_start=3, slide_stop=9) print(list_windows)
['DEFGH', 'EFGHI', 'FGHIJ']
You can adjust the window length using the
window_sizeparameter:# Get 2 amino acid windows of size 8 (starting from second residue) list_windows = sp.get_sliding_aa_window(seq=seq, slide_start=1, window_size=8) print(list_windows)
['BCDEFGHI', 'CDEFGHIJ']
If you wish to start counting residue positions from 1 instead of 0, set
index1=True:# Get 3 amino acid windows of size 8 (starting from first residue) list_windows = sp.get_sliding_aa_window(seq=seq, slide_start=1, window_size=8, index1=True) print(list_windows)
['ABCDEFGH', 'BCDEFGHI', 'CDEFGHIJ']
Selecting too long windows could result into
gaps(default=‘-’), which can be disabled settingaccept_gaps=False(by default enabled). Sliding windows can have gaps whenslide_stopis greater than the sequence length:# Get amino acid window of size 10 (until residue position 12) list_windows = sp.get_sliding_aa_window(seq=seq, slide_stop=11, window_size=10, accept_gap=True) print(list_windows)
['ABCDEFGHIJ', 'BCDEFGHIJ-', 'CDEFGHIJ--']