aaanalysis.SequencePreprocessor.get_sliding_aa_window
- static SequencePreprocessor.get_sliding_aa_window(seq=None, slide_start=0, slide_stop=None, window_size=5, index1=False, gap='-', accept_gap=True)[source]
Extract sliding windows of amino acids from a sequence.
- Parameters:
seq (str) – The protein sequence from which to extract the windows.
slide_start (int, default=0) – The starting position (>=0) for sliding window extraction.
slide_stop (int, optional) – The ending position (>=1) for sliding window extraction. If
None, extract all possible windows.window_size (int, default=5) – The size of each window (>=1) to extract.
index1 (bool, default=False) – Whether position index starts at 1 (if
True) or 0 (ifFalse), where first amino acid is at position 1 or 0, respectively.gap (str, default='-') – The character used to represent gaps.
accept_gap (bool, default=True) – Whether to accept gaps in the window. If
True, C-terminally padding is enabled.
- Returns:
list_windows – A list of extracted windows of amino acids.
- Return type:
Examples
You can obtain multiple defined amino acid windows (shifted by 1 residue position towards the C-terminus) from a protein sequences using the
SequencePreprocessor().get_sliding_aa_window()method. We first create an example sequence and theSequencePrepreprocessor()object as follows:import aaanalysis as aa seq = "ABCDEFGHIJ" sp = aa.SequencePreprocessor()
Provide the sequence as
seqparameter and specify a stop position using theslid_stopparameter:# Get all 6 amino acid windows of size 5 list_windows = sp.get_sliding_aa_window(seq=seq, slide_start=0, slide_stop=9) print(list_windows)
['ABCDE', 'BCDEF', 'CDEFG', 'DEFGH', 'EFGHI', 'FGHIJ']
You can change the start position (default=0) using the
slide_startparameter:# Get all 3 amino acid windows of size 5 list_windows = sp.get_sliding_aa_window(seq=seq, slide_start=3, slide_stop=9) print(list_windows)
['DEFGH', 'EFGHI', 'FGHIJ']
You can adjust the window length using the
window_sizeparameter:# Get 2 amino acid windows of size 8 (starting from second residue) list_windows = sp.get_sliding_aa_window(seq=seq, slide_start=1, window_size=8) print(list_windows)
['BCDEFGHI', 'CDEFGHIJ']
If you wish to start counting residue positions from 1 instead of 0, set
index1=True:# Get 3 amino acid windows of size 8 (starting from first residue) list_windows = sp.get_sliding_aa_window(seq=seq, slide_start=1, window_size=8, index1=True) print(list_windows)
['ABCDEFGH', 'BCDEFGHI', 'CDEFGHIJ']
Selecting too long windows could result into
gaps(default=‘-’), which can be disabled settingaccept_gaps=False(by default enabled). Sliding windows can have gaps whenslide_stopis greater than the sequence length:# Get amino acid window of size 10 (until residue position 12) list_windows = sp.get_sliding_aa_window(seq=seq, slide_stop=11, window_size=10, accept_gap=True) print(list_windows)
['ABCDEFGHIJ', 'BCDEFGHIJ-', 'CDEFGHIJ--']