aaanalysis.SequencePreprocessor.get_sliding_aa_window

static SequencePreprocessor.get_sliding_aa_window(seq=None, slide_start=0, slide_stop=None, window_size=5, index1=False, gap='-', accept_gap=True)[source]

Extract sliding windows of amino acids from a sequence.

Parameters:
  • seq (str) – The protein sequence from which to extract the windows.

  • slide_start (int, default=0) – The starting position (>=0) for sliding window extraction.

  • slide_stop (int, optional) – The ending position (>=1) for sliding window extraction. If None, extract all possible windows.

  • window_size (int, default=5) – The size of each window (>=1) to extract.

  • index1 (bool, default=False) – Whether position index starts at 1 (if True) or 0 (if False), where first amino acid is at position 1 or 0, respectively.

  • gap (str, default='-') – The character used to represent gaps.

  • accept_gap (bool, default=True) – Whether to accept gaps in the window. If True, C-terminally padding is enabled.

Returns:

list_windows – A list of extracted windows of amino acids.

Return type:

list of str

Examples

You can obtain multiple defined amino acid windows (shifted by 1 residue position towards the C-terminus) from a protein sequences using the SequencePreprocessor().get_sliding_aa_window() method. We first create an example sequence and the SequencePrepreprocessor() object as follows:

import aaanalysis as aa

seq = "ABCDEFGHIJ"
sp = aa.SequencePreprocessor()

Provide the sequence as seq parameter and specify a stop position using the slid_stop parameter:

# Get all 6 amino acid windows of size 5
list_windows = sp.get_sliding_aa_window(seq=seq, slide_start=0, slide_stop=9)
print(list_windows)
['ABCDE', 'BCDEF', 'CDEFG', 'DEFGH', 'EFGHI', 'FGHIJ']

You can change the start position (default=0) using the slide_start parameter:

# Get all 3 amino acid windows of size 5
list_windows = sp.get_sliding_aa_window(seq=seq, slide_start=3, slide_stop=9)
print(list_windows)
['DEFGH', 'EFGHI', 'FGHIJ']

You can adjust the window length using the window_size parameter:

# Get 2 amino acid windows of size 8 (starting from second residue)
list_windows = sp.get_sliding_aa_window(seq=seq, slide_start=1, window_size=8)
print(list_windows)
['BCDEFGHI', 'CDEFGHIJ']

If you wish to start counting residue positions from 1 instead of 0, set index1=True:

# Get 3 amino acid windows of size 8 (starting from first residue)
list_windows = sp.get_sliding_aa_window(seq=seq, slide_start=1, window_size=8, index1=True)
print(list_windows)
['ABCDEFGH', 'BCDEFGHI', 'CDEFGHIJ']

Selecting too long windows could result into gaps (default=‘-’), which can be disabled setting accept_gaps=False (by default enabled). Sliding windows can have gaps when slide_stop is greater than the sequence length:

# Get amino acid window of size 10 (until residue position 12)
list_windows = sp.get_sliding_aa_window(seq=seq, slide_stop=11, window_size=10, accept_gap=True)
print(list_windows)
['ABCDEFGHIJ', 'BCDEFGHIJ-', 'CDEFGHIJ--']