aaanalysis.AAlogo.get_conservation

static AAlogo.get_conservation(df_logo_info=None, value_type='mean')[source]

Summarize per-position information content into a single conservation score.

Aggregates the per-position information content from AALogo.get_df_logo_info() into a single scalar value representing overall sequence conservation.

Parameters:
  • df_logo_info (pd.Series, shape (n_positions,)) – Per-position information content with index name ‘pos’, as returned by AAlogo.get_df_logo_info().

  • value_type ({'min', 'mean', 'median', 'max'}, default='mean') –

    Aggregation method:

    • min: Minimum conservation across all positions.

    • mean: Average conservation across all positions.

    • median: Median conservation across all positions.

    • max: Maximum conservation at any single position.

Returns:

cons_val – Conservation score ranging from 0 (no conservation) to ~4.248 (fully conserved).

Return type:

float

Notes

  • The maximum theoretical information content per position is log2(20) ≈ 4.248 bits, corresponding to a completely conserved amino acid.

  • Use value_type='mean' for an overall conservation estimate and value_type='max' to identify the most conserved position.

See also

Examples

The AALogo.get_conservation static method aggregates the per-position information content from get_df_logo_info into a single float using min, mean, median, or max. It takes a pd.Series with index name 'pos' as input.

import warnings
warnings.filterwarnings('ignore')
import aaanalysis as aa

sf = aa.SequenceFeature()
df_seq = aa.load_dataset(name="DOM_GSEC", n=100)
labels = df_seq["label"].values
df_parts = sf.get_df_parts(df_seq=df_seq, list_parts=["jmd_n", "tmd", "jmd_c"])

aalogo = aa.AAlogo()
df_logo_info = aalogo.get_df_logo_info(df_parts=df_parts)
df_logo_info.head()
pos
0    0.250782
1    0.297012
2    0.358037
3    0.322563
4    0.383782
dtype: float64

A pd.Series returned by get_df_logo_info, with index name 'pos'. The method simply aggregates the values — it is equivalent to calling df_logo_info.mean() (or min/median/max) directly.

cons = aa.AAlogo.get_conservation(df_logo_info=df_logo_info)
print(f"Conservation score:        {cons:.6f} bits")
print(f"df_logo_info.mean():       {df_logo_info.mean():.6f} bits")
print(f"Are equal: {cons == df_logo_info.mean()}")
Conservation score:        0.800787 bits
df_logo_info.mean():       0.800787 bits
Are equal: True

Controls which aggregation is applied to df_logo_info:

  • 'mean' (default): df_logo_info.mean() — overall average conservation

  • 'min': df_logo_info.min() — least conserved position

  • 'median': df_logo_info.median() — median position conservation

  • 'max': df_logo_info.max() — most conserved position

for value_type in ["min", "mean", "median", "max"]:
    score = aa.AAlogo.get_conservation(df_logo_info=df_logo_info, value_type=value_type)
    # Confirm it matches direct pandas call
    expected = getattr(df_logo_info, value_type)()
    print(f"{value_type:>8}: {score:.4f} bits  (== df_logo_info.{value_type}(): {score == expected})")
   min: 0.2508 bits  (== df_logo_info.min(): True)
  mean: 0.8008 bits  (== df_logo_info.mean(): True)
median: 0.8627 bits  (== df_logo_info.median(): True)
   max: 1.5553 bits  (== df_logo_info.max(): True)

get_conservation is most useful for reducing get_df_logo_info output to a single comparable value per group or dataset.

df_info_pos = aalogo.get_df_logo_info(df_parts=df_parts, labels=labels, label_test=1)
df_info_neg = aalogo.get_df_logo_info(df_parts=df_parts, labels=labels, label_test=0)

print(f"{'':12} {'min':>8} {'mean':>8} {'median':>8} {'max':>8}")
print("-" * 48)
for group, df_info in [("All", df_logo_info), ("Positive", df_info_pos), ("Negative", df_info_neg)]:
    scores = {vt: aa.AAlogo.get_conservation(df_logo_info=df_info, value_type=vt)
              for vt in ["min", "mean", "median", "max"]}
    print(f"{group:<12} {scores['min']:>8.4f} {scores['mean']:>8.4f} "
          f"{scores['median']:>8.4f} {scores['max']:>8.4f}")
                  min     mean   median      max
------------------------------------------------
All            0.2508   0.8008   0.8627   1.5553
Positive       0.4080   1.1192   1.1839   1.8965
Negative       0.1741   0.7748   0.7218   1.5067