aaanalysis.comp_auc_adjusted
- aaanalysis.comp_auc_adjusted(X=None, labels=None, label_test=1, label_ref=0, n_jobs=None)[source]
Compute an adjusted Area Under the Curve (AUC) [-0.5, 0.5] assessing the similarity between two groups.
Introduced in [Breimann25a], this adjusted AUC (denoted ‘AUC*’) is computed for each feature in the dataset
X, comparing two groups specified by the labels. It is based on the non-parametric measure of the difference between two groups. The adjustment of AUC subtracts 0.5, so it ranges between -0.5 and 0.5. An AUC* of 0 indicates an equal distribution between the two groups. This measure is useful for ranking features based on their ability to distinguish between the two groups.Added in version 1.0.0.
- Parameters:
X (array-like, shape (n_samples, n_features)) – Feature matrix. ‘Rows’ typically correspond to proteins and ‘columns’ to features.
labels (array-like, shape (n_samples,)) – Dataset labels of samples in X. Should contain only two different integer label values, representing test and reference group (typically, 1 and 0).
label_test (int, default=1,) – Class label of test group in
labels.label_ref (int, default=0,) – Class label of reference group in
labels.n_jobs (int, None, or -1, default=None) – Number of CPU cores (>=1) used for multiprocessing. If
None, the number is optimized automatically. If-1, the number is set to all available cores. Overridden byoptions['n_jobs']when set.
- Returns:
auc – Array with AUC* values for each feature, ranging from [-0.5, 0.5]. A value of 0 indicates equal distributions between the two groups for that feature.
- Return type:
array-like, shape (n_features,)
Examples
You can compare the similarity of two distributions (here two normal distributions, group_test and group_ref) utilizing an adjusted Area Under the Curve (AUC*) measure ranging from -0.5 to 0.5, as introduced in [Breimann25a]. Provide only feature matrix
Xand its respective grouplabelsto thecomp_auc_adjustedfunction:import numpy as np import seaborn as sns import matplotlib.pyplot as plt import aaanalysis as aa # Generate random data for two groups group_test = np.random.normal(-2, 0.5, 1000) # Mean = -2, Std = 0.5, 1000 samples group_ref = np.random.normal(2, 0.5, 1000) # Mean = 2, Std = 0.5, 1000 samples # Combine data into a single dataset and reshape it X = np.hstack([group_test, group_ref]).reshape(-1, 1) # Reshape to 2D array labels = np.array([1]*1000 + [0]*1000) auc_score = aa.comp_auc_adjusted(X=X, labels=labels)[0] # Plot aa.plot_settings() sns.histplot(group_test, color="tab:red", kde=True, label='Test group', alpha=0.5) sns.histplot(group_ref, color="tab:gray", kde=True, label='Reference group', alpha=0.5) plt.title(f"AUC* = {auc_score} (All test samples are smaller)") aa.plot_legend(dict_color=dict(Test="tab:red", Ref="tab:gray"), ncol=1, x=0.85, y=1) sns.despine() plt.show()
The greater the overlap between both distributions, the closer the
auc_scoreis to 0:group_test = np.random.normal(-0.5, 0.5, 1000) group_ref = np.random.normal(0.5, 0.5, 1000) X = np.hstack([group_test, group_ref]).reshape(-1, 1) # Reshape to 2D array labels = np.array([1]*1000 + [0]*1000) auc_score = aa.comp_auc_adjusted(X, labels)[0] # Plot aa.plot_settings() sns.histplot(group_test, color="tab:red", kde=True, label='Test group', alpha=0.5) sns.histplot(group_ref, color="tab:gray", kde=True, label='Reference group', alpha=0.5) plt.title(f"AUC* = {auc_score} (Most test samples are smaller)") sns.despine() plt.show()
A
auc_scoreof 0 indicates a perfect overlap:group_test = np.random.normal(0, 0.5, 1000) group_ref = np.random.normal(0, 0.5, 1000) X = np.hstack([group_test, group_ref]).reshape(-1, 1) # Reshape to 2D array labels = np.array([1]*1000 + [0]*1000) auc_score = aa.comp_auc_adjusted(X, labels)[0] # Plot aa.plot_settings() sns.histplot(group_test, color="tab:red", kde=True, label='Test group', alpha=0.5) sns.histplot(group_ref, color="tab:gray", kde=True, label='Reference group', alpha=0.5) plt.title(f"AUC* = {auc_score} (Distributions are almost identical)") sns.despine() plt.show()
If all values from the test group (the higher integer value) are greater than the values of the reference group, the
auc_scoreis 0.5:group_test = np.random.normal(2, 0.5, 1000) group_ref = np.random.normal(-2, 0.5, 1000) X = np.hstack([group_test, group_ref]).reshape(-1, 1) # Reshape to 2D array labels = np.array([1]*1000 + [0]*1000) auc_score = aa.comp_auc_adjusted(X, labels)[0] # Plot aa.plot_settings() sns.histplot(group_test, color="tab:red", kde=True, label='Test group', alpha=0.5) sns.histplot(group_ref, color="tab:gray", kde=True, label='Reference group', alpha=0.5) plt.title(f"AUC* = {auc_score} (All test samples are greater)") sns.despine() plt.show()