On this tutorial, we discover methods to use the SHAP-IQ bundle to uncover and visualize function interactions in machine studying fashions utilizing Shapley Interplay Indices (SII), constructing on the muse of conventional Shapley values.
Shapley values are nice for explaining particular person function contributions in AI fashions however fail to seize function interactions. Shapley interactions go a step additional by separating particular person results from interactions, providing deeper insights—like how longitude and latitude collectively affect home costs. On this tutorial, we’ll get began with the shapiq bundle to compute and discover these Shapley interactions for any mannequin. Try the Full Codes right here
Putting in the dependencies
!pip set up shapiq overrides scikit-learn pandas numpy
Information Loading and Pre-processing
On this tutorial, we’ll use the Bike Sharing dataset from OpenML. After loading the information, we’ll break up it into coaching and testing units to organize it for mannequin coaching and analysis. Try the Full Codes right here
import shapiq
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
import numpy as np
# Load knowledge
X, y = shapiq.load_bike_sharing(to_numpy=True)
# Cut up into coaching and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Mannequin Coaching and Efficiency Analysis
# Practice mannequin
mannequin = RandomForestRegressor()
mannequin.match(X_train, y_train)
# Predict
y_pred = mannequin.predict(X_test)
# Consider
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
r2 = r2_score(y_test, y_pred)
print(f"R² Rating: {r2:.4f}")
print(f"Imply Absolute Error: {mae:.4f}")
print(f"Root Imply Squared Error: {rmse:.4f}")
Establishing an Explainer
We arrange a TabularExplainer utilizing the shapiq bundle to compute Shapley interplay values based mostly on the k-SII (k-order Shapley Interplay Index) technique. By specifying max_order=4, we enable the explainer to think about interactions of as much as 4 options concurrently, enabling deeper insights into how teams of options collectively impression mannequin predictions. Try the Full Codes right here
# arrange an explainer with k-SII interplay values as much as order 4
explainer = shapiq.TabularExplainer(
mannequin=mannequin,
knowledge=X,
index="k-SII",
max_order=4
)
Explaining a Native Occasion
We choose a selected check occasion (index 100) to generate native explanations. The code prints the true and predicted values for this occasion, adopted by a breakdown of its function values. This helps us perceive the precise inputs handed to the mannequin and units the context for deciphering the Shapley interplay explanations that comply with. Try the Full Codes right here
from tqdm.asyncio import tqdm
# create explanations for various orders
feature_names = listing(df[0].columns) # get the function names
n_features = len(feature_names)
# choose an area occasion to be defined
instance_id = 100
x_explain = X_test[instance_id]
y_true = y_test[instance_id]
y_pred = mannequin.predict(x_explain.reshape(1, -1))[0]
print(f"Occasion {instance_id}, True Worth: {y_true}, Predicted Worth: {y_pred}")
for i, function in enumerate(feature_names):
print(f"{function}: {x_explain[i]}")
Analyzing Interplay Values
We use the explainer.clarify() technique to compute Shapley interplay values for a selected knowledge occasion (X[100]) with a price range of 256 mannequin evaluations. This returns an InteractionValues object, which captures how particular person options and their combos affect the mannequin’s output. The max_order=4 means we take into account interactions involving as much as 4 options. Try the Full Codes right here
interaction_values = explainer.clarify(X[100], price range=256)
# analyse interplay values
print(interaction_values)
First-Order Interplay Values
To maintain issues easy, we compute first-order interplay values—i.e., normal Shapley values that seize solely particular person function contributions (no interactions).
By setting max_order=1 within the TreeExplainer, we’re saying:
“Inform me how a lot every function individually contributes to the prediction, with out contemplating any interplay results.”
These values are often known as normal Shapley values. For every function, it estimates the common marginal contribution to the prediction throughout all doable permutations of function inclusion. Try the Full Codes right here
feature_names = listing(df[0].columns)
explainer = shapiq.TreeExplainer(mannequin=mannequin, max_order=1, index="SV")
si_order = explainer.clarify(x=x_explain)
si_order
Plotting a Waterfall chart
A Waterfall chart visually breaks down a mannequin’s prediction into particular person function contributions. It begins from the baseline prediction and provides/subtracts every function’s Shapley worth to achieve the ultimate predicted output.
In our case, we’ll use the output of TreeExplainer with max_order=1 (i.e., particular person contributions solely) to visualise the contribution of every function. Try the Full Codes right here
si_order.plot_waterfall(feature_names=feature_names, present=True)
In our case, the baseline worth (i.e., the mannequin’s anticipated output with none function data) is 190.717.
As we add the contributions from particular person options (order-1 Shapley values), we are able to observe how every one pushes the prediction up or pulls it down:
- Options like Climate and Humidity have a constructive contribution, growing the prediction above the baseline.
- Options like Temperature and Yr have a robust unfavorable impression, pulling the prediction down by −35.4 and −45, respectively.
General, the Waterfall chart helps us perceive which options are driving the prediction, and by which path—offering priceless perception into the mannequin’s decision-making.
Try the Full Codes right here. Be happy to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be happy to comply with us on Twitter and don’t overlook to hitch our 100k+ ML SubReddit and Subscribe to our E-newsletter.