HomeArtificial IntelligenceTutorial: Exploring SHAP-IQ Visualizations - MarkTechPost

Tutorial: Exploring SHAP-IQ Visualizations – MarkTechPost


On this tutorial, we’ll discover a variety of SHAP-IQ visualizations that present insights into how a machine studying mannequin arrives at its predictions. These visuals assist break down advanced mannequin habits into interpretable parts—revealing each the person and interactive contributions of options to a particular prediction. Take a look at the Full Codes right here.

Putting in the dependencies

!pip set up shapiq overrides scikit-learn pandas numpy seaborn
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
from tqdm.asyncio import tqdm

import shapiq

print(f"shapiq model: {shapiq.__version__}")

Importing the dataset

On this tutorial, we’ll use the MPG (Miles Per Gallon) dataset, which we’ll load instantly from the Seaborn library. This dataset accommodates details about numerous automobile fashions, together with options like horsepower, weight, and origin. Take a look at the Full Codes right here.

import seaborn as sns
df = sns.load_dataset("mpg")
df

Processing the dataset

We use Label Encoding to transform the explicit column(s) into numeric format, making them appropriate for mannequin coaching.

import pandas as pd
from sklearn.preprocessing import LabelEncoder

# Drop rows with lacking values
df = df.dropna()

# Encoding the origin column
le = LabelEncoder()
df.loc[:, "origin"] = le.fit_transform(df["origin"])
df['origin'].distinctive()
for i, label in enumerate(le.classes_):
    print(f"{label} → {i}")

Splitting the info into coaching & check subsets

# Choose options and goal
X = df.drop(columns=["mpg", "name"])
y = df["mpg"]

feature_names = X.columns.tolist()
x_data, y_data = X.values, y.values

# Prepare-test cut up
x_train, x_test, y_train, y_test = train_test_split(x_data, y_data, test_size=0.2, random_state=42)

Mannequin Coaching

We practice a Random Forest Regressor with a most depth of 10 and 10 determination bushes (n_estimators=10). A hard and fast random_state ensures reproducibility.

# Prepare mannequin
mannequin = RandomForestRegressor(random_state=42, max_depth=10, n_estimators=10)
mannequin.match(x_train, y_train)

Mannequin Analysis

# Consider
mse = mean_squared_error(y_test, mannequin.predict(x_test))
r2 = r2_score(y_test, mannequin.predict(x_test))
print(f"Imply Squared Error: {mse:.2f}")
print(f"R2 Rating: {r2:.2f}")

Explaining a Native Occasion

We select a particular check occasion (with instance_id = 7) to discover how the mannequin arrived at its prediction. We’ll print the true worth, predicted worth, and the characteristic values for this occasion. Take a look at the Full Codes right here.

# choose an area occasion to be defined
instance_id = 7
x_explain = x_test[instance_id]
y_true = y_test[instance_id]
y_pred = mannequin.predict(x_explain.reshape(1, -1))[0]
print(f"Occasion {instance_id}, True Worth: {y_true}, Predicted Worth: {y_pred}")
for i, characteristic in enumerate(feature_names):
    print(f"{characteristic}: {x_explain[i]}")

Producing Explanations for A number of Interplay Orders

We generate Shapley-based explanations for various interplay orders utilizing the shapiq bundle. Particularly, we compute:

  • Order 1 (Normal Shapley Values): Particular person characteristic contributions
  • Order 2 (Pairwise Interactions): Mixed results of characteristic pairs
  • Order N (Full Interplay): All interactions as much as the entire variety of options
# create explanations for various orders
feature_names = listing(X.columns)  # get the characteristic names
n_features = len(feature_names)

si_order: dict[int, shapiq.InteractionValues] = {}
for order in tqdm([1, 2, n_features]):
    index = "k-SII" if order > 1 else "SV"  # can even be set mechanically by the explainer
    explainer = shapiq.TreeExplainer(mannequin=mannequin, max_order=order, index=index)
    si_order[order] = explainer.clarify(x=x_explain)
si_order

1. Power Chart

The power plot is a strong visualization instrument that helps us perceive how a machine studying mannequin arrived at a particular prediction. It shows the baseline prediction (i.e., the anticipated worth of the mannequin earlier than seeing any options), after which exhibits how every characteristic “pushes” the prediction greater or decrease.

On this plot:

  • Pink bars characterize options or interactions that improve the prediction.
  • Blue bars characterize those who lower it.
  • The size of every bar corresponds to the magnitude of its impact.

When utilizing Shapley interplay values, the power plot can visualize not simply particular person contributions but in addition interactions between options. This makes it particularly insightful when deciphering advanced fashions, because it visually decomposes how mixtures of options work collectively to affect the result. Take a look at the Full Codes right here.

sv = si_order[1]  # get the SV
si = si_order[2]  # get the 2-SII
mi = si_order[n_features]  # get the Moebius remodel

sv.plot_force(feature_names=feature_names, present=True)
si.plot_force(feature_names=feature_names, present=True)
mi.plot_force(feature_names=feature_names, present=True)

From the primary plot, we will see that the bottom worth is 23.5. Options like Weight, Cylinders, Horsepower, and Displacement have a optimistic affect on the prediction, pushing it above the baseline. Alternatively, Mannequin 12 months and Acceleration have a adverse impression, pulling the prediction downward.

2. Waterfall Chart

Much like the power plot, the waterfall plot is one other in style solution to visualize Shapley values, initially launched with the shap library. It exhibits how totally different options push the prediction greater or decrease in comparison with the baseline. One key benefit of the waterfall plot is that it mechanically teams options with very small impacts into an “different” class, making the chart cleaner and simpler to grasp. Take a look at the Full Codes right here.

sv.plot_waterfall(feature_names=feature_names, present=True)
si.plot_waterfall(feature_names=feature_names, present=True)
mi.plot_waterfall(feature_names=feature_names, present=True)

3. Community Plot

The community plot exhibits how options work together with one another utilizing first- and second-order Shapley interactions. Node measurement displays particular person characteristic impression, whereas edge width and shade present interplay energy and route. It’s particularly useful when coping with many options, revealing advanced interactions that easier plots would possibly miss. Take a look at the Full Codes right here.

si.plot_network(feature_names=feature_names, present=True)
mi.plot_network(feature_names=feature_names, present=True)

4. SI Graph Plot

The SI graph plot extends the community plot by visualizing all higher-order interactions as hyper-edges connecting a number of options. Node measurement exhibits particular person characteristic impression, whereas edge width, shade, and transparency replicate the energy and route of interactions. It offers a complete view of how options collectively affect the mannequin’s prediction. Take a look at the Full Codes right here.

# we abbreviate the characteristic names since, they're plotted contained in the nodes
abbrev_feature_names = shapiq.plot.utils.abbreviate_feature_names(feature_names)
sv.plot_si_graph(
    feature_names=abbrev_feature_names,
    present=True,
    size_factor=2.5,
    node_size_scaling=1.5,
    plot_original_nodes=True,
)
si.plot_si_graph(
    feature_names=abbrev_feature_names,
    present=True,
    size_factor=2.5,
    node_size_scaling=1.5,
    plot_original_nodes=True,
)
mi.plot_si_graph(
    feature_names=abbrev_feature_names,
    present=True,
    size_factor=2.5,
    node_size_scaling=1.5,
    plot_original_nodes=True,
)

5. Bar Plot

The bar plot is tailor-made for world explanations. Whereas different plots can be utilized each regionally and globally, the bar plot summarizes the general significance of options (or characteristic interactions) by exhibiting the imply absolute Shapley (or interplay) values throughout all situations. In shapiq, it highlights which characteristic interactions contribute most on common. Take a look at the Full Codes right here.

explanations = []
explainer = shapiq.TreeExplainer(mannequin=mannequin, max_order=2, index="k-SII")
for instance_id in tqdm(vary(20)):
    x_explain = x_test[instance_id]
    si = explainer.clarify(x=x_explain)
    explanations.append(si)
shapiq.plot.bar_plot(explanations, feature_names=feature_names, present=True)

“Distance” and “Horsepower” are probably the most influential options total, which means they’ve the strongest particular person impression on the mannequin’s predictions. That is evident from their excessive imply absolute Shapley interplay values within the bar plot.

Moreover, when taking a look at second-order interactions (i.e., how two options work together collectively), the mixtures “Horsepower × Weight” and “Distance × Horsepower” present important joint affect. Their mixed attribution is round 1.4, indicating that these interactions play an vital function in shaping the mannequin’s predictions past what every characteristic contributes individually. This highlights the presence of non-linear relationships between options within the mannequin.


Take a look at the Full Codes right here. Be happy to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be happy to observe us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our Publication.


I’m a Civil Engineering Graduate (2022) from Jamia Millia Islamia, New Delhi, and I’ve a eager curiosity in Knowledge Science, particularly Neural Networks and their utility in numerous areas.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments