Applied Data Analysis in Python

This answer page shows the results of trying different values of noise and n_neighbors when fitting k-NN to a dummy data set. For you to complete the exercise I would just expect you to maually change the values and rerun the cells to look at the differences. On this page I have done something a little more complicated in order to visualise all the combinations in one plot.

In [1]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.inspection import DecisionBoundaryDisplay

We'll loop over a range of neighbour counts:

In [2]:
neighbours = [1, 5, 10, 100, 150]

Start by grabbing the data:

In [3]:
data = pd.read_csv("https://milliams.com/courses/applied_data_analysis/moons.csv")
X = data[["x1", "x2"]]
y = data["y"]

train_X, test_X, train_y, test_y = train_test_split(X, y, random_state=42)

We then loop over the range of values we want, fit the model and plot the result for each.

In [4]:
# We'll plot the results in a grid of subplots
fig, axs = plt.subplots(
    nrows=len(neighbours),
    ncols=1,
    figsize=(8, 30),
    constrained_layout=True,
    sharex=True,
    sharey=True
)

for row, n_neighbors in enumerate(neighbours):
    # Fit and score the model (uses the `n_neighbors` variable)
    model = KNeighborsClassifier(n_neighbors=n_neighbors).fit(train_X, train_y)
    score = model.score(test_X, test_y)

    # Plot the results in the grid of subplots
    ax = axs[row]
    ax.set_xlim(-2, 3)
    ax.set_ylim(-1.5, 2)
    ax.set_title(f"k: {n_neighbors}, score={score:.2}")
    DecisionBoundaryDisplay.from_estimator(model, X, cmap="PRGn", ax=ax)
    sns.scatterplot(data=X, x="x1", y="x2", hue=y, ax=ax, palette="Dark2")
    ax.get_legend().set_visible(False)