Implementing Polynomial Regression in Python: Random Data Generation and Visualization

Chanchala Gorale
2 min readJun 11, 2024

--

Sure, I can help you with that. Let’s start by creating random data and then we’ll build a polynomial regression model using Python. Here’s the plan:

  1. Generate Random Data: We’ll generate a set of random data points that follow a polynomial relationship.
  2. Fit Polynomial Regression Model: We’ll fit a polynomial regression model to this data.
  3. Visualize the Results: We’ll visualize the original data points and the fitted polynomial curve.

Let’s begin by generating the random data.

Step 1: Generate Random Data

We’ll create a dataset where y is a polynomial function of x with some added random noise. For simplicity, let's use a quadratic function (second-degree polynomial) as our base function.

Step 2: Fit Polynomial Regression Model

We’ll use the numpy and scikit-learn libraries to fit a polynomial regression model.

Step 3: Visualize the Results

We’ll use matplotlib to visualize the data and the fitted polynomial curve.

Here is the complete code to perform these steps

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

# Step 1: Generate random data
np.random.seed(0) # For reproducibility
x = np.random.rand(100, 1) * 10 # Random x values in range [0, 10]
y = 2 + 3 * x + x**2 + np.random.randn(100, 1) * 10 # Quadratic relationship with noise

# Step 2: Fit polynomial regression model
degree = 2 # Degree of the polynomial
polyreg = make_pipeline(PolynomialFeatures(degree), LinearRegression())
polyreg.fit(x, y)
y_pred = polyreg.predict(x)

# Step 3: Visualize the results
plt.scatter(x, y, color='blue', label='Data Points')
plt.plot(np.sort(x, axis=0), np.sort(y_pred, axis=0), color='red', label='Polynomial Fit')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Polynomial Regression (Degree 2)')
plt.legend()
plt.show()

Explanation:

  1. Data Generation:
  • We generate 100 random x values between 0 and 10.
  • The corresponding y values are generated using a quadratic function y = 2 + 3 * x + x**2 with added Gaussian noise.

2. Polynomial Regression:

  • We use PolynomialFeatures from scikit-learn to transform our features into polynomial features.
  • We create a pipeline that first transforms the data into polynomial features and then applies linear regression.
  • We fit this model to our data.

3. Visualization:

  • We plot the original data points.
  • We plot the polynomial regression curve using the fitted model.

Let’s execute the code and visualize the results. I’ll run the code in the next step.

Here is the generated plot showing the polynomial regression results:

  • Blue dots represent the original data points.
  • Red curve represents the fitted polynomial regression model (degree 2).

As we can see, the red curve fits the quadratic trend in the data quite well, indicating that our polynomial regression model has successfully captured the underlying relationship between x and y. If you have any other questions or need further adjustments, feel free to ask! ​​

--

--