Data Visualization in Machine Learning

Chanchala Gorale
4 min readJun 11, 2024

In machine learning, visualizations play a crucial role in understanding data, evaluating models, and interpreting results. Various libraries are available for creating plots, each with its strengths and specialties. Here’s an overview of some popular plotting libraries, different types of plots they offer, and when to use each type of plot.

Popular Plotting Libraries

Matplotlib

  • Strengths: Versatility and customization.
  • Usage: Basic plots, detailed customization.

Seaborn

  • Strengths: Statistical visualizations, aesthetically pleasing.
  • Usage: High-level interface for drawing attractive statistical graphics.

Plotly

  • Strengths: Interactive plots.
  • Usage: Dashboards, interactive web applications.

ggplot (via plotnine in Python)

  • Strengths: Grammar of graphics.
  • Usage: Declarative plotting, layered graphics.

Bokeh

  • Strengths: Interactive plots for web applications.
  • Usage: Large and complex visualizations, interactive web plots.

Altair

  • Strengths: Declarative statistical visualization.
  • Usage: Quickly creating interactive visualizations.

Types of Plots and Their Usage

Line Plot

  • Description: Displays data points connected by a line.
  • Usage: Trend analysis over time or continuous data.
  • Library: Matplotlib, Seaborn, Plotly, Altair.
  • Example:
import matplotlib.pyplot as plt 
plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
plt.ylabel('some numbers')
plt.show()

Scatter Plot

  • Description: Uses dots to represent the values obtained for two different variables.
  • Usage: Relationship between two variables, detecting correlations.
  • Library: Matplotlib, Seaborn, Plotly, Altair.
  • Example:
import matplotlib.pyplot as plt 
plt.scatter([1, 2, 3, 4], [1, 4, 9, 16])
plt.ylabel('some numbers')
plt.show()

Histogram

  • Description: Represents the distribution of a dataset by showing the frequency of data points in successive numerical intervals (bins).
  • Usage: Distribution of a single variable, frequency analysis.
  • Library: Matplotlib, Seaborn, Plotly, Altair.
  • Example:
import matplotlib.pyplot as plt 
plt.hist([1, 2, 2, 3, 3, 3, 4, 4, 4, 4], bins=4)
plt.show()

Box Plot

  • Description: Summarizes the distribution of a dataset by showing the median, quartiles, and potential outliers.
  • Usage: Identifying outliers, understanding data spread.
  • Library: Seaborn, Matplotlib, Plotly.
  • Example:
import seaborn as sns 
sns.boxplot(data=[1, 2, 3, 4, 5, 6, 7, 8, 9])
plt.show()

Heatmap

  • Description: Represents data in a matrix form, where individual values are represented as colors.
  • Usage: Correlation matrices, feature importance, confusion matrices.
  • Library: Seaborn, Matplotlib, Plotly.
  • Example:
import seaborn as sns 
import numpy as np
data = np.random.rand(10, 12)
sns.heatmap(data)
plt.show()

Bar Plot

  • Description: Represents categorical data with rectangular bars.
  • Usage: Comparing quantities of different categories.
  • Library: Matplotlib, Seaborn, Plotly, Altair.
  • Example:
import matplotlib.pyplot as plt 
plt.bar([1, 2, 3, 4], [1, 4, 9, 16])
plt.show()

Pair Plot

  • Description: Creates a matrix of scatter plots for all pairs of variables.
  • Usage: Exploring relationships between multiple variables.
  • Library: Seaborn.
  • Example:
import seaborn as sns
iris = sns.load_dataset("iris")
sns.pairplot(iris)
plt.show()

Violin Plot

  • Description: Combines box plot and kernel density plot.
  • Usage: Distribution of the data across different categories.
  • Library: Seaborn, Matplotlib.
  • Example:
import seaborn as sns
sns.violinplot(data=[1, 2, 3, 4, 5, 6, 7, 8, 9])
plt.show()

When to Use Specific Plots

  • Line Plot: When analyzing trends over time or continuous data.
  • Scatter Plot: When exploring relationships or correlations between two variables.
  • Histogram: When examining the distribution and frequency of a single variable.
  • Box Plot: When identifying outliers and understanding the spread of data.
  • Heatmap: When visualizing matrix-like data, such as correlations or feature importance.
  • Bar Plot: When comparing quantities across different categories.
  • Pair Plot: When exploring pairwise relationships in a dataset.
  • Violin Plot: When analyzing the distribution of data across different categories with a focus on density.

These plots and libraries enable machine learning practitioners to gain insights into data, evaluate models, and communicate findings effectively.

--

--