"If a worker wants to do his job well, he must first sharpen his tools." - Confucius, "The Analects of Confucius. Lu Linggong"
Front page > Programming > Can You Color Scatter Plots Based on Specific Column Values in Pandas with Matplotlib?

Can You Color Scatter Plots Based on Specific Column Values in Pandas with Matplotlib?

Published on 2024-11-08
Browse:100

Can You Color Scatter Plots Based on Specific Column Values in Pandas with Matplotlib?

Coloring Scatter Plots by Column Values Using Pandas and Matplotlib

Matplotlib is a popular Python library for creating static, animated, and interactive visualizations in Python. This article explores using Matplotlib to color scatter plots based on values in a specific column of a Pandas DataFrame.

Imports and Data

To begin, we import the necessary libraries, including Matplotlib (as plt) and Pandas (as pd). We also generate a sample DataFrame ("df") with three columns: "Height," "Weight," and "Gender."

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

np.random.seed(0)
N = 37
_genders = ["Female", "Male", "Non-binary", "No Response"]
df = pd.DataFrame({
    "Height (cm)": np.random.uniform(low=130, high=200, size=N),
    "Weight (kg)": np.random.uniform(low=30, high=100, size=N),
    "Gender": np.random.choice(_genders, size=N),
})

Updating in August 2021

Seaborn has introduced new figure-level functions, such as seaborn.relplot in version 0.11.0. These functions are recommended over using FacetGrid directly.

sns.relplot(data=df, x="Weight (kg)", y="Height (cm)", hue="Gender", hue_order=_genders, aspect=1.61)
plt.show()

Old Answer (2015)

If you wish to use Matplotlib directly, you'll need to map matplotlib's scatter function onto a Pandas DataFrame's categories. To do this:

  • Create a dictionary with unique categories from the column and colors.
  • Add a new "Color" column to the DataFrame, assigning each category a corresponding color.
  • Use the scatter function to plot the data, specifying the color column as the "c" argument.
def dfScatter(df, xcol='Height', ycol='Weight', catcol='Gender'):
    fig, ax = plt.subplots()
    categories = np.unique(df[catcol])
    colors = np.linspace(0, 1, len(categories))
    colordict = dict(zip(categories, colors))

    df["Color"] = df[catcol].apply(lambda x: colordict[x])
    ax.scatter(df[xcol], df[ycol], c=df.Color)
    return fig

fig = dfScatter(df)
fig.savefig('fig1.png')

By following these steps, you can easily color scatter plots based on column values using Pandas and Matplotlib.

Release Statement This article is reprinted at: 1729320559 If there is any infringement, please contact [email protected] to delete it
Latest tutorial More>

Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.

Copyright© 2022 湘ICP备2022001581号-3