For my chart, I’m using an Olympic Historic Dataset from Olympedia.org which Joseph Cheng shared in Kaggle with a public space license.
It contains event to Athlete diploma Olympic Video video games Outcomes from Athens 1896 to Beijing 2022. After an EDA (Exploratory Data Analysis) I reworked it proper right into a dataset that particulars the number of female athletes in each sport/event per 12 months. My bubble chart thought is to level out which sports activities actions have a 50/50 female to male ratio athletes and the way in which it has developed all through time.
My plotting info consists of two completely completely different datasets, one for yearly: 2020 and 1996. For each dataset I’ve computed the entire sum of athletes that participated to each event (athlete_sum) and the way in which so much that sum represents as compared with the number of entire athletes (male + female) (distinction). See a screenshot of the information beneath:
That’s my technique to visualise it:
- Measurement proportion. Using radius of bubbles to verify amount athletes per sport. Higher bubbles will symbolize extraordinarily aggressive events, akin to Athletics
- Multi variable interpretation. Making use of colours to represent female illustration. Delicate inexperienced bubbles will symbolize events with a 50/50 break up, akin to Hockey.
Proper right here is my begin line (using the code and technique from above):
Some simple fixes: rising decide dimension and altering labels to empty if the size isn’t over 250 to steer clear of having phrases outside bubbles.
fig, ax = plt.subplots(figsize=(12,8),subplot_kw=dict(facet="equal"))#Labels edited straight in dataset
Successfully, now on the very least it’s readable. Nevertheless, why is Athletics pink and Boxing blue? Let’s add a legend for instance the connection between colours and female illustration.
Because of it’s not your frequent barplot chart, plt.legend() doesn’t do the trick proper right here.
Using matplotlib Annotation Bbox we are going to create rectangles (or circles) to level out meaning behind each coloration. We are going to moreover do the an identical issue to level out a bubble scale.
import matplotlib.pyplot as plt
from matplotlib.offsetbox import (AnnotationBbox, DrawingArea,
TextArea,HPacker)
from matplotlib.patches import Circle,Rectangle# That's an occasion for one a part of the legend
# Define the place the annotation (legend) is perhaps
xy = [50, 128]
# Create your colored rectangle or circle
da = DrawingArea(20, 20, 0, 0)
p = Rectangle((10 ,10),10,10,shade="#fc8d62ff")
da.add_artist(p)
# Add textual content material
textual content material = TextArea("20%", textprops=dict(shade="#fc8d62ff", dimension=14,fontweight="daring"))
# Combine rectangle and textual content material
vbox = HPacker(children=[da, text], align="prime", pad=0, sep=3)
# Annotate every in a discipline (change alpha in the event you want to see the sector)
ab = AnnotationBbox(vbox, xy,
xybox=(1.005, xy[1]),
xycoords="info",
boxcoords=("axes fraction", "info"),
box_alignment=(0.2, 0.5),
bboxprops=dict(alpha=0)
)
#Add to your bubble chart
ax.add_artist(ab)
I’ve moreover added a subtitle and a textual content material description beneath the chart just by using plt.textual content material()
Simple and client nice interpretations of the graph:
- Majority of bubbles are mild inexperienced → inexperienced means 50% females → majority of Olympic competitions have a good 50/50 female to male break up (yay🙌)
- Only one sport (Baseball), in darkish inexperienced coloration, has no female participation.
- 3 sports activities actions have solely female participation nevertheless the number of athletes is fairly low.
- An important sports activities actions by the use of athlete amount (Swimming, Athletics and Gymnastics) are very close to having a 50/50 break up