Skip to Main Content

Data Literacy

Data literacy is the ability to understand, use, and communicate data

Common charts

This page provides brief descriptions and samples of commonly used chart types. The example images were taking from module 4 of "See Numbers in Data", a course created by John MacInnes on the Sage Campus platform. The Leatherby Libraries provides access to Sage Campus for chapman students and employees.

Pie charts

Pie Charts

Pie charts use the arc length of a segment to represent a quantity. This poses some specific challenges: the human eye is not as good at estimating the length of curved lines as the length of a straight line or the area of a rectangle. Pie charts are best when comparing a small number of categories with significant differences between them - it is much harder to determine if 35% and 40% are equal in a pie chart compared to in a bar chart.

Variations of the pie chart include donut charts, exploded pie charts, and complex polar area and ring charts. Pie charts can be controversial and are often misused, but are familiar to many and have their place in the ecosystem of data visualization.

Can you tell which of "Holidays" and "Clothing" is larger?

Bar charts

Bar Charts

In a bar chart, the height or length of a bar is proportional to a value of the data. Bars can be stacked side by side to show relative proportionality, or stacked on top of each other to show how a larger category is divided into smaller subcategories. Clustered bar charts can show the results for multiple variables. Bar charts can also be used to show the values of a variable that each case in a data set takes, rather than the distribution of cases across categories of a variable.

 

Histograms

Histograms

A cousin of bar charts, histograms visualize the distribution of continuous data values, like income, test scores, or physical measurements. The x-axis shows the range of values and the y-axis shows the frequency or proportion of cases within each value range. To create a histogram, the continuous data is divided into bins or groups.  

Box plots

Box Plots

Box plots serve as a valuable tool for presenting data distribution visually. Let's break down the elements of a box plot. First, the box itself captures the middle 50% of the data, which corresponds to the range between the 25th and 75th percentiles. A line within the box indicates the median value. The whiskers, extending from the box, reach out to the minimum and maximum data values, excluding any outliers. Speaking of outliers, these are data points that deviate significantly from the rest of the distribution. In some cases, they are denoted by dots positioned beyond the whiskers. 

In this example box plot, the box shows that for most countries in the data set, between 50-70% of the population lives in rural areas. The whiskers indicate the full range goes from 13% rural population in Gabon up to 88% in Burundi. Gabon and Burundi are identified as extremes with particularly low and high rural population percentages compared to other countries. Overall, box plots are a compact way to understand the distribution of data, quickly identifying the median, spread, range and any outliers. 

Scatterplots

Scatterplots and Bubbleplots

A scatterplot represents two series of data for the same set of cases, with two continuous variables plotted against the two axes, with one variable represented by the x-axis and the other represented by the y-axis. Each dot, or coordinate, represents a single case. So in this scatterplot, we have the average life expectancy on the y-axis and the average income on the x-axis, with each dot representing a country in the world. 

Additional variables can be shown by manipulating the dots or coordinates, for example by adding color or changing the size. Looking at this new version of the scatterplot, we see that color is being used to show which region each country is from, and the size of the bubble corresponds to the population size of that country. 

Line graphs

Line Graphs

Line graphs are exactly the same as scatterplots, except that the coordinates are joined by a line. They work only if there is not too much variation from one time period to the next. They show steady changes clearly, but if there is too much change, the line becomes a confused zigzag that is difficult to interpret. Line graphs are also good for displaying trends over time. 

  

This chart shows a line chart reporting the amount of Carbon dioxide emissions from the consumption and flaring of fossil fuels in three different regions during 1980-2006. We can see that over time the Europe and US regions remained steady, while the Asia and Oceania regions saw a steep increase over time.