Stacked charts offer us a parts-to-whole bird’s-eye view of data by stacking data series to show their cumulative total. We may use different types of stacked charts depending on the scenario, the properties that we want to highlight, and how we choose to achieve the right balance between accuracy and engagement for the visualization. Each type of stacked chart has its advantages and shortcomings and highlights different aspects of the scenario. In this article, we look at five different types of stacked charts, along with their common uses.
Stacked area charts are a variant of the area chart, where different data series, represented as areas, are stacked on top of each other to sum to a whole. The top curve of the graph represents the evolution of the cumulative total of all the series. Each of the series has a moving baseline provided by the cumulative total of all the series below it, while the lowest series uses the horizontal axis as a baseline. The contributions can be plotted as absolute numbers, as in the case of a standard stacked area chart, or as a percentage of the whole, as in a 100% stacked area chart.
Time series evolution
Stacked area charts are primarily used to show how a total comprised of different categories evolves over a continuous period of time. The continuous nature of an area chart intuitively conveys a continuous period (whereas something like a column chart intuitively conveys discrete points in time).
These charts provide a rough idea of how the part-to-whole relationship changes over time, rather than an accurate estimate of the numbers. This is because of two factors – one, our brains do not process area calculations well, and two, each of the series has a moving baseline, making accurate estimations and comparisons between different series difficult.
In the example below, global coffee production from 1961 to 2020 is plotted, divided by region. We see the increase in the total over time and note that South America has produced most of the world’s coffee during the entire period visualized. We also notice that coffee production in Asia has increased over time to account for a significant proportion of the total in 2020. As mentioned above, this chart allows us to have a rough and quick picture of the trends. Accurate estimates of production in each region are particularly difficult – notice how the volatility in the production in South America affects the graphs of all of the series above it. Between 1980 and 1990, for example, the spikes that we see are transmitted across all the regions, even though coffee production during this period was relatively smooth in the other regions.
Stacked column charts break up the totals shown in a standard column chart into contributing categories. The height of each column represents the total, while the height of each segment within it represents the magnitude of that series.
Column charts in general facilitate accurate comparison of the totals by using height to represent numbers. Stacked column charts allow the same function while simultaneously providing an idea about how different categories contributed to this total. The focus in these types of charts is on the individual columns, unlike an area chart, where the focus is on the trend.
Snapshot in time + time series
These charts can be used in cases when we want to show changes over time. This is facilitated by the intuitive understanding of progression provided by using the horizontal axis to represent time. However, stacked column charts can also provide interesting insights into data that is captured at a specific point in time by showing comparisons between nominal categories. This gives us a snapshot in time, rather than a progression over time.
The chart above shows time series data on the number of hours spent with digital media in the US, divided by the medium used. We can track the increasing trend over time, while also noticing that the surge in mobile usage has largely contributed to the overall growth.
Let us also look at an example illustrating a nominal comparison. The following chart shows the reported frequency of loneliness by age in England, progressing from “always” to “never”. This is a snapshot in time as this data comes from 2017 and we may compare responses across age cohorts.
Stacked bar charts are similar to stacked column charts but use horizontal bars instead of columns to represent the totals. The length of each segment once again represents the magnitude of the respective contributing series.
Since the bars representing the total are placed from top to bottom, we may order the bars in decreasing order of length to show the rankings of different categories. The different stacks aid in understanding why the rankings come to be by showcasing the contributions of the different categories.
Since the primary categorical variable is plotted on the vertical axis, these charts are better suited to nominal comparisons rather than time series visualizations.
The chart above depicts the greenhouse gas emissions produced by different food sources, divided into segments by the emissions from different parts of the supply chain. This chart illustrates how stacked bar charts can be used for rankings as well as nominal comparisons. In particular, we note that beef from beef herd ranks the highest in emissions, and that emissions from land use change and emissions from the farm are responsible for the majority of the total.
Diverging bar charts are a modification on the standard stacked bar chart. The central axis in a diverging bar chart indicates zero and the stacks diverge from this axis, to show positive values on one side, and negative on the other. It is to be noted that unlike most stacked charts, the total length of the bar no longer corresponds to the value of the total. This is because the “net” value of the total would be calculated as the difference between the positive and negative contributing values.
Positive and negative values
Diverging bar charts are used when datasets have both positive and negative values. A standard stacked bar chart, by contrast, requires all values to be positive or negative.
This chart’s ability to accommodate opposite values on different sides of the central axis makes it a great choice to visualize positive and negative sentiment. If there is a neutral category, this is usually centered around the zero axis with positive and negative opinions on either side.
The diverging bar chart above shows the revenue and the costs for a furniture retailer by month. We can see that the costs are shown as negative values, and the revenue for each month is a positive value. This is an example where the total height of the bar is meaningless – instead, the profit would be calculated as the difference between the revenue and the costs, which can be shown as a marked line in the same chart.
Waterfall charts are not strictly stacked charts but belong to the family of charts showing part-to-whole representations. They are a variation on the stacked column and diverging column charts with the individual segments unstacked and horizontally staggered, while the total is shown as a separate column. The baseline for the first segment is the x-axis, and the baseline for each subsequent segment is the top of the previous segment. The following chart shows a timeline of the fateful maiden voyage of the Titanic with respect to the number of people on board the ship. We see the stops the Titanic made where passengers embarked or disembarked, and then the effect of the collision and the subsequent rescue.
Positive and negative values + totals
The main advantage of this chart is that it can be used to show positive and negative values (unlike a standard stacked column chart) and totals (unlike a diverging column chart) simultaneously.
Breakdown of totals
Waterfall charts are designed for the eye to follow the gains and losses made by the different segments to arrive at the total. They provide an intuitive representation of how the total can be broken down by category.
The following waterfall chart shows company profit as the difference between revenue, shown as the sum of product revenue and services revenue, and costs, shown as the sum of fixed costs and variable costs.
Streamgraphs use an organic, flowing shape to represent time series data. They are similar to stacked area charts in that they use stacked areas to show categories that add up to the total. However, the areas are positive on both sides of a central axis which represents time, and the baseline of each category is given by the area below it. As with a stacked area chart, the widths of the individual layers add up to the width of the total. Critically, the order of layering is chosen to reduce the “wiggle” making the graph more legible for large data sets. Click here for a detailed discussion by Byron and Wattenberg.
Large datasets with high volatility data
Streamgraphs work well for data with high volatility because of the lack of a baseline representing zero. In a stacked area chart, for example, the peaks and troughs are exaggerated compared to a streamgraph because the baseline represents zero, and peaks and troughs in series below accumulate to distort the form of the series stacked above. This is partially corrected in a streamgraph by the lack of a zero baseline and by choosing the order of layering to minimize distortion, also making them ideal for large data sets with many contributing series.
High engagement rather than accuracy
Data visualization is a balancing act between aesthetic concerns and accuracy, and the more accurate choice is not always the better one. A streamgraph, while somewhat compromising on accuracy, provides a visual synopsis of a large dataset in an engaging manner that draws readers in to investigate further. Eye-catching graphics can be a significant advantage in the digital age with the prevalence of fast-paced reporting.
The chart above from Storytelling with Data shows music sales over the years in the US, divided by format. We notice that there is no y-axis scale to show precise numbers, but we see the rise and fall of the different formats over the years, and the novelty and colors make the graph very engaging.
- By Hamsini Sukumar
Inforiver is the fastest way to do everything in Power BI. It enables citizen developer productivity and unleashes true self-service with our intuitive and interactive no-code data app suite for Microsoft Power BI. The product is developed by Lumel Technologies Inc, who are #1 Power BI Visuals AppSource Partner serving over 2,000+ customers worldwide with their xViz, Inforiver, and ValQ offerings.