Inforiver

Upcoming webinar on 'Inforiver Charts : The fastest way to deliver stories in Power BI', Aug 29th , Monday, 10.30 AM CST.    Register Now

Treemaps 101 - A detailed introduction   

Treemaps are most intuitively understood when seen rather than described, so here’s an example of a treemap! 

treemap-visualization-of-global-biomass-distribution

As seen here, treemaps use rectangles divided into sections to represent categories that add up to a whole, with the area of each section in proportion to the value of the corresponding category.  They can be thought of as a “squarified” pie chart that can show parts-to-whole relationships and can, in addition, represent hierarchies using nesting. The chart above is a parts-to-whole representation of the distribution of global biomass among the different types of organisms.

The area of each rectangle corresponds to the proportion of global biomass that it represents. For example, we see that plants far outweigh every other category, and in fact weigh more than all the other taxa put together! In this article, we will look at the hierarchical data structure needed for treemaps, discuss their uses and their advantages along with some drawbacks that these charts present. We will then examine some best practices that will help you make effective treemaps, and conclude with a brief discussion on related charts. 

historical-treemap-occupations

Treemaps in their present form were developed in the 1990s by Ben Shneiderman from the University of Maryland as an alternative to tree diagrams to represent hierarchical information, specifically to visualize file directories on a computer. The image above shows such a chart, with the nested rectangles representing the hierarchy of folders and the files therein.  Older “treemap-like” representations do exist, like this chart below from the 1870 Statistical Atlas of the United States, showing the occupational patterns of various states. 

hierarchical-data-structure-treemap

Hierarchical data structures and trees 

A hierarchy is a nested list with multiple levels. An example of a hierarchical list is shown here, depicting the hierarchy at a company. 

  • Manager X 
    • Team 1 Lead 
      • Employee 1 
      • Employee 2 
      • Employee 3 
  • Team 2 lead
    • Employee 4 
    • Employee 5 
    • Employee 6 
    • Employee 7

This type of hierarchy is usually represented as a tree diagram (which is different from a treemap!) as follows: 

treemap-limited-entities-plotted

We may represent this hierarchy using nested rectangles in the following steps. The top-level hierarchy (Manager X) is represented by the whole rectangular area, as in the image below. 

company-hierarchy-tree-diagram

The second level of hierarchy consists of the two team leads that are under the manager. The rectangles representing these can be nested inside the area of the rectangle representing Manager X. 

nested-rectangles-company-hierarchy

Now, the last level of hierarchy showing the employees in each team can be nested once again inside the rectangles representing the leads. 

treemap-hierarchy-level2-team-leads

Thus, treemaps use recursive nesting to represent hierarchies. In reality, each of these rectangles would be sized according to the value that it represents.  The data structure for a treemap should have the form shown below, with the values for the higher levels usually calculated as the sum of the levels below. 

sunburst-chart-hard-drive-folder-hierarchy

What makes treemaps a good choice? 

To summarize, treemaps are ideal for datasets with the following properties: 

  1. Part-to-whole relationships 

As discussed earlier, treemaps can be seen as a rectangular version of the pie chart, and like a pie chart, they can be used to see how categories add up to a total. In fact, it may be easier for us to estimate rectangular areas rather than slices or angles, as in a pie chart. The following chart, for example, shows how different counties in Texas voted during the 2016 presidential election. Each county is sized by population, and we see a parts-to-whole view of how the populace voted. 

treemap-part-to-whole-relationship
  1. Your data contains hierarchical relationships 

Treemaps were originally made for hierarchies, as in the chart below that divides global carbon emissions first into continents, each of which are colored differently, and then into countries within each continent, showing how each entity has contributed to emissions. This represents two levels of hierarchy – the top level with continents, and the second level with continents divided into countries according to their contribution. 

treemap-global-carbon-emissions-hierarchy
  1. Compactness for large trees 

One of the biggest advantages of a treemap is its ability to represent a very large hierarchy in a very compact space thanks to the nested rectangles. For example, we may want to know how the space on a hard drive is divided into different folders. This can be represented using a tree as shown, but we see that the tree is so large that we can barely read the labels. 

treemap-estimation-&-comparing-challenge

Instead, we may represent the same data using a treemap as follows. Notice how the nesting of different hierarchies ensures that the map remains very compact.

treemap-c-drive
  1. Hierarchies + numerical values 

In the above example, we also see that while the tree only showed us the hierarchy, the treemap can actually represent the size of each folder using the area of the corresponding rectangle. For the tree, we would instead need to print the values of each category, which is much more cumbersome to read and compare. Few other visualization techniques allow us to visualize both hierarchies as well as quantities. 

  1. The overall pattern takes precedence over accurate reading of values 

Treemaps work well when the focus is on the overall patterns, rather than an accurate reading of each value. The chart below shows the causes of child mortality, and how the volume of deaths from each cause has changed between 1990 and 2017. The darker rectangles represent the data from 1990, while the lighter rectangles represent 2017. The message of this chart is clear – in almost every category, childhood mortality has decreased since 1990.  

treemap-child-mortality-causes

Disadvantages of treemaps 

Treemaps also present significant difficulties: 

  1. Difficulty in estimating and comparing areas 

The primary disadvantage of using a treemap is our limited ability to estimate areas accurately because of how our visual perception works. This means that any chart that uses area as a way of encoding values is read much less accurately than one that, for instance, uses height measured from a common baseline.  Try to answer the following questions about the chart below. Which area is larger – that of C or E? Two of these rectangles are of the same size as each other, but smaller than all the other values. Which ones are they and how much smaller are they? 

reading-areas

You will likely find it difficult to answer these questions. In fact, C and E are of the same size, as are all the rectangles except A and B, which are smaller than the rest by 10%!  

  1. Unfamiliarity in reading hierarchies 

The hierarchies shown in a treemap may also be unfamiliar to many readers and may be harder to read than in a tree diagram, where we see each level of hierarchy clearly, with parent and child nodes clearly indicated. 

Best practices for treemaps 

  1. Aspect ratio of rectangles 

There are many algorithms that generate treemaps from a given dataset, but “squarifying” algorithms that try to make the rectangles as close to a square as possible are popular. This is because labeling and reading long and thin rectangular areas is much more difficult than relatively square areas. In general, it is better to stick to an aspect ratio that is as close to 1:1 as possible. 

  1. Sorting by size 

Sorting the rectangles by size allows the reader to parse through information quickly and find the major contributors to a metric. The chart below, for example, sorts the counties in Texas by population and places the largest ones on the top left, petering down to smaller counties at the bottom right. This gives us a birds-eye view of how Texas voted during the 2016 presidential election, with larger counties leaning more towards Clinton than smaller ones. 

treemap-part-to-whole-relationship
  1. Color for categorical encoding 

Color can be an important tool for secondary encoding by category (while the primary encoding of value is given by the area of the rectangles). This chart, for example, classifies the causes of child mortality, with causes grouped into broad categories, each of which is shaded in a different color.   

treemap-child-mortality-causes

We may use the same example of the voting preferences in Texas. A similar technique is used to show the political inclinations of each county here, using a blue and red color scheme. Here, the primary encoding by size is done according to the population, and the secondary categorical encoding is by voting percentages. 

treemap-part-to-whole-relationship
  1. Limit the number of entities plotted 

As discussed earlier, treemaps can handle many more entities than a pie chart in a parts-to-whole representation. Despite this, too many small categories can make reading the chart very difficult in addition to making labeling of each category hard, as in the chart below. One solution to the latter problem is to use tooltips for the smaller areas, but smaller values may still remain obscure. Limit the number of categories plotted as well as the number of levels of hierarchy to keep your chart legible and clean. 

treemap-part-to-whole-relationship

Alternatives to treemaps

  1. Marimekko chart 

A Marimekko chart can be thought of as a variation on the bar chart, where the widths of the bars have been scaled according to a specific data value. The following chart acts as a variation of the 100% stacked bar chart, with the x-axis showing different countries, where the widths of the bars are scaled according to their populations. The y-axis is a percentage bar showing the proportion of each country that lives on less than or more than $30 per day. We thus get a parts-to-whole view of the economic circumstances within the population of each country as well as an understanding of how the global population is divided into these categories. 

treemap-compactness-example
  1. Sunburst chart 

Sunburst charts use concentric annular areas to represent hierarchies, with the highest level towards the center and the lower levels expanding outwards. These rings are divided into sections that are proportional in area to the value of the corresponding category. The chart below shows the same information as discussed earlier on how a hard drive is divided into folders.

treemap-data-structure

By - Hamsini Sukumar

Other resources 

Stacked column charts: The essential guide 


Get Inforiver brochure

Maximize your business potential with Inforiver's paginated reporting, data entry, planning & budgeting capabilities
Download now
Inforiver

Inforiver helps enterprises consolidate planning, reporting & analytics on a single platform (Power BI). The no-code, self-service award-winning platform has been recognized as the industry’s best and is adopted by many Fortune 100 firms.

Inforiver is a product of Lumel, the #1 Power BI AppSource Partner. The firm serves over 3,000 customers worldwide through its portfolio of products offered under the brands Inforiver, EDITable, ValQ, and xViz.

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram