Multimedia Learning

By: Richard Mayer



A lot of things are scientifically proven in this book – like the fact that you learn better with the combination of images and text rather than text alone. And that we all have rather limited capacities for processing information. Most importantly, if you ever are in the position to give a presentation where you want to make a lasting impression, you’ll walk away from this summary with the tools to do it.

What is Learning?

If we are going to unpack how people can learn better using multimedia principles, we need to understand how you and I learn. Learning, as defined by Mayer is “a change in knowledge attributable to experience”, and it is a uniquely personal experience. For instance, what you take out of this summary and what your colleague would take out of it would differ in some way.

What we learn is broken down into five different buckets:

  • Facts – knowledge of a place or event. For instance, Prince William and Kate Middleton got married on a dreary London day on April 29th, 2011.
  • Concepts – knowledge of categories, principles, or models such as knowing how a car engine works.
  • Procedures – knowledge of step-by-step procedures such as performing open heart surgery or putting on your pants in the morning (one leg, then the other).
  • Strategies – knowledge of general methods for orchestrating your knowledge to achieve your goals. For instance, knowing which shots to hit on a very windy day on the golf course.
  • Beliefs – thoughts about yourself or how your learning works. For instance, thinking that “I’m no good at English”.

Learning always involves a change in what you know. However, you don’t always have to acquire new knowledge, you may just rearrange existing knowledge and beliefs. The funny thing about knowledge is that there is little we can do to measure it. We can only make inferences based on your behaviour, which usually means on tests designed to measure what we recall and understand.

Recall tests how much we remembered. In the real world, this type of learning is not very useful, because the profits don’t go to those who remember the most stuff. Understanding is the golden ticket, and happens when we can create a mental model from the information that was presented.

Creating mental models and then using them to your advantage is what the best and the brightest in business do. This is the type of learning that these multimedia design principles aim to enable.

How we process information

Now that we know what learning is, we can explore how we process information. These are the assumptions that the building blocks of learning are built upon, and they might surprise you.

First, we need to understand that we process visual information and auditory information separately. The distinction between visual information and auditory information is in how we take it in.

If we start processing the information with our eyes – which we do with images, on-screen text and video – we use our visual information pathways. If we start processing the information with our ears – which we do with narration, background music and other sounds – we use our auditory information pathways.

Some scientists would include text on a page in the auditory information pathways, but the crux of the argument is that we process visual and auditory information differently.

Second, we have a limited capacity to process information. For anybody who struggles with remembering names and phone numbers, this should come as no surprise. In particular, we limited to a certain amount of information in each channel. For instance, when we are reading a book, we are only able to keep a few words at a time in our working memory.

The same goes for images. The most common tests for how much we are able to hold in our memories is the digit span test, where we are presented with randomly selected digits and are asked to repeat them back, in order. A similar test can be done with images.

The average person can remember between 5-7 chunks of information at a time. Because of this limited ability to process information, we are constantly making decisions on what to focus our attention on, and what not to.

Lastly, we actively process information in order to try and make sense of it. In essence, we are attempting to make mental models of the information so that it becomes useful to us. There are many different ways that we process incoming information.

We might try and put the information into a process so we can follow step-by-step instructions in the future (the hip bone is connected to the thigh bone).

We might try and generalise the information so that we can apply to more than the current information.

Or we might make a comparison of what we are learning to something else that already knew.

These three assumptions of how we process information – dual channel, limited capacity and active processing – play a huge role in how we develop presentations that help people learn.

How to Reduce Extraneous Processing

The first step in crafting a a multimedia presentation that is scientifically proven to help people learn better is to reduce extraneous processing – or simply put, getting rid of what isn’t necessary. There are five principles involved in this.

  • Coherence In order to help people learn, we need to get rid of unneeded text, graphics, images and sounds. There is evidence that people will learn more from a summary than they will from a full text – and not only do they remember more, they better understand the topics.
  • Any elaboration should come after the learner has developed a mental model of the material. There is some arguments for the addition of interesting information, because it helps people enjoy learning more and thus are more likely to want to learn again. However, given an equal desire to learn, these elements do not add to the learning experience.
  • Signalling 
People learn better when cues that highlight the organisation of the material are added. Things like chapter outlines before the topic is presented, headings to separate different sections of material and vocal emphasis on important words are helpful here.
  • Redundancy 
 People learn better from graphics and narration than they do from graphics, narration and printed text. Essentially, if you are going to do a presentation with a voice over and slides, you wouldn’t have the text you are speaking on the slides as well. This overloads the visual channel with images and text that it needs to read, and confusion ensues. So, the next time you see somebody presenting a powerpoint and reading directly from the slide, tell them about the redundancy principle.
  • Spatial Contiguity People learn better when text and images are positioned close together rather than far apart
  • Temporal Contiguity People learn better when text and images are presented at the same time, rather than in succession.

How to Manage Essential Processing

Now that we’ve dealt with getting rid of the unnecessary, we can focus on how to maximise how people learn the necessary information. Essential processing is taking what you are learning and representing it in your working memory, where you try and make sense of what you are learning based on what you already know.

There are three principles that will help us create effective presentations here.

A multimedia presentation that can be broken down into user paced segments, should be. For instance, in a segment that has 5 parts, it should be broken down into five sections that the user can control before the presentation moves on to the next topic. With video, this is simple as the user can simply pause the video and re-watch anything that they don’t completely understand.

However, other presentations such as a live webinar or seminar don’t offer this flexibility. So, a good practice would be to make recordings of these events available afterwards. There are limits to where this principle actually helps people learn better, and they include when the material is complex, or the learner has very little prior knowledge.

People learn better from a multimedia message when they know the names and characteristics of the main concepts. For instance, when you need to take an algebra class before you take an advanced statistics class. That’s because the statistics class won’t slow down to teach you all of the algebra you need to know beforehand.

In these situations, you are building a model on top of another model. Not understanding the algebra is going to lead you to brain overload and you won’t learn anything. Again, however, in a multimedia presentation where somebody has the ability to pause and then look up the gaps in their knowledge (in Google, for instance), this becomes less of an issue.

People learn more deeply from images and narration than they do from images and text. This creates three different levels of learning. Text alone, then text with images, and then narration with images, with the latter being by far the most effective.

How to Foster Generative Processing

Generative processing is the act of taking the information you are processing and making sense of it. You take the information and make mental models, and you then see how that model relates to what you already know.

This is the last and essential step to truly understanding new material, and one of the biggest challenges to it is generative processing underutilisation. If you are bored, get lost in the material or don’t like the presentation in general, you will typically tune out and not attempt to use your generative processing powers. And this, my friends, is a bad thing.

There are 2 principles that will help us foster this important last step.

People learn better from words and images than from words alone. Thinking back to the fact that we have two ways to process information (auditory and visually), words and pictures together allow us to create two working models of the material simultaneously and build connections between the two. The verbal and visual information will never be equivalent.

For instance, no matter how much detail I go into, if I’m explaining a beautiful sunset to you verbally, both you and I are going to form different pictures in our minds. And these, in turn, would be different from the actual image I’m describing.

So pictures might not be worth a thousand words, but they are certainly different than the words themselves. Keep in mind that as human beings, we haven’t always had language to process, but we’ve always had imagery to process.

People learn better from multimedia when words are in a conversational style rather than a formal style. If you want to do this in your own presentations, simply use “I” and “you” in the presentation frequently, because it makes it seem like you are talking directly to the audience. W

Why does this work on us so well? We have a social response embedded in us that makes us pay attention to when somebody is talking directly at us. Have you ever turned around when somebody said your name in a crowded place, but was clearly talking to somebody else?

That’s the social response in action. Not only do we pay attention more, but we also make a concerted effort to make sense of what is being said to us. Overall, you just learn a lot better.

NOTE: a computerised voice does not add this intended effect, so don’t use one in place of a human voice.


So there you have it – the rules for creating a multimedia presentation that truly allows you to impart information in a way that is remembered, and also understood. How can you use this information?

Anything to do with training your employees can be greatly enhanced with this method. If you are a presenter and give presentations, this could be extremely helpful. This would even be beneficial in complex sales presentations where your prospect needs to understand your product before they can buy it.

Just make sure to use your newfound powers for good, and not evil.