Kirkpatrick Level 3 - Free Evaluation Examples

2022.09.22 Jonathan Deller

The Kirkpatrick Model is one of the world’s most popular and effective training evaluation systems. It was invented by Don Kirkpatrick and introduced via a series of journal articles in 1959. It grew in popularity following the publication of Kirkpatrick's 1994 book, called Evaluating Training Programs. As you may have heard, the Kirkpatrick model helps organizations of all sizes to evaluate the effectiveness of their training courses and programs.

The Kirkpatrick approach to training evaluation is divided into four levels:

Level 1: Reaction
Level 2: Learning
Level 3: Behavior
Level 4: Results

In this post, we’ll be looking at examples of evaluation that you can use when conducting Level 3 - Behavior assessments using the Kirkpatrick model.

Why use the Kirkpatrick model?

The Kirkpatrick system is one of the most widely used training evaluation models because it helps companies and organizations of all sizes to quickly find out whether a particular training has met its goals.

The Kirkpatrick Model shows you, at a glance:

How the trainees responded to the training

Whether the training produced learning
If the learning translated into workplace changes
To what extent the stakeholder’s expectations were met

What are the four levels of the Kirkpatrick Model?

It’s worth briefly recapping the four levels to give an overview of the model and see how the Level 3 evaluation fits into the overall picture.

Level 1: Reaction

The Reaction level tells you what the trainees (the course participants) thought of the training. How did they feel about it? What were their overall impressions?

Level 2: Learning

The Learning evaluations show you what, if any, learning took place. Did the participants learn something from the training? If so, what?

Level 3: Behavior

The Behavior level aims to find out whether the training produced on-the-job changes. In other words, did the participants use the knowledge and skills from the training when they went back to work?

Level 4: Results

The final level, Results, looks at whether the stakeholder’s expectations were met. The stakeholders are anyone overseeing the organization, such as management. What were their goals for the training? Were these goals met?

Kirkpatrick is one of the most popular training evaluation models because it simplifies the process of analyzing training.

If training isn’t producing results, the Kirkpatrick model helps you see where things are going wrong.

If the training is failing to meet stakeholder’s expectations, you can quickly see what needs to be improved to remedy the situation.

What are the main Kirkpatrick Level 3 evaluation strategies?

Now that we’ve seen where level three evaluations fit into the overall picture, it’s time to learn more about what the actual strategies are. A Level 3 evaluation strategy is an approach that helps you discover whether the training produced changes in the workplace.

In simple terms:

Have the trainees started using the knowledge, understanding or skills that they gained during training to help with their job?

Some examples of level three evaluation strategies include:

Workplace observations
Peer observations
Self-reflections
Pre- and post-training assessments and assessments on the job
Pre- and post-training self-assessments and self-assessments on the job

When should Kirkpatrick Level 3 evaluation strategies be used?

There is no hard and fast rule about when to use these strategies. In general, Level 3 evaluations should take place a while after the training has finished. This issue was specifically addressed by Don Kirkpatrick’s son James and James’ wife Wendy when they co-authored an updated version of the Kirkpatrick model, The Kirkpatrick Four Levels:  A Fresh Look after 50 Years 1959-2009. They wrote that there is no set time period after which the training should be evaluated. It depends on many factors including:

The design of the evaluation
The measurement indicators
The data collection sources and methods

The latest revised “new” Kirkpatrick Evaluation Model notes that behavioral changes take time to materialize. The speed at which the changes take effect in the workplace will depend on the nature of the training and the demands it places on trainees. You should try to get your indicators as soon as possible after the training but at the same time not too early since behavior takes time to change. So, unless you decide to change it, Kodo Survey’s default is three months after the training took place.

Kirkpatrick Level 3 Workplace evaluation

Which factors determine whether a Level 3 evaluation will be successful?

The success of a Level 3 evaluation strategy depends largely on the design of the evaluation.

For example:

Which evaluation approach is being used?
How was the assessment designed?
Which data collection sources and methods are being focused on?

Which measurement indicators are being used?
What type of data analysis and reporting is used?

The success of a level 3 evaluation also depends on the drivers of the strategy. Drivers are key factors that will determine the type of evaluation that is possible.

For instance:

Interest
What is the specific area of interest?
Resources
Which resources do you have available?
Cost
What is your budget and how much would the evaluation cost?

Business needs
What are the needs of your business?
Time
How much time can you devote to the evaluation?

As you can see, there is no single best type of evaluation for any given organization. Every organization, no matter its size, is constrained by a number of key drivers such as cost, time and resources.

Look:

The best type of evaluation may be outside of your budget and therefore unattainable. You may have to settle for a different type of evaluation that is more cost-effective to implement.

Possible evaluation approaches for assessing behavior change

To help you develop your own Kirkpatrick Level 3 evaluation strategy, we’ve created a simple checklist.

It will help you consider the four different categories of evaluation:

Design
Data Collection Source
Data Collection Method

Metrics/Indicators

Here, we'll briefly explain what these categories are.

Design
This is how you conduct the evaluation. Do you assess it once after the training or in some other way?
Data Collection Source
This means where the data is coming from. It could be sourced from the trainees themselves, the management or someone else such as a trained evaluator. In other words, who is doing the evaluation?
Data Collection Method
This category means what method you’ll use to collect the data.
Metrics/Indicators
Lastly, this category looks at which metrics you’ll use.

Unless otherwise noted, the approaches in the following checklist are applicable to any field setting, from everyday workplaces, real-life events, simulations or exercises.

Category #1. Design

Possible Level 3 evaluation options:

Post-training only
Pre- and Post-training
Pre- and Post-training and on the job
Multiple Repeat Measures
Non-equivalent Comparison Group
Randomized Control Trial*

Category #2. Data Collection Source

Possible Level 3 evaluation options:

Self-report
Peer-Evaluation
Supervisor Evaluation
Evaluator or Trainer Observer

Category #3. Data Collection Method

Possible Level 3 evaluation options:

Survey
Observation
Interview
Embedded in Training *

Category #4. Metrics/Indicators

Possible Level 3 evaluation options:

KAIB (Knowledge, Attitude, Intention and behavior)
Goal-based behaviors
Competency-based behaviors
Intention goals and beliefs**

* Not possible in a real-life or workplace setting
** Intention goals are goals that the participants plan to make using the knowledge and skills they acquired during their training (Basarab, 2011).

Free Kirkpatrick Level 3 Evaluation Examples

So, what does it look like when you put everything in the above checklist into practice? Here are four different examples of how you’d fit everything together.

Example 1 is a simple, cost-effective evaluation
Example 2 is a slightly more complex evaluation
Example 3 is a complex and costly evaluation
Example 4 is a diplomatic approach that looks at both learning and behavioral development

Example #1: A post-test survey

The simplest and least expensive option is a post-test survey given to participants after a training, exercise or simulation.

As you can see, we simply go through the four categories on the above table and pick the simplest option for each one:

Design: Post-training only
Data Collection Source: Self-report

Data Collection Method: Survey
Metrics/Indicators: KAIB

To conduct this type of level three evaluation, you’d write the question for the survey and email them to the participants. You’d then collate the data and draw conclusions. As this survey is self-reported, there’s no need for additional staff such as evaluators.

Advantages: Simple, cost-effective, uncomplicated

Disadvantages: It’s self-reported so it relies on trust and the accuracy of the responses isn’t independently verified.

Some example questions:

To create a simple post-test survey, here are some example questions you may ask the participants:

Please write the name of the training you completed recently

Describe how the training related to your job responsibilities.

Have you used any aspects of the training in the workplace?

Look at the following KAIBs (Knowledge, Attitudes, Intentions and Behaviors)
(Insert KAIBs)
Please indicate your level of competence using the following scale:

1 no knowledge
2 not mastered

3 requires more knowledge or supervision
4 requires little supervision
5 mastery level

Example #2: A peer-observation

A slightly more complex example of Level 3 evaluation is a peer-observation. It has the following features:

Design: Post-training only

Data Collection Source: Peer-Evaluation
Data Collection Method: Evaluation
Metrics/Indicators: Goal-based behaviors, and/or Competency-based behaviors

To conduct this type of evaluation, you would need to create a peer-observation form. Ask the trainees to observe each other in the workplace and complete this form based on the skills and competencies they observe being put into practice.

‘Peer-observation’ can be a fairly flexible term; it could be colleagues in the same department or an inter-departmental arrangement. You would need to collate the responses and use this data to draw conclusions about the effectiveness of the training.

Advantages: Relatively simple, and if the right peers are selected for the evaluation it’s more objective than self-reported evaluations.

Disadvantages: This can be disruptive to facilitate if you are pulling employees away from their daily routines and responsibilities to conduct peer observations. Unlike a trained facilitator, colleagues may not be able to accurately identify KAIBs.

Some example questions:

To create a peer-observation form, here are some example questions you may want to include:

Write the name and title of the person you are observing and the date and time the observation took place.

According to the competencies and/or KAIBs, indicate which you observe being used in the workplace.
(Insert KAIBs and use the above Likert 5-scale)

Example #3: A behavioral assessment conducted by a trained evaluator

By far the most costly, time intensive and complex type of evaluation is one conducted by a trained evaluator or a Trainer Observer. This type of workplace assessment offers some of the more accurate results but is the least cost-effective option.

Here are the details:

Design: Pre- and/or Pre- and Post-training
Data Collection Source: Evaluator or Trainer Observer

Data Collection Method: Observation and/or interview
Metrics/Indicators: Goal-based behaviors, and/or Competency-based behaviors

In terms of cost, there are various ways to make this type of evaluation more economical. For example, you could have the evaluator give just one post-training observation. If you have both a pre- and post-evaluation and/or multiple repeat measures, the costs and complexities would rise considerably.

Likewise, both an observation and an interview would cost more than just choosing one or the other.

Advantages: Highly accurate data, precise approach, little need for oversight.

Disadvantages: Expensive to implement, may disrupt normal working schedules.

Some example questions:

If you decide to hire an evaluator, they could assist you in writing the questions. They would simply need to know the areas of competencies and/or KAIB’s that you are looking to evaluate. They can assist in identifying behaviors in the workplace and recording employee’s levels of competencies.

Here are some example questions for the evaluator to ask:

Has the trainee under observation participated in any training over the past six months that may have helped them perform their responsibilities to a higher standard?

Did the training provide any lessons related to their role and responsibilities? If so, please describe them.

Is the trainee able to perform their tasks and duties? (You can list the specific duties here).

Look at the following KAIBs (Knowledge, Attitudes, Intentions and Behaviors) and indicate the trainee’s level of competence in each. (Include the KAIBs and the above Likert 5-scale).

Example #4: Looking at both learning and behavioral development

This is the diplomatic way. The direct cost is low since it is test based and the indirect cost are also low since it’s based on self-evaluation and not the evaluation of others. Still the quality is adequate; not as low as our first example and not as high as our third.

It uses the following features:

Design: Pre- and post-training test as well as one or several tests on the job
Data Collection Source: Self-report
Data Collection Method: Survey
Metrics/Indicators: KAIB

You need to write the question for the survey and email them to the participants. You’d then collate the data and draw conclusions. As it is self-reported, you won’t take up anybody else time, but the learners to do this. Since it is survey-based you can also make it very efficient with reminders, analysis etc.

Advantages: Simple, cost-effective, uncomplicated

Disadvantages: It’s self-reported but can be verified by comparing learning retention with reported behavior which would indicate e.g. social desirability effects.

The questions:

You need to create a survey measuring KAIB before and straight after a training, and a while later, on the job. In order to drive objectivity in to the answers you don’t simply ask if the participants perceived that the training helped them perform their job responsibilities. That would give you subjective data to analyze. In order to measure if the participants have used the training you should instead work with the hypothesis that if they haven’t, they will have forgotten most of what they learned.

With that belief you simply ask e.g. single-choice or multiple-choice questions before the training and after the training to see if they have learned anything. On the job, three or so months later, you ask the same questions again to see if the participants still remember it. If not, it’s likely they never used it.

With Kodo Survey you measure the entire KAIB (knowledge, attitude, intention and behavior) using a method like this.

Conclusion

In this post, we’ve offered three evaluation examples for Level 3 of the Kirkpatrick model. Our table should help you determine the best type of evaluation for your budget, time frame, and needs.

But there’s more to learn!

Many training managers and L&D Directors report that they either lack access to data that enables them to measure learning impact or they don’t know how to analyze the data they have.

That’s why we created a white paper specifically about determining and optimizing the impact of training and development. Download it now and learn how to take action that increases the impact of your training and development portfolio while decreasing costs!

The Noob Guide to Understanding Kirkpatrick model evaluation

How to Master Kirkpatrick model of training evaluation in 6 Simple Steps

Kirkpatrick levels of evaluation: Expectation vs reality

Kirkpatrick level 4: learn How to measure your ROI training