Scheduling Reinforcement

If the above player doesn't work, try this direct link.

After you watch the video and know the material, click HERE for the quiz.

Have you ever wondered how our behavior is conditioned? How does the timing of punishments and rewards affect our learning? In this lesson, we'll take a look at how reward scheduling can influence how fast we learn a behavior and how strongly it's reinforced.

Operant conditioning is usually based on the idea that we reward or punish specific behaviors. In real-world applications, we can't reward or punish behavior every time. Even if we could, research suggests that rewarding every instance of a positive behavior can actually be less effective than rewarding more selectively. So for training purposes, we have four different ways of spacing out the rewards and punishments:

fixed ratio
variable ratio
fixed interval and
variable interval

FIXED	VARIABLE
Ratio (after 3 times)	Ratio (after 2, 3 or 4 times)
Interval (every 5 minutes)	Interval (every 5, 10 or 15 minutes)

Ratio refers to how many successes you need before you get a reward, or how many failures before you get a punishment.

The fixed ratio says that we do it after x amount of successes or failures. Let's say three. Every third time you do the behavior, we give you a reward or punishment.

With the variable ratio, we change it up. Sometimes we'll reward you twice in a row; sometimes you'll have to wait through a bunch of successes.

Interval refers to the time between rewards. If that amount of time is predetermined and the same every time, it's a fixed interval. If, however, it's changing and unpredictable, it's a variable interval.

These four methods are used for different types of operant conditioning. All of them lead to longer-term results than a continuous reinforcement schedule where you're rewarded every time. Because, with that, you get used to being rewarded every time, and as soon as the rewards stop coming, the conditioning quickly goes to extinction.

Now let's look at some examples of reward schedules.

A fixed ratio might be shown by training a seal to do tricks, where they have to do 3 tricks in a row to get a reward. So the seal balances a ball on its nose, jumps through a hoop and claps its flippers, all for one fish at the end.

Now, a variable ratio is best exemplified by the slot machine. With the slot machine, we never know when we're going to win, but we know we won't win if we stop pulling that handle. That's what keeps us playing in the hopes of hitting the jackpot.

Fixed interval reinforcement is like your paycheck because you go to work every day, and on a schedule, you're rewarded with a sum of money; whereas a variable interval is like random bonuses that your boss gives out every so often.

These are the main ways of reinforcing behavior. The fastest way to condition behavior is by getting rewarded every time. But since the reward is expected, that behavior is unlikely to continue if there's no more reward. The different types of fixed and variable schedules take longer to condition, but the effects are longer-lasting. This is especially true for variable ratios because they keep us guessing if the next time we'll win the jackpot.