Operant Conditioning
It has long been known that behavior is affected by its
consequences. We reward and punish people, for example, so that
they will behave in different ways. A more specific effect of a
consequence was first studied experimentally by Edward L.
Thorndike in a well-known experiment. A cat enclosed in a box
struggled to escape and eventually moved the latch which opened
the door. When repeatedly enclosed in a box, the cat gradually
ceased to do those things which had proved ineffective ("errors")
and eventually made the successful response very quickly.
In operant conditioning, behavior is also affected by its
consequences, but the process is not trial-and-error learning. It
can best be explained with an example. A hungry rat is placed in a
semi-soundproof box. For several days bits of food are
occasionally delivered into a tray by an automatic dispenser. The
rat soon goes to the tray immediately upon hearing the sound of
the dispenser. A small horizontal section of a lever protruding
from the wall has been resting in its lowest position, but it is
now raised slightly so that when the rat touches it, it moves
downward. In doing so it closes an electric circuit and operates
the food dispenser. Immediately after eating the delivered food
the rat begins to press the lever fairly rapidly. The behavior has
been strengthened or reinforced by a single consequence. The rat
was not "trying" to do anything when it first touched the lever
and it did not learn from "errors."
To a hungry rat, food is a natural reinforcer, but the reinforcer
in this example is the sound of the food dispenser, which was
conditioned as a reinforcer when it was repeatedly followed by the
delivery of food before the lever was pressed. In fact, the sound
of that one operation of the dispenser would have had an
observable effect even though no food was delivered on that
occasion, but when food no longer follows pressing the lever, the
rat eventually stops pressing. The behavior is said to have been
extinguished.
An operant can come under the control of a stimulus. If pressing
the lever is reinforced when a light is on but not when it is off,
responses continue to be made in the light but seldom, if at all,
in the dark. The rat has formed a discrimination between light and
dark. When one turns on the light, a response occurs, but that is
not a reflex response.
The lever can be pressed with different amounts of force, and if
only strong responses are reinforced, the rat presses more and
more forcefully. If only weak responses are reinforced, it
eventually responds only very weakly. The process is called
differentiation.
A response must first occur for other reasons before it is
reinforced and becomes an operant. It may seem as if a very
complex response would never occur to be reinforced, but complex
responses can be shaped by reinforcing their component parts
separately and putting them together in the final form of the
operant.
Operant reinforcement not only shapes the topography of behavior,
it maintains it in strength long after an operant has been formed.
Schedules of reinforcement are important in maintaining behavior.
If a response has been reinforced for some time only once every
five minutes, for example, the rat soon stops responding
immediately after reinforcement but responds more and more rapidly
as the time for the next reinforcement approaches. (That is called
a fixed-interval schedule of reinforcement.) If a response has
been reinforced n the average every five minutes but
unpredictably, the rat responds at a steady rate. (That is a
variable-interval schedule of reinforcement.) If the average
interval is short, the rate is high; if it is long, the rate is
low.
If a response is reinforced when a given number of responses has
been emited, the rat responds more and more rapidly as the
required number is approached. (That is a fixed-ratio schedule of
reinforcement.) The number can be increased by easy stages up to a
very high value; the rat will continue to respond even though a
response is only very rarely reinforced. "Piece-rate pay" in
industry is an example of a fixed-ratio schedule, and employers
are sometimes tempted to "stretch" it by increasing the amount of
work required for each unit of payment. When reinforcement occurs
after an average number of responses but unpredictably, the
schedule is called variable-ratio. It is familiar in gambling
devices and systems which arrange occasional but unpredictable
payoffs. The required number of responses can easily be stretched,
and in a gambling enterprise such as a casino the average ratio
must be such that the gambler loses in the long run if the casino
is to make a profit.
Reinforcers may be positive or negative. A positive reinforcer
reinforces when it is presented; a negative reinforcer reinforces
when it is withdrawn. Negative reinforcement is not punishment.
Reinforcers always strengthen behavior; that is what "reinforced"
means. Punishment is used to suppress behavior. It consists of
removing a positive reinforcer or presenting a negative one. It
often seems to operate by conditioning negative reinforcers. The
punished person henceforth acts in ways which reduce the threat of
punishment and which are incompatible with, and hence take the
place of, the behavior punished.
This human species is distinguished by the fact that its vocal
responses can be easily conditioned as operants. There are many
kinds of verbal operants because the behavior must be reinforced
only through the mediation of other people, and they do many
different things. The reinforcing practices of a given culture
compose what is called a language. The practices are responsible
for most of the extraordinary achievements of the human species.
Other species acquire behavior from each other through imitation
and modelling (they show each other what to do), but they cannot
tell each other what to do. We acquire most of our behavior with
that kind of help. We take advice, heed warnings, observe rules,
and obey laws, and our behavior then comes under the control of
consequences which would otherwise not be effective. Most of our
behavior is too complex to have occurred for the first time
without such verbal help. By taking advice and following rules we
acquire a much more extensive repertoire than would be possible
through a solitary contact with the environment.
Responding because behavior has had reinforcing consequences is
very different from responding by taking advice, following rules,
or obeying laws. We do not take advice because of the particular
consequence that will follow; we take it only when taking other
advice from similar sources has already had reinforcing
consequences. In general, we are much more strongly inclined to do
things if they have had immediate reinforcing consequences than if
we have been merely advised to do them.
The innate behavior studied by ethologists is shaped and
maintained by its contribution to the survival of the individual
and species. Operant behavior is shaped and maintained by its
consequences for the individual. Both processes have controversial
features. Neither one seems to have any place for a prior plan or
purposes. In both, selection replaces creation.
Personal freedom also seems threatened. It is only the feeling of
freedom, however, which is affected. Those who respond because
their behavior has had positively reinforcing consequences usually
feel free. They seem to be doing what they want to do. Those who
respond because the reinforcement has been negative and who are
therefore avoiding or escaping from punishment are doing what they
have to do and do not feel free. These distinctions do not involve
the fact of freedom.
The experimental analysis of operant behavior has led to a
technology often called behavior modification. It usually consists
of changing the consequences of behavior, removing consequences
which have caused trouble, or arranging new consequences for
behavior which has lacked strength. Historically, people have been
controlled primarily through negative reinforcement that is, they
have been punished when they have not done what is reinforcing to
those who could punish them. Positive reinforcement has been less
often used, partly because its effect is slightly deferred, but it
can be as effective as negative reinforcement and has many fewer
unwanted byproducts. For example, students who are punished when
they do not study may study, but they may also stay away from
school (truancy), vandalize school property, attack teachers, or
stubbornly do nothing. Redesigning school systems so that what
students do is more often positively reinforced can make a great
difference.
|