Discriminative stimulus training and selective stimulus control in rats
Discriminative stimulus training and selective stimulus control in rats.
Name: Mark Miller
ID number: 3330061
Course: PSYC 111
Lab Group: 103
Tutor: Jacqueline Harris
Date Due: 9 May 2002
Abstract
The aim of the experiment was to show that rats demonstrated stimulus discrimination and selective stimulus control during operant conditioning. The first hypothesis was the subject would learn to discriminate between the VR16 conditions that signal reinforcement and the EXT conditions. It was also hypothesised that the stimulus used to discriminate between VR16 and EXT would either be the light or the tone, not a combination.
The participant in this experiment was a 16-month-old, female, Spague - Dawley albino rat that was randomly selected from a group of 20. The apparatus used was an operant chamber, which delivered two stimuli (a light and a tone) to the subject, and a reinforcer of diluted condensed milk. During the first week of experimentation the subject underwent discrimination training, this was followed by a series of probe trials in the second week.
The results from the first week showed the subject learned that no reinforcement was given during EXT, because the rate of responding decreased. The second week's results showed that high tone was the stimulus used to discriminate between the stimuli. These results supported both the hypotheses, and it was concluded that rats do demonstrate stimulus discrimination and selective stimulus control.
The major theorists for the development of operant conditioning were Edward Thorndike (1910), John Watson (1914), and Burrhus Skinner (1938) (Huitt and Hummel, 1997). They proposed that learning is the result of the application of consequences following overt behaviour; that is, subjects begin to connect certain responses with certain stimuli. This led Thorndike to conclude that the probability of a specific response reoccurring is changed according to the consequences following the response, and he labelled this learning conditioning (Carlson and Buskist 1997, Huitt and Hummel, 1997).
In 1910, Thorndike used the notion of 'consequences' to teach cats and dogs to manipulate a latch in a "puzzle-box", to activate a door and escape (Huitt and Hummel, 1997). The consequence was either punishment or reward (Carlson and Buskist, 1997). Thorndike measured the time it took the animal to escape over various trials, and over time he noted that the animals latency to escape decreased consistently until it would activate the lever immediately after being placed in the box (Huitt and Hummel, 1997). The reward of being freed from the box somehow strengthened the association between the stimulus of being in the box and the appropriate action (Huitt and Hummel, 1997). Thorndike concluded that the reward strengthened the stimulus-response associations (Carlson and Buskist, 1997). He then went on to formulate his "law of effect", which can be summarised by saying that an animal is more likely to repeat a response if the result is favourable, and less likely to repeat the action if the consequences were not favourable (Carlson and Buskist, 1997). There were two possible consequences of a behaviour, reinforcement or punishment. These could be divided into two sub-categories, positive (sometimes called pleasant) and negative (sometimes called aversive). These could be added to or taken away from the environment in order to change the probability of a given response occurring again (Carlson and Buskist, 1997. Werzburg University). Punishment decreases the repetition of behaviour and reinforcement usually increases the likelihood of response being repeated.
A stimulus that acts as an indicator to the subject, suggesting that a reinforcer is available is said to be a discriminative stimulus (Gleitman, 1995). A discriminative stimulus affects the subjects' behaviour considerably (Gleitman, 1995), as it influences the likelihood of a response occurring (Carlson and Buskist, 1997). Reynolds (1961) conducted experiments where two pigeons learned to tap a red key with a white triangle. To determine which was the discriminative stimulus, he tested the two birds with either a plain red key or a plain key with just a white triangle. Reynolds (1961) found that the first bird used the red key as the discriminative stimulus and the second bird used the white triangle to discriminate between stimuli. This experiment is also an example of selective stimulus control, where each pigeon selected which stimulus it believed was responsible for producing the reinforcer.
To effectively study how a subject behaves in a given environment and to certain stimuli, it was necessary to establish a schedule of reinforcement, which is a set of guidelines saying how often the subject is reinforced (Gleitman, 1995). Stimuli could be presented to the environment according to a schedule of which there were two categories: continuous and intermittent (Gleitman, 1995), or not at all using extinction. Continuous reinforcement simply means that the behaviour is followed by a consequence each time it occurs. Intermittent schedules were based either on the passage of time (interval schedules) or the number ...
This is a preview of the whole essay
To effectively study how a subject behaves in a given environment and to certain stimuli, it was necessary to establish a schedule of reinforcement, which is a set of guidelines saying how often the subject is reinforced (Gleitman, 1995). Stimuli could be presented to the environment according to a schedule of which there were two categories: continuous and intermittent (Gleitman, 1995), or not at all using extinction. Continuous reinforcement simply means that the behaviour is followed by a consequence each time it occurs. Intermittent schedules were based either on the passage of time (interval schedules) or the number of correct responses emitted (ratio schedules). The consequence could be delivered based on the same amount of passage of time or the same number of correct responses (fixed) or it could be based on a slightly different amount of time or number of correct responses that vary around a particular number (variable). This results in four classes of intermittent schedules, fixed interval (FI), fixed ratio (FR), variable interval (VI), and variable ratio (VR) (Gleitman, 1995). [Note: Continuous reinforcement is actually a specific example of a fixed ratio schedule with only one response emitted before a consequence occurs.]. The final schedule was extinction. During extinction, the subject is no longer reinforced for producing a previously reinforced response. Because there is no reward for responding, the frequency of the response decreases until it stops altogether (Carlson and Buskist, 1997. Huitt and Hummel, 1997. Gleitman, 1995).
For the purpose of this experiment we used two alternating schedules of consequence (Lab Manual Psychology 111/112, 2002), Variable Ratio of 16 (VR16), where a reinforcer was given after an average of 16 responses, and Extinction (EXT). VR schedule was chosen, as a variable ratio was thought to be the best for maintaining behaviour (Werzburg University).
The aim of the experiment was to demonstrate stimulus discrimination and selective stimulus control in rats, and in turn, give support to past research indicating that learning comes from experience.
The subject for this experiment was a female, albino rat, approximately 18 months old. The rat was placed in the operant chamber and subjected to two stimuli, a light and a tone. VR was paired with a dull light and high tone (1000Hz) and EXT was paired with a bright light and a low tone (500Hz) (Lab Manual Psychology 111/112, 2002). By reviewing past research, two hypotheses were formulated. The initial hypothesis was that the subject would learn to discriminate between the VR16 conditions that signal reinforcement and the EXT conditions, and therefore the rates of responding during VR16 would be higher than during EXT. It was also hypothesised that the stimulus used by the rats to discriminate would either be the light or the tone, not a combination (selective stimulus control).
.
Method
Participants
The subjects used for this experiment were 20, female, albino Spague-Dawley rats. The rats were raised in standard laboratory conditions, and were approximately 16 months old at the beginning of the experimental process, which formally started on March 4, 2002. The subjects were not experimentally naïve as they had participated in previous laboratory experiments. Individual weights for the participants was monitored daily and kept at 85% of the subjects' free feeding weight, which is approximately what their weight in the wild would be (Lab Manual Psychology 111/112, 2002). All subjects began their discrimination program of alternating between VR16 and EXT on April 6 2002, however for two weeks leading up to this date they had been trained constantly on VR16.
For the purposes of this experiment, we studied subject 15.
It was assumed that this participant had auditory and visual perception at a sufficient standard to hear and see the sound and light stimulus, and that there were no other physical impairments that may hinder our results.
Apparatus
The studying of the subject's responses took place in a standard operant chamber (see figure 1), which is used to minimise interference from outside sources (Lab Manual Psychology 111/112, 2002). The chamber contained a lever for the participant to make responses on, a tap for dispensing the diluted condensed milk (reward), and a speaker that played one of the two tones; high tone was 1000Hz, or low tone of 500Hz, depending on the schedule. The tones were combined with a light. Both the speaker and the light source were mounted directly above the lever in a hope that the subject would associate the stimulus with tapping the lever more rapidly (Brembs, 2001). There were also two pilot lights, which were constantly on in the chamber, to make it easier to observe the subject.
Figure1: An example of an operant chamber, with example cumulative output from a subject (the food reward in our example was diluted condensed milk, not food pellets).
An IBM computer was connected via cabling to the lever in the chamber, and to the University of Otagos' computer network, and thus had access to the appropriate Psychology Department software. The computer was used for the collection and collation of data.
Procedure
The preparation of the subjects began 6 weeks before recording of results began. The subjects were randomly assigned to experimenters, but for the purpose of this report, rat 15 was chosen.
Over the first week of recording, (discrimination training) the subject underwent 20 32-minute sessions, which alternated between 1-minute intervals of a variable ratio schedule of reinforcement after an average of 16 lever press (VR16), combined with a dull light and a high tone (1000Hz), and 'extinction' (EXT), where the subject received the stimulus of a bright light and low tone, but was not reinforced regardless of the number of taps. The manipulated (or independent) variables for the discrimination training were the two schedules of reinforcement, VR16 and EXT. The measured variable (dependant variable) was the mean number of responses per minute to the reinforced and not reinforced conditions. The hypothesis for the first week of training was that the rats would learn to discriminate between the VR16 conditions that signal reinforcement and the EXT conditions that signal non-reinforcement.
In the second week the rat were put through the same experiment as the discrimination training, of 20 32-minute sessions, with each session broken down into two 16-minute parts. Each blocks' alternating pattern of VR16 and EXT, was randomly interrupted with a series of four probe trials which changed the combination of light and sound stimulus. Probe 1 was a bright light and low tone (EXT); the second was a dull light and a low tone; probe 3 was bright light and a high tone; and the final probe (4) was a dull light and a high tone (VR16). It is important to note that the individual probes last for 1-minute and during the probe trials no reinforcement was given. It was hoped that these probe trials would prove the hypothesis that only one stimulus would be used to discriminate between reinforcement and non-reinforcement, either light or tone, not a combination. The manipulated (independent) variables for this experiment were the stimuli and the combination in which they were presented. The measured (dependant) variable was the average response for each of the four probe trials.
All results for each of the 20 sessions in the first week were averaged, which was also done for the probe trial results in the second week.
The design of this experiment was with-in subjects as all of the participant were exposed to the same stimuli and expected to perform the same task, in this case lever pressing.
Results
The measured variables for the discrimination training were the mean number of responses per minute for VR16 and EXT. The collected results were the mean number of responses per minute for each session; these were averaged over 20 sessions of 32-minutes, which can be seen in Table 1. The mean number of taps per minute per session for the week during VR16 was 71.705 responses. During EXT the average number was 68.795 taps per minute (standard deviation was not calculated).
Table 1
Average Number of Responses for each Session for VR16 and EXT
Session
2
3
4
5
6
7
8
9
0
VR
56.7
60.5
62.4
78.1
86.4
97.2
80.9
62.3
64.2
79.6
EXT
99.5
91.5
07.6
93
94.5
00.7
58
53.3
59.3
56.8
Session
1
2
3
4
5
6
7
8
9
20
VR
69.5
81.5
76.8
94.1
79.2
77.5
46
65.7
52
63.5
EXT
79.5
59
60.7
51.7
62.5
39
74
55.7
40.6
39
Figure 2 shows the Mean number of response per minute for the participant. In this graph, it is possible to see that the average responses for VR16 over the week remains constant, but during EXT the rate of response decreases from 99.5 to 39 taps per minute, the trend is more clearly identified in table 2.
Figure 2: Mean number of response per minute for rat 15, in an alternating 1-minute block schedule of VR16 and EXT.
Table 2
Average Number of Responses for each Session for VR16 and EXT, Separated into Blocks containing Four Sessions.
session 1-4
session 5-8
session 9-12
session 13-16
session 17-20
VR
64.43
81.70
73.70
81.90
56.80
EXT
97.90
76.63
63.65
53.48
52.33
Table 2 shows that through the middle sessions of VR16, the subject increased responding, but from beginning to end the results stayed consistent. In EXT however, we see a constant decline from 97.9 down to 52.33.
During the probe trials in the second week of the experiment the measured variable was the average response over all the sessions for the probe trials. Table 3 and Figure 3 below show the average results accumulated for the probe trials. The lowest result (34.66) was obtained when the rat was subject to a bright light and a low tone, which is also the same arrangement of stimuli as for extinction. The highest average response was for bright light and high tone (55.8). The mean for the probe that matched VR16 was 47, which ranked second highest.
Table 3
The Combination of Light and Tone Used in the Four Probe Trials
Probe
Light/Tone
Response Rate
Type
Combination
per Minute
Bright light/low tone
34.66
(EXT)
2
Dull light/low tone
45.53
3
Bright light/high tone
55.8
4
Dull light/high tone
47
(VR16)
Figure 3: Probe trials for rat 15, using different combinations of the 2 Stimulus. Results are the average for all observed sessions.
Discussion
During the discrimination training period, there is a distinct decline of responses in the extinction schedule from 99.5 to 39, while VR16 stayed approximately stationary (56.7 to 63.5). This can be clearly seen in the 4-session average blocks where EXT decreases from 97.9 to 52.33. These results agree with the hypothesis, as it can be seen that the subject was learning that during the extinction schedule there was no reinforcement.
We see in the probe trials that the probe type that was the best attended to by the subject was probe 3 (55.8), or bright light and high tone. The lowest number of responses was achieved by probe 1 (34.66) (bright light and low tone), which was the same conditions as the extinction schedule, which is why these results were low. The second ranked probe was number 4 (47), and had the same stimuli as VR16 (dull light and high tone). The stimulus that was common to both probe 3 and 4 is the high tone, so we can therefore conclude that the subject uses the high tone to determine whether reinforcement is available. This concurs with the second hypothesis that only one stimulus would be used to discriminate between reinforcement and non-reinforcement.
These results follow those of Reynolds (1961), but we did not go into confirming that tone was the selected stimulus, which could have been done by offering the subject probe trials of only bright or dull light, or only high or low tone. This would definitely confirm whether selective stimulus control was taking place. This would obviously be the next phase of this experiment.
By reviewing the experiment, there seems to be some limitations, the most important of these being the use of pilot light in the chamber itself. The lights were in place to aid the experimenter in observing the subject. However, as one of the stimuli used was a light, it would seem the pilot lights would be an obvious confound. It is also worth contemplating the ease in differentiating between a bright light and a dull light, and high and low tones. The final source of error was the rat(s). Initially it was assumed that the subjects were physically able to participate (not deaf for example), but it was also assumed that although the rats were not experimentally naïve, the results would not be affected.
The results of this experiment have real world implications, as humans try to find ways of controlling animal behaviours and ultimately the animals themselves. Sanjiv Talwar of New York State University recently developed a radio-controlled rat (ABC.com, 2002). The rat has three wires implanted in different parts of its brain, two responsible for direction control, and the third in the rats medial forebrain bundle, an area responsible for relaying feelings of happiness or reward (ABC.com). On a level much closer to home, using operant conditioning to teach the family dog to stop barking by using positive reinforcement, or controlling an insolent child's behaviour in a classroom using extinction, are everyday applications of this knowledge.
The conclusion of this experiment is that rats do demonstrate discrimination in operant conditioning situations, and there does seem to be evidence that selective stimulus control is present in these situations.
References
Carlson, N., & Buskist, W. (1997). Psychology: The science of behaviour (5th ed.) Boston: Allyn and Bacon.
Gleitman, H. (1995). Psychology (4th ed.) New York: Norton
Kentridge, R.W. (2001) Lecture notes from Werzburg University Germany. Available at http://www.biozentrum.uni-wuerzburg.de/genetics/
behavior/learning/behaviorism.html.
Huitt, W. & Hummel, J. (1997). Lecture notes from Valdosta State University, Georgia, USA. Available at http://chiron.valdosta.edu/whuitt/col/behsys/operant.html.
Brembs, B. (2001) Available at http://www.brembs.net/learning/operant.html
ABC News (2002). Scientists develop Remote-Controlled Rats. Available at http://more.abcnews.go.com/sections/scitech/DailyNews/rats020501.html
Laboratory Manual Psychology 111/112 (2002). Dunedin. Department of Psychology. University of Otago.
Reynolds, G. S. (1961). Attention to the pigeon. Journal of the Experimental Analysis of Behaviour, 4. 203 - 208
Mark Miller. PSYC 111. Lab 103