Sponsor
Portland State University. Department of Electrical Engineering.
First Advisor
George G. Lendaris
Term of Graduation
Spring 1997
Date of Publication
5-1997
Document Type
Thesis
Degree Name
Master of Science (M.S.) in Electrical and Computer Engineering
Department
Electrical Engineering
Language
English
Subjects
Neural networks (Computer science), Heuristic programming, Algorithms
DOI
10.15760/etd.7663
Physical Description
1 online resource (2, viii, 79 pages)
Abstract
This thesis discusses strategies for and details of training procedures for the Dual Heuristic Programming (DHP) methodology. This and other approximate dynamic programming approaches (HDP, DHP, GDHP) have been discussed in some detail in the literature, all being members of the Adaptive Critic Design (ACD) family. The example applications used here are the inverted pendulum problem and a fully nonlinear constant velocity bicycle steering model. The inverted pendulum has been successfully controlled using DHP, as reported in the literature. This thesis suggests and investigates several alternative D HP training procedures and compares their performance with respect to convergence speed and quality of resulting controller design. A promising modification is to introduce a real copy of the criticNN (criticNN#2) for making the "desired output" calculations, and very importantly, this criticNN#2 is trained differently than is criticNN#l. The idea is to provide the "desired outputs" from a stable platform during an epoch while adapting the criticNN#l. Then at the end of the epoch, criticNN#2 is made identical to the then-current adapted state of criticNN#l, and a new epoch starts. In this way, both the criticNN#l and the actionNN can be simultaneously trained on-line during each epoch, with a faster overall convergence than the older approach. Further, the measures used herein suggest that a "better" controller design (the actionNN) results.
The learning strategy with the fastest learning was used to design a controIler for a fully nonlinear, constant-velocity bicycle steering model. The controller's task here is to steer the car along a given trajectory on the road. The performance accomplished by the controller demonstrates the applicability of that learning strategy to highly nonlinear, complex plants.
Rights
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
Persistent Identifier
https://archives.pdx.edu/ds/psu/36440
Recommended Citation
Paintz, Christian Peter, "Training Strategies for Critic and Action Neural Networks in Dual Heuristic Programming Method" (1997). Dissertations and Theses. Paper 5792.
https://doi.org/10.15760/etd.7663
Comments
If you are the rightful copyright holder of this dissertation or thesis and wish to have it removed from the Open Access Collection, please submit a request to pdxscholar@pdx.edu and include clear identification of the work, preferably with URL