Portland State University. Department of Electrical Engineering
George G. Lendaris
Term of Graduation
Date of Publication
Master of Science (M.S.) in Electrical and Computer Engineering
Electrical and Computer Engineering
Neural networks (Computer science), Heuristic programming, Algorithms
1 online resource (2, viii, 79 pages)
This thesis discusses strategies for and details of training procedures for the Dual Heuristic Programming (DHP) methodology. This and other approximate dynamic programming approaches (HDP, DHP, GDHP) have been discussed in some detail in the literature, all being members of the Adaptive Critic Design (ACD) family. The example applications used here are the inverted pendulum problem and a fully nonlinear constant velocity bicycle steering model. The inverted pendulum has been successfully controlled using DHP, as reported in the literature. This thesis suggests and investigates several alternative D HP training procedures and compares their performance with respect to convergence speed and quality of resulting controller design. A promising modification is to introduce a real copy of the criticNN (criticNN#2) for making the "desired output" calculations, and very importantly, this criticNN#2 is trained differently than is criticNN#l. The idea is to provide the "desired outputs" from a stable platform during an epoch while adapting the criticNN#l. Then at the end of the epoch, criticNN#2 is made identical to the then-current adapted state of criticNN#l, and a new epoch starts. In this way, both the criticNN#l and the actionNN can be simultaneously trained on-line during each epoch, with a faster overall convergence than the older approach. Further, the measures used herein suggest that a "better" controller design (the actionNN) results.
The learning strategy with the fastest learning was used to design a controIler for a fully nonlinear, constant-velocity bicycle steering model. The controller's task here is to steer the car along a given trajectory on the road. The performance accomplished by the controller demonstrates the applicability of that learning strategy to highly nonlinear, complex plants.
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
Paintz, Christian Peter, "Training Strategies for Critic and Action Neural Networks in Dual Heuristic Programming Method" (1997). Dissertations and Theses. Paper 5792.