First Advisor

George G. Lendaris

Term of Graduation

Spring 1997

Date of Publication


Document Type


Degree Name

Master of Science (M.S.) in Electrical and Computer Engineering


Electrical Engineering




Neural networks (Computer science), Heuristic programming, Algorithms



Physical Description

1 online resource (2, viii, 79 pages)


This thesis discusses strategies for and details of training procedures for the Dual Heuristic Programming (DHP) methodology. This and other approximate dynamic programming approaches (HDP, DHP, GDHP) have been discussed in some detail in the literature, all being members of the Adaptive Critic Design (ACD) family. The example applications used here are the inverted pendulum problem and a fully nonlinear constant velocity bicycle steering model. The inverted pendulum has been successfully controlled using DHP, as reported in the literature. This thesis suggests and investigates several alternative D HP training procedures and compares their performance with respect to convergence speed and quality of resulting controller design. A promising modification is to introduce a real copy of the criticNN (criticNN#2) for making the "desired output" calculations, and very importantly, this criticNN#2 is trained differently than is criticNN#l. The idea is to provide the "desired outputs" from a stable platform during an epoch while adapting the criticNN#l. Then at the end of the epoch, criticNN#2 is made identical to the then-current adapted state of criticNN#l, and a new epoch starts. In this way, both the criticNN#l and the actionNN can be simultaneously trained on-line during each epoch, with a faster overall convergence than the older approach. Further, the measures used herein suggest that a "better" controller design (the actionNN) results.

The learning strategy with the fastest learning was used to design a controIler for a fully nonlinear, constant-velocity bicycle steering model. The controller's task here is to steer the car along a given trajectory on the road. The performance accomplished by the controller demonstrates the applicability of that learning strategy to highly nonlinear, complex plants.


In Copyright. URI:

This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).


If you are the rightful copyright holder of this dissertation or thesis and wish to have it removed from the Open Access Collection, please submit a request to and include clear identification of the work, preferably with URL

Persistent Identifier