Training Strategies for Critic and Action Neural Networks in Dual Heuristic Programming Method
This paper discusses strategies for and details of training procedures for the dual heuristic programming methodology. This and other approximate dynamic programming approaches have been discussed in the literature, all being members of the adaptive critic design family. It suggests and investigates several alternative procedures and compares their performance with respect to convergence speed and quality of resulting controller design. A modification is to introduce a real copy of the criticNN (criticNN 2) for making the "desired output" calculations, and this criticNN 2 is trained differently than is criticNN 1. The idea is to provide the "desired outputs" from a stable platform during an epoch while adapting the criticNN 1. Then at the end of the epoch, criticNN 2 is made identical to the then-current adapted state of criticNN 1, and a new epoch starts. In this way, both the criticNN 1 and the actionNN can be simultaneously trained online during each epoch, with a faster overall convergence than the older approach. The measures used suggest that a "better" controller design (the actionNN) results.
Locate the Document
Lendaris, G. G., & Paintz, C. (1997, June). Training strategies for critic and action neural networks in dual heuristic programming method. In Proceedings of International Conference on Neural Networks (ICNN'97) (Vol. 2, pp. 712-717). IEEE.