First Advisor

Marek Perkowski

Term of Graduation

Summer 2020

Date of Publication


Document Type


Degree Name

Doctor of Philosophy (Ph.D.) in Electrical and Computer Engineering


Electrical and Computer Engineering



Physical Description

1 online resource (xxxiii, 365 pages)


Humanoid robots are expected to be able to communicate with expressive gestures at the same level of proficiency as humans. However, creating expressive gestures for humanoid robots is difficult and time consuming due to the high number of degrees of freedom (DOF) and the iterations needed to get the desired expressiveness.

Current robot motion editing software has varying levels of sophistication of motion editing tools ranging from basic ones that are text-only, to ones that provide graphical user interfaces (GUIs) which incorporate advanced features, such as curve editors and inverse kinematics. These tools enable users to create simple motions; but creating expressive motions is laborious and demands a lot of patience, as well as technical and artistic skills from the user. Therefore, most humanoid robots have a limited range of expressive motions with little variety and executed the same way each time. Future robots should be able to generate expressive gestures on the fly during interaction with humans.

This work presents several new methods for creating expressive motions in humanoid robots that we have not seen in other robot motion editors. The first is a new method of composing robot gestures and behaviors that uses algebraic expressions by utilizing probabilistic operators that are extensions of simple algebraic operators such as concatenation, union, repetition, and subtraction.

This method also allows hierarchical composition by reusing previously-defined behaviors in another expression, enabling generation of highly complex behaviors. I implemented this method as a tool called Robot Expressive Behavior Language (REBeL). The utility of this method is demonstrated by creating various behaviors for the HROS-1 mini humanoid robot and the adult-human-sized robot Mr. Jeeves. The second method analyzes MIDI music to extract timing information. This method allows motions to be executed with more rhythmic and dynamic varieties without requiring the user to manually specify and edit the motion data. The third method uses Kochanek-Bartels interpolation parameters of tension, bias, and continuity to produce follow-through, overlapping, and anticipation effects from traditional animation principles. Additionally, I employ multiresolution analysis using wavelets to filter motion data in two tasks: 1) The first task was to reduce jerk when concatenating two or more motion data where the joins are discontinuous.

We found that jerk was reduced not only at the joins, but everywhere else in the motion data without deviating much from the original data. 2) The second task was to create various motion expressions. By reconstructing the motion data, using only a subset of the filterbanks obtained from the multiresolution analysis, we found that different wavelets produced different effects such as tremors, stuttering, and the 'lazy' version of the motion. In addition, the Haar wavelet was used to discretize values from our motion dataset for machine learning. This discretization method reduces the number of unique values in the dataset from over two million to just above two hundred values/bins, which makes it suitable for using one-hot encoding to train machine learning models.

In this work, I also present my results of using multiple machine learning models in two tasks. The first task was to evaluate the quality of robot motions by training classifiers that discriminate between motion capture data and robot motion data. For this task, I used binary decision tree, na\"ive Bayes, support vector machine (SVM), and a long short-term memory (LSTM) network as classifier models. A robot motion data sample is considered 'good' if the classifier misclassifies the sample as motion capture data. The second task is to produce new motion samples using generative models. I built three generative models: a LSTM network, a variational auto encoder (VAE) network, and a generative adversarial network (GAN) that was based on the deep convolutional GAN (DCGAN) model. My results show that all three generative models can produce data that are better than random noise and that have similar characteristics to motion capture data. The LSTM model especially benefits from the one-hot encoding of the training data. However, most of the data produced by these models do not always show meaningful communicative gestures. I also present my building a motion dataset to train these machine learning models using samples from motion capture data and existing robot motion data.


In Copyright. URI: This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

Persistent Identifier

robot cleaned dataset.csv (321631 kB)
robot cleaned dataset

mocap cleaned dataset.csv (500691 kB)
mocap cleaned dataset