Autors: Budakova, D. V., Petrova-Dimitrova V. S., Dakovski L. G.
Title: Intelligent virtual agent, learning how to reach a goal by making the least number of compromises
Keywords: Intelligent system, Reinforcement learning, Smart shopping-cart learning agents

Abstract: The learning process in the Q-learning algorithm is characterized by maximizing a single, numerical reward signal. However, there are tasks for which the requirements toward the way to reach a goal are complex. This paper proposes a modification to the Q-learning algorithm. In order to make the Q-learning agent find the optimal path to the goal by meeting particular complex criteria, the use of measures model (a model of environment criteria), represented as a new memory matrix, is introduced. If the goal cannot be reached by following the pre-set criteria, the learning agent can compromise a given criterion. The agent makes the least possible number of tradeoffs in order to reach the goal. If the criteria are arranged by their level of importance, then the agent can choose more in number and more acceptable compromises. The aim of the modification is to empower the learning agent to control the way of reaching a goal. The modified algorithm has been applied to training smart agents.


  1. Sutton R. S. and Barto A. G., 2014, Reinforcement Learning: An Introduction, Online, England, MIT Press
  2. Gosavi A., 2008, Reinforcement Learning: A Tutorial Survey and Recent Advances, INFORMS Journal on Computing, Volume Vol. 21 No.2, pp. 178-192
  3. Torrado R.R., Bontrager P., Togelius J., Liu J., Perez-Liebana D., 2018, Deep Reinforcement Learning for General Video Game AI, Maastricht, 2018-August, <Netherlands>, IEEE Conference on Computatonal Intelligence and Games
  4. Argall B., 2009, Learning Mobile Robot Motion Control from Demonstration and Corrective, Robotics Institute Carnegie Mellon University Pittsburgh, Volume PA 15213, pp. 8
  5. Amor H. B., Vogt D., Ewerton M., Berger E., Jung B., Peters J., 2013, Learning Responsive Robot Behavior by Imitation, Tokyo, November 3-7, 2013, <Japan>, IEEE/RSJ International Conference on Intelligent Robots and Systems
  6. Takahashi K, Kim K., Ogata T., Sugano S., 2017, Tool-body assimilation model considering grasping motion through deep learning, Robotics and Autonomous Systems, Elsevier,, Volume Volume 91, pp. 115–127
  7. Moffaert K. V., 2016, Reinforcement Learning for Sequential Decision Making Problems, for the degree of Doctor of Science, <Computer Science, Brussels University Press>
  8. Natarajan S., Tadepalli P.,, 2005, Dinamic Preferences in Multi-Criteria Reinforcement Learning, Bonn, 2005, <Germany>, International Conference on Macine Learning
  9. Budakova D , Dakovski L., 2019, Smart shopping system, Plovdiv, May 2019, <Bulgaria>, TECHSYS 2019
  10. Budakova D, Dakovski L., Petrova-Dimitrova Veselka, 2019, Smart Shopping Cart Learning Agents Development, Sozopol, 26-28 September 2019, <Bulgaria>, TECIS 2019, 19th IFAC-PapersOnLine
  11. Budakova D., Dakovski L., Petrova-Dimitrova Veselka, 2019, Smart Shopping Cart Learning Agents, International journal on Advances in internet technology, IARIA, Volume Vol. 12, nr 3&4. 2019, pp. 109 – 121
  12. Shakev N. G., Ahmed S. A.,Topalov A.V., Popov V.L., and Shiev K.B., 2018, Autonomous Flight Control and Precise Gestural Positioning of a Small Quadrotor, Learning Systems: From Theory to Practice, Springer, Volume 2018, pp. 179-197


TechSys 2020, vol. 878, 2020, Bulgaria, IOP Publishing, DOI 1757-899X/1757-8981

Copyright IOP Conference Series: Materials Science and Engineering

Full text of the publication

Вид: публикация в международен форум, публикация в реферирано издание, индексирана в Scopus