Policy Interpretation for Deep Reinforcement Learning

Slavova, A. V.; Hristov, V. D.

Autors: Slavova, A. V., Hristov, V. D.
Title: Policy Interpretation for Deep Reinforcement Learning
Keywords: Deep learning, Deep reinforcement learning, Interpretability, Policy visualization, XAI

Abstract: Reinforcement Learning (RL) aims to train an autonomous intelligent agent to build interaction behavior with a predefined environment. This is usually achieved through training using deep neural networks. The Deep Reinforcement Learning (DRL) method is rapidly entering various domains, such as robotics, autonomous cars, industrial production, and other systems implemented in human environments. Due to the black box nature of this training approach, it is important to understand the behavior of the RL agent, especially in areas with risk to human health. Understanding decision making models allows building reliable and stable intelligent systems, find drawbacks, and optimize the learning algorithm. In this work, we develop a method for interpreting RL agent behavior which is based on Agglomerative clustering of Shapley values embeddings of states for gaining insights of agent politics in various resulting groups through RuleFit method. The results obtained point out parameters importance, state thresholds of actions and marginal state values.

References

Shapley, L. S. A value for n-person games. Contributions to the Theory of Games, 2, 1953, pp. 307-317.
Jullum, M., Redelmeier, A., and Aas, K. groupShapley: "Efficient prediction explanation with Shapley values for feature groups." 10.48550/arXiv.2106.12228, 2021.
J.H. Friedman. B.E. Popescu. "Predictive learning via rule ensembles." Ann. Appl. Stat. 2 (3) 916-954, 2008.
T. Zahavy, N. Ben-Zrihem, and S. Mannor. "Graying the black box: Understanding dqns. In International Conference on Machine Learning", pp. 1899-1908, 2016
L. v. d. Maaten and G. Hinton. "Visualizing data using t-sne. Journal of machine learning research", 9(Nov):2579-2605, 2008.
V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, K. Kavukcuoglu. "Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning", 2016, pp. 1928-1937.
S. H. Huang, K. Bhatia, P. Abbeel, and A. D. Dragan. "Establishing appropriate trust via critical states." International Conference on Intelligent Robots (IROS), 2018.
Silva, A.; Gombolay, M.; Killian, T.; Jimenez, I.; and Son, S.-H. "Optimization Methods for Interpretable Differentiable Decision Trees Applied to Reinforcement Learning", 2020
A. Verma, V. Murali, R. Singh, P. Kohli, S. Chaudhuri, "Programmatically interpretable reinforcement learning." International Conference on Machine Learning, 5045-5054. PMLR, 2018.
O. Bastani, Y. Pu, A. Solar-Lezama, "Verifiable reinforcement learning via policy extraction." Advances in neural information processing systems, 31, 2018.
R.S. Sutton, A.G. Barto, "Reinforcement learning: An introduction." MIT Press, ISBN-10. 0262039249, 1998.
https://gymnasium.farama.org/environments/classic_control/cart_pole/
R.S Sutton, D. A McAllester, S. P Singh, Y. Mansour. "Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems", pp. 1057-1063, 2000
T. Wang, "Learning Reinforcement Learning by Learning REINFORCE", University of Toronto, https://www.cs.toronto.edu/~tingwuwang/REINFORCE.pdf
D. Beechey, T.M.S. Smith, O. Simsek, "Explaining Reinforcement Learning with Shapley Values". ICML'23: Proceedings of the 40th International Conference on Machine Learning, 84, pp. 2003-2014 2023.
S.Lipovetsky, M. Conklin, "Analysis of regression in game theory approach". Applied Stochastic Models in Business and Industry, 17(4):319-330, 2001.
L.McInnes, J. Healy, "UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction", Journal of Open Source Software, 3(29), 861, https://doi.org/10.21105/joss.00861, 2018

Issue

ICARAI 2025 - International Conference Automatics, Robotics and Artificial Intelligence, Proceedings, 2025, Bulgaria, https://doi.org/10.1109/ICARAI67046.2025.11137898

Вид: публикация в международен форум, публикация в реферирано издание, индексирана в Scopus

Е-Публикации
Технически университет - София

Детайли за публикация от базата данни на ТУ - София (Publication Details)