• D8_575414
  • 84.1KB
  • zip
  • 0
  • VIP专享
  • 0
  • 2022-05-12 21:01
倒立摆 对倒立摆问题的强化学习解决方案的调查。 介绍 倒立摆问题可以简明地定义为创建一个系统,该系统使用致动器沿轨道移动推车,并使用传感器来揭示推车和摆的状态,从而自动平衡附着在轨道上的推车上的旋转摆。 给定特定的倒立摆系统,解决问题就等于选择使用哪种传感器,可以选择制定近似于所涉及物理的确定性或随机模型,最重要的是找出有效的控制策略。 它是控制理论和动力学中的经典问题,可以作为开发实时控制算法的良好测试平台。 注意到问题在自然界和人造世界中的普遍性,可以很好地证明人们理解问题的动机。 例如,每个人在站立时都需要不断进行调整以防止摔倒,因此我们所有人在坐着或四处走动时都会反复解决此问题的一个更为困难的版本。 问题表述 长度为l的摆锤的一端具有质量m ,并通过铰链连接到质量为M的手推车,该摆锤能够通过在手推车上施加一定的力F来旋转。 通过假设,忽略了小车在地面上的摩擦以及摆锤在小车上的摩擦
  • inverted-pendulum-master
  • img
  • pendulum.png
  • data
  • test.mp4
  • src
  • .gitignore
# inverted-pendulum A survey of reinforcement learning solutions to the inverted pendulum problem. # Introduction The inverted pendulum problem can be defined concisely as creating a system that autonomously balances a rotating pendulum attached to a cart on a rail using actuators to move the cart along the rail, and sensors to reveal the state of the cart and pendulum. Given a specific inverted pendulum system, solving the problem amounts to choosing which sensors to use, optionally formulating a deterministic or stochastic model that approximates the physics involved, and most importantly figuring out a control strategy that works. It is a classic problem in control theory and dynamics and can serve as a good testing ground for the development of real-time control algorithms. A good argument for the motivation of understanding the problem can be seen by noting its prevalence in nature and the man made world. For example, every person when upright needs to constantly make adjustments to prevent from falling over, and so all of us repeatedly solve a much more difficult version of this problem when sitting or moving about. # Problem Formulation A pendulum of length **l** with a mass **m** on one end and attached to a cart of mass **M** with a hinge, is able to rotate by the effect of the application of some force **F** on the cart. ![Alt text](/img/pendulum.png?raw=true "Schematic of the problem setup. Graphic by Krishnavedala, Wikimedia commons (CC0 1.0).") By assumption, the friction of the cart on the ground as well as the friction of the pendulum on the cart are ignored. The cart is limited to motion in the horizontal x-axis, while the pendulum is able to rotate along the x, y plane freely, making an angle of **θ** with the y-axis. It is also assumed that the force vector **F** has no component in the y direction. A controller attempts to balance the pendulum by applying a finite force to the cart, allowing it to move left or right with some acceleration. After some amount of time **∆T** has passed, the controller fails if either the angle of the pendulum deviates by more than **±∆θ** or the position of the cart reaches the bounds **±L**. The problem can now be stated explicitly: develop a controller that balances the pendulum under these constraints. # References The following list contains background research material. [1] Stimac, Andrew K. Standup and stabilization of the inverted pendulum. Diss. Massachusetts Institute of Technology, 1999. [2] Russell, Stuart, and Peter Norvig. "Artificial intelligence: a modern approach." (1995). [3] Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. Vol. 1. No. 1. Cambridge: MIT press, 1998. [4] Melo, Francisco S., Sean P. Meyn, and M. Isabel Ribeiro. "An analysis of reinforcement learning with function approxi- mation." Proceedings of the 25th international conference on Machine learning. ACM, 2008. [5] Kober, J. & Peters, J. (2012). Reinforcement learning in robotics: A survey. In Reinforcement Learning, Vol. 12 of Adaptation, Learning, and Optimization, pp. 579-610. Springer Berlin Heidelberg. [6] Anderson, Charles W. "Learning to control an inverted pendulum using neural networks." Control Systems Magazine, IEEE 9.3 (1989): 31-37. [7] Watkins, Christopher JCH, and Peter Dayan. "Q-learning." Machine learning 8.3-4 (1992): 279-292. [8] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural net- works." Advances in neural information processing systems. 2012. [9] Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).