About

The Cart-Pole environment is a classic testbed in reinforcement learning, where the objective is to balance a pole on a moving cart. The challenge involves controlling the cart's movement to prevent the pole from falling over.

How to play

When the game starts you will be in control of the cart. Your task is to balance the pole for 30 seconds by pressing A and D to move left and right. If you want to see the agent in action, click the checkbox in the top left corner.

How it works

In this project, I employed linear function approximation to train an agent over 40,000 episodes. The agent is capable of executing three distinct actions: remaining stationary, moving left, and moving right. The decision-making process of the agent is informed by its current state, which is defined by four continuous variables: cart position, cart velocity, pole angle, and pole angular velocity. 

State Aggregation: Tile Coding

These 4 state variables, due to their continuous nature, can take on any number within a specific range, including decimal values. To transform these continuous variables into a more manageable discrete format, I utilized a technique known as tile coding, a form of state aggregation. Tile coding efficiently maps a wide continuous range into a narrower, discrete spectrum. For instance, within a range of 0 to 10, closely related values like 2.4512 and 2.4513, which would normally be considered different, are aggregated and treated as a single value. In this example, both would be classified as 2, along with other values in proximity to 2. This method of state aggregation simplifies the state representation, streamlining it for the agent's decision-making process, thereby enhancing its ability to learn and adapt in the given environment. However, this approach presents a balancing challenge. The more broadly states are aggregated, the less detail is retained. Take, for instance, the pole angle variable, which ranges from 0 to 90 degrees. If it's aggregated into just two states, one ranging from 0-45 and the other from 46-90, a significant amount of information is lost. A pole angle of 44 degrees, close to losing balance, would be treated the same as an upright and stable position. Therefore, selecting the appropriate number of categories, or tiles, is crucial. Too few tiles can oversimplify the state, while too many can make the state unnecessarily complex. The key lies in finding a balance that maintains sufficient detail for effective learning without overwhelming the agent with information. In this project, tile coding is enhanced by the implementation of tilings. This technique involves subtly shifting the categories for each variable, which allows for a more detailed capture of our environment. For example, consider the pole angle: a 45-degree angle might be categorized in one way, but a 46-degree angle is categorized differently. Despite being only 1 degree apart, these angles are treated as distinct due to this categorization method.By applying multiple tilings, we slightly alter where variables fall within these categories. This approach is akin to looking at the same scene from slightly different angles to gain a more comprehensive understanding. It's especially useful in recognizing subtle but important differences between similar states, enabling a more nuanced grasp of the relationships within the data. This method enriches the learning process by providing a layered and intricate perspective, rather than a flat, one-dimensional view.


Leave a comment

Log in with itch.io to leave a comment.