Simon Pageaud
Mail : simon.pageaud@liris.cnrs.fr
PhD in SMA Team 
LIRIS-CNRS, University Claude Bernard Lyon 1
Generic architecture for urban policy co-construction using multiagent reinforcement learning
The Smart City allows stakeholders with different backgrounds to enroll in urban policy making. However, doing so requires suitable tools to integrate needs and feedbacks from these stakeholders. Multiagent simulation provides virtuals environments to study reactions of political decisions through modification of global behaviors. To be considered relevant, the decisions need to be applied on realistic populations. In this PhD thesis, we propose a generic architecture to help policymakers integrate the flow of continuous data from the future smart city to build relevant urban policies with autonomous and adaptative multiagent simulations. Our approach in twofold. We describe a formal model to build the complex ecosystem of the policy setting with agents able to react to politicial decisions. Then we rely on a multi-level coupling that learn specific environment information to build a relevant urban policy. We provide a multiagent and multi-level simulation to adapt urban policies using reinforcement learning. A set of fixed local learner agents is distributed over the environment to collect information without any prior. They are gathered into clusters managed by a variable number of control agents that decide which actions to apply. Through a trust-score attribution, control agents are able to identify each local learner contribution to their global payoff and allow deep reinforcement learning with reduced impact of the non-stationnarity on experience replay in Deep Q-Learning. These two contributions are merged together to provide a complete model to co-construct urban policies.
Smart environment, urban policy making, participatory simulation, reinforcement learning, multiagent learning, coordination
Extented Abstract Multiagent Learning and Coordination using Clustered Deep Q-Network
Existing decentralized learning methods entail scalability issues due to the number of agents involved. Independent Q-Learning approach proposes that each agent learns its own action-values. One drawback of this method is that the non-stationarity introduced by Independent Q-Learning limits the use of experience replay memory, needed in deep reinforcement learning methods such as Deep Q-Network. This paper presents a multiagent, multi-level solution named Clustered Deep Q-Network (CDQN) to overcome this issue.
Full Paper Co-construction of Adaptive Public Policies Using SmartGov (Available on request)
Designing a public urban policy is a demanding process which requires both time and money with no warranty of its efficiency. It involves knowledge about the purpose of urban design, behaviors of users and needs in terms of mobility. We believe that in the near future, decision makers will have to react and more frequently adapt public policies, based on the huge amount of available data, feedbacks from both target users and stakeholders. In this paper, we propose a generic agent-based architecture to model and simulate urban policies, which could facilitate the co-design and assessment of public policies in a specific environment. Two agent-based models are coupled with a micromacro dynamic loop, and they can be adapted either by the system using reinforcement learning, or by the stakeholders using simulation results. A generic formalism is elaborated to represent urban policies, which can be instantiated in a co-design approach between the policymaker and our system. An experimentation is conducted on an urban mobility policy, related to the configuration of parking price system in downtown area. The agent's behavior and environment are developed to be as realistic as possible, based on a real-world source of modeling. However, our architecture has been designed to be generic, exploiting infrastructure data from any city using available community data (Open Street Map). The scenario of parking pricing shows that the system learns postpolicy behaviors and can propose some adjustments (e.g., specific actions to apply, when and how) to better meet the stakeholders' objectives (e.g., maximize parking gains). The policy maker can then choose to validate the provided policies, or modified them for additional simulations.
LIRIS
Bâtiment Nautibus
Université Claude Bernard Lyon 1
43 Boulevard du 11 Novembre 1918 
69622 Villeurbanne Cedex