加勒比久久综合,国产精品伦一区二区,66精品视频在线观看,一区二区电影

合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫院企業服務合肥法律

代做CS 7642 Reinforcement Learning and Decision

時間:2024-04-21  來源:合肥網hfw.cc  作者:hfw.cc 我要糾錯



   CS 7642: Reinforcement Learning and Decision Making Project #3 Overcooked
 1 Problem 1.1 Description
For the final project of this course, you have to bring together everything you have learned thus far and solve the multi-agent Overcooked environment (modeled after the popular video game). In this environment, you have control over 2 chefs in a restaurant kitchen who have to collaborate to cook onion soups. To cook a soup, the agents need to put 3 onions into a cooking pot, initiate cooking, wait for the soup to cook, put the soup into a dish, and serve the dish at a serving area. This project serves as a capstone to the course and as such we expect much of the project to be open-ended and self-directed. Your primary goal is to maximize the number of soups delivered within an episode on a variety of layouts ranging from fairly easy to extremely difficult. In your quest to solve these layouts you may discover auxiliary goals or metrics that are worth analyzing.
Our expectation is that you have learned what is significant to include in this type of report from the previous projects and the material we have covered so far. It is thus up to you to define:
• The direction of your project including which aspect(s) you aim to focus upon.
• How you specify and measure such aspects.
• How to train your agents.
• How to structure your report and what graphs to include (in addition to the mandatory graphs discussed later).
Your focus should be on demonstrating your understanding of the algorithm(s)/solution(s), clarifying the ratio- nale behind your experiments, and analyzing their results. Your main goal is to develop an algorithm to solve the environment but you can also use everything else studied in the course such as reward and policy shaping. The environment provides a reward shaping data structure that you are free to use. You may also design your own reward shaping in place of, or in addition to, this default setup. However, all algorithms and solutions used to solve the environment should be your own. We encourage you to start off this project with your Project 2 solution and see how far that model takes you. This will provide context for why multi-agent methods may be necessary for this environment. It will also help to ease your transition into this environment by utilizing an algorithm you’ve already gotten to work.
Figure 1: Visualization of the Overcooked environment. Carroll et al. 2019
1.2 Environment and Task
In this project, you will be training a team of 2 agents to cook onion soups in a kitchen. The objective is always to deliver as many soups as possible within a 400-timestep episode. Each soup takes 20 timesteps to cook and
 1

– Overcooked 2
delivering a soup successfully yields a +20 reward. Cooking a soup with less than 3 onions, dropping a soup on the ground, or serving the soup on the counter (instead of the designated serving area) yields no reward but hinders progress as agents lose precious time (and starve customers). Episodes are truncated to a 400 step horizon with no termination conditions. You are not permitted to increase the 400 step horizon. You are provided with 5 layouts of varying difficulty - [cramped room, asymmetric advantages, coordination ring, forced coordination, counter circuit 0 1order] as shown in Figure 2 1. Your task is to achieve a mean soup delivery count of ≥ 7 per episode across all layouts using a single approach. This means a single algorithm and a single reward-shaping function (if you utilize reward shaping). This also means a single set of hyperparameters, The idea is to build an agent that can solve any layout that is thrown at it, and not just these 5. Having a constant set of parameters also makes reproducibility much easier (something we gained an appreciation for in Project 1). Note that some layouts can be solved by a single agent algorithm and don’t require any collaboration. Other layouts benefit significantly from collaboration and some may only be solvable via collaboration. This means that a successful approach to solving all 5 layouts will likely require an explicit multi-agent approach. We also expect you to develop your approaches and analyze the results by explicitly looking at multi-agent metrics (see Section 1.8).
Figure 2: The 5 layouts you are tasked to solve. From left to right they are named [cramped room, asymmetric advantages, coordination ring, forced coordination, counter circuit 0 1order]. Car- roll et al. 2019
1.3 State Space
This is a fully-observable MDP and both agents have access to the full observation. Therefore, the state and observation spaces are equivalent. By default, the observations are provided as a 96-element vector, customized for each agent. The encoding for player i ∈ {0, 1} contains a player-centric featurized view for the ith player, and is as follows:
[player i features, other player features, player i dist to other player, player i position]
The first component, player i features has length 46 and is detailed below. Note that if you add all the feature lengths in the specification below, you will get 36 instead of the expected 46. This is because the five features related to the pot (having a combined length of 10) occur twice, once for each pot, and are concatenated together. Note also that none of our layouts contain tomatoes, so the features corresponding to tomatoes will always be 0. Finally, layouts containing only one cooking pot will have the second pot’s features zeroed out as well.
• p i orientation: one-hot-encoding of direction currently facing (length 4)
• p i obj: one-hot-encoding of object currently being held ([onion, soup, dish, tomato]) (all 0s if no object
held) (length 4)
• p i closest onion|tomato|dish|soup: (dx, dy) where dx = x dist to item, dy = y dist to item. (0, 0) if item is currently held (length 8)
• p i closest soup n onions|tomatoes: int value for number of this ingredient in closest soup (length 2)
• p i closest serving area|empty counter: (dx, dy) where dx = x dist to item, dy = y dist to item. (length
4)
1The overcooked environment has dozens of layouts but for this project we will only be focusing on these 5.
  
– Overcooked 3
• p i closest pot j exists: {0, 1} depending on whether jth closest pot is found. If 0, then all other pot features are 0. Note: can be 0 even if there are more than j pots on layout, if the pot is not reachable by player i (length 1)
• p i closest pot j is empty|is full|is cooking|is ready: {0, 1} depending on boolean value for jth closest pot (length 4)
• p i closest pot j num onions|num tomatoes: int value for number of this ingredient in jth closest pot (length 2)
• p i closest pot j cook time: int value for seconds remaining on soup. 0 if no soup is cooking (length 1)
• p i closest pot j: (dx, dy) to jth closest pot from player i location (length 2)
• p i wall j: {0, 1} boolean value of whether player i has a wall immediately in direction j (length 4)
The remaining components of the observation vector are as follows:
other player features (length 46): ordered concatenation of player j features for j ̸= i player i dist to other player (length 2): [player j.pos - player i.pos for j ̸= i]
player i position (length 2)
1.4 Action Space
The action space is discrete with six possible actions: up, down, left, right, stay, and ”interact,” which is a contextual action determined by the tile the player is facing (e.g. placing an onion when facing a counter). Each layout has one or more onion dispensers and dish dispensers, which provide an unlimited supply of onions and dishes respectively.
1.5 Installation Notes
The environment is officially supported on Python 3.7 and is installed via pip install overcooked-ai. We recommend you run in Anaconda. We require the use of PyTorch if using deep learning methods. You absolutely do not need a GPU to solve any of the layouts in less than 10 hours (in fact, GPUs typically slow RL algorithms down). To help you with getting started, we are providing you with a Jupyter notebook. You may create a copy of this notebook in order to run the starting code. This notebook demonstrates installing, building, interacting with, and visualizing the environment. You are not required to use this notebook in your project, but we encourage you to use it as a companion to this document to better understand the environment.
1.6 IMPORTANT: Reward Shaping Addendum
If you plan on using reward shaping, take a look at how the default shaped rewards are swapped by the agent index in the provided notebook. Upon episode reset, agents are assigned randomly to one of the 2 starting positions. This assignment is only reflected in the official observation that is returned to you by the environment’s step method. Any state variable you obtain from the Overcooked environment that is not in this observation variable (including anything in the info dictionary or the base environment) needs to be similarly swapped. Failure to do this means you will be assigning credit to the wrong agent roughly half the time, crippling your algorithm.
For more details on installation and operation, refer to the GitHub repository - https://github.com/ HumanCompatibleAI/overcooked_ai
1.7 Strategy Recommendations
You are free to pursue any multi-agent RL strategies in your soup-cooking quest. For instance, you may pursue a novel reward-shaping technique, however, make sure that the method chosen is relevant to multi-agent RL problems. We strongly recommend that you (1) start with your Project 2 solution adapted to this problem and (2) start with the cramped room and asymmetric advantages layouts. Below are further examples of strategies worth pursuing:

– Overcooked 4
• using reward shaping techniques for improving multi-agent considerations such as collaboration and credit
assignment;
• asynchronous methods Mnih et al. 2016;
• centralizing training and decentralizing execution (Lowe et al. 2017; J. N. Foerster et al. 2017);
• value factorisation Rashid, Samvelyan, De Witt, et al. 2020;
• employing curriculum learning (some single-agent ideas in this dissertation may be interesting and easy to extend to the multi-agent case e.g., Narvekar 2017).
• adding communication protocols (J. Foerster et al. 2016);
• improving multi-agent credit assignment (J. N. Foerster et al. 2017; Zhou et al. 2020);
• improving multi-agent exploration (Iqbal and Sha 2019; Wang et al. 2019)
• finding better inductive biases (i.e., choosing the function space for policy/value function approximation) to handle the exponential complexity of multi-agent learning, e.g., graph neural networks (Battaglia et al. 2018; Naderializadeh et al. 2020).
1.8 Procedure
This problem is more sophisticated than anything you have seen so far in this course. Make sure you reserve enough time to consider what an appropriate approach might involve and, of course, enough time to build and train it.
• Clearly define the direction of your project and which aspect(s) you aim to improve upon over your Project 2 baseline, assuming that that baseline was unable to solve all of the layouts. For example, do you want to improve collaboration among your agents?
– This includes why you think your algorithm/procedure will accomplish this and whether or not your results demonstrate success.
• Implement a solution that produces such improvements.
– Use any algorithms/strategy as inspiration for your solution.
– The focus of this project is to try new algorithms/solutions, rather than to simply im- prove hyper-parameters of the algorithms already implemented. Further, avoid search- ing for random seeds that happen to work the best as this is inconsequential analysis. Remember that the algorithm/reward-shaping/hyperparameters must be fixed across all 5 layouts.
– Justify the choice of that solution and explain why you expect it to produce these improvements.
– Even if your solution does not solve all of the layouts, you still have the ability to write
a solid paper.
– Upload/maintain your code in your private repo at https://github.gatech.edu/gt-omscs-rldm.
• Describe your experiments and create graphs that demonstrate the success/failure of your solution.
– You must provide one graph demonstrating the number of soups made across all five layouts during training. You can combine all five layouts’ plots onto one graph if you wish. Displaying a simple moving average for each layout’s training run is suggested to help with clarity.
– You must provide one graph demonstrating performance of your trained agent on each layout over at least 100 consecutive episodes. Again, you can combine all five layouts’ plots into one graph. If all five of these graphs are flat lines (a possible consequence of using a deterministic algorithm on a deterministic environment), then a bar graph is ok.
– Additionally, you must provide at least two graphs using metrics you decided on that are significant for your hypothesis/goal.
– Analyze your results and explain the reasons for the success/failure of your solution.

– Overcooked 5
– Since graphs are largely decided by you, they should have clear axis, labels, and captions. You will
lose points for graphs that do not have any description or label of the information being displayed.
– Example metrics you might consider are number of dish pickups, dropped dishes, incorrect deliveries, or picked up onions. These example metrics and more are built-in to the environment and are accessible via the info variable at the end of an episode. In your report you should clearly motivate why you are interested in a particular metric. See the provided notebook.
• We’ve created a private Georgia Tech GitHub repository for your code. Push your code to the personal repository found here: https://github.gatech.edu/gt-omscs-rldm.
• The quality of the code is not graded. You do not have to spend countless hours adding comments, etc. However, the TAs will examine code during grading.
• Make sure to include a README.md file for your repository that we can use to run your code.
– Include thorough and detailed instructions on how to run your source code in the README.md.
– If you work in a notebook, like Jupyter, include an export of your code in a .py file along with your notebook.
– The README.md file should be placed in the project 3 folder in your repository.
• You will be penalized by 25 points if you:
– Do not have any code or do not submit your full code to the GitHub repository; or – Do not include the git hash for your last commit in your paper.
• Write a paper describing your agents and the experiments you ran.
– Include the hash for your last commit to the GitHub repository in the header on the first page of
your paper.
– Make sure your graphs are legible and you cite sources properly. While it is not required, we recommend you use a conference paper format. For example: https://www.ieee.org/conferences/ publishing/templates.html.
– 5 pages maximum—really, you will lose points for longer papers.
– Explain your algorithm(s).
– Explain your training implementation and experiments.
– An ablation study would be a interesting way to find out the different components of the algorithm that contribute to your metric. (See J. N. Foerster et al. 2017.)
– Graphs highlighting your implementations successes and/or failures.
– Explanation of algorithms used: what worked best? what didn’t work? what could have worked
better?
– Justify your choices.
∗ Unlike Project 1, there are multiple ways of solving this problem and you have a lot of discretion over the general approach you take as well as experimental design decisions. Explain to the reader why, from amongst the multiple alternatives, you chose the ones you did.
∗ Your focus should be on justifying the algorithm/techniques you implemented.
– Explanation of pitfalls and problems you encountered.
– What would you try if you had more time?
– Save this paper in PDF format.
– Submit to Canvas!
1.9 Resources
1.9.1 Lectures
• Lesson 11A: Game Theory
• Lesson 11B: Game Theory Reloaded
• Lesson 11C: Game Theory Revolutions

– Overcooked 6
1.9.2 Readings
• J. N. Foerster et al. 2017
• Lowe et al. 2017
• Rashid, Samvelyan, Witt, et al. 2018
1.9.3 Talks
• Factored Value Functions for Cooperative Multi-Agent Reinforcement Learning • Counterfactual Multi-Agent Policy Gradients
• Learning to Communicate with Deep Multi-Agent Reinforcement Learning
• Automatic Curricula in Deep Multi-Agent Reinforcement Learning
1.10 Submission Details
The due date is indicated on the Canvas page for this assignment. Make sure you have set your timezone in Canvas to ensure the deadline is accurate.
Due Date: Indicated as “Due” on Canvas
Late Due Date [20 point penalty per day]: Indicated as “Until” on Canvas
The submission consists of:
• Your written report in PDF format (Make sure to include the git hash of your last commit.) • Your source code
To complete the assignment, submit your written report to Project 3 under your Assignments on Canvas (https://gatech.instructure.com) and submit your source code to your personal reposi- tory on Georgia Tech’s private GitHub
You may submit the assignment as many times as you wish up to the due date, but, we will only consider your last submission for grading purposes. Late submissions will receive a cumulative 20 point penalty per day. That is, any projects submitted after midnight AOE on the due date will receive a 20 point penalty. Any projects submitted after midnight AOE the following day will receive another 20 point penalty (a 40 point penalty in total) and so on. No project will receive a score less than a zero no matter what the penalty. Any projects more than 4 days late and any missing submissions will receive a 0.
Please be aware, if Canvas marks your assignment as late, you will be penalized. This means one second late is treated the same as three hours late, and will receive the same penalty as described in the breakdown above. Additionally, if you resubmit your project and your last submission is late, you will incur the penalty corresponding to the time of your last submission. Submit early and often.
Finally, if you have received an exception from the Dean of Students for a personal or medical emergency we will consider accepting your project up to 7 days after the initial due date with no penalty. Students requiring more time should consider taking an incomplete for this semester as we will not be able to grade their project.
1.11 Grading and Regrading
When your assignments, projects, and exams are graded, you will receive feedback explaining your successes and errors in some level of detail. This feedback is for your benefit, both on this assignment and for future assignments. It is considered a part of your learning goals to internalize this feedback. This is one of many learning goals for this course, such as: understanding game theory, random variables, and noise.
If you are convinced that your grade is in error in light of the feedback, you may request a regrade within a week of the grade and feedback being returned to you. A regrade request is only valid if it includes an explanation of where the grader made an error. Create a private Ed Discussion post titled “[Request] Regrade Project 3”. In the Details add sufficient explanation as to why you think the grader made a mistake. Be concrete and specific. We will not consider requests that do not follow these directions.

– Overcooked 7 1.12 Words of Encouragement
We understand this is a daunting project with many possible design directions to consider. As Graduate Students in Computer Science, projects that allow you to challenge and expand your skills in a practical and low-stakes manner are crucial. These projects are ideal for testing the knowledge you have garnered throughout the course and applying yourself to a difficult problem commonly faced when applying reinforcement learning in industry. After completing the course, a project like this can be valuable to highlight during interviews, to demonstrated your newfound knowledge to current employers, or to add a (new) section on your resume. Historically, many students have reported back the positive interactions encountered when discussing their projects, sometimes leading to job offers or promotions. However, please remember not to publicly post your report or code. The project is a good talking point and you would be within the bounds of the GT Honor Code if you were to share it privately with a potential employer (if you so desire), however making any part of this project publicly available would be a violation of the GT Honor Code.
We encourage you to start early and dive head-first into the project to try as many options as possible. We strongly believe the more successes and failures you experience, the greater your growth and learning will be.
The teaching staff is dedicated to helping as much as possible. We are excited to see how you will approach the problem and have many resources available to help. Over the next several Office Hours, we will be discussing various approaches in detail, as well as dive deeper into approaches on Ed Discussions. We are here to help you and want to see you succeed! With all that said:
Good luck and happy coding!

請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp

掃一掃在手機打開當前頁
  • 上一篇:代寫EMATM0050 DSMP MSc in Data Science
  • 下一篇:COMP284 代做、Java 語言編程代寫
  • 無相關信息
    合肥生活資訊

    合肥圖文信息
    2025年10月份更新拼多多改銷助手小象助手多多出評軟件
    2025年10月份更新拼多多改銷助手小象助手多
    有限元分析 CAE仿真分析服務-企業/產品研發/客戶要求/設計優化
    有限元分析 CAE仿真分析服務-企業/產品研發
    急尋熱仿真分析?代做熱仿真服務+熱設計優化
    急尋熱仿真分析?代做熱仿真服務+熱設計優化
    出評 開團工具
    出評 開團工具
    挖掘機濾芯提升發動機性能
    挖掘機濾芯提升發動機性能
    海信羅馬假日洗衣機亮相AWE  復古美學與現代科技完美結合
    海信羅馬假日洗衣機亮相AWE 復古美學與現代
    合肥機場巴士4號線
    合肥機場巴士4號線
    合肥機場巴士3號線
    合肥機場巴士3號線
  • 短信驗證碼 目錄網 排行網

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    久久久久影视| 玖玖在线播放| 先锋影音国产精品| 色综合.com| 蜜桃91丨九色丨蝌蚪91桃色| 欧美a一欧美| 国产精一区二区| 日本在线中文字幕一区二区三区| 91久久久精品国产| 91国内精品| 国产不卡一区| 日本vs亚洲vs韩国一区三区二区 | 极品中文字幕一区| 日韩一级电影| 欧美日韩亚洲一区| 欧美一级免费| 日韩成人精品一区| 日韩亚洲精品在线| 亚洲国内精品| 国产精品久久久久av蜜臀| 美女毛片一区二区三区四区最新中文字幕亚洲 | 同性恋视频一区| 麻豆国产精品一区二区三区 | 久久精品国产68国产精品亚洲| 欧美日韩视频网站| 午夜精品偷拍| 在线日韩中文| 久久狠狠久久| 亚洲视频一起| 日韩动漫一区| 国产一区网站| 99精品视频在线免费播放| 国产成人精品一区二区三区免费| 老司机精品视频网站| 一区二区三区在线资源| 国产一区二区亚洲| 最新亚洲国产| 日日夜夜免费精品视频| 粉嫩av一区二区三区四区五区 | av免费不卡| 国产亚洲在线| 99热在线精品观看| 亚洲黄色影片| 在线午夜精品| 亚洲欧美日本视频在线观看| 亚洲精品小说| 日韩午夜免费| 亚洲美女一区| 国产一区成人| 男人操女人的视频在线观看欧美 | 中文久久电影小说| 五月激激激综合网色播| 欧洲精品99毛片免费高清观看| 99综合99| 亚洲69av| 国产一区二区区别| 日韩黄色网络| 成人自拍在线| 麻豆国产欧美一区二区三区r| 精品久久ai电影| 欧美成a人免费观看久久| 久久久久91| 精品中文字幕一区二区三区av| 国内激情久久| 日韩午夜激情| 91亚洲国产| 日本另类视频| 亚洲国产高清一区二区三区| 亚欧美中日韩视频| 国产精品久一| 日韩av在线发布| 成人毛片在线| 狠狠色综合网| 亚洲欧洲美洲av| 久久久久伊人| 亚洲视频电影在线| 日韩av不卡一区| 女同一区二区三区| 亚洲免费激情| 一区二区三区四区日本视频| 高清av一区| 高清一区二区三区av| 日韩成人伦理电影在线观看| 精品72久久久久中文字幕| 婷婷久久国产对白刺激五月99| 久久先锋影音| 香蕉久久久久久| 99热这里有精品| 精品国产一区探花在线观看 | 国内精品美女在线观看| 国产一区二区三区91| av一级亚洲| 香蕉av一区二区| 国产在线精彩视频| 国产精品主播| 视频精品二区| 黄色工厂这里只有精品| 欧美特黄aaaaaaaa大片| 91精品视频一区二区| 91蜜桃臀久久一区二区| 日韩午夜在线| 美女久久久久久| 日韩激情一区二区| 狠狠综合久久av一区二区老牛| 麻豆成全视频免费观看在线看| 日日噜噜夜夜狠狠视频欧美人| 日韩不卡一区二区三区| 国产99久久| 日韩欧美1区| 国产欧美日韩视频在线| 99久久婷婷国产综合精品电影√| 日韩www.| 在线精品亚洲| 日韩精品四区| 成人福利视频| 国产va免费精品观看精品视频 | 91成人精品| 日韩欧美精品| 亚洲日产av中文字幕| 国产二区精品| 国产精品一级| 精品三级av在线导航| 日韩成人免费| 国产精品亚洲一区二区在线观看 | 久久uomeier| 亚洲精品自拍| 欧美日韩激情| 久久精品国产久精国产| 黄色免费大全亚洲| 中文av在线全新| 亚洲国产国产| 三级在线观看一区二区| 中文字幕日韩一区二区不卡| 国产主播精品| 99精品国产福利在线观看免费| japanese色系久久精品| 蜜臀av国产精品久久久久| 欧美激情日韩| 欧美日韩国产综合网| 日本欧美韩国一区三区| 99精品在线观看| 国产精品亚洲成在人线| 热久久天天拍国产| 99欧美精品| 国产精品对白| 成人在线高清| 久久久夜夜夜| 免费在线亚洲欧美| 激情久久久久久久| 在线精品亚洲| 亚洲免费一区二区| 国产精品日本一区二区三区在线| 夜夜嗨网站十八久久| 麻豆精品国产传媒mv男同| 婷婷丁香综合| 综合久草视频| 蜜桃精品视频在线观看| 偷拍亚洲精品| 日韩成人高清| 亚洲私人影院| 日本女人一区二区三区| 香蕉国产精品偷在线观看不卡| 国产精品视频一区视频二区 | 免费在线欧美视频| 久久国际精品| 成人在线黄色| 91久久久精品国产| 国产乱码精品一区二区亚洲| 国产高潮在线| 久久裸体视频| 成人在线分类| 亚洲优女在线| 久久九九免费| 国内精品视频| 午夜欧美激情| 欧美va天堂| 婷婷亚洲精品| 亚洲国产导航| 国产精品成人av| 久久久综合色| 欧美女王vk| 国产91亚洲精品久久久| 亚洲欧美色图| 日韩欧美高清一区二区三区| 日韩美女在线| 久久性天堂网| 久久一区二区三区喷水| 国产成人调教视频在线观看| 亚洲成人不卡| 午夜在线一区二区| 成人羞羞在线观看网站| 国产精品日韩精品中文字幕| 88xx成人免费观看视频库| 午夜精品亚洲| 人人狠狠综合久久亚洲婷| 91精品在线免费视频| 成人黄色免费观看| 日韩av专区| 日韩午夜黄色| 99精品电影|