ÆÄÀÌÅäÄ¡¿Í À¯´ÏƼ ML-Agents·Î ¹è¿ì´Â °­È­ÇнÀ

¹Î±Ô½Ä ¿Ü ÁöÀ½ | À§Å°ºÏ½º
  • µî·ÏÀÏ2022-09-26
  • ÆÄÀÏÆ÷¸Ëpdf
  • ÆÄÀÏÅ©±â32 M  
  • Áö¿ø±â±â¾ÆÀÌÆù, ¾ÆÀÌÆÐµå, ¾Èµå·ÎÀ̵å, ÅÂºí¸´, PC
  • ÆòÁ¡ ÆòÁ¡Á¡ Æò°¡¾øÀ½

Ã¥¼Ò°³

À¯´ÏƼ ML-Agents´Â °ÔÀÓ ¿£ÁøÀÎ À¯´ÏƼ¸¦ ÅëÇØ Á¦ÀÛÇÑ ½Ã¹Ä·¹ÀÌ¼Ç È¯°æÀ» °­È­ÇнÀÀ» À§ÇÑ È¯°æÀ¸·Î ¸¸µé¾îÁÖ´Â °í¸¶¿î µµ±¸ÀÌ´Ù. ÇÏÁö¸¸ ¾ÆÁ÷±îÁöµµ ML-Agents, ±×Áß¿¡¼­µµ ƯÈ÷ ML-Agents 2.0 ÀÌÈÄÀÇ ¹öÀüÀ» ´Ù·ç´Â Âü°í ÀÚ·á°¡ ¸¹Áö ¾Ê±â ¶§¹®¿¡ ML-Agents¸¦ »ç¿ëÇÏ´Â µ¥ ¾î·Á¿òÀÌ ¸¹¾Ò´Ù. ÀÌ Ã¥Àº À¯´ÏƼ, ML-Agents, ½ÉÃþ°­È­ÇнÀ µî À¯´ÏƼ ML-Agents¸¦ »ç¿ëÇÏ´Â µ¥ ÇÊ¿äÇÑ ´Ù¾çÇÑ ³»¿ëÀ» ´Ù·é´Ù. ¶ÇÇÑ ÀÌ Ã¥Àº 2020³â Ãâ°£µÈ ¡ìÅÙ¼­ÇÃ·Î¿Í À¯´ÏƼ ML-Agents·Î ¹è¿ì´Â °­È­ÇнÀ¡íÀÇ °³Á¤ÆÇÀ¸·Î ÃֽŹöÀüÀÇ ML-Agents¿¡ ´ëÇÑ ³»¿ëÀ» ´Ù·ç°í ÀÖ´Ù.

ÀúÀÚ¼Ò°³

ÇѾç´ëÇб³ ¹Ì·¡ÀÚµ¿Â÷°øÇаú¿¡¼­ ¹Ú»çÇÐÀ§¸¦ ÃëµæÇßÀ¸¸ç ÇöÀç īī¿À¿¡¼­ AI ¿£Áö´Ï¾î·Î ÀÏÇϰí ÀÖ´Ù. °­È­ÇнÀ °ü·Ã ÆäÀ̽ººÏ ±×·ìÀÎ Reinforcement Learning KoreaÀÇ ¿î¿µÁøÀ¸·Î Ȱµ¿Çϰí ÀÖÀ¸¸ç À¯´ÏƼ ÄÚ¸®¾Æ¿¡¼­ °øÀÎÇÑ À¯´ÏƼ Àü¹®°¡ ±×·ìÀÎ Unity Masters 3~5±â·Î Ȱµ¿Çß´Ù.

¸ñÂ÷

¢Ã 1Àå: °­È­ÇнÀÀÇ °³¿ä

1.1 °­È­ÇнÀÀ̶õ?

___1.1.1 ±â°èÇнÀÀ̶õ?

___1.1.2 °­È­ÇнÀÀÇ ¼º°ú

1.2 °­È­ÇнÀÀÇ ±âÃÊ ¿ë¾î

1.3 °­È­ÇнÀÀÇ ±âÃÊ ÀÌ·Ð

___1.3.1 º§¸¸ ¹æÁ¤½Ä

___1.3.2 ŽÇè(exploration)°ú ÀÌ¿ë(exploitation)



¢Ã 2Àå: À¯´ÏƼ ML_Agents »ìÆìº¸±â

2.1 À¯´ÏƼ¿Í ML-Agents

___2.1.1 À¯´ÏƼ

___2.1.2 ML-Agents

2.2 À¯´ÏƼ ¼³Ä¡ ¹× ±âÃÊ Á¶ÀÛ¹ý

___2.2.1 À¯´ÏƼ Çãºê ´Ù¿î·Îµå ¹× ¼³Ä¡

___2.2.2 À¯´ÏƼ ¶óÀ̼±½º Ȱ¼ºÈ­

___2.2.3 À¯´ÏƼ ¿¡µðÅÍ ¼³Ä¡

___2.2.4 À¯´ÏƼ ÇÁ·ÎÁ§Æ® »ý¼º

___2.2.5 À¯´ÏƼ ÀÎÅÍÆäÀ̽º

___2.2.6 À¯´ÏƼÀÇ ±âÃÊÀûÀÎ Á¶ÀÛ

2.3 ML-Agents ¼³Ä¡

___2.3.1 ML-Agents ÆÄÀÏ ³»·Á¹Þ±â

___2.3.2 À¯´ÏƼ¿¡ ML-Agents ¼³Ä¡Çϱâ

___2.3.3 ML-Agents ÆÄÀÌ½ã ÆÐŰÁö ¼³Ä¡Çϱâ

2.4 ML-AgentsÀÇ ±¸¼º ¿ä¼Ò

___2.4.1 Behavior Parameters

___2.4.2 Agent Script

___2.4.3 Decision Requester, Model Overrider

___2.4.4 ȯ°æ ºôµåÇϱâ

2.5 mlagents-learnÀ» ÀÌ¿ëÇØ ML-Agents »ç¿ëÇϱâ

___2.5.1 ML-Agents¿¡¼­ Á¦°øÇÏ´Â °­È­ÇнÀ ¾Ë°í¸®Áò

___2.5.2 ML-Agents¿¡¼­ Á¦°øÇÏ´Â ÇнÀ ¹æ½Ä

___2.5.3 PPO ¾Ë°í¸®ÁòÀ» ÀÌ¿ëÇÑ 3DBall ȯ°æ ÇнÀ

2.6 Python-API¸¦ ÀÌ¿ëÇØ ML-Agents »ç¿ëÇϱâ

___2.6.1 Python-API¸¦ ÅëÇÑ ¿¡ÀÌÀüÆ® ·£´ý Á¦¾î



¢Ã 3Àå: ±×¸®µå¿ùµå ȯ°æ ¸¸µé±â

3.1 ÇÁ·ÎÁ§Æ® ½ÃÀÛÇϱâ

3.2 ±×¸®µå¿ùµå ½ºÅ©¸³Æ® ¼³¸í

3.3 º¤ÅÍ °üÃø Ãß°¡ ¹× ȯ°æ ºôµå

3.4 ¹ø¿Ü: ÄÚµå ÃÖÀûÈ­ Çϱâ



¢Ã 4Àå: Deep Q Network(DQN)

4.1 DQN ¾Ë°í¸®ÁòÀÇ ¹è°æ

___4.1.1 °¡Ä¡ ±â¹Ý °­È­ÇнÀ

___4.1.2 DQN ¾Ë°í¸®ÁòÀÇ °³¿ä

4.2 DQN ¾Ë°í¸®ÁòÀÇ ±â¹ý

___4.2.1 °æÇè ¸®Ç÷¹ÀÌ(experience replay)

___4.2.2 Ÿ±ê ³×Æ®¿öÅ©(target network)

4.3 DQN ÇнÀ

4.4 DQN ÄÚµå

___4.4.1 ¶óÀ̺귯¸® ºÒ·¯¿À±â ¹× ÆÄ¶ó¹ÌÅÍ °ª ¼³Á¤

___4.4.2 Model Ŭ·¡½º

___4.4.3 Agent Ŭ·¡½º

___4.4.4 Main ÇÔ¼ö

___4.4.5 ÇнÀ °á°ú



¢Ã 5Àå: µå·Ð ȯ°æ ¸¸µé±â

5.1 A2C ¾Ë°í¸®ÁòÀÇ °³¿ä

5.2 ¾×ÅÍ-Å©¸®Æ½ ³×Æ®¿öÅ©ÀÇ ±¸Á¶

5.3 A2C ¾Ë°í¸®ÁòÀÇ ÇнÀ °úÁ¤

5.4 A2CÀÇ ÀüüÀûÀÎ ÇнÀ °úÁ¤

5.5 A2C ÄÚµå

___5.5.1 ¶óÀ̺귯¸® ºÒ·¯¿À±â ¹× ÆÄ¶ó¹ÌÅÍ °ª ¼³Á¤

___5.5.2 Model Ŭ·¡½º

___5.5.3 Agent Ŭ·¡½º

___5.5.4 Main ÇÔ¼ö

5.5.5 ÇнÀ °á°ú



¢Ã 6Àå: Advantage Actor Critic(A2C)

6.1 ÇÁ·ÎÁ§Æ® ½ÃÀÛÇϱâ

6.2 µå·Ð ¿¡¼Â °¡Á®¿À±â & ¿ÀºêÁ§Æ® Ãß°¡

___6.2.1 ¿¡¼Â½ºÅä¾î¿¡¼­ µå·Ð ¿¡¼Â ³»·Á¹Þ±â

___6.2.2 µå·Ð ȯ°æ Á¦ÀÛÇϱâ

6.3 ½ºÅ©¸³Æ® ¼³¸í

___6.3.1 DroneSetting ½ºÅ©¸³Æ®

___6.3.2. DroneAgent ½ºÅ©¸³Æ®

6.4 µå·Ð ȯ°æ ½ÇÇà ¹× È¯°æ ºôµå



¢Ã 7Àå: Deep Deterministic Policy Gradient(DDPG)

7.1 DDPG ¾Ë°í¸®ÁòÀÇ °³¿ä

7.2 DDPG ¾Ë°í¸®ÁòÀÇ ±â¹ý

___7.2.1 °æÇè ¸®Ç÷¹ÀÌ(experience replay)

___7.2.2 Ÿ±ê ³×Æ®¿öÅ©(target network)

___7.2.3 ¼ÒÇÁÆ® Ÿ±ê ¾÷µ¥ÀÌÆ®(soft target update)

___7.2.4 OU ³ëÀÌÁî(Ornstein Uhlenbeck Noise)

7.3 DDPG ÇнÀ

___7.3.1 Å©¸®Æ½ ³×Æ®¿öÅ© ¾÷µ¥ÀÌÆ®

___7.3.2 ¾×ÅÍ ³×Æ®¿öÅ© ¾÷µ¥ÀÌÆ®

7.4 DDPG ÄÚµå

___7.4.1 ¶óÀ̺귯¸® ºÒ·¯¿À±â ¹× ÆÄ¶ó¹ÌÅÍ °ª ¼³Á¤

___7.4.2 OU Noise Ŭ·¡½º

___7.4.3 Actor Ŭ·¡½º

___7.4.4 Critic Ŭ·¡½º

___7.4.5 Agent Ŭ·¡½º

___7.4.6 Main ÇÔ¼ö

___7.4.7 ÇнÀ °á°ú



¢Ã 8Àå: īƮ·¹ÀÌ½Ì È¯°æ ¸¸µé±â

8.1 ÇÁ·ÎÁ§Æ® ½ÃÀÛÇϱâ

8.2 īƮ·¹ÀÌ½Ì È¯°æ ±¸¼ºÇϱâ

8.3 ½ºÅ©¸³Æ® ÀÛ¼º ¹× ºôµåÇϱâ



¢Ã 9Àå: Behavioral Cloning(BC)

9.1 Behavioral Cloning ¾Ë°í¸®ÁòÀÇ °³¿ä

9.2 Behavioral Cloning ¾Ë°í¸®ÁòÀÇ ±â¹ý

___9.2.1 º¸»óÀÌ À½¼öÀÎ µ¥ÀÌÅÍ Á¦¿ÜÇϱâ

9.3 Behavioral Cloning ÇнÀ

9.4 Behavioral Cloning ¾Ë°í¸®Áò ÄÚµå

___9.4.1 ¶óÀ̺귯¸® ºÒ·¯¿À±â ¹× ÆÄ¶ó¹ÌÅÍ °ª ¼³Á¤

___9.4.2 Model Ŭ·¡½º

___9.4.3 Agent Ŭ·¡½º

___9.4.4 Main ÇÔ¼ö

___9.4.5 ÇнÀ °á°ú

9.5 ml-agentsÀÇ ³»Àå Imitation Learning »ç¿ë

___9.5.1 ML-Agents¿¡¼­ Á¦°øÇÏ´Â Behavioral Cloning ¾Ë°í¸®Áò

___9.5.2 ML-Agents¿¡¼­ Á¦°øÇÏ´Â GAIL ¾Ë°í¸®Áò

___9.5.3 ¸ð¹æÇнÀÀ» À§ÇÑ Config ÆÄÀÏ ¼³Á¤

___9.5.4 ml-agent¿¡¼­ÀÇ ¸ð¹æÇнÀ °á°ú



¢Ã 10Àå: ¸¶¹«¸®

10.1 ±âÃÊÆí ³»¿ë Á¤¸®

10.2 Ãß°¡ ÇнÀ ÀÚ·á

___10.2.1 À¯´ÏƼ

___10.2.2 À¯´ÏƼ ML-Agents

___10.2.3 °­È­ÇнÀ

10.3 ÀÀ¿ëÆí¿¡¼­ »ìÆìº¼ ³»¿ë

ÇÑÁÙ ¼­Æò