再探 游戏 《 2048 》 —— AI方法—— 缘起、缘灭(7) —— Python版本实现的《2048》游戏的TDL算法
《2048》游戏在线试玩地址:
如何解决《2048》游戏源于外网的一个讨论帖子,而这个帖子则是讨论如何解决该游戏的最早开始,可谓是“缘起”:
What is the optimal algorithm for the game 2048?
关于该游戏的相关内容前面已经写过一些内容:
再探 游戏 《 2048 》 —— AI方法—— 缘起、缘灭(1) —— Firefox浏览器下自动运行游戏篇
===========================================
在网上发现了一个对《2048》游戏的TDL解法的一个C++版本实现,地址:https://github.com/moporgic/TDL2048-Demo
本文就是介绍根据这个实现用python语言重构,也就是重新实现的python版本的TDL算法。
改用python实现的代码地址:
https://gitee.com/devilmaycry812839668/tdl2048-python-demo
根据网友实现的《2048》游戏的TDL解法,使用python语言重写的,性能是难以与原版C++实现所比较的。本库意义在于代码逻辑的示范,并没有太多实际运行性能的价值。这里所使用的TDL算法是参考README中的文献所实现的,但是需要注意这里并没有严格实现,仅仅是实现了论文中部分的算法。
=============================================
原C++版本实现的运行结果:(地址:https://github.com/moporgic/TDL2048-Demo )
================================================
按照C++版本的逻辑和参数设置使用Python重写的代码:
python实现的代码地址:
https://gitee.com/devilmaycry812839668/tdl2048-python-demo
运行结果:
第一次:
90000 mean = 51467 max = 171436
128 100.0% ( 0.6%)
256 99.4% ( 0.5%)
512 98.9% ( 2.2%)
1024 96.7% (17.5%)
2048 79.2% (33.6%)
4096 45.6% (39.5%)
8192 6.1% ( 6.1%)
91000 mean = 52003 max = 175620
64 100.0% ( 0.1%)
128 99.9% ( 0.1%)
256 99.8% ( 0.5%)
512 99.3% ( 3.0%)
1024 96.3% (16.7%)
2048 79.6% (33.3%)
4096 46.3% (39.7%)
8192 6.6% ( 6.6%)
92000 mean = 51713 max = 179520
64 100.0% ( 0.1%)
256 99.9% ( 0.6%)
512 99.3% ( 3.4%)
1024 95.9% (16.3%)
2048 79.6% (33.0%)
4096 46.6% (40.3%)
8192 6.3% ( 6.3%)
93000 mean = 51767 max = 184252
64 100.0% ( 0.1%)
128 99.9% ( 0.3%)
256 99.6% ( 1.1%)
512 98.5% ( 3.6%)
1024 94.9% (16.2%)
2048 78.7% (32.7%)
4096 46.0% (39.2%)
8192 6.8% ( 6.8%)
94000 mean = 52040 max = 177072
64 100.0% ( 0.4%)
128 99.6% ( 0.3%)
256 99.3% ( 1.4%)
512 97.9% ( 4.0%)
1024 93.9% (16.9%)
2048 77.0% (30.6%)
4096 46.4% (38.7%)
8192 7.7% ( 7.7%)
95000 mean = 53317 max = 177396
32 100.0% ( 0.1%)
64 99.9% ( 0.2%)
128 99.7% ( 0.1%)
256 99.6% ( 0.5%)
512 99.1% ( 1.8%)
1024 97.3% (14.8%)
2048 82.5% (33.8%)
4096 48.7% (41.6%)
8192 7.1% ( 7.1%)
96000 mean = 55690 max = 174236
256 100.0% ( 0.4%)
512 99.6% ( 1.8%)
1024 97.8% (12.1%)
2048 85.7% (34.7%)
4096 51.0% (43.7%)
8192 7.3% ( 7.3%)
97000 mean = 52997 max = 177124
64 100.0% ( 0.1%)
128 99.9% ( 0.1%)
256 99.8% ( 1.0%)
512 98.8% ( 3.0%)
1024 95.8% (16.9%)
2048 78.9% (31.9%)
4096 47.0% (39.5%)
8192 7.5% ( 7.5%)
98000 mean = 52491 max = 178280
128 100.0% ( 0.6%)
256 99.4% ( 1.1%)
512 98.3% ( 2.8%)
1024 95.5% (15.3%)
2048 80.2% (33.1%)
4096 47.1% (40.0%)
8192 7.1% ( 7.1%)
99000 mean = 52579 max = 231680
128 100.0% ( 0.2%)
256 99.8% ( 1.5%)
512 98.3% ( 1.9%)
1024 96.4% (17.0%)
2048 79.4% (32.3%)
4096 47.1% (39.8%)
8192 7.3% ( 7.2%)
16384 0.1% ( 0.1%)
100000 mean = 54323 max = 176672
64 100.0% ( 0.2%)
256 99.8% ( 0.4%)
512 99.4% ( 2.1%)
1024 97.3% (15.5%)
2048 81.8% (31.6%)
4096 50.2% (42.8%)
8192 7.4% ( 7.4%) real 299m7.984s
user 299m2.076s
sys 0m4.188s
第二次:
90000 mean = 51416 max = 169020
128 100.0% ( 0.3%)
256 99.7% ( 1.0%)
512 98.7% ( 3.9%)
1024 94.8% (17.8%)
2048 77.0% (30.5%)
4096 46.5% (39.9%)
8192 6.6% ( 6.6%)
91000 mean = 52111 max = 176432
32 100.0% ( 0.1%)
256 99.9% ( 0.7%)
512 99.2% ( 2.8%)
1024 96.4% (14.9%)
2048 81.5% (33.0%)
4096 48.5% (43.0%)
8192 5.5% ( 5.5%)
92000 mean = 52641 max = 175848
64 100.0% ( 0.1%)
128 99.9% ( 0.2%)
256 99.7% ( 0.7%)
512 99.0% ( 2.9%)
1024 96.1% (15.2%)
2048 80.9% (34.0%)
4096 46.9% (40.4%)
8192 6.5% ( 6.5%)
93000 mean = 53224 max = 177336
64 100.0% ( 0.1%)
128 99.9% ( 0.3%)
256 99.6% ( 0.9%)
512 98.7% ( 3.0%)
1024 95.7% (14.4%)
2048 81.3% (32.5%)
4096 48.8% (41.6%)
8192 7.2% ( 7.2%)
94000 mean = 53501 max = 181752
128 100.0% ( 0.2%)
256 99.8% ( 0.6%)
512 99.2% ( 2.3%)
1024 96.9% (15.5%)
2048 81.4% (32.1%)
4096 49.3% (42.9%)
8192 6.4% ( 6.4%)
95000 mean = 54450 max = 173708
64 100.0% ( 0.3%)
128 99.7% ( 0.2%)
256 99.5% ( 0.5%)
512 99.0% ( 2.8%)
1024 96.2% (15.3%)
2048 80.9% (29.9%)
4096 51.0% (43.8%)
8192 7.2% ( 7.2%)
96000 mean = 55262 max = 226556
32 100.0% ( 0.1%)
64 99.9% ( 0.2%)
128 99.7% ( 0.2%)
256 99.5% ( 0.4%)
512 99.1% ( 2.6%)
1024 96.5% (15.6%)
2048 80.9% (30.3%)
4096 50.6% (42.3%)
8192 8.3% ( 8.2%)
16384 0.1% ( 0.1%)
97000 mean = 53725 max = 177588
128 100.0% ( 0.1%)
256 99.9% ( 0.1%)
512 99.8% ( 3.4%)
1024 96.4% (14.9%)
2048 81.5% (32.4%)
4096 49.1% (42.0%)
8192 7.1% ( 7.1%)
98000 mean = 54402 max = 176952
64 100.0% ( 0.1%)
128 99.9% ( 0.2%)
256 99.7% ( 1.0%)
512 98.7% ( 2.4%)
1024 96.3% (15.6%)
2048 80.7% (31.6%)
4096 49.1% (40.9%)
8192 8.2% ( 8.2%)
99000 mean = 55552 max = 176284
64 100.0% ( 0.2%)
128 99.8% ( 0.2%)
256 99.6% ( 0.9%)
512 98.7% ( 2.2%)
1024 96.5% (14.7%)
2048 81.8% (29.5%)
4096 52.3% (44.5%)
8192 7.8% ( 7.8%)
100000 mean = 55576 max = 174112
64 100.0% ( 0.1%)
128 99.9% ( 0.2%)
256 99.7% ( 0.6%)
512 99.1% ( 1.4%)
1024 97.7% (15.5%)
2048 82.2% (31.2%)
4096 51.0% (43.1%)
8192 7.9% ( 7.9%) real 306m48.606s
user 306m42.380s
sys 0m4.352s
第三次:
90000 mean = 48991 max = 158096
parameter mean = 1.5322926465899345 min = -12611.286509352407 max = 5502.92779377629
128 100.0% ( 0.1%)
256 99.9% ( 0.9%)
512 99.0% ( 4.1%)
1024 94.9% (15.4%)
2048 79.5% (36.0%)
4096 43.5% (38.2%)
8192 5.3% ( 5.3%)
91000 mean = 49475 max = 175724
parameter mean = 1.5583559250936798 min = -12611.286509352407 max = 5561.9897442909605
128 100.0% ( 0.2%)
256 99.8% ( 1.0%)
512 98.8% ( 3.7%)
1024 95.1% (15.7%)
2048 79.4% (34.6%)
4096 44.8% (40.2%)
8192 4.6% ( 4.6%)
92000 mean = 47784 max = 162568
parameter mean = 1.5773082791168676 min = -12611.286509352407 max = 5678.766629401056
32 100.0% ( 0.1%)
64 99.9% ( 0.2%)
128 99.7% ( 0.2%)
256 99.5% ( 1.3%)
512 98.2% ( 3.5%)
1024 94.7% (18.3%)
2048 76.4% (36.3%)
4096 40.1% (34.3%)
8192 5.8% ( 5.8%)
93000 mean = 50350 max = 156552
parameter mean = 1.6007116787981626 min = -12611.286509352407 max = 5777.117118806266
128 100.0% ( 0.1%)
256 99.9% ( 0.9%)
512 99.0% ( 3.8%)
1024 95.2% (16.1%)
2048 79.1% (33.4%)
4096 45.7% (41.2%)
8192 4.5% ( 4.5%)
94000 mean = 50930 max = 162044
parameter mean = 1.624377595843311 min = -12611.286509352407 max = 5785.769207357467
128 100.0% ( 0.1%)
256 99.9% ( 0.8%)
512 99.1% ( 2.6%)
1024 96.5% (15.9%)
2048 80.6% (35.1%)
4096 45.5% (39.8%)
8192 5.7% ( 5.7%)
95000 mean = 51721 max = 176832
parameter mean = 1.644080028640483 min = -12611.286509352407 max = 5738.999075391805
128 100.0% ( 0.2%)
256 99.8% ( 0.5%)
512 99.3% ( 3.3%)
1024 96.0% (15.3%)
2048 80.7% (33.7%)
4096 47.0% (39.5%)
8192 7.5% ( 7.5%)
96000 mean = 51399 max = 163788
parameter mean = 1.6663415442828855 min = -12611.286509352407 max = 5845.513382853591
128 100.0% ( 0.3%)
256 99.7% ( 0.5%)
512 99.2% ( 2.5%)
1024 96.7% (16.1%)
2048 80.6% (34.8%)
4096 45.8% (39.7%)
8192 6.1% ( 6.1%)
97000 mean = 51230 max = 166544
parameter mean = 1.686732530926021 min = -12611.286509352407 max = 5774.039565288521
128 100.0% ( 0.2%)
256 99.8% ( 0.5%)
512 99.3% ( 2.9%)
1024 96.4% (15.1%)
2048 81.3% (34.7%)
4096 46.6% (40.9%)
8192 5.7% ( 5.7%)
98000 mean = 52740 max = 160900
parameter mean = 1.7095415726931256 min = -12563.946177760263 max = 5843.970362728986
128 100.0% ( 0.2%)
256 99.8% ( 0.5%)
512 99.3% ( 2.8%)
1024 96.5% (13.5%)
2048 83.0% (33.7%)
4096 49.3% (44.0%)
8192 5.3% ( 5.3%)
99000 mean = 52669 max = 169220
parameter mean = 1.7307740083579728 min = -12476.152646821502 max = 5989.542236493472
64 100.0% ( 0.1%)
128 99.9% ( 0.2%)
256 99.7% ( 0.5%)
512 99.2% ( 3.6%)
1024 95.6% (16.2%)
2048 79.4% (30.9%)
4096 48.5% (41.7%)
8192 6.8% ( 6.8%)
100000 mean = 50804 max = 177192
parameter mean = 1.7455929446279508 min = -12425.416359450286 max = 5934.462230068022
128 100.0% ( 0.2%)
256 99.8% ( 0.8%)
512 99.0% ( 3.2%)
1024 95.8% (16.5%)
2048 79.3% (33.0%)
4096 46.3% (41.1%)
8192 5.2% ( 5.2%) real 296m31.332s
user 296m26.503s
sys 0m3.752s
第四次:
90000 mean = 52461 max = 173552
parameter mean = 1.6053481665698048 min = -7208.176952683409 max = 6427.410230296034
64 100.0% ( 0.1%)
128 99.9% ( 0.1%)
256 99.8% ( 0.7%)
512 99.1% ( 2.3%)
1024 96.8% (15.1%)
2048 81.7% (33.0%)
4096 48.7% (42.5%)
8192 6.2% ( 6.2%)
91000 mean = 55241 max = 178784
parameter mean = 1.6274640457672958 min = -7259.0587871490425 max = 6477.305533296316
256 100.0% ( 0.5%)
512 99.5% ( 2.6%)
1024 96.9% (13.5%)
2048 83.4% (33.3%)
4096 50.1% (42.8%)
8192 7.3% ( 7.3%)
92000 mean = 52429 max = 177220
parameter mean = 1.6318069277652643 min = -7355.596804147162 max = 6134.214002995824
64 100.0% ( 0.1%)
128 99.9% ( 0.3%)
256 99.6% ( 0.8%)
512 98.8% ( 3.4%)
1024 95.4% (13.0%)
2048 82.4% (34.8%)
4096 47.6% (41.7%)
8192 5.9% ( 5.9%)
93000 mean = 53196 max = 176908
parameter mean = 1.6571834176438884 min = -7395.123183171931 max = 6442.175016066894
128 100.0% ( 0.1%)
256 99.9% ( 0.4%)
512 99.5% ( 3.5%)
1024 96.0% (13.4%)
2048 82.6% (32.6%)
4096 50.0% (44.1%)
8192 5.9% ( 5.9%)
94000 mean = 54382 max = 173860
parameter mean = 1.675727360442215 min = -7541.959140000994 max = 6517.354366059809
64 100.0% ( 0.1%)
128 99.9% ( 0.1%)
256 99.8% ( 1.3%)
512 98.5% ( 2.7%)
1024 95.8% (12.9%)
2048 82.9% (33.4%)
4096 49.5% (41.6%)
8192 7.9% ( 7.9%)
95000 mean = 43978 max = 154944
parameter mean = 1.6698778814922732 min = -7572.789701277566 max = 6304.535102266191
64 100.0% ( 0.1%)
128 99.9% ( 0.9%)
256 99.0% ( 1.9%)
512 97.1% ( 7.9%)
1024 89.2% (19.2%)
2048 70.0% (32.4%)
4096 37.6% (32.7%)
8192 4.9% ( 4.9%)
96000 mean = 54040 max = 160380
parameter mean = 1.6912278785732446 min = -7719.388457403322 max = 6510.281642123573
128 100.0% ( 0.3%)
256 99.7% ( 0.4%)
512 99.3% ( 3.2%)
1024 96.1% (12.8%)
2048 83.3% (32.1%)
4096 51.2% (44.6%)
8192 6.6% ( 6.6%)
97000 mean = 54919 max = 168684
parameter mean = 1.707997573418896 min = -7894.983875587813 max = 6280.380637503087
64 100.0% ( 0.1%)
128 99.9% ( 0.2%)
256 99.7% ( 0.8%)
512 98.9% ( 2.5%)
1024 96.4% (12.7%)
2048 83.7% (34.6%)
4096 49.1% (40.5%)
8192 8.6% ( 8.6%)
98000 mean = 54004 max = 177120
parameter mean = 1.7254613897551536 min = -7987.297183766859 max = 6452.3736882617795
64 100.0% ( 0.1%)
128 99.9% ( 0.5%)
256 99.4% ( 0.6%)
512 98.8% ( 2.6%)
1024 96.2% (13.8%)
2048 82.4% (31.6%)
4096 50.8% (43.0%)
8192 7.8% ( 7.8%)
99000 mean = 57217 max = 177216
parameter mean = 1.7490571072457777 min = -8006.657356429137 max = 6541.148384830181
128 100.0% ( 0.1%)
256 99.9% ( 0.6%)
512 99.3% ( 1.5%)
1024 97.8% (12.0%)
2048 85.8% (31.9%)
4096 53.9% (46.0%)
8192 7.9% ( 7.9%)
100000 mean = 54750 max = 176264
parameter mean = 1.764116240567329 min = -8017.693119600374 max = 6523.890065745666
256 100.0% ( 0.9%)
512 99.1% ( 1.5%)
1024 97.6% (15.3%)
2048 82.3% (32.6%)
4096 49.7% (41.9%)
8192 7.8% ( 7.8%) real 316m59.817s
user 316m55.019s
sys 0m3.848s
第五次:
90000 mean = 52291 max = 179004
parameter mean = 1.577609627851082 min = -6410.535830712946 max = 6268.392610115872
32 100.0% ( 0.1%)
64 99.9% ( 0.2%)
128 99.7% ( 0.2%)
256 99.5% ( 0.5%)
512 99.0% ( 2.5%)
1024 96.5% (16.7%)
2048 79.8% (33.5%)
4096 46.3% (39.2%)
8192 7.1% ( 7.1%)
91000 mean = 50069 max = 177416
parameter mean = 1.58602477127092 min = -6416.8667830648565 max = 6116.279207138224
32 100.0% ( 0.1%)
64 99.9% ( 0.1%)
128 99.8% ( 0.2%)
256 99.6% ( 1.3%)
512 98.3% ( 3.4%)
1024 94.9% (15.7%)
2048 79.2% (34.8%)
4096 44.4% (37.9%)
8192 6.5% ( 6.5%)
92000 mean = 51593 max = 173324
parameter mean = 1.607900775327343 min = -6564.512670668733 max = 6032.862918894087
128 100.0% ( 0.3%)
256 99.7% ( 1.0%)
512 98.7% ( 3.0%)
1024 95.7% (18.2%)
2048 77.5% (32.4%)
4096 45.1% (37.3%)
8192 7.8% ( 7.8%)
93000 mean = 51460 max = 176452
parameter mean = 1.6219107678292823 min = -6678.20840805842 max = 6023.411176000316
128 100.0% ( 0.4%)
256 99.6% ( 0.7%)
512 98.9% ( 3.6%)
1024 95.3% (15.1%)
2048 80.2% (34.0%)
4096 46.2% (39.6%)
8192 6.6% ( 6.6%)
94000 mean = 52758 max = 173096
parameter mean = 1.6468634622310205 min = -6747.213382051944 max = 6295.77740079765
64 100.0% ( 0.2%)
256 99.8% ( 0.4%)
512 99.4% ( 3.2%)
1024 96.2% (15.7%)
2048 80.5% (32.1%)
4096 48.4% (41.9%)
8192 6.5% ( 6.5%)
95000 mean = 47877 max = 182596
parameter mean = 1.6551993956598714 min = -6748.069998331208 max = 6276.521022738931
128 100.0% ( 0.7%)
256 99.3% ( 0.7%)
512 98.6% ( 5.7%)
1024 92.9% (18.3%)
2048 74.6% (32.0%)
4096 42.6% (37.5%)
8192 5.1% ( 5.1%)
96000 mean = 52882 max = 182724
parameter mean = 1.6750486373931794 min = -6784.023377349516 max = 6130.883932820633
256 100.0% ( 0.6%)
512 99.4% ( 1.8%)
1024 97.6% (16.6%)
2048 81.0% (34.6%)
4096 46.4% (38.8%)
8192 7.6% ( 7.6%)
97000 mean = 52465 max = 181272
parameter mean = 1.6954002504581211 min = -6944.510544426434 max = 6366.379778545424
128 100.0% ( 0.4%)
256 99.6% ( 0.8%)
512 98.8% ( 3.4%)
1024 95.4% (13.9%)
2048 81.5% (33.9%)
4096 47.6% (40.7%)
8192 6.9% ( 6.9%)
98000 mean = 54046 max = 180240
parameter mean = 1.7180765658270194 min = -6851.223610376787 max = 6475.389397704871
128 100.0% ( 0.1%)
256 99.9% ( 0.1%)
512 99.8% ( 2.8%)
1024 97.0% (17.3%)
2048 79.7% (29.2%)
4096 50.5% (43.0%)
8192 7.5% ( 7.5%)
99000 mean = 54522 max = 178304
parameter mean = 1.7391720879705754 min = -6844.314693502977 max = 6668.951758086201
256 100.0% ( 0.3%)
512 99.7% ( 2.7%)
1024 97.0% (15.0%)
2048 82.0% (31.6%)
4096 50.4% (42.5%)
8192 7.9% ( 7.9%)
100000 mean = 53481 max = 171572
parameter mean = 1.756473951551881 min = -6983.806168335259 max = 6474.6341022093275
64 100.0% ( 0.1%)
256 99.9% ( 1.0%)
512 98.9% ( 3.1%)
1024 95.8% (15.8%)
2048 80.0% (34.1%)
4096 45.9% (37.2%)
8192 8.7% ( 8.7%) real 299m16.055s
user 299m11.208s
sys 0m3.960s
第六次:
90000 mean = 52543 max = 174820
parameter mean = 1.6059263026057993 min = -6550.597576404195 max = 6261.313263951085
64 100.0% ( 0.2%)
128 99.8% ( 0.2%)
256 99.6% ( 0.5%)
512 99.1% ( 2.7%)
1024 96.4% (16.3%)
2048 80.1% (31.8%)
4096 48.3% (42.5%)
8192 5.8% ( 5.8%)
91000 mean = 55358 max = 175220
parameter mean = 1.6251463892495772 min = -6702.8250139300135 max = 6310.60832493676
256 100.0% ( 0.7%)
512 99.3% ( 2.7%)
1024 96.6% (14.0%)
2048 82.6% (31.7%)
4096 50.9% (42.6%)
8192 8.3% ( 8.3%)
92000 mean = 56273 max = 236784
parameter mean = 1.6486751777321125 min = -6906.108255742565 max = 6371.227224105752
128 100.0% ( 0.2%)
256 99.8% ( 0.3%)
512 99.5% ( 2.3%)
1024 97.2% (14.0%)
2048 83.2% (30.0%)
4096 53.2% (45.4%)
8192 7.8% ( 7.7%)
16384 0.1% ( 0.1%)
93000 mean = 53268 max = 220724
parameter mean = 1.660701492927984 min = -7052.671176869955 max = 6368.542040877368
128 100.0% ( 0.2%)
256 99.8% ( 0.8%)
512 99.0% ( 4.3%)
1024 94.7% (14.4%)
2048 80.3% (30.3%)
4096 50.0% (43.9%)
8192 6.1% ( 6.0%)
16384 0.1% ( 0.1%)
94000 mean = 53880 max = 174016
parameter mean = 1.6784254950564135 min = -7083.17129906737 max = 6476.117469214218
64 100.0% ( 0.2%)
128 99.8% ( 0.1%)
256 99.7% ( 0.5%)
512 99.2% ( 3.1%)
1024 96.1% (14.7%)
2048 81.4% (31.7%)
4096 49.7% (42.0%)
8192 7.7% ( 7.7%)
95000 mean = 54367 max = 183256
parameter mean = 1.6994809013610876 min = -7219.06370794949 max = 6534.607678601115
64 100.0% ( 0.2%)
256 99.8% ( 0.5%)
512 99.3% ( 2.3%)
1024 97.0% (16.6%)
2048 80.4% (30.1%)
4096 50.3% (41.9%)
8192 8.4% ( 8.4%)
96000 mean = 54792 max = 176636
parameter mean = 1.7165451311948388 min = -7323.116142364778 max = 6542.238758733516
64 100.0% ( 0.1%)
256 99.9% ( 0.8%)
512 99.1% ( 2.2%)
1024 96.9% (14.1%)
2048 82.8% (33.2%)
4096 49.6% (40.9%)
8192 8.7% ( 8.7%)
97000 mean = 47666 max = 177616
parameter mean = 1.716929942156352 min = -7406.573279954264 max = 6257.871826491511
64 100.0% ( 0.4%)
128 99.6% ( 0.3%)
256 99.3% ( 1.8%)
512 97.5% ( 3.7%)
1024 93.8% (17.9%)
2048 75.9% (35.4%)
4096 40.5% (35.5%)
8192 5.0% ( 5.0%)
98000 mean = 54499 max = 177628
parameter mean = 1.743440598860856 min = -7603.90460458944 max = 6365.018691973732
128 100.0% ( 0.3%)
256 99.7% ( 0.7%)
512 99.0% ( 3.1%)
1024 95.9% (14.2%)
2048 81.7% (32.1%)
4096 49.6% (41.2%)
8192 8.4% ( 8.4%)
99000 mean = 55059 max = 178140
parameter mean = 1.7682309894805486 min = -7638.215075618973 max = 6565.107432410588
32 100.0% ( 0.1%)
64 99.9% ( 0.1%)
128 99.8% ( 0.2%)
256 99.6% ( 0.5%)
512 99.1% ( 2.3%)
1024 96.8% (16.2%)
2048 80.6% (30.2%)
4096 50.4% (42.4%)
8192 8.0% ( 8.0%)
100000 mean = 55167 max = 169020
parameter mean = 1.7892723967535384 min = -7792.923437823282 max = 6686.816646242725
256 100.0% ( 0.1%)
512 99.9% ( 2.8%)
1024 97.1% (15.3%)
2048 81.8% (31.8%)
4096 50.0% (42.3%)
8192 7.7% ( 7.7%) real 319m53.779s
user 319m48.061s
sys 0m4.412s
关于n-tuple network 的参数统计分析,可以看到进行100000 episodes的训练后参数的均值没有太大的变化,但是参数的最大、最小值则为正几千和负几千,这个最大参数和最小参数的差距可以说是十分巨大的。由于之前也做过其他TDL的复现:
再探 游戏 《 2048 》 —— AI方法—— 缘起、缘灭(5) —— 第一个用于解决2048游戏的Reinforcement learning方法——《Temporal Difference Learning of N-Tuple Networks for the Game 2048》
代码:https://gitee.com/devilmaycry812839668/td-tuple-net-for-2048
而之前的复现效果不好的很大原因就是参数会出现上下溢出的问题,而这次的这个复现虽然也依然出现参数最大、最小值差距较大的情况,但是由于这个游戏episode较长(几千甚至几万步)也是属于正常的。之所以之前的复现中参数出现溢出而这次的复现参数上下限在可控范围,个人的看法是这个主要的改进原因在于这次复现中将一个特征进行了旋转对调后形成了8个同态特征,而这8个同态特征都是共用一个lookup表的,这样每次在update的时候对一个lookup表的修改都会相对的分散而不是那么集中,于是就避免了在早些训练过程中类似棋盘状态频繁出现导致的对某些lookup表中数据进行过多的update,这样也就避免了lookup表出现参数值溢出的问题。换句话说,一个特征变换出8个共享一个lookup特征表的同态特征才是TDL算法有效的关键。
这里的TDL算法其实就是TD(0)算法,或者我们也可以把它看做是一种Q-learning的同形态算法,由于《2048》游戏中状态的数据表示为:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]这样的类型,而这样的状态特征又难以高效的使用神经网络,因此对于《2048》游戏来说SOTA的解法就是使用N-Tuple Network来进行游戏状态的数值表示。
注意:
关于《2048》游戏的TDL算法细节需要参考论文:《Temporal Difference Learning of N-Tuple Networks for the Game 2048》
=====================================================
从C++版本和Python版本的运行结果上来看,这次的实现还是很成功的,可以说是完全保证了算法运行逻辑、参数设置等的一致,并且结果也是相当的,唯一的不同就是运行的最终时间消耗。原始的C++版本运行完需要60分钟,也就是一个小时的时间,而我们这里实现的python版本需要运行300分钟,也就是五个小时的时间,可以看到总的运行时间变成了5倍,不过考虑到python语言的特性,这个运行时间也是完全可以接受的。虽然C++版本可以达到Python版本五分之一的用时,但是C++版本确实看起来不好理解,即使是我也只是能做到看懂C++代码,而难以流畅的编写C++代码。虽然python版本难以用于现实的应用,不过这个版本的实现还是可以有一定参考价值的,毕竟这也是我至今网上唯一可以找到的python版本实现的《2048》游戏的TDL算法,也正是因为网上没有才自己用python语言写了一遍。
=========================================
再探 游戏 《 2048 》 —— AI方法—— 缘起、缘灭(7) —— Python版本实现的《2048》游戏的TDL算法的更多相关文章
- 跟k8s工作负载Deployments的缘起缘灭
跟k8s工作负载Deployments的缘起缘灭 考点之简单介绍一下什么是Deployments吧? 考点之怎么查看 Deployment 上线状态? 考点之集群中能不能设置多个Deployments ...
- 再探JS数组原生方法—没想到你是这样的数组
最近作死又去做了一遍javascript-puzzlers上的44道变态题,这些题号称"JS语言专业八级"的水准,建议可以去试试,这里我不去解析这44道题了, ...
- 【再探backbone 02】集合-Collection
前言 昨天我们一起学习了backbone的model,我个人对backbone的熟悉程度提高了,但是也发现一个严重的问题!!! 我平时压根没有用到model这块的东西,事实上我只用到了view,所以昨 ...
- 再探jQuery
再探jQuery 前言:在使用jQuery的时候发现一些知识点记得并不牢固,因此希望通过总结知识点加深对jQuery的应用,也希望和各位博友共同分享. jQuery是一个JavaScript库,它极大 ...
- [老老实实学WCF] 第五篇 再探通信--ClientBase
老老实实学WCF 第五篇 再探通信--ClientBase 在上一篇中,我们抛开了服务引用和元数据交换,在客户端中手动添加了元数据代码,并利用通道工厂ChannelFactory<>类创 ...
- Spark Streaming揭秘 Day7 再探Job Scheduler
Spark Streaming揭秘 Day7 再探Job Scheduler 今天,我们对Job Scheduler再进一步深入一下,对一些更加细节的源码进行分析. Job Scheduler启动 在 ...
- 第四节:SignalR灵魂所在Hub模型及再探聊天室样例
一. 整体介绍 本节:开始介绍SignalR另外一种通讯模型Hub(中心模型,或者叫集线器模型),它是一种RPC模式,允许客户端和服务器端各自自定义方法并且相互调用,对开发者来说相当友好. 该节包括的 ...
- 深入出不来nodejs源码-内置模块引入再探
我发现每次细看源码都能发现我之前写的一些东西是错误的,去改掉吧,又很不协调,不改吧,看着又脑阔疼…… 所以,这一节再探,是对之前一些说法的纠正,另外再缝缝补补一些新的内容. 错误在哪呢?在之前的初探中 ...
- 再探Redux Middleware
前言 在初步了解Redux中间件演变过程之后,继续研究Redux如何将中间件结合.上次将中间件与redux硬结合在一起确实有些难看,现在就一起看看Redux如何加持中间件. 中间件执行过程 希望借助图 ...
- c++再探string之eager-copy、COW和SSO方案
在牛客网上看到一题字符串拷贝相关的题目,深入挖掘了下才发现原来C++中string的实现还是有好几种优化方法的. 原始题目是这样的: 关于代码输出正确的结果是()(Linux g++ 环境下编译运行) ...
随机推荐
- 在MySQL中INNER JOIN、LEFT JOIN、RIGHT JOIN 和 FULL JOIN 有什么区别?
我们有两张表: TableA:id firstName lastName.......................................1 aru ...
- Python遥感影像叠加分析:基于一景数据提取另一数据
本文介绍基于Python中GDAL模块,实现基于一景栅格影像,对另一景栅格影像的像元数值加以叠加提取的方法. 本文期望实现的需求为:现有一景表示6种不同植被类型的.tif格式栅格数据,以及另一 ...
- 海思SDK 学习 :000-海思HI35xx平台软件开发快速入门之背景知识
背景 参考自:<HiMPP V3.0 媒体处理软件开发参考.pdf> 由于在音视频处理领域,海思芯片占有全球市场的很大份额.当我们选择使用海思芯片开发时,程序开发模型主要是围绕HIMPP( ...
- WPF在.NET9中的重大更新:Windows 11 主题
在2023年的2月20日,在WPF的讨论区,WPF团队对路线的优先级发起了一次讨论. 对三个事项发起了投票. 第一个是Windows 11 主题 第二个是更新的控件 第三个是可空性注释 最终Windo ...
- 含税仅498元起!复旦微ARM + FPGA SoC全国产工业核心板,性价比真高!
- 重复消费Java Stream的三种方法。你选择哪种?
Java中的Stream一旦被消费就会关闭,不能再次使用了.如果的确有需要该怎么办呢? 这里介绍三种重复消费Stream的方法. 1. 从集合再次创建 这里你都不用往下继续看就知道该怎么办,不过我还是 ...
- JSP快速上手与MVC模式和三层架构的知识点总结+综合案例
阅读提示: 说明 由于JSP实在是太 难读 难写 复杂 占资源 难调试 不分离 了,拉跨!(节目效果哈,勿喷),作为一种有(ji)更(hu)好(jiu)的(yao)上(bei)位(tao)替(tai) ...
- sheetjs导出表格时间错误问题
最近使用sheetjs,前端web去导出生成excel,xlsx表格.其中遇到一种问题,那就是时间出错了!比如多出8小时43秒,少了43秒.看到这种问题的时候,我也一脸懵逼.先上图! 不过在有些人电脑 ...
- P3938
斐波那契 题意描述 输入 5 1 1 2 3 5 7 7 13 4 12 输出 1 1 2 2 4 点拨 根据题目去找规律,每一个儿子与父亲结点具有斐波那契数的规律,我们只需要每次找到该数在斐波那契数 ...
- MQ和RabbitMQ
一.微服务间通讯有同步和异步两种方式: 同步通讯:就像打电话,需要实时响应. 异步通讯:就像发邮件,不需要马上回复. Feign调用就属于同步方式,虽然调用可以实时得到结果,但存在下面的问题: 1.耦 ...