Polaris 2.0 Defeats Stoxpoker Team in Man-Machine Poker Championship Rematch
Following last summer's narrow loss to the humans in the First Man-Machine Poker Championship held in Vancouver, Canada, the University of Alberta's Computer Poker Research Group (CPRG) had a year to improve on their collection of poker-playing programs, collectively dubbed Polaris, for this summer's rematch. Their efforts were rewarded, as Polaris 2.0 defeated a team of human competitors in a series of duplicate limit hold'em matches.
The competition primarily took place at the Rio All-Suite Casino Hotel in Las Vegas, Nevada during the Gaming Life Expo at the World Series of Poker, with one match taking place each day from July 3-6. The human team was made up of several different members of the Stoxpoker poker coaching website. Unlike last summer when the humans were represented by just two competitors, Phil Laak and Ali Eslami, this time there were seven different players who participated in the competition, including Nick Grudzien, Kyle Hendon, Rich McRoberts, Victor Acosta, Mark Newhouse, IJay Palansky, and Matt Hawrilenko.
As was the case last summer, the competition consisted of four sessions in which two human competitors simultaneously (and separately) played hands of "duplicate" limit Hold'em against the computer, this time with $500/$1,000 blinds and $1,000/$2,000 limits. Following the rules of duplicate poker, the cards dealt to the human player in one session were identical to the ones dealt to the computer, and vice-versa, with the community cards for each hand being identical as well. The format minimizes the luck of the draw, as in the end both the humans and Polaris 2.0 faced being dealt the same cards and encountering the same situations throughout.
At the end of each match, the totals from each of the two simultaneously played 500-hand sessions would be added together to determine that session's victor. It was decided that if that total represented a less than a 25-small-bet difference (i.e., $25,000), the match would be declared a draw, while a greater difference would give a win to the team with the higher total.
In the first match, Nick Grudzien and Kyle Hendon took on Polaris 2.0. Hendon finished his 500 hands ahead by $37,000, but Grudzien ended down $42,000. As the total difference was only $5,000 or five small bets, the first match was declared a draw.
The humans took the second match, thanks largely to the efforts of Rich McRoberts, who finished up $89,500 against the computer. His partner, Victor Acosta, didn't fare as well, ending down $39,500. The humans' net profit of $50,000 was more than enough to assure them the win.
Polaris 2.0 was able to mount a comeback, however, taking both the third and fourth matches. In the third, Mark Newhouse was able to end his 500 hands up $251,500, by far the most successful showing of any of the human competitors. However his partner, IJay Palansky, finished down $307,500, meaning the computer had finished the match with an overall profit of $56,000 and took the win. Clearly in this match the cards heavily favored Newhouse and the computer vs. Palansky; however, Polaris 2.0 managed to lose less with the same cards vs. Newhouse than Palansky did vs. the computer.
The fourth session saw the computer win against both of its human competitors, finishing $60,500 ahead of Mark Hawrilenko and $29,000 ahead of Palansky. That gave Polaris 2.0 two wins, one loss, and one draw in the matches played at the Gaming Life Expo. Two earlier matches taking place elsewhere involving Stoxpoker players were also counted toward the overall standings; the humans won one and the computer the other. Thus, in the end, Polaris 2.0 won three matches, lost two, and tied one.
According to professor Michael Bowling, one of those supervising the University of Alberta graduate students who helped program Polaris 2.0, significant improvements were made since last summer that made it more difficult for humans to exploit weaknesses.
Most significantly, Bowling explains that programmers were able to add "an element of learning, where Polaris identifies which common poker strategy a human is using and switches its own strategy to counter." This meant the computer did not employ similar tactics against all of the humans, but followed different strategies against each, making it harder for the humans to adjust during to the computer's changing strategies in a given session and/or compare notes with one another between sessions on how Polaris 2.0 played.
Polaris 2.0 also learned from its own mistakes, employing an algorithm intriguingly named "counter-factual regret" by which it was able to keep track of the humans' play during hands it had lost, then adjust its own play when similar circumstances arose.
The CPRG says it intends to broaden its research to move beyond heads-up limit hold'em to take on more complicated poker games. The group intends to apply its findings in the area of artificial intelligence to non-poker contexts as well.