Hold'em Killer

Hold'em Killer is my poker bot blog and guide to writing your own pokerbot. It is associated with my free online texas hold'em bot at http://www.holdemkiller.com. These are chapters from a book I was writing last year. I don't know of any poker programming books out there, so maybe I'll get to finishing this someday and get it published.

Monday, April 18, 2005

Beyond the Sklansky Preflop

(continued from previous post)

Poker programs are very different from chess programs. In chess, the program is strong at the opening and end-games, where it can work from a ‘book’, or do a full forward simulation in the endgame; chess programs are weakest in the middle game, where despite complete information the game is not fully computable. With poker bots it’s the exact opposite: bots are strongest in the middle game, when the game is most dominated by calculable (but not humanly calculable) statistics of the cards to come; in the beginning and end-games, the standard play is simple enough that the bots have no advantage, and the psychological factors surrounding the unknown information dominate.

That said, it really seems like a bot should be able to play the preflop right and be able to adjust to the game.

Earlier I wrote that, rather than trust existing preflop models like Sklansky’s, with all their ambiguities and possible errors, I wanted to come up with a new, comprehensive and deterministic preflop strategy.

One of the biggest mistakes that new players make in the preflop is to decide ahead of time which hands to play, and then ignoring position and opponent bets. The odds impact of a single bet to the right is huge. If you’re in middle position, a Sklansky 5 hand that is raisable with no bets to the right usually becomes a fold with one bet to the right. So any preflop model that doesn’t take in the entire situation is lost from the outset.

To keep things well-defined, we’re still talking about honest play. There is no consideration of playing loose to create a table image for the benefit of future hands, bluffing, or anything like that. We’re talking about honest play against modeled (but not necessarily honest!) opponents.

The incredible thing is that given each completely specified situation, there does exist a precise statistically correct action within the limits of the available information. Any lookup table or software library that takes your hole cards, position, and opponent bets (plus any available opponent history) and produces *the* correct preflop action is easily a million dollar asset. But as far as I know, no one has ever publicly produced one. I’m going to take a partial stab here today.

It is not possible to run a forward simulation of holdem from the preflop through the river in real-time.

And even pregenerating the game, with all its possible variations, is not practical. You may be able to precompute all card combinations through the river, but once you add in all possible opponent actions and modeling (all of which really matter), that’s too many parameters to pre-generate (we’re potentially talking about years of supercomputer time to produce that).

The middle game is a sweet spot for the bot because holdem is thoroughly computable in real time at the flop. What we want to do is produce a scheme by which we can precompute enough data to make the preflop computable in a real game.


FlopHand:

Since it’s not possible to pregenerate a comprehensive lookup table for the preflop, let’s define an intermediate goal: the situation at the flop. I’ll call the cards visible to you at the flop a FlopHand. A FlopHand is simply a pair of hole cards plus a set of 3 table cards. This is not to be confused with a 5-card hand. There is a big difference between aces in your hole and aces on the board. The cards of a FlopHand are un-ordered within the hole and table card groups, but the groups themselves are ordered. So whereas there are 2.7 million 5-card hands, there are about 26 million FlopHands. For each of the 1326 sets of hole cards, however, there are only 19600 FlopHands, making the set of FlopHands for your preflop trivially available in a lookup table.

You’ve probably seen tables that order all preflop card pairs from best to worst. These are usually based on simple simulations that simply deal all cards through the river. Some actually take #opponents into account and order them by (trivially estimated) expected winnings; in this case the ordering changes depending on #opponents. In reality, of course, the rankings depend on situation; if you put the opponent on a high pair, that diminishes the value of you having a medium pair vs. e.g. flush cards.

The FlopHands are also orderable. A FlopHand has a trivial rank, which can be computed by dealing all combinations of the remaining 2 cards and ordering the hands by #wins. In reality, however, the opponent hand is not random. Flops that benefit commonly played hole cards (according to the opponent model) are worse than those that don’t, independent of how they directly benefit your hand. Therefore, the rank of a FlopHand is conditional on the game situation.


FlopConfig:

Now let’s define a FlopConfig. A FlopConfig is a FlopHand plus the players’ positions and preflop actions and opponent preflop model (previous segment). A FlopConfig is the fully specified situation at the flop, comprehending the pot size, the expected types of cards that the opponents may have, and the resulting Reflective Hand Odds (RHO).


Expected Value:

What we’re working towards is a computable Expected Value (EV) for a FlopConfig. If we can calculate an expected win/loss for each possible FlopConfig, then at the preflop we can take the weighted average across the possible FlopConfigs and determine the correct action. Once again it comes down to Reflective filtering and math.

What we want is a formula where we can efficiently take a FlopHand and some precomputed parameters for it, plug in the extra FlopConfig info, and produce the RHO and Implied Odds (IO), which we plug into the simple F() function (from earlier segment) to determine if the FlopConfig is “bettable”. If it’s not bettable, we will fold, so its cost is the cost of seeing the flop; otherwise, its EV will be some positive amount that we want to generate from the computed RHO.


p(FlopConfig):

All possible FlopHands are equally likely (ignoring minor variation for dependencies btw opponent hole cards and the deck), so

p(FlopHand) = 1/19600

The FlopConfig entails opponent bets, etc. Part of that information is available in the form of bets to the right. The remainder has to come from your preflop model. In the last segment I talked about scaling callers behind and ahead based on preflop modeling. The missing information for the FlopConfigs is comprised of each possible action ahead, weighted according to the preflop model.

p(FlopConfig) = p(FlopHand) * p(actions ahead that contribute to this FlopConfig)

Although each possible FlopHand is equally likely, the FlopConfigs are not.

The later your position, the fewer possible FlopConfigs there are per possible FlopHand. There are always multiple FlopConfigs/FlopHand, however, since your preflop action is part of the state space.


EV(FlopConfig):

The EV of an unbettable FlopConfig is zero, because we will fold it. We don’t count the negative cost of seeing the flop. Below is the raw formula for EV of a FlopConfig. If the raw formula produces a negative value, we decide that the FlopConfig is unbettable, making the EV zero. Therefore, EV(FlopConfig) is always non-negative.

EVraw(FlopConfig) = p(win)*(ExpectedPot - $ExpectedBets - $rake) -
p(lose)($ExpectedBets) ,
where:
p(win) = RHO
p(lose) = 1-RHO
$ExpectedBets and $ExpectedPot is taken from Implied Odds calculation (earlier segment)
$rake is given

EV(FlopConfig) = max(EVraw(FlopConfig), 0)


Now, it’s true that in late position, our evaluation of the FlopConfig changes as opponents check/call/raise behind, and we may for example end up betting a FlopConfig (honestly, no bluff) that in early position would be unbettable. But we are talking about the FlopConfig as anticipated from the preflop. We only want to assign positive values to FlopConfigs that would be initially bettable by us (against n opponents as stipulated by the FlopConfig).

The fact that EV(FlopConfig) >= 0 has huge impact on the IO for the preflop. It invalidates the full forward simulations that rank preflop hands on how they fare through the river. It also rewards preflop hands with highly polarized sets of outcomes, e.g. suited connectors, where the strong hands depend on getting the right flop and the other flops cause us to fold. Now the preflop hands will be ranked by having the highest average FlopConfig, and the only negative factor is the cost of seeing the flop.

It’s well known that 22 has >50% odds of winning over AKs. Most of us know intuitively that AKs is the preferred hand, but why? You can see the answer if you think about the FlopConfigs. 22 gives a large number of FlopConfigs with EVraw slightly greater than zero (because you have at least a pair), and only very few with large EV (when you hit a set). Therefore the average EVraw of the bettable FlopConfigs is low. With AKs, you get a smaller number of bettable (EVraw > 0) FlopConfigs, when you hit a pair or flush draw; but the average of those bettable ones is much higher; the EV is then much higher because you strip away much more of the downside in the EVraw -> EV conversion than you do for 22. That’s true in heads-up, and it’s even more true in multiway due to the higher positive EVraw’s for AKs.

So these simplistic rankings of hole cards that we see so often are flawed; the real value of a preflop hand comes from the weighted average of its EV(FlopConfig)’s, which depends on the game situation.



Computing EV(FlopConfig):

The only complex part of the equation is the RHO. The RHO for a specific FlopConfig is computable in a few seconds. Pregenerating RHO for even one FlopConfig for each 26million FlopHands is already months of processing time.

What we want though is to precompute and cache enough parameters with each FlopHand that we can compute the EV for all possible FlopConfigs (which is much more than all 19600 possible FlopHands for a given PreflopHand) in real time.

To make it worse, recall also, from the first posting, that even if you precompute hand odds against each opponent type in each position for each FlopHand, you cannot then simply multiply those together to get hand odds against all opponents (to briefly restate: the probability of beating one opponent is highly dependent on the probability of beating the other opponents, especially in draw situations).

The good news is that most FlopConfigs are known to be non-bettable from a simple heuristic (e.g. no pair, draw, or strong overcards). Their EV’s are a constant zero, regardless of other factors.
Also, an even larger set of FlopConfigs are ignorable because you would never (in honest play) arrive at the given flop situation (e.g. having called with a clearly unplayable set of hole cards following a bet on the right).

I haven’t thought it through much further than this. Suffice it to say that as the bot plays, its model of each opponent stabilizes over time; the bot starts with a neutral model for each player, then chooses a stereotype based on the first few hands, then slowly develops a more accurate model. It can then use it’s downtime to slowly update its ‘book’ of possible FlopConfigs based on those models, and assign RHO to each possible situation. Given a preflop hand, it can then quickly compute the weighted average of the EV(FlopConfig)’s, and decide if the hole cards are playable.


EV(PreflopHand):

Now we want to take the statistics we have for the spectrum of possible FlopConfigs, and use them to evaluate the preflop hand.

EV for a PreflopHand is influenced by your preflop action. Your raise increases your investment and also influences opponent folds and bets. That choice produces different sets of possible FlopConfigs.

EV(PreflopHand w/ call) =
( Sum( p(FlopConfig) * EV(FlopConfig)) / #FlopConfigs ) – $CallAmount

EV(PreflopHand w/ raise) =
( p(no flop)*PotAmount + Sum( p(FlopConfig) * EV(FlopConfig)) /
#FlopConfigs ) –
$RaiseAmount (**)

EV(PreflopHand w/ fold) = 0

The honest play is to take the preflop action with the highest EV, which is most often zero (fold).

Note that the formula is comprehensive. E.g., since EV(FlopConfig) is always non-negative, checking the big blind (i.e. $CallAmount = 0) always has positive EV.

(**) If you raise, everyone may fold and there may not be a flop. State in math terms, in the EV(PreflopHand w/ raise) equation, the p(PreflopConfig)’s don’t add up to 1. Therefore, you fill in the missing probability component for the “null” FlopConfig, whose EV is the pot amount before your raise.



A Reductive look at the Preflop:

The preflop is very hard to wrap your head around. Even at this initial stage there is all this mixing of factors from the starting hand, position, implied odds, bets behind, and opponent model. How to make sense of it all ?

Since we can’t humanly rationalize all these factors, let’s start with a game that we can thoroughly analyze. Bear with me as we build on this simple model:


1-Chip Zero-Blind Hold’em:

1-Chip Zero-Blind holdem (1C0B) is a simple imaginary game where you may bet 1 chip at the preflop. There is no more betting thereafter, and if you win on the river you don’t win the pot – you just get back your chip plus 1 more chip. The betting order is the same (starting with UTG), but the blind amounts are zero. This eliminates all pot odds and implied odds; you’re simply betting even money that you have the best hand on the preflop.

To make it even more unreal, let’s assume no one is bluffing. What does it take then to bet first in 1C0B ? We said earlier that preflop hands are not orderable; but for the first bet they pretty much are. So we can define this in terms of top %-ile preflop needed to bet first.

If you’re in the “SB” and no one has bet, you’re just betting that you have a better hand than the BB, which is random; so trivially you just need an above-average hand. When OTB, you’re betting that both of the random hands ahead are lower. The required %-ile is the solution to (1-x)^2 = ½ , i.e. ~30%. In seat 9, it’s the solution to (1-x)^3 = ½, etc.

























































SeatRequired %-ile preflop hand to bet first
UTG (1-x)^9 = ½7%
4(1-x)^8 = ½8%
5(1-x)^7 = ½9%
6(1-x)^6 = ½11%
7(1-x)^5 = ½13%
8(1-x)^4 = ½ ...16%
9(1-x)^3 = ½ ; x = 1 – 1/cuberoot(2)21%
OTB(1-x)^2 = ½ ; x = 1 – 1/sqrt(2)29%
SB1/250%
BB 1/1100%




As you can see, the set of hands playable as the first bet grows almost exponentially in latter position. Your exact seat position matters big-time.


1-Chip Zero-Blind Hold’em w/ Pot Odds:

Now let’s add in the blinds and the resulting Pot Odds, just for the preflop. There is still no betting after the flop. The stake is now 1 ½ chips instead of 1 chip. And the small blind only bets ½ chip; note that whereas the first 8 bettors are risking 1 chip to win 1 ½ (2-to3), the SB is risking ½ to win 1 ½ (1-to-3), a huge difference. There’s still no bluffing. Call this 1CwB.
The first 8 seats are now getting 2-to-3 pot odds, so they need a 40% chance of winning.

























































Seat Required %-ile preflop hand to bet first
UTG (1-x)^9 = 2/5 10%
4(1-x)^8 = 2/511%
5(1-x)^7 = 2/512%
6(1-x)^6 = 2/514%
7(1-x)^5 = 2/517%
8(1-x)^4 = 2/5 ...20%
9(1-x)^3 = 2/5 ; x = 1 – cuberoot(2/5) 26%
OTB(1-x)^2 = 2/5 ; x = 1 – sqrt(2/5)37%
SB 3/4 75% *
BB100%1/1



That covers the first bet. What quality hand is required to call a first bet, still assuming no bluffing and only a 1 ½ chip stake? To call a single bet, you need a 40% chance of beating a hand in the range that’s bettable for the opponent behind who bet first. The order of the preflop hands is now resorted by win rate against hole cards in the bettable range of the opponent who first bet (in a future segment I will be producing my 169x169 table showing win rate of each generic hand against each other generic hand; so this is well defined).





















Seat NRequired %-ile preflop hand (resorted) to CALL a bet by opponent in seat N
UTG10%*(1-40%) = 6%
411%*(1-40%) = 6.6%
...etc ...



Now, we can add in the Pot Odds for the extra bet that the first opponent put in. So instead of a 1 ½ chip stake (2-to-3), we have a 2 ½ chip stake (2-to-5). The first bettor still needs a 40% chance to win. To call him, the next bettor has 2-to-5 odds so only needs 29% chance of winning.
























Seat NRequired %-ile preflop hand to bet first in seat NRequired %-ile preflop hand to CALL a bet by opponent in seat N
UTG(1-x)^9 = 2/5 ; 10%10%*(1-29%) = 7.1%
4(1-x)^8 = 2/5 ; 11%11%*(1-29%) = 7.8%
......



I’ll leave it to someone else to add in Implied Odds for the entire hand.


Bluffing on the Preflop:

Now we can consider bluffing. You may think that bluffing will throw a wrench in the entire model. But not really. Let’s think about what it means for an opponent behind to be bluffing when he bets first on the preflop. I would argue that for any competent player, any regular bluffing in that case would appear as looseness in your model. An opponent has no reason not to bet his profitably bettable cards. So if he bluffs, he will be playing more than his average. You may not be able to predict which way he is bluffing, but you’ll know that he is bluffing with some known frequency. You can then adjust the strength of the hand needed to call/raise him accordingly.

Your bot itself should also bluff. Sklansky writes that the optimum bluffing frequency is such that it makes your opponent’s Hand Odds equal his Implied Odds.


Summary

Lots of people view the preflop as a simple and automatic part of the game. But it is definitely not, due largely to the effect of opponent actions and modeling on the odds. And, because a good bot is so strong on the flop, correct preflop play will get the bot into the right set of flops and put it over the top. The bot doesn’t have the human element to recognize opponent play patterns from just a few hands, but it can compensate by analyzing all available data to a level that is not humanly possible. Once the bot is in the right set of flops, it is in the zone.

At the end of the day, all the bot has to do is beat the rake. If it does that, it can keep its head above water for hours and occasionally capitalize on an opponent’s mistake. That’s all that it needs to be a winner. Unless it is up against other bots ;-)




---------------
Ervin Peretz is a software engineer at a major web search company.
He is author of HoldemKiller, a free online poker bot.

0 Comments:

Post a Comment

<< Home