Secrets of the Early Rounds
by Stephen W. Custer, PhD
This column is usually devoted to blackjack. But it’s March. Even the most fervent blackjack player can’t help but be infected by the madness at the Sport Book. Since the NCAA adopted a 64-team format for its post-season basketball tournament, the three-week playoff has become the biggest event in sports—both in fan interest and dollars wagered. If you can’t make it to one of the playoff sites, come to Vegas. Here you can watch eight games simultaneously while betting on point spreads, first half margins, and outstanding players. Come to think of it, forget that playoff site, it’s better in Vegas. Sports betting
It’s been 21 years since the tournament was expanded from 48 to 64 teams. (Yes, I know the last few years it’s been 65 teams. But that play-in game is more like a final tournament game to see who gets to boogie at the big dance, than the opening song of the grand ball.) That’s 1323 games. Can we apply some blackjack-like statistical analysis to all that data?
I can’t give much help in filling out those brackets—it’s pretty much a crap shoot. From a statistical point of view, your best bet is to pick all favorites. Higher seeded teams have won 73% of the time. But the odds are against you. You will probably finish above average, but out of the money. Some idiot who picked Liberty to upset Duke will get lucky and win the thing.
Secret of the First Round—
But I Regress:
So, do I have any words of wisdom, and better yet, of profitability for you? Might. By studying the past 20 years of tournament data, I believe there are some opportunities in betting the money line on individual games. How can history help pick this year’s games? To project the future from the past requires some thread of consistency. What does a Michael Jordan lead, Dean Smith coached team of the 80s have to do with North Carolina’s chances this year? Not much. College teams turn over at least every four years, and with early entry into the NBA, frequently much faster.
The thread of consistency is the Selection Committee. The members change from year to year, but the criteria and methodology pretty much stay the same. Rather than study the individual teams, I study the seeding match-ups. Table 1 shows the percentage of upsets in the first round of the tournament by seeding. There is a high degree of consistency in these numbers, with the percentage of upsets increasing as the teams are more closely seeded. But it’s not perfect. I can’t come up with any reason why nine seeds should win a majority of the games against eight seeds, or 14 seeds win almost as often as 13 seeds. These anomalies are probably just random fluctuations. After all, we only have 80 data points for each pairing.
What statisticians do in cases like this is find a mathematical model that fits the data but smoothes out the anomalies. After several trials I came up with the following model:
P(Upset) is the probability of an upset,
Seed(F) is the seeding of the favorite, and
Seed(D) is the seeding of the underdog.
a and b are constants.
Using a mathematical technique call “regression” we determine the values of the constants a and b that best fit the data. Here a = -0.08, and b = 1.23.
Table 2 converts the model probabilities to money line odds used by sports books. Plus values are for dogs and negative values for favorites. For example, if a dog is shown as +150 it means your $100 bet wins $150 should the dog prevail (total payback $250, your bet of $100 plus $150 profit). For this type bet the dog must win the game—no points involved. A line of –150 means you must bet $150 on the favorite to get a profit of $100.
If the casino betting line is more favorable than the line in Table 2 (larger plus number for the dog or smaller negative number for the favorite) you may have a good bet.
Last year I tested this model. The result was one of those glass half-full or haft-empty things. Well, really more like three-quarters empty. Of the 32 first rounds games there were 17 betting opportunities: 5 underdogs and 12 favorites. I bet $100 on each game. One of the dogs and 7 of the favorites won, giving me a net lose of $43. This sure looks like an empty glass. So how can I see it as one-quarter full? Well, this is a minus 2.5 percent loss, one-half the casino’s rake of 5 percent. So I did beat the odds, just not enough to cover the casino’s cut.
The bottom line is that the volatility is so large, and any advantage, if there is one, so small, that one year is not enough data. I’ve not been able to find the money lines for past years in order to do a larger test. Evidently this data is not archived. If any reader has this data, or knows where I can get it, please let me know.
Secret of the Second Round—Do the Correlation Hop:
The good news is this secret has been tested with my own money, and I did very well. The bad news is, it may not be available every year. You’ll have to be patient.
The average number of upsets in the first round has been 7.9 with highs in 2001 of 13, and 1989 with 12, and a low of three in 2000. The average number of upsets in the second round is 4.9 with a high in 2000 of nine, and none in 1991, one in 1989 and three in 2001. Notice anything? The years with a high number of upsets in round 1 had a low number of upsets in round 2, and vice versa.
This phenomenon is not just for the years mentioned, but holds throughout the 20-year history. Table 3 shows the number of upsets in round 1 vs. upsets in round 2 ordered in ascending order. Although it’s not perfect you can see as the number of upsets in round 1 increase, the number of upsets in round 2 decreases.
We can test this trend by doing a correlation analysis. Correlation analysis compares two sets of data to see if there is a relationship. The analysis generates a Correlation Coefficient between plus and minus 1. A value near plus 1 means the data is positively correlation, i.e., high values in one series correspond to high values in the other series, and low values correspond to low values. A value near minus 1 means when one series rises, the other falls. Anything near zero means there is little or no relationship between the series.
In the case of first and second round upsets the Correlation Coefficient is –0.62. That’s a fairly strong negative correlation.
How can you take advantage of this? If there is a high number of upsets in the first round (say 12 or more), bet the money line on the favorites in the second round. If there are few upsets in the first round (say 4 or less), take the dogs in round 2. What if there is an average number of upsets in the first round? Keep your money—there are no betting opportunities. Remember, I told you you’d have to be patient.
I first noted this negative correlation after the 2001 tournament. 2000 and 2001 were years with big swings in the number of upsets from round 1 and round 2. The number of upsets in 2002 and 2003 were 7 and 8 respectively, too close to the average of 7.9 to give any second round betting opportunities.
In 2004 there was a near low four upsets in round 1. I headed for the Stardust sports book and bet the money line for all 16 dogs in the second round. There were seven upsets, more than doubling my money.
Last year there were eight first round upsets—again no betting opportunity.
Do professional handicappers know these secrets? They may, they probably do, not much gets by them. But the pros are not handicapping the teams, but the betting public. They set the lines to balance the books, i.e., get balancing amounts bet on each side of the line. The betting public does not know these secrets, but bet their hunches, likes and dislikes, and thinks they can out handicap the pros who do it for a living. Until the general public catches on, the handicappers won’t show these secrets in their lines. ´
First Round Upsets
Pairing % Upsets
March Madness Sports betting.