|
Experiments in computer chess
The value of an evaluation function
What's the value of an evaluation function (EVAL) in a chess program? What's the value of each EVAL component and how much elo do they represent? This page might give some insight.
For this purpose we took ProDeo 1.74 and added a new parameter.
| [EVAL = MATERIAL ONLY] |
EVAL returns only the material value following the 1|3|3|5|9 rule |
Then we run engine-engine matches of 2000 - 4000 fast games, if needed more if the histogram requires so.
Part I - Starting with no evaluation at all, just counting material.
| Match |
Base Engine |
Setting |
Win |
Draw |
Loss |
Perc |
Remarks |
| 1.1 |
ProDeo 1.74 |
Material only |
341 |
5 |
1 |
1.0% |
It made no sense to continue, the
picture is clear, a chess program
can't do without an EVAL. |
Note that the percentages on this page are true for ProDeo only. While it is assumed other chess engines will produce similar percentages it would interesting to know if that is a true assumption.
Part II - Stripping the evaluation from each of its major components to meassure its impact.
| Match |
Base Engine |
Setting |
Win |
Draw |
Loss |
Perc |
Remarks |
| 2.1 |
ProDeo 1.74 |
minus Piece Square |
888 |
641 |
488 |
40.1% |
Apparently PST's are worth a lot.
2000 games. Histogram. |
| 2.2 |
ProDeo 1.74 |
minus King Safety |
1053 |
665 |
737 |
43.5% |
After 2400 games a score drop
of 6.5% Histogram. |
| 2.3 |
ProDeo 1.74 |
minus Mobility |
1103 |
536 |
427 |
33.6% |
After 2000 games an incredible
score drop of 16.4%. Apparently
Mobility is a very dominant EVAL
ingredient. Histogram. |
| 2.4 |
ProDeo 1.74 |
minus Pawn Eval |
745 |
597 |
666 |
48.0% |
After 2400 games only a score
drop of 2%. Histogram. |
| 2.5 |
ProDeo 1.74 |
minus Passed Pawns |
1099 |
353 |
553 |
36.4% |
After 2000 games an incredible
score drop of 13.6%.
Apparently Passed Pawns is a
very dominant EVAL ingredient.
Histogram. |
| 2.6 |
ProDeo 1.74 |
Minus Moderate stuff
- Bishop Pair
- Knight Outposts
- Center Control
- Double attack eval |
762 |
672 |
569 |
45.2% |
After 2000 games still a score
drop of 4.8% for the somewhat
less dominant EVAL components.
Histogram. |
Part III - Stripping the evaluation step by step from its major components until only some minor stuff
is left.
| Match |
Base Engine |
Setting |
Win |
Draw |
Loss |
Perc |
Remarks |
| 3.1 |
ProDeo 1.74 |
- King Safety
- Mobility |
1389 |
491 |
522 |
31.9% |
After 2400 games the score is
on its way back to the base:
Material evaluation only plus
some minor stuff. Histogram. |
| |
|
|
|
|
|
|
|
| 3.2 |
ProDeo 1.74 |
- King Safety
- Mobility
- Pawn Eval
- Passed Pawns |
1472 |
314 |
372 |
24.5% |
Down to the bottom, we are
already close. 2100 games.
Histogram. |
|
|
|
|
|
|
|
|
| 3.3 |
ProDeo 1.74 |
- King Safety
- Mobility
- Pawn Eval
- Passed Pawns
- Bishop Pair
- Knight Outposts
- Center Control
- Double attack eval |
1475 |
255 |
298 |
20.9% |
2000 games.
Histogram.
|
|
|
|
|
|
|
|
|
| 3.4 |
ProDeo 1.74 |
- King Safety
- Mobility
- Pawn Eval
- Passed Pawns
- Bishop Pair
- Knight Outposts
- Center Control
- Double attack eval
- Piece Square |
1780 |
153 |
126 |
9.8% |
And finally we remove the
Piece Square evaluation.
Histogram.
|
List of the minor stuff that has remained in EVAL and is responsible for the 9.8%
- Tuned pieces values for the middle game and end game.
- Pin evaluation.
- Subtraction threats evaluation.
- Xray evaluation.
- Blocking a weak pawn with a knight.
- Rook on 7th rank evaluation.
- Trapped Bishop on a7|h7 and a6|h6
- Trapped Rook in case of lost castling rights.
- Quadrant rule in pawn endings.
- Unequal Bishop ending evaluation.
- All kind of standard draw evaluation (such KRPKR)
- Bad Bishop evaluation (bishop caught in its own pawns)
- Connecting rooks evaluation.
- Doubled rooks on (semi) open file evaluation.
- Queen mobility. Her majesty has a separate (very minor) mobility evaluation.
- Material Imbalance - encourage (avoid) piece exchange when ahead (down) in material.
- Pawn ending - when 1 pawn up usually wins by 90% evaluation.
- KPK win or draw evaluation.
- Avoid early queen development.
- Castling evaluation.
All these minor evaluations go into one variable and it would require 20 new and separate variable names to meassure their impact on EVAL slowing down the evalauation function which falls outside the scope of this experiment. The 20 points list is convincing enough to explain the remaining 9.8%.
Part IV - Remarks and analysis
- In general the results are as expected, King Safety, Mobility and Pawn evaluation being the dominant components in EVAL and yet there were a couple of surprises for me.
a) The unexpected high impact of using PST, approx 10% as demonstrated in match 2.1 and 3.4 b) The relative low impact of King Safety (6.5%) in match 2.2, I expected a (lot) more.
c) The incredible low impact of Pawn Eval (double pawns, isolated pawns, backward pawns, pawn pressure, pawn formation) of only 2%, see match 2.4 contrary to passed pawn EVAL (13.6%) see match 2.5
- Perhaps it makes sense to have a good look at Pawn eval again because of its low impact (2%) and perhaps to re-evaluate the PST values because of the unexpected (10%) impact also because these values haven't changed since the 80's.
|