Wisdom is knowing how

   little we know -- Socrates

About Company

Started in 1980, retired in 2004 REBEL was baptized into ProDeo, latin for gratis according to Dutch tradition.

Other Information

    Quick downloads

 

 

    Quick links

 

 

    Technical

 

 

    Misc

 

 

    Personal links

 

 

 

Experiments in computer chess

 

The value of an evaluation function

 

 

What's the value of an evaluation function (EVAL) in a chess program? What's the value of each EVAL component and how much elo do they represent? This page might give some insight.

 

For this purpose we took ProDeo 1.74 and added a new parameter.

 

 [EVAL = MATERIAL ONLY]  EVAL returns only the material value following the 1|3|3|5|9 rule

 

Then we run engine-engine matches of 2000 - 4000 fast games, if needed more if the histogram requires so.

 

Part I - Starting with no evaluation at all, just counting material.

 

 Match  Base Engine  Setting  Win Draw Loss  Perc  Remarks
 1.1  ProDeo 1.74  Material only   341     5     1  1.0%

 It made no sense to continue, the

 picture is clear, a chess program

 can't do without an EVAL.

 

Note that the percentages on this page are true for ProDeo only. While it is assumed other chess engines will produce similar percentages it would interesting to know if that is a true assumption.

 

 

Part II - Stripping the evaluation from each of its major components to meassure its impact.

 

 Match  Base Engine  Setting  Win  Draw  Loss  Perc  Remarks
 2.1  ProDeo 1.74  minus Piece Square  888  641  488  40.1%

 Apparently PST's are worth a lot.

 2000 games. Histogram.

 2.2  ProDeo 1.74  minus King Safety

1053

 665  737  43.5%

 After 2400 games a score drop

 of 6.5%  Histogram.

 2.3  ProDeo 1.74  minus Mobility  1103  536  427  33.6%

 After 2000 games an incredible

 score drop of 16.4%. Apparently 

 Mobility is a very dominant EVAL

 ingredient. Histogram.

 2.4  ProDeo 1.74  minus Pawn Eval   745  597  666  48.0%

 After 2400 games only a score

 drop of 2%. Histogram.

 2.5  ProDeo 1.74  minus Passed Pawns  1099  353  553  36.4%

 After 2000 games an incredible

 score drop of 13.6%.

 Apparently Passed Pawns is  a

 very dominant EVAL ingredient. 

 Histogram.

 2.6  ProDeo 1.74

 Minus Moderate stuff

 - Bishop Pair

 - Knight Outposts

 - Center Control

 - Double attack eval

 762  672  569  45.2%

 After 2000 games still a score

 drop of 4.8% for the somewhat

 less dominant EVAL components.

 Histogram.

 

 

Part III - Stripping the evaluation step by step from its major components until only some minor stuff

               is left.

 

 Match  Base Engine  Setting  Win  Draw  Loss  Perc  Remarks
 3.1  ProDeo 1.74

 - King Safety

 - Mobility

 1389   491  522  31.9%

 After 2400 games the score is

 on its way back to the base:

 Material evaluation only plus

 some minor stuff. Histogram.

             

 

 3.2  ProDeo 1.74

 - King Safety

 - Mobility

 - Pawn Eval

 - Passed Pawns

 1472   314  372  24.5%

 Down to the bottom, we are

 already close. 2100 games. 

 Histogram.

         
 3.3  ProDeo 1.74

 - King Safety

 - Mobility

 - Pawn Eval

 - Passed Pawns

 - Bishop Pair

 - Knight Outposts

 - Center Control

 - Double attack eval

 1475  255  298  20.9%

 2000 games.

 

 Histogram.

 

 

       

 

         
 3.4  ProDeo 1.74

 - King Safety

 - Mobility

 - Pawn Eval

 - Passed Pawns

 - Bishop Pair

 - Knight Outposts

 - Center Control

 - Double attack eval

 - Piece Square

 1780  153  126  9.8%

 And finally we remove the

 Piece Square evaluation.

 

 Histogram.

   

 

 

List of the minor stuff that has remained in EVAL and is responsible for the 9.8%

 

  1. Tuned pieces values for the middle game and end game.
  2. Pin evaluation.
  3. Subtraction threats evaluation.
  4. Xray evaluation.
  5. Blocking a weak pawn with a knight.
  6. Rook on 7th rank evaluation.
  7. Trapped Bishop on a7|h7 and a6|h6
  8. Trapped Rook in case of lost castling rights.
  9. Quadrant rule in pawn endings.
  10. Unequal Bishop ending evaluation.
  11. All kind of standard draw evaluation (such KRPKR)
  12. Bad Bishop evaluation (bishop caught in its own pawns)
  13. Connecting rooks evaluation.
  14. Doubled rooks on (semi) open file evaluation.
  15. Queen mobility. Her majesty has a separate (very minor) mobility evaluation.
  16. Material Imbalance - encourage (avoid) piece exchange when ahead (down) in material.
  17. Pawn ending - when 1 pawn up usually wins by 90% evaluation.
  18. KPK win or draw evaluation.
  19. Avoid early queen development.
  20. Castling evaluation.

 

All these minor evaluations go into one variable and it would require 20 new and separate variable names to meassure their impact on EVAL slowing down the evalauation function which falls outside the scope of this experiment. The 20 points list is convincing enough to explain the remaining 9.8%.

 


 

Part IV - Remarks and analysis

 

  1. In general the results are as expected, King Safety, Mobility and Pawn evaluation being the dominant components in EVAL and yet there were a couple of surprises for me.
      
    a)   The unexpected high impact of using PST, approx 10% as demonstrated in match 2.1 and 3.4
     
    b)   The relative low impact of King Safety (6.5%) in match 2.2, I expected a (lot) more.

    c)   The incredible low impact of Pawn Eval (double pawns, isolated pawns, backward pawns,
          pawn pressure, pawn formation) of only 2%, see match 2.4 contrary to passed pawn EVAL
          (13.6%) see match 2.5
        
  2. Perhaps it makes sense to have a good look at Pawn eval again because of its low impact (2%) and perhaps to re-evaluate the PST values because of the unexpected (10%) impact also because these values haven't changed since the 80's.
       


  

 

 

 

 

 

 

 

 

Copyright ® 2012  Ed Schröder