(Project with INTEL research) Hash Functions in Networking – Performance Evaluation Kit

Contact: Jean-Yves Le Boudec

Hashing is used extensively in contemporary networks, to improve performance of various lookup and memory access schemes, to provide for load balancing among multiple targets, and for various other tasks. Although selected publications exist, when designing a particular application, it is often not easy to determine which hash function suits best the particular task and the particular input data distribution.

In fact, it is generally impossible to evaluate a hash function on a theoretical basis only, as for any deterministic hash function, such input data patterns can be constructed that would cause the hash function to perform poorly. Thus, a hash function can only be evaluated against data exhibiting similar properties to the data that the application is destined for.

The goal of this project is to develop a framework for quick testing and evaluating any hash function against a representative set of network traces, being rapidly able to compare the following criteria:

– computation overhead;
– number of collisions in various table sizes;
– entropy of input and output data;
– comparison to the best known hash functions.

References:

[1] D. E. Knuth, “The Art of Computer Programming – Vol. 3, Sorting and searching”, 2nd edition, Addison-Wesley, 1998, Chapter 6.4 “Hashing”, pp. 513-558.

[2] Z. Cao, Z. Wang and E. W. Zegura, “Performance of Hashing-Based Schemes for Internet Load Balancing”, Proceedings of the INFOCOM 2000 Conference, pp. 332-341.

[3] R. Russo, L. Kencl, B. Metzler, P. Droz, “Scalable and Adaptive Load Balancing on IBM PowerNP”, Technical report No. RZ-3431, IBM Research Report, July 2002, Chapter 3.2.5, “Pseudorandom functions with uniform distributions”, pp. 47-61.

Benefits: State of the art research

Domain: Network performance analysis

Industry: The project is proposed by the Intel research lab and would be supervised remotely and locally at EPFL. 1-2 trips to Intel Research Cambridge would be required (costs covered by Intel).

Co-supervisors:
L. Kencl (Intel Research Cambridge)
A. Moore (Cambridge University)