Universal prediction and alpha-mutual information

Universal prediction is a problem at the intersection of information theory and learning. It deals with the estimation of a data sequence when the information about the distribution generating the data is limited. More precisely, the only information that is available is that the true distribution of the data belongs to a given class, and we wish to estimate the probability of any data sequence generated by the source with the smallest possible error, which is measured in terms of log loss. 

The optimal estimator is in general the Normalized Maximum Likelihood (NML), but it has some practical drawbacks. A good alternative for i.i.d. data sequences is the Krichevsky-Trofimov (KT) estimator. In our conference paper (available at https://arxiv.org/pdf/2202.12737.pdf) we studied a class of alternative estimators that interpolate between KT and NML. We called them Alpha-NML predictors.

A project on this topic would revolve around some experimental work (with MATLAB or Python) to study the performance of our class of predictors. In particular, the following topics are of interest:

1. In the paper linked above we derived some closed form formulas for our predictor in the i.i.d. case involving gamma functions. Using formulas involving gamma functions, we managed to use our predictor to compute the probabilities for data sequences of length up to 100. We believe that by playing around with these gamma terms it is possible to further increase the length.

2. It would be interesting to see if it is possible to derive nice closed-form formulas for other classes of distributions that are not i.i.d. (e.g., Markov sources).

3. In the paper, we studied the performance of our predictor in the worst case (that is, for the worst possible data sequence). It would be interesting to assess the performance also in the average case, and compare it to KT and NML.

 

Contact: Marco Bondaschi – [email protected]