Modern CNN-based human pose estimators are constructed as regression architectures with human pose prediction as output. This formulation does not naturally involve the confidence of the estimation. Another possible solution is to regress the pose refinement directly via the CNN , which gives slight improvement over the ordinary regression-based techniques. However, it still suffers from a high number of hyperparameters and very complex training procedure.
Recent advances in energy-based learning propose the confidence-based regression, which entails predicting a confidence value for each input-target pair (x, y) [3, 4, 5]. One approach  suggests iterative optimization of the input via network gradient ascent, which aims at increasing the confidence value. Such approach has demonstrated impressive results in object detection.
Our goal is to obtain a proof of concept for this idea  in pose estimation setting. Main steps include the replication of the main result of the target paper  (replication would be meaningful only in case the HPE pipeline will be very close to it) and its extension to pose estimation task, probably, 2D poses only. As it is seen now, the most obscure point is the dataset preparation for pose estimation task, as it is not clear what the “bad” feasible pose is ( may provide possible solution for this issue).
The candidate should have programming experience, ideally in Python. Previous experience with machine learning and computer vision is +, experience with PyTorch is ++. Main requirements are curiosity to learn new and willingness to overcome difficulties.
 “Acquisition of Localization Confidence for Accurate Object Detection”, Jiang et al., 2018
 “Human Pose Estimation with Iterative Error Feedback”, Carreira et al., 2016
 “How to Train Your Energy-Based Model for Regression”, Gustafsson et al., 2020
 “Energy-Based Models for Deep Probabilistic Regression”, Gustafsson et al., 2020
 “A Tutorial on Energy-Based Learning”, LeCun et al., 2006
 “PoseFix: Model-agnostic General Human Pose Refinement Network”, Moon et al., 2019