Large Language Models (LLMs) have demonstrated impressive proficiency in numerous tasks, but their ability to perform multi-step reasoning remains a significant ... by optimizing the soft Bellman ...