Prompt optimization representation/optimizer open questions research batch

Date: 2026-04-11 Collector: Hermes Agent Method: arXiv API (export.arxiv.org) and OpenAlex API lookups via python3 in the local workspace.

Scope

Prompt optimization and prompting-system literature centered on representation, abstraction, and optimizer design.
Focus on prompt programs, DSPy, symbolic prompt-program search, textual gradients, RL prompt optimization, assertions, and module decomposition.
Explicitly excludes papers whose main contribution is a new base model architecture.

Core representation / abstraction papers

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines (2023)
- URL: https://arxiv.org/abs/2310.03714
- OpenAlex cited_by_count: 50
- Abstract note: abstracts LM pipelines as text-transformation graphs with declarative modules and a compiler that optimizes the pipeline against a metric.
DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines (2023)
- URL: https://arxiv.org/abs/2312.13382
- OpenAlex cited_by_count: 2
- Abstract note: adds assertion-like computational constraints plus compilation and inference-time self-refinement strategies for more reliable LM pipelines.
Symbolic Prompt Program Search: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization / SAMMO (2024)
- URL: https://arxiv.org/abs/2404.02319
- DOI: https://doi.org/10.18653/v1/2024.findings-emnlp.37
- OpenAlex cited_by_count: 2
- Abstract note: treats prompts as symbolic prompt programs and searches over compile-time transformations rather than assuming fixed prompt structure.
AutoDSPy: Automating Modular Prompt Design with Reinforcement Learning for Small and Large Language Models (2025)
- DOI: https://doi.org/10.18653/v1/2025.emnlp-industry.192
- OpenAlex cited_by_count: 0
- Abstract note: uses RL to automate DSPy pipeline construction by selecting reasoning modules, signatures, and execution strategies.
Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together (2024)
- URL: https://arxiv.org/abs/2407.10930
- DOI: https://doi.org/10.18653/v1/2024.emnlp-main.597
- OpenAlex cited_by_count: 7
- Abstract note: argues modular LM pipelines should sometimes alternate prompt optimization and weight adaptation rather than choosing only one.
Dspy-based Neural-Symbolic Pipeline to Enhance Spatial Reasoning in LLMs (2024/2025)
- URL: https://arxiv.org/abs/2411.18564
- DOI: https://doi.org/10.1016/j.neunet.2025.108022
- OpenAlex cited_by_count: 4
- Abstract note: uses DSPy as a modular substrate in a neural-symbolic pipeline, showing that decomposition itself can be performance-critical.
Is It Time To Treat Prompts As Code? A Multi-Use Case Study For Prompt Optimization Using DSPy (2025)
- URL: https://arxiv.org/abs/2507.03620
- OpenAlex cited_by_count: 0
- Abstract note: mixed but real evidence that DSPy-style optimization helps across guardrails, routing, hallucination detection, and prompt-evaluation tasks.

Core optimizer-design papers

RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning (2022)
- URL: https://arxiv.org/abs/2205.12548
- DOI: https://doi.org/10.18653/v1/2022.emnlp-main.222
- OpenAlex cited_by_count: 135
- Abstract note: optimizes discrete prompts with RL, but the resulting prompts are often ungrammatical gibberish, raising representation questions.
TEMPERA: Test-Time Prompting via Reinforcement Learning (2022)
- URL: https://arxiv.org/abs/2211.11890
- OpenAlex cited_by_count: 8
- Abstract note: edits prompts at test time using an action space over instructions, exemplars, and verbalizers.
Large Language Models Are Human-Level Prompt Engineers / APE (2022)
- URL: https://arxiv.org/abs/2211.01910
- OpenAlex cited_by_count: 297
- Abstract note: treats instructions as programs and uses proposal-and-selection search over natural-language candidates.
Reflexion: Language Agents with Verbal Reinforcement Learning (2023)
- URL: https://arxiv.org/abs/2303.11366
- OpenAlex cited_by_count: 259
- Abstract note: replaces weight updates with linguistic feedback plus episodic memory, showing that optimizer state can itself be textual.
Automatic Prompt Optimization with “Gradient Descent” and Beam Search / APO or ProTeGi (2023)
- URL: https://arxiv.org/abs/2305.03495
- DOI: https://doi.org/10.18653/v1/2023.emnlp-main.494
- OpenAlex cited_by_count: 131
- Abstract note: turns critique into textual gradients and prompt edits guided by beam search and bandit selection.
Large Language Models as Optimizers / OPRO (2023)
- URL: https://arxiv.org/abs/2309.03409
- OpenAlex cited_by_count: 91
- Abstract note: uses the LM itself as a black-box optimizer that proposes candidate instructions from prior solution-value histories.
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution (2023)
- URL: https://arxiv.org/abs/2309.16797
- OpenAlex cited_by_count: 27
- Abstract note: evolves both task prompts and mutation prompts, making the optimizer prompt itself part of the search object.
PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization (2023)
- URL: https://arxiv.org/abs/2310.16427
- OpenAlex cited_by_count: 5
- Abstract note: casts prompt optimization as Monte-Carlo-tree-search-style strategic planning over prompt states.
TextGrad: Automatic “Differentiation” via Text (2024)
- URL: https://arxiv.org/abs/2406.07496
- OpenAlex cited_by_count: 12
- Abstract note: extends textual-gradient ideas to computation graphs and compound AI systems, not just single prompts.
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning (2025)
- URL: https://arxiv.org/abs/2507.19457
- OpenAlex cited_by_count: 0
- Abstract note: argues natural-language reflection and Pareto-style evolution can outperform GRPO and MIPROv2 with far fewer rollouts.
Scaling Textual Gradients via Sampling-Based Momentum (2025)
- URL: https://arxiv.org/abs/2506.00400
- OpenAlex cited_by_count: 0
- Abstract note: explicitly studies instability and context-wall effects in textual-gradient optimization, proposing momentum-style sampling over prompt histories.

Empirical comparison / evaluation papers

A Comparative Study of DSPy Teleprompter Algorithms for Aligning Large Language Models Evaluation Metrics to Human Evaluation (2024)
- URL: https://arxiv.org/abs/2412.15298
- OpenAlex cited_by_count: 0
- Note: compares COPRO, MIPRO, BootstrapFewShot, BootstrapFewShot with Optuna, and KNN few-shot on hallucination detection judged against human labels.
Analyzing LLM Instruction Optimization for Tabular Fact Verification (2026)
- URL: https://arxiv.org/abs/2602.17937
- DOI: https://doi.org/10.18653/v1/2026.findings-eacl.161
- OpenAlex cited_by_count: 0
- Note: compares COPRO, MiPROv2, and SIMBA across direct prediction, CoT, ReAct+SQL, and CodeAct+Python, finding optimizer performance depends on the prompting regime.

Working synthesis

The open technical question is no longer just “how do we find a better prompt string?” but “what is the right representation of an LM program, and which optimizer should operate over which parts of that representation?”
DSPy and SAMMO push the representation side: typed/module-level prompt programs, symbolic transformations, and explicit constraints.
RLPrompt, TEMPERA, OPRO, Promptbreeder, PromptAgent, TextGrad, and GEPA push different optimizer families over roughly the same object class.
The evaluation papers suggest optimizer ranking is highly regime-dependent, which means the field still lacks a good abstraction for optimizer selection itself.

Agent Harness Wiki

Explorer

Prompt optimization representation/optimizer open questions research batch

Scope

Core representation / abstraction papers

Core optimizer-design papers

Empirical comparison / evaluation papers

Working synthesis

Graph View

Table of Contents

Backlinks