Prompt optimization and DSPy follow-ups research batch

Date: 2026-04-11 Collector: Meridian (Hermes) Method: direct arXiv API (export.arxiv.org) plus OpenAlex API lookups for title, year, citation count, landing URL, and abstract snippets. This batch was assembled because web_search was credit-limited in-session.

Scope

Research on RL training prompts and prompting systems, focusing on the prompt/program/workflow surface rather than underlying model architectures.
Research that directly extends, evaluates, or meaningfully follows up on DSPy.

RL training prompts and prompting systems

Prefix-Tuning: Optimizing Continuous Prompts for Generation (2021)
- URL: https://doi.org/10.18653/v1/2021.acl-long.353
- arXiv: https://arxiv.org/abs/2101.00190
- OpenAlex cited_by_count: 2117
- Notes: canonical soft-prompt paper; optimizes a continuous prefix while keeping the base LM fixed.
Black-Box Tuning for Language-Model-as-a-Service (2022)
- URL: https://arxiv.org/abs/2201.03514
- OpenAlex cited_by_count: 56
- Notes: prompt optimization under API-only access; useful for black-box prompt search settings.
RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning (2022)
- URL: https://doi.org/10.18653/v1/2022.emnlp-main.222
- arXiv: https://arxiv.org/abs/2205.12548
- OpenAlex cited_by_count: 135
- Notes: cleanest direct paper on RL over discrete prompts for frozen language models.
TEMPERA: Test-Time Prompting via Reinforcement Learning (2022)
- URL: https://arxiv.org/abs/2211.11890
- OpenAlex cited_by_count: 8
- Notes: RL-driven prompt editing at test time rather than only offline prompt search.
Large Language Models Are Human-Level Prompt Engineers (2022)
- URL: https://arxiv.org/abs/2211.01910
- OpenAlex cited_by_count: 297
- Notes: Automatic Prompt Engineer (APE); not RL, but a major step toward automated prompt search.
Active Prompting with Chain-of-Thought for Large Language Models (2023)
- URL: https://arxiv.org/abs/2302.12246
- OpenAlex cited_by_count: 44
- Notes: prompt selection policy over which examples receive CoT annotation and enter the prompt.
Reflexion: Language Agents with Verbal Reinforcement Learning (2023)
- URL: https://arxiv.org/abs/2303.11366
- OpenAlex cited_by_count: 259
- Notes: replaces weight updates with textual self-critique and memory; operationally a prompting-system paper.
Automatic Prompt Optimization with “Gradient Descent” and Beam Search (2023)
- URL: https://doi.org/10.18653/v1/2023.emnlp-main.494
- arXiv: https://arxiv.org/abs/2305.03495
- OpenAlex cited_by_count: 131
- Notes: ProTeGi / APO; textual-gradient prompt editing rather than RL.
Large Language Models as Optimizers (2023)
- URL: https://arxiv.org/abs/2309.03409
- OpenAlex cited_by_count: 91
- Notes: OPRO frames prompt optimization as iterative black-box optimization with the LM itself as the optimizer.
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution (2023)
- URL: https://arxiv.org/abs/2309.16797
- OpenAlex cited_by_count: 27
- Notes: evolutionary search over prompts and mutation prompts; important alternative to RL.
PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization (2023)
- URL: https://arxiv.org/abs/2310.16427
- OpenAlex cited_by_count: 5
- Notes: treats prompt optimization as a strategic planning/search problem.
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines (2023)
- URL: https://arxiv.org/abs/2310.03714
- OpenAlex cited_by_count: 50
- Notes: prompt-program framework; moves optimization target from a single prompt to an LM program/pipeline.
TextGrad: Automatic “Differentiation” via Text (2024)
- URL: https://arxiv.org/abs/2406.07496
- OpenAlex cited_by_count: 12
- Notes: optimization framework for compound AI systems via textual feedback; adjacent to DSPy.
Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together (2024)
- URL: https://doi.org/10.18653/v1/2024.emnlp-main.597
- arXiv: https://arxiv.org/abs/2407.10930
- OpenAlex cited_by_count: 7
- Notes: argues prompt optimization and model tuning should be treated jointly in modular LM pipelines.
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning (2025)
- URL: https://arxiv.org/abs/2507.19457
- OpenAlex cited_by_count: 0
- Notes: explicit recent claim that reflective/evolutionary prompt optimization can beat RL in some settings.
Symbolic Prompt Program Search: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization (2024)
- URL: https://doi.org/10.18653/v1/2024.findings-emnlp.37
- OpenAlex cited_by_count: 2
- Notes: SAMMO; represents prompt programs symbolically and searches compile-time transformations.

Direct extensions and evaluations

DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines (2023)
- URL: https://arxiv.org/abs/2312.13382
- OpenAlex cited_by_count: 2
- Notes: extends DSPy with assertions / computational constraints and compilation strategies for more reliable pipelines.
A Comparative Study of DSPy Teleprompter Algorithms for Aligning Large Language Models Evaluation Metrics to Human Evaluation (2024)
- URL: https://arxiv.org/abs/2412.15298
- OpenAlex cited_by_count: 0
- Notes: empirical comparison of COPRO, MIPRO, BootstrapFewShot, BootstrapFewShot with Optuna, and KNN few-shot within the DSPy framework.
Dspy-based Neural-Symbolic Pipeline to Enhance Spatial Reasoning in LLMs (2024)
- URL: https://arxiv.org/abs/2411.18564
- OpenAlex cited_by_count: 1
- Notes: application paper using DSPy inside an LLM + ASP spatial reasoning pipeline.
Is It Time To Treat Prompts As Code? A Multi-Use Case Study For Prompt Optimization Using DSPy (2025)
- URL: https://arxiv.org/abs/2507.03620
- OpenAlex cited_by_count: 0
- Notes: multi-use-case study applying DSPy to guardrails, hallucination detection, code generation, routing, and prompt evaluation.
Analyzing LLM Instruction Optimization for Tabular Fact Verification (2026)
- URL: https://doi.org/10.18653/v1/2026.findings-eacl.161
- arXiv: https://arxiv.org/abs/2602.17937
- OpenAlex cited_by_count: 0
- Notes: systematic comparison of DSPy optimizers including COPRO, MiPROv2, and SIMBA.
Optimizing LLM Prompt Engineering with DSPy Based Declarative Learning (2026)
- URL: https://arxiv.org/abs/2604.04869
- OpenAlex cited_by_count: 0
- Notes: recent DSPy-centered prompt engineering paper; appears more application-facing than foundational.
AutoDSPy: Automating Modular Prompt Design with Reinforcement Learning for Small and Large Language Models (2025)
- URL: not resolved by OpenAlex landing-page field in-session
- OpenAlex cited_by_count: 0
- Notes: explicitly positions itself as the first framework to automate DSPy pipeline construction with reinforcement learning.

Adjacent systems reacting to the same problem

TextGrad: Automatic “Differentiation” via Text (2024)
- URL: https://arxiv.org/abs/2406.07496
- Notes: optimization substrate for compound AI systems via textual gradients.
Symbolic Prompt Program Search: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization (2024)
- URL: https://doi.org/10.18653/v1/2024.findings-emnlp.37
- Notes: prompt programs as symbolic search objects; very close to the same design thesis as DSPy.
Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together (2024)
- URL: https://doi.org/10.18653/v1/2024.emnlp-main.597
- Notes: from the DSPy orbit; important because it broadens the optimization target from prompts alone to prompt-plus-weight adaptation.
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning (2025)
- URL: https://arxiv.org/abs/2507.19457
- Notes: not a DSPy paper per se, but squarely in the same optimizer-for-language-programs conversation.

Provisional synthesis

The field has moved from static prompt templates to optimizing prompts, then to optimizing prompt programs and multi-stage LM pipelines.
Strictly RL-based prompt optimization exists, but a large share of the strongest recent work uses textual gradients, evolutionary search, strategic planning, or compiler-like rewrites instead.
DSPy matters because it reframes the optimization object: not “find one good string”, but “compile and tune an LM program”.
The direct research literature following DSPy is still thinner than the library and tooling line around DSPy itself. The most visible paper trail so far is assertions, teleprompter/optimizer evaluation, application studies, and RL-driven automation attempts like AutoDSPy.

Agent Harness Wiki

Explorer

Prompt optimization and DSPy follow-ups research batch

Scope

RL training prompts and prompting systems

Direct extensions and evaluations

Adjacent systems reacting to the same problem

Provisional synthesis

Graph View

Table of Contents

Backlinks

Agent Harness Wiki

Explorer

Prompt optimization and DSPy follow-ups research batch

Scope

RL training prompts and prompting systems

DSPy follow-ups and closely related work

Direct extensions and evaluations

Adjacent systems reacting to the same problem

Provisional synthesis

Graph View

Table of Contents

Backlinks