Prompt optimization and DSPy follow-ups research batch

Date: 2026-04-11 Collector: Meridian (Hermes) Method: direct arXiv API (export.arxiv.org) plus OpenAlex API lookups for title, year, citation count, landing URL, and abstract snippets. This batch was assembled because web_search was credit-limited in-session.

Scope

  • Research on RL training prompts and prompting systems, focusing on the prompt/program/workflow surface rather than underlying model architectures.
  • Research that directly extends, evaluates, or meaningfully follows up on DSPy.

RL training prompts and prompting systems

  1. Prefix-Tuning: Optimizing Continuous Prompts for Generation (2021)

  2. Black-Box Tuning for Language-Model-as-a-Service (2022)

    • URL: https://arxiv.org/abs/2201.03514
    • OpenAlex cited_by_count: 56
    • Notes: prompt optimization under API-only access; useful for black-box prompt search settings.
  3. RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning (2022)

  4. TEMPERA: Test-Time Prompting via Reinforcement Learning (2022)

  5. Large Language Models Are Human-Level Prompt Engineers (2022)

    • URL: https://arxiv.org/abs/2211.01910
    • OpenAlex cited_by_count: 297
    • Notes: Automatic Prompt Engineer (APE); not RL, but a major step toward automated prompt search.
  6. Active Prompting with Chain-of-Thought for Large Language Models (2023)

    • URL: https://arxiv.org/abs/2302.12246
    • OpenAlex cited_by_count: 44
    • Notes: prompt selection policy over which examples receive CoT annotation and enter the prompt.
  7. Reflexion: Language Agents with Verbal Reinforcement Learning (2023)

    • URL: https://arxiv.org/abs/2303.11366
    • OpenAlex cited_by_count: 259
    • Notes: replaces weight updates with textual self-critique and memory; operationally a prompting-system paper.
  8. Automatic Prompt Optimization with “Gradient Descent” and Beam Search (2023)

  9. Large Language Models as Optimizers (2023)

    • URL: https://arxiv.org/abs/2309.03409
    • OpenAlex cited_by_count: 91
    • Notes: OPRO frames prompt optimization as iterative black-box optimization with the LM itself as the optimizer.
  10. Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution (2023)

  11. PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization (2023)

  12. DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines (2023)

    • URL: https://arxiv.org/abs/2310.03714
    • OpenAlex cited_by_count: 50
    • Notes: prompt-program framework; moves optimization target from a single prompt to an LM program/pipeline.
  13. TextGrad: Automatic “Differentiation” via Text (2024)

    • URL: https://arxiv.org/abs/2406.07496
    • OpenAlex cited_by_count: 12
    • Notes: optimization framework for compound AI systems via textual feedback; adjacent to DSPy.
  14. Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together (2024)

  15. GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning (2025)

    • URL: https://arxiv.org/abs/2507.19457
    • OpenAlex cited_by_count: 0
    • Notes: explicit recent claim that reflective/evolutionary prompt optimization can beat RL in some settings.
  16. Symbolic Prompt Program Search: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization (2024)

Direct extensions and evaluations

  1. DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines (2023)

    • URL: https://arxiv.org/abs/2312.13382
    • OpenAlex cited_by_count: 2
    • Notes: extends DSPy with assertions / computational constraints and compilation strategies for more reliable pipelines.
  2. A Comparative Study of DSPy Teleprompter Algorithms for Aligning Large Language Models Evaluation Metrics to Human Evaluation (2024)

    • URL: https://arxiv.org/abs/2412.15298
    • OpenAlex cited_by_count: 0
    • Notes: empirical comparison of COPRO, MIPRO, BootstrapFewShot, BootstrapFewShot with Optuna, and KNN few-shot within the DSPy framework.
  3. Dspy-based Neural-Symbolic Pipeline to Enhance Spatial Reasoning in LLMs (2024)

  4. Is It Time To Treat Prompts As Code? A Multi-Use Case Study For Prompt Optimization Using DSPy (2025)

    • URL: https://arxiv.org/abs/2507.03620
    • OpenAlex cited_by_count: 0
    • Notes: multi-use-case study applying DSPy to guardrails, hallucination detection, code generation, routing, and prompt evaluation.
  5. Analyzing LLM Instruction Optimization for Tabular Fact Verification (2026)

  6. Optimizing LLM Prompt Engineering with DSPy Based Declarative Learning (2026)

    • URL: https://arxiv.org/abs/2604.04869
    • OpenAlex cited_by_count: 0
    • Notes: recent DSPy-centered prompt engineering paper; appears more application-facing than foundational.
  7. AutoDSPy: Automating Modular Prompt Design with Reinforcement Learning for Small and Large Language Models (2025)

    • URL: not resolved by OpenAlex landing-page field in-session
    • OpenAlex cited_by_count: 0
    • Notes: explicitly positions itself as the first framework to automate DSPy pipeline construction with reinforcement learning.

Adjacent systems reacting to the same problem

  1. TextGrad: Automatic “Differentiation” via Text (2024)

  2. Symbolic Prompt Program Search: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization (2024)

  3. Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together (2024)

  4. GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning (2025)

Provisional synthesis

  • The field has moved from static prompt templates to optimizing prompts, then to optimizing prompt programs and multi-stage LM pipelines.
  • Strictly RL-based prompt optimization exists, but a large share of the strongest recent work uses textual gradients, evolutionary search, strategic planning, or compiler-like rewrites instead.
  • DSPy matters because it reframes the optimization object: not “find one good string”, but “compile and tune an LM program”.
  • The direct research literature following DSPy is still thinner than the library and tooling line around DSPy itself. The most visible paper trail so far is assertions, teleprompter/optimizer evaluation, application studies, and RL-driven automation attempts like AutoDSPy.