Close Menu
TechurzTechurz

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Creating a qubit fit for a quantum future

    August 28, 2025

    Anthropic will start training its AI models on chat transcripts

    August 28, 2025

    CrowdStrike buys Onum in agentic SOC push

    August 28, 2025
    Facebook X (Twitter) Instagram
    Trending
    • Creating a qubit fit for a quantum future
    • Anthropic will start training its AI models on chat transcripts
    • CrowdStrike buys Onum in agentic SOC push
    • I asked Google Finance’s AI chatbot what stocks to buy – and its answer surprised me
    • Intel has received $5.7 billion under Trump’s investment deal
    • This Qi2 battery pack from Anker just made wireless charging essential for me
    • Bob Odenkirk’s ‘Nobody 2’ Gets Streaming Date, Report Says
    • Unravelling 5G Complexity: Engaging Students with TIMS-Powered Hands-on Education
    Facebook X (Twitter) Instagram Pinterest Vimeo
    TechurzTechurz
    • Home
    • AI
    • Apps
    • News
    • Guides
    • Opinion
    • Reviews
    • Security
    • Startups
    TechurzTechurz
    Home»News»The rise of prompt ops: Tackling hidden AI costs from bad inputs and context bloat
    News

    The rise of prompt ops: Tackling hidden AI costs from bad inputs and context bloat

    TechurzBy TechurzJune 29, 2025No Comments8 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    The rise of prompt ops: Tackling hidden AI costs from bad inputs and context bloat
    Share
    Facebook Twitter LinkedIn Pinterest Email

    This article is part of VentureBeat’s special issue, “The Real Cost of AI: Performance, Efficiency and ROI at Scale.” Read more from this special issue.

    Model providers continue to roll out increasingly sophisticated large language models (LLMs) with longer context windows and enhanced reasoning capabilities. 

    This allows models to process and “think” more, but it also increases compute: The more a model takes in and puts out, the more energy it expends and the higher the costs. 

    Couple this with all the tinkering involved with prompting — it can take a few tries to get to the intended result, and sometimes the question at hand simply doesn’t need a model that can think like a PhD — and compute spend can get out of control. 

    This is giving rise to prompt ops, a whole new discipline in the dawning age of AI. 

    “Prompt engineering is kind of like writing, the actual creating, whereas prompt ops is like publishing, where you’re evolving the content,” Crawford Del Prete, IDC president, told VentureBeat. “The content is alive, the content is changing, and you want to make sure you’re refining that over time.”

    The challenge of compute use and cost

    Compute use and cost are two “related but separate concepts” in the context of LLMs, explained David Emerson, applied scientist at the Vector Institute. Generally, the price users pay scales based on both the number of input tokens (what the user prompts) and the number of output tokens (what the model delivers). However, they are not changed for behind-the-scenes actions like meta-prompts, steering instructions or retrieval-augmented generation (RAG). 

    While longer context allows models to process much more text at once, it directly translates to significantly more FLOPS (a measurement of compute power), he explained. Some aspects of transformer models even scale quadratically with input length if not well managed. Unnecessarily long responses can also slow down processing time and require additional compute and cost to build and maintain algorithms to post-process responses into the answer users were hoping for.

    Typically, longer context environments incentivize providers to deliberately deliver verbose responses, said Emerson. For example, many heavier reasoning models (o3 or o1 from OpenAI, for example) will often provide long responses to even simple questions, incurring heavy computing costs. 

    Here’s an example:

    Input: Answer the following math problem. If I have 2 apples and I buy 4 more at the store after eating 1, how many apples do I have?

    Output: If I eat 1, I only have 1 left. I would have 5 apples if I buy 4 more.

    The model not only generated more tokens than it needed to, it buried its answer. An engineer may then have to design a programmatic way to extract the final answer or ask follow-up questions like ‘What is your final answer?’ that incur even more API costs. 

    Alternatively, the prompt could be redesigned to guide the model to produce an immediate answer. For instance: 

    Input: Answer the following math problem. If I have 2 apples and I buy 4 more at the store after eating 1, how many apples do I have? Start your response with “The answer is”…

    Or: 

    Input: Answer the following math problem. If I have 2 apples and I buy 4 more at the store after eating 1, how many apples do I have? Wrap your final answer in bold tags .

    “The way the question is asked can reduce the effort or cost in getting to the desired answer,” said Emerson. He also pointed out that techniques like few-shot prompting (providing a few examples of what the user is looking for) can help produce quicker outputs. 

    One danger is not knowing when to use sophisticated techniques like chain-of-thought (CoT) prompting (generating answers in steps) or self-refinement, which directly encourage models to produce many tokens or go through several iterations when generating responses, Emerson pointed out. 

    Not every query requires a model to analyze and re-analyze before providing an answer, he emphasized; they could be perfectly capable of answering correctly when instructed to respond directly. Additionally, incorrect prompting API configurations (such as OpenAI o3, which requires a high reasoning effort) will incur higher costs when a lower-effort, cheaper request would suffice.

    “With longer contexts, users can also be tempted to use an ‘everything but the kitchen sink’ approach, where you dump as much text as possible into a model context in the hope that doing so will help the model perform a task more accurately,” said Emerson. “While more context can help models perform tasks, it isn’t always the best or most efficient approach.”

    Evolution to prompt ops

    It’s no big secret that AI-optimized infrastructure can be hard to come by these days; IDC’s Del Prete pointed out that enterprises must be able to minimize the amount of GPU idle time and fill more queries into idle cycles between GPU requests. 

    “How do I squeeze more out of these very, very precious commodities?,” he noted. “Because I’ve got to get my system utilization up, because I just don’t have the benefit of simply throwing more capacity at the problem.” 

    Prompt ops can go a long way towards addressing this challenge, as it ultimately manages the lifecycle of the prompt. While prompt engineering is about the quality of the prompt, prompt ops is where you repeat, Del Prete explained. 

    “It’s more orchestration,” he said. “I think of it as the curation of questions and the curation of how you interact with AI to make sure you’re getting the most out of it.” 

    Models can tend to get “fatigued,” cycling in loops where quality of outputs degrades, he said. Prompt ops help manage, measure, monitor and tune prompts. “I think when we look back three or four years from now, it’s going to be a whole discipline. It’ll be a skill.”

    While it’s still very much an emerging field, early providers include QueryPal, Promptable, Rebuff and TrueLens. As prompt ops evolve, these platforms will continue to iterate, improve and provide real-time feedback to give users more capacity to tune prompts over time, Dep Prete noted.

    Eventually, he predicted, agents will be able to tune, write and structure prompts on their own. “The level of automation will increase, the level of human interaction will decrease, you’ll be able to have agents operating more autonomously in the prompts that they’re creating.”

    Common prompting mistakes

    Until prompt ops is fully realized, there is ultimately no perfect prompt. Some of the biggest mistakes people make, according to Emerson: 

    • Not being specific enough about the problem to be solved. This includes how the user wants the model to provide its answer, what should be considered when responding, constraints to take into account and other factors. “In many settings, models need a good amount of context to provide a response that meets users expectations,” said Emerson. 
    • Not taking into account the ways a problem can be simplified to narrow the scope of the response. Should the answer be within a certain range (0 to 100)? Should the answer be phrased as a multiple choice problem rather than something open-ended? Can the user provide good examples to contextualize the query? Can the problem be broken into steps for separate and simpler queries?
    • Not taking advantage of structure. LLMs are very good at pattern recognition, and many can understand code. While using bullet points, itemized lists or bold indicators (****) may seem “a bit cluttered” to human eyes, Emerson noted, these callouts can be beneficial for an LLM. Asking for structured outputs (such as JSON or Markdown) can also help when users are looking to process responses automatically. 

    There are many other factors to consider in maintaining a production pipeline, based on engineering best practices, Emerson noted. These include: 

    • Making sure that the throughput of the pipeline remains consistent; 
    • Monitoring the performance of the prompts over time (potentially against a validation set);
    • Setting up tests and early warning detection to identify pipeline issues.

    Users can also take advantage of tools designed to support the prompting process. For instance, the open-source DSPy can automatically configure and optimize prompts for downstream tasks based on a few labeled examples. While this may be a fairly sophisticated example, there are many other offerings (including some built into tools like ChatGPT, Google and others) that can assist in prompt design. 

    And ultimately, Emerson said, “I think one of the simplest things users can do is to try to stay up-to-date on effective prompting approaches, model developments and new ways to configure and interact with models.” 

    Bad bloat context costs hidden inputs Ops prompt rise tackling
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleGlastonbury 2025 LIVE: how to watch from abroad, lineup, latest news
    Next Article How I Made It Brutally Honest
    Techurz
    • Website

    Related Posts

    Opinion

    AI hires or human hustle? The next frontier of startup ops at Disrupt 2025

    August 28, 2025
    AI

    Anthropic launches Claude for Chrome in limited beta, but prompt injection attacks remain a major concern

    August 27, 2025
    AI

    LLMs easily exploited using run-on sentences, bad grammar, image scaling

    August 27, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Start Saving Now: An iPhone 17 Pro Price Hike Is Likely, Says New Report

    August 17, 20258 Views

    You Can Now Get Starlink for $15-Per-Month in New York, but There’s a Catch

    July 11, 20257 Views

    Non-US businesses want to cut back on using US cloud systems

    June 2, 20257 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    Start Saving Now: An iPhone 17 Pro Price Hike Is Likely, Says New Report

    August 17, 20258 Views

    You Can Now Get Starlink for $15-Per-Month in New York, but There’s a Catch

    July 11, 20257 Views

    Non-US businesses want to cut back on using US cloud systems

    June 2, 20257 Views
    Our Picks

    Creating a qubit fit for a quantum future

    August 28, 2025

    Anthropic will start training its AI models on chat transcripts

    August 28, 2025

    CrowdStrike buys Onum in agentic SOC push

    August 28, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms and Conditions
    • Disclaimer
    © 2025 techurz. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.