The challenge with gen AI consulting services is that the technology is moving faster than the consulting frameworks built around it. A gen AI roadmap designed in January 2024 using the models and benchmarks available at that time may already be optimising for the wrong architecture by the time implementation begins.
The Model Selection Decision That Cannot Be Deferred
Foundation model selection for enterprise gen AI applications is not a generic best-model question. GPT-4o, Claude Sonnet, Gemini Pro, Llama, and domain-specific models each have different performance profiles across specific task types: long-context document processing, structured data extraction, multi-step reasoning, code generation, multilingual performance. A gen AI consulting engagement that does not benchmark multiple models against your specific use case and data before recommending one is substituting market familiarity for evidence. The model that leads general benchmarks is not always the right model for your specific documents, your specific query distribution, and your specific quality bar.
The Difference Between a Gen AI Use Case and a Gen AI Application
A use case is an activity AI can improve: summarising support tickets, drafting sales emails, extracting structured data from documents. An application is the system that delivers that improvement, connected to real data sources, integrated into real workflows, with proper access controls and quality monitoring. Gen AI consulting that remains at the use case level without designing the application architecture is producing a list of things to build, not a plan for building them. The consulting engagement that moves from use case identification through application design and into implementation planning is the one that produces builds.
Prompt Engineering as a Strategic Capability
For organisations building gen AI applications on foundation model APIs, prompt design is a strategic capability, not a junior task. System prompts that define the model’s behaviour, output format requirements, and quality constraints – combined with few-shot examples that demonstrate the desired output pattern – produce dramatically more consistent and accurate results than minimal prompts. Gen AI consulting that includes prompt engineering methodology transfer – teaching the client’s team how to design, test, and version prompts with the same discipline applied to code – creates long-term internal capability rather than perpetual consulting dependency.
Cost Modeling for Gen AI at Scale
Gen AI applications have variable operating costs that scale with usage: inference cost per API call, embedding cost for RAG pipelines, storage cost for vector databases, and monitoring cost for production observability. Gen AI consulting that produces a build recommendation without a cost model for operating at the expected query volume is producing an incomplete recommendation. A gen AI application that costs $0.01 per query sounds cheap until it is processing 500,000 queries per month. Cost modeling should account for expected usage growth over the first year, not just current volume.
Build vs Buy vs Wrap: The Architecture Decision That Shapes Everything
Most enterprise gen AI applications fall into three categories: wrapping a foundation model API with a thin application layer (fastest to build, highest dependency on external model changes), fine-tuning a foundation model on domain-specific data (higher quality for specialised tasks, higher implementation cost), or building a RAG system over company documents (best for knowledge-intensive applications, requires data pipeline investment). Gen AI consulting that recommends one of these approaches for all use cases regardless of their specific requirements is applying a preferred architecture rather than matching architecture to need.

