How to Talk to an AI (And Make It Listen): APIs, Prompt Engineering, and Grounding AI in Your Data

Aparajita Sihag
Mar 21
13 min read

This is Part 2 of a three-part series for professionals building AI tools without a deep technical background. In Part 1, we covered what LLMs are, how they work, and why they hallucinate. This piece builds directly on that foundation.

Knowing what an AI is turns out to be only half the battle.

The other half is knowing how to work with one - how to connect it to your systems, how to instruct it precisely enough that it behaves the way your use case demands, and how to give it access to your actual company knowledge rather than letting it wing it from general training.

These three capabilities - connecting, instructing, and grounding - are what separate an impressive demo from a tool people actually trust and use. This piece covers all three.

Part One: APIs - How AI Connects to Everything Else

The Waiter Analogy

API stands for Application Programming Interface. It's one of the most frequently used acronyms in technology and one of the most poorly explained. The formal definition is almost useless to non-technical people. Here's a better one.

An API is a waiter in a restaurant.

You're sitting at the table. The kitchen - with all the ingredients, equipment, and skill to make your meal - is in the back. You don't walk into the kitchen yourself. You tell the waiter what you want, in a format the kitchen understands (a menu order), and the waiter returns with what you asked for. The kitchen and the dining room never directly interact. The waiter is the intermediary.

In software terms: your HR copilot is the diner. Claude or GPT-4 is the kitchen. The API is the waiter - a defined, structured channel that lets two systems exchange information without either one needing to know how the other works internally.

When your HR chatbot needs to answer a question, it doesn't "open" Claude like an application. It sends a precisely formatted request to Claude's API over the internet and receives a structured response back. That exchange takes milliseconds and costs a fraction of a cent.

What Actually Travels Back and Forth

The exchange is simpler than most people expect. Your system sends something like:

"Here are my instructions for how to behave. Here is the conversation history so far. Here is the user's new message. Please respond."

Claude's API sends back the response text, along with some metadata about how many tokens were used. That's essentially the entire interaction. There's no magic. There's no ongoing connection. It's a single request and a single response - repeated for every message.

This matters because it explains something important about cost. You're not paying for time. You're not paying per conversation. You're paying per token - per chunk of text processed. A typical HR policy Q&A exchange uses somewhere between 500 and 1,000 tokens. For a tool used by a few hundred employees, monthly API costs typically run in the tens to low hundreds of dollars - very manageable for an internal enterprise tool.

It also explains a less obvious cost driver: every API call typically resends the entire conversation history. The longer a conversation gets, the more tokens you're sending with each new message. A naive implementation of a long-running chatbot can accumulate significant costs simply because the conversation history keeps growing. Your IT team will want to think about this when designing any tool intended for extended employee interactions.

APIs provides a structured way for two different systems to talk to each other. If you think of two systems like two people, the API is like the phone system that lets them connect, and the API key is like a SIM card or phone number that identifies who is calling and keeps track of usage for billing.

API Keys: Your Identity and Your Liability

When you sign up for the OpenAI or Anthropic API, you receive an API key - a long string of characters that looks something like `sk-ant-abc123xyz...`. This key does two things simultaneously: it identifies your account and it authorises billing to it.

The security implication: If someone else obtains your API key, they can make requests charged to your account. There is no secondary authentication. The key is the credential.

In any production application, API keys must be stored in what developers call environment variables - secure server-side storage that the application reads at runtime but that never appears in the code itself, in configuration files, or anywhere else a human might casually encounter it. A surprisingly common and genuinely dangerous mistake is for a developer to hard-code an API key into a script and then upload that script to a public code repository. The key gets discovered, often within hours, and the associated account gets drained.

When working with your IT team on any AI integration, the question "how are we storing and rotating API keys?" is a reasonable one to ask.

APIs Are Everywhere - Not Just for AI

Here's the insight that unlocks the full power of what you're learning: it's not just LLMs that have APIs. Virtually every modern software service - your HRMS, your learning management system, your email platform, your document storage - communicates via API using the same two universal standards: HTTP and JSON.

HTTP is the protocol - the same one your browser uses to load any website. A request goes out, a response comes back.

JSON is the data format - a simple, human-readable way to structure information. It looks like `{"employee_name": "Priya Sharma", "department": "Finance", "leave_balance": 12}`.

This shared language is what enables Zapier and Power Automate to exist and why they're so useful. They're essentially visual interfaces for sending HTTP requests with JSON data, without you needing to write the underlying code yourself. Your AI tool sits in the middle of a network of these connections - reading from one system, reasoning with the LLM, and writing to another.

The performance review workflow, for instance, involves at least three APIs: one to read the review from wherever it's stored (SharePoint, your HRMS), one to send it to Claude for analysis, and one to write the output back to your learning platform. Each connection speaks the same language. The logic that stitches them together is your workflow.

Part Two: Prompt Engineering - How You Shape What AI Does

The Real Power in Your Hands

If the API is how you connect AI to your systems, prompt engineering is how you control what the AI actually does once it's connected. It is, without exaggeration, the most important practical skill for anyone building AI products - and it requires no coding whatsoever.

Here's the right framing: you now know that an LLM was trained on an almost incomprehensible range of human text. That means it has absorbed, in some statistical sense, an enormous range of different "voices" and "personalities" - a lawyer, a teacher, a comedian, a doctor, a HR manager, a financial analyst. Prompt engineering is how you summon the right one and keep it there consistently.

A well-prompted model feels like a carefully designed product. A poorly-prompted one will frustrate your users, give inconsistent answers, wander off-topic, and occasionally embarrass your organisation. The difference between those two outcomes lives entirely in the prompt.

The Three Layers of Every LLM Interaction

Every exchange with an LLM has up to three layers:

The system prompt is a set of instructions the model receives before the user types anything. It's invisible to the end user. This is where you define the model's role, its scope, its tone, its constraints, and its output format. Think of it as the job description you hand the AI before it starts work.

The conversation history is every message exchanged so far in the current session - both the user's messages and the model's previous responses. The model receives the full history with each new message, which is how it maintains context within a conversation.

The user's current message is whatever the person typed just now.

Of these three, the system prompt is your most powerful lever. It determines the baseline behaviour of everything that follows. A thoughtfully written system prompt is worth more than any amount of post-hoc correction.

Every interaction with an LLM has three prompt layers: System prompt, conversation history, and user message

Six Techniques That Separate Good Prompts from Great Ones

Give it a specific role. Vague instructions produce vague output. Instead of "answer HR questions," try: "You are a senior HR business partner at Acme Corp with ten years of experience. You are warm, professional, and always cite the relevant policy section when answering." The more specific the role, the more consistent and appropriate the behaviour.
Set hard boundaries. LLMs will attempt to be helpful about almost anything unless you explicitly tell them not to. For an HR policy chatbot, this means specifying: "Only answer questions based on the Acme Corp policy documents provided to you. If asked anything outside this scope - personal advice, competitor information, general knowledge - politely decline and redirect to HR." Without this instruction, your HR bot might start offering investment advice or discussing competitors. This has happened.
Show, don't just tell - few-shot examples. One of the most powerful prompting techniques is providing the model with an example of a perfect input-output pair rather than describing in the abstract what you want. If you want the model to always cite the policy section number, show it exactly what that looks like: "User: How many annual leave days am I entitled to? Assistant: As per Section 4.2 of the Leave Policy, full-time employees receive 18 days of annual leave per year, accrued monthly." The model will mirror the format, tone, and citation style in every subsequent response.
Ask for step-by-step reasoning. For complex analytical tasks - summarising a performance review, evaluating a job application, generating a learning path recommendation - adding "Think through this step by step before giving your final answer" to the prompt significantly improves output quality. This forces the model to work through the problem before committing to a conclusion, rather than jumping to the first plausible-sounding answer. It is a simple instruction that produces measurably better results on reasoning tasks.
Control the output format explicitly. For any AI tool that feeds its output into another system - a dashboard, a workflow, a database - you must specify exactly how the response should be structured. "Always respond in the following format: Summary: [one sentence]. Policy reference: [section number]. Full answer: [two to three paragraphs]. Recommended next step: [one action item]." When a workflow receives this response, it can reliably extract each field. When it receives a free-flowing paragraph, it cannot. Format control is the bridge between AI reasoning and system automation.
Plan for the "I don't know" case. Every AI system will eventually encounter a question it cannot answer well. The default behaviour - without explicit instruction - is often to generate a plausible-sounding response anyway. For enterprise tools, this is unacceptable. Your system prompt must explicitly handle this case: "If the answer is not in the documents provided to you, say so clearly and direct the employee to contact HR at hr@company.com. Do not guess." That single instruction prevents an entire class of harmful hallucinations.

What a Real Production System Prompt Looks Like

Here's a complete example for an HR policy chatbot, incorporating all six techniques:

You are Asha, an HR assistant for Acme Corp. You help employees understand company HR policies clearly and professionally.

Rules:

Only answer questions based on the policy documents provided to you in each conversation.

Always cite the policy section number and document name when answering. If the answer is not in the provided documents, respond with: "I don't have that information in my current documents - please contact HR directly at hr@acmecorp.com."

Never guess or infer policy details that are not explicitly stated in the documents.

Keep answers concise - under 150 words unless the question genuinely requires more detail.

Tone: warm, clear, and professional. Avoid jargon.

Format every response as follows:

Answer: [your response]

Policy reference: [section number and document name]

Notice what this prompt accomplishes before the user types a single word: it establishes a persona, sets hard scope boundaries, handles the fallback case explicitly, constrains length, defines tone, and enforces output format. This single block of text - which took perhaps twenty minutes to write and refine - determines the majority of how the product will feel to every employee who uses it.

Part Three: RAG - Giving AI Access to What It Couldn't Have Learned

The Problem RAG Solves

You now understand that an LLM's knowledge is baked into its weights during training - and that the training process ends at a specific point in time (unless the AI company decides to retrain its model periodically). The model knows nothing that happened after its training cutoff. More importantly for enterprise use: it knows nothing about your company at all.

Your leave policy, your internal processes, your product documentation, your proprietary research, your employees' performance records - none of this exists in the model's weights. If you ask a standard LLM about your company's notice period, it will make something up. It will do so confidently, helpfully, and incorrectly.

RAG - Retrieval Augmented Generation - is the architectural pattern that solves this problem. Instead of relying on what the model learned during training, you inject the relevant document text directly into the prompt at the moment of each query. The model reads the document and answers from what it just read - not from memory.

The result: an HR chatbot that answers from your actual policies, every single time. An L&D advisor that recommends from your actual course catalogue. A performance review analyser that references your actual competency framework.

RAG provides your unique environment's context to LLM, not available in its training data

The Mechanism: Embeddings and Vector Databases

RAG sounds straightforward until you ask the obvious question: if your HR manual is 300 pages long, how does the system know which three pages are relevant to "how do I apply for paternity leave"? You can't stuff all 300 pages into every prompt - context windows have limits, and even if they didn't, sending irrelevant content to the model wastes tokens and dilutes the quality of the answer.

The answer lies in two concepts that work together: embeddings and vector databases.

An embedding is a way of converting a piece of text into a list of numbers that captures its meaning - not just its words, but what those words are about. The remarkable property of embeddings is that texts with similar meanings end up as similar numbers. "How many vacation days do I get?" and "What is my annual leave entitlement?" would produce nearly identical embeddings, even though the words are different, because they mean the same thing.

During the setup phase, every chunk of your company's documents - each paragraph, each policy section - gets converted into an embedding. These embeddings are stored in a vector database, which is a specialised storage system optimised for finding numerical similarity at speed.

When an employee asks a question, that question also gets converted into an embedding. The system then searches the vector database for document chunks whose embeddings are most similar to the question embedding - and retrieves those chunks. Those chunks are then injected into the prompt alongside the question, and the model answers from them.

A rough analogy: imagine your HR manual as a library, and the embedding process as creating a comprehensive index that captures meaning rather than just keywords. "Maternity leave" and "parental leave" and "leave for new mothers" all point to the same section, because the index understands they're about the same thing. When a question arrives, the system checks the meaning-index and retrieves the most relevant pages.

Three Things That Make RAG Work Well in Practice

Understanding the concept is one thing. Deploying it well is another. Three factors separate a reliable RAG system from a frustrating one.

Chunking strategy. Documents need to be split into chunks before embedding. Split too small, and each chunk lacks enough context to be meaningful. Split too large, and you lose specificity. The wrong split can cut a sentence in half and destroy the meaning of a policy clause. A thoughtful chunking strategy - splitting by paragraph, section heading, or logical topic rather than by fixed character count - makes a significant difference in answer quality. Modern RAG tools handle this automatically, but knowing the concept means you can have an intelligent conversation with whoever is implementing it.
Document freshness. RAG is only as good as the documents in the vector database. If your leave policy changes in January and nobody updates the database until March, your copilot will spend three months confidently giving employees outdated information. RAG is not a one-time setup - it requires a process for keeping the document library current. This is a people and process problem as much as a technical one. The team that owns the policies needs to own the process of updating the system when those policies change.
Handling the "not in the documents" case. Even with comprehensive document coverage, some questions won't have answers in the available content. Without explicit instruction, the model will fill that gap with a plausible-sounding generated response. As covered in the prompt engineering section, your system prompt must explicitly instruct the model to acknowledge when information isn't available and redirect the employee accordingly.

Training vs. RAG: An Important Distinction

A common point of confusion is worth addressing directly. When you use a tool like ChatGPT and the toggle says "your chats may be used to improve our models," does that mean the conversations are being fed back into the weights? And if so, does your sensitive data risk becoming part of the model's future responses to other users?

The answer requires understanding the difference between two very different processes.

Training is how the model builds its weights - the slow, expensive, once-in-a-while process where the model learns from vast amounts of text. When you interact with Claude or ChatGPT in your daily use, you are not retraining the model. The weights are fixed. You're just sending input through them and getting output back.

RAG is a retrieval process - it happens in real time, for each individual query, and only affects what the model knows in that specific conversation. The retrieved document chunks are added to the prompt. They are not written into the weights. When the conversation ends, the model retains no memory of what it read.

So when a consumer platform says it may use your chats for training: that means your conversations could influence future versions of the model through the standard training pipeline - a slow, large-scale process, not an immediate response feed. It does not mean other users can currently access your conversations.

However - and this is the important nuance - "opting out of training" is not the same as "your data doesn't exist on their servers." Your conversations are still transmitted to their infrastructure, potentially stored, and potentially reviewed by human employees. For genuinely sensitive enterprise data, the only truly safe architecture is either a contractually protected enterprise agreement or a self-hosted model where the data never leaves your own infrastructure at any point.

What You Can Now Do

You've covered a lot of ground across these two pieces. Let's be explicit about what it unlocks.

You can now explain to a vendor - precisely and confidently - why their claim that "our model is trained on HR data so it knows everything about HR" is insufficient for your use case. Training data is static. It cannot include each employee's specific performance history, your current course catalogue, or the version of the leave policy that took effect last quarter. RAG is essential not because the model is bad, but because your data is private, specific, and constantly changing.

You can write a system prompt that gives an AI tool a role, sets boundaries, handles fallback cases, and enforces output format - before your IT team builds a single line of integration code. Doing this well before technical development begins saves significant rework.

You can ask the right security questions. Where does the data go? Who can read it? Is this an enterprise agreement or a consumer plan? If our data can't leave our servers, what's the deployment path?

And you understand, at a conceptual level, how the retrieval mechanism works - which means you can contribute meaningfully to decisions about document chunking, database updates, and fallback handling.

In Part 3, we bring everything together into the practical question: how do you actually build this? We'll map out the no-code and low-code tool landscape, show you how to design multi-step AI workflows, and give you the architectural patterns that cover the vast majority of enterprise AI use cases.

How to Talk to an AI (And Make It Listen): APIs, Prompt Engineering, and Grounding AI in Your Data

Part One: APIs - How AI Connects to Everything Else

Part Two: Prompt Engineering - How You Shape What AI Does

Part Three: RAG - Giving AI Access to What It Couldn't Have Learned

Recent Posts

Comments