~ Proving Your AWS Skills in the Generative AI Era ~
Table of Contents
- Introduction: What This Certification Means and How to Approach It
- Exam Overview and Domains
- Key Knowledge by Domain
- 3.1 Leveraging and Selecting Foundation Models
- 3.2 Prompt Engineering
- 3.3 RAG (Retrieval-Augmented Generation) Architecture
- 3.4 Fine-Tuning and Model Customization
- 3.5 Building AI Agents
- 3.6 Security and Responsible AI
- 3.7 AgentCore (New Service)
- What You'll Be Able to Do in Practice
- Study Methods and Resource Guide
- Pre-Exam Checklist
1. Introduction: What This Certification Means and How to Approach It
This document was created as a study guide for the AWS Certified Generative AI Developer – Professional exam (commonly referred to as AIP).
What This Certification Proves
This certification demonstrates your skills in developing and optimizing generative AI applications on AWS. As of 2026, demand for engineers with hands-on generative AI experience is growing rapidly. Earning this certification clearly signals to the market that you can work with AWS's generative AI stack at a professional level.
Core Study Philosophy
Advice: This exam tests understanding, not memorization. Always keep asking yourself "Why does this service exist?" and "What problem does it solve?" as you study.
2. Exam Overview and Domains
| Item | Details |
|---|---|
| Exam Name | AWS Certified Generative AI Developer – Professional |
| Official Launch | April 2026 onward (beta prior to that) |
| Question Format | Multiple choice (single and multiple answer) |
| Exam Duration | 180 minutes |
| Passing Score | 750 / 1000 (scaled score) |
| Exam Fee | ¥44,000 (tax included) |
Main Exam Domains
| Domain | Key Themes |
|---|---|
| Domain 1 | Selecting and leveraging foundation models |
| Domain 2 | Prompt engineering |
| Domain 3 | Designing and building RAG architectures |
| Domain 4 | Fine-tuning and customization |
| Domain 5 | Building AI agents |
| Domain 6 | Security, governance, and responsible AI |
3. Key Knowledge by Domain
3.1 Leveraging and Selecting Foundation Models
★ What the Exam Tests
You'll need to understand the characteristics of Foundation Models (FMs) accessible through Amazon Bedrock and be able to select the optimal model for a given use case.
Models You Need to Know
| Model Family | Provider | Key Characteristics / Strengths |
|---|---|---|
| Claude | Anthropic | Long-form comprehension, logical reasoning, safety-focused |
| Titan | Amazon | Text generation, embeddings, image generation. AWS-native |
| Llama | Meta | Open-source lineage, highly customizable |
| Mistral | Mistral AI | Lightweight and fast, cost-efficient |
| Stable Diffusion | Stability AI | Specialized for image generation |
| Command/Embed | Cohere | Strong at text generation and embeddings |
Selection Criteria (Frequently Tested)
Factors to consider when selecting a model:
- Task type: Text generation, summarization, code generation, image generation, embeddings, etc.
- Accuracy requirements: Is high precision required, or is "good enough" acceptable?
- Latency requirements: Is real-time response needed?
- Cost: Input/output token pricing, inference costs
- Context window: Maximum number of tokens the model can accept as input
- Multimodal support: Is combined text + image processing needed?
Exam Tip: Expect many questions about the trade-off between "cost optimization" and "accuracy." For questions asking "What is the most cost-efficient approach?", the correct answer usually involves starting with a smaller model and scaling up only if needed.
Key Amazon Bedrock Features
| Feature | Description |
|---|---|
| Model Access | API-based access to FMs from multiple providers |
| Playground | GUI-based test environment to try out models |
| Knowledge Bases | Managed service for building RAG pipelines |
| Agents | Autonomous task execution with external tool integration |
| Guardrails | Filtering for harmful content |
| Model Evaluation | Performance comparison across models |
| Customization | Fine-tuning and continued pre-training |
3.2 Prompt Engineering
★ What the Exam Tests
You'll be tested on the name, characteristics, and appropriate use of each prompting technique. Questions in the format "Which prompting technique is best for this situation?" appear frequently.
Core Prompting Techniques
Zero-Shot Prompting
Giving instructions only, with no examples. Relies entirely on the model's pre-trained knowledge.
Please summarize the following text in 3 lines:
[text]
When to use: Simple tasks where the model's general capabilities are sufficient
Few-Shot Prompting
Providing a few input–output examples before presenting the actual task.
Review: "This product is amazing!" → Sentiment: Positive
Review: "It broke and is unusable." → Sentiment: Negative
Review: "It's okay." → Sentiment:
When to use: When you need the model to follow a specific format or classification criteria
Chain-of-Thought (CoT) Prompting
A technique that guides the model through a step-by-step reasoning process. Add instructions like "Think through this step by step."
Problem: A store has 12 apples. 8 are sold, then 5 more arrive.
How many apples are there? Think through this step by step.
When to use: Complex tasks requiring mathematical reasoning or logical thinking
System Prompts
A prompt that defines the model's role, constraints, and behavior. This part is not visible to the end user.
You are an AWS technical support engineer.
Only answer questions related to AWS services.
Keep your responses under 200 characters.
When to use: Any application where consistent response quality needs to be maintained
Prompt Optimization Best Practices
| Practice | Description |
|---|---|
| Be specific | Avoid vague instructions; explicitly state output format, length, and tone |
| Use delimiters | Separate input sections with XML tags or dividers |
| Iterate | Don't aim for perfection on the first try; test and refine repeatedly |
| Use negative instructions | Constraints like "Do not…" are also effective |
| Tune temperature | Low temperature = deterministic; high temperature = creative |
Inference Parameters (Frequently Tested)
| Parameter | Role | Effect of Value |
|---|---|---|
| Temperature | Controls randomness of output | Low → precise and consistent; High → diverse and creative |
| Top P | Limits candidate tokens by cumulative probability | Low → conservative; High → diverse |
| Top K | Selects from the top K candidate tokens | Small → conservative; Large → diverse |
| Max Tokens | Maximum number of output tokens | Affects cost and response length |
| Stop Sequences | String(s) that halt generation | Useful for controlling output format |
Exam Tip: "Tasks where accuracy matters (code generation, fact-based answers)" → Low Temperature "Tasks where creativity matters (brainstorming, story writing)" → High Temperature This judgment call comes up constantly.
3.3 RAG (Retrieval-Augmented Generation) Architecture
★ What the Exam Tests
The exam heavily focuses on RAG's architecture, the role of each component, and vector database selection. Build patterns using Amazon Bedrock Knowledge Bases are one of the most important topics.
What Is RAG?
RAG (Retrieval-Augmented Generation) is an architecture that retrieves relevant information from external data sources and passes it as context to an LLM to generate a response.
It reduces hallucination (plausible-sounding but incorrect answers) — a known weakness of standalone LLMs — and enables accurate responses grounded in up-to-date internal data.
RAG Architecture (Processing Flow)
┌──────────────────────────────────────────────────────────────┐
│ RAG Processing Flow │
│ │
│ User Question │
│ ↓ │
│ ① Vectorize the question using an Embedding model │
│ ↓ │
│ ② Similarity search in the vector DB (semantic search) │
│ ↓ │
│ ③ Retrieve relevant documents (chunks) │
│ ↓ │
│ ④ Inject retrieved info + original question into prompt │
│ ↓ │
│ ⑤ LLM generates a response │
│ ↓ │
│ Return answer to user │
└──────────────────────────────────────────────────────────────┘
Data Ingestion Pipeline
┌──────────────────────────────────────────────────────────────┐
│ Data Ingestion Pipeline │
│ │
│ Data Sources (S3, Web Crawler, etc.) │
│ ↓ │
│ ① Load and parse documents │
│ ↓ │
│ ② Chunking (split documents into smaller units) │
│ ↓ │
│ ③ Vectorize using an Embedding model │
│ ↓ │
│ ④ Store in vector DB │
└──────────────────────────────────────────────────────────────┘
Chunking Strategies (Frequently Tested)
| Strategy | Description | Best For |
|---|---|---|
| Fixed-size | Mechanically split by a set token count | Simple, fast, general-purpose |
| Semantic | Split by meaningful units | When semantic coherence is important |
| Hierarchical | Split into parent and child chunks | When both broad context and fine detail are needed |
| Overlapping | Overlap chunk boundaries | When you want to prevent information loss at boundaries |
Exam Tip: Chunks too large → more noise, lower accuracy, higher cost Chunks too small → context is lost, meaningful answers become impossible Expect questions about "choosing the right chunk size."
Vector Database Options
| Service | Characteristics | Exam Role |
|---|---|---|
| Amazon OpenSearch Serverless | Serverless, hybrid full-text + vector search | Most frequently tested. Appears most often in RAG questions |
| Amazon Aurora PostgreSQL (pgvector) | Adds vector search to a relational DB | When leveraging an existing RDB |
| Amazon Neptune | Graph DB + vector search | Combined with knowledge graphs |
| Pinecone | Third-party, purpose-built for vector search | Can connect from Bedrock Knowledge Bases |
| Redis Enterprise Cloud | High-speed in-memory + vector search | Low-latency requirements |
Amazon Bedrock Knowledge Bases Configuration
Amazon Bedrock Knowledge Bases is a service that lets you build the RAG pipeline described above as a fully managed solution.
Supported Data Sources:
- Amazon S3 (most common)
- Web Crawler
- Confluence
- SharePoint
- Salesforce
Key Configuration Options:
- Choice of Embedding model (Titan Embeddings, etc.)
- Choice of chunking strategy
- Choice of vector DB
- Metadata filtering settings
Exam Tip: "Build a chatbot that returns accurate answers using internal documents" → RAG (Bedrock Knowledge Bases) is the go-to answer. Make sure you understand why RAG is preferred over fine-tuning (data freshness, cost, ease of implementation).
RAG vs. Fine-Tuning (A Very Frequently Tested Comparison)
| Aspect | RAG | Fine-Tuning |
|---|---|---|
| Purpose | Improve answer accuracy by referencing external knowledge | Change model behavior or style |
| Data freshness | Can reference the latest data in real time | Depends on data available at training time |
| Cost | Increased tokens at inference (added context) | Training cost (GPU time) required |
| Implementation complexity | Relatively simple (especially with Bedrock Knowledge Bases) | Requires data prep, training, and evaluation |
| Best use cases | Internal FAQs, knowledge search, referencing current info | Specific tone/format, learning domain-specific terminology |
3.4 Fine-Tuning and Model Customization
★ What the Exam Tests
You'll be tested on the types of fine-tuning, their use cases, cost trade-offs, and when to use fine-tuning vs. RAG.
Comparison of Customization Approaches
| Approach | Cost | Effectiveness | When to Apply |
|---|---|---|---|
| Prompt Engineering | Lowest | Limited | Always try this first |
| RAG | Moderate | Highly effective for knowledge expansion | When external knowledge retrieval is needed |
| Fine-Tuning | High | Highly effective for changing model behavior | Specialized tasks in a specific domain |
| Continued Pre-Training | Highest | Fundamentally adds domain knowledge | Adding a new language or specialized field |
Exam Tip: When the question says "most cost-efficient" or "what should you try first," the correct answer pattern is: Prompt Engineering → RAG → Fine-Tuning, in that order.
The Fine-Tuning Process
- Prepare training data: Create input–output pairs in JSONL format
- Upload data to S3
- Create a customization job in Bedrock
- Train the model (Provisioned Throughput required)
- Evaluate the custom model
- Deploy and use
When to Choose Fine-Tuning
- You want to teach the model a specific response style or tone
- You need the model to understand industry-specific terminology or abbreviations
- You need consistent output in a specific format (JSON, XML, etc.)
- Prompt engineering and RAG don't achieve sufficient accuracy
3.5 Building AI Agents
★ What the Exam Tests
You'll be tested on how Amazon Bedrock Agents work, integrating actions with Knowledge Bases, and connecting to Lambda functions.
What Are Amazon Bedrock Agents?
Bedrock Agents enable an LLM to interact with external APIs and data sources to autonomously execute multi-step tasks.
Agent Components
| Component | Description |
|---|---|
| Foundation Model | The LLM that serves as the agent's "brain" |
| Instructions | A prompt defining the agent's role and constraints |
| Action Groups | External operations the agent can perform (implemented via Lambda functions) |
| Knowledge Bases | Internal data the agent can reference (RAG) |
Agent Execution Flow
User question
↓
Agent analyzes the question (orchestration)
↓
Selects actions as needed
├→ Search Knowledge Base → retrieve relevant info
├→ Invoke Lambda function → external API/DB operation
└→ Additional reasoning needed → query model again
↓
Generate final answer and return to user
Defining Action Groups
Action groups are defined using an OpenAPI schema and linked to a backend Lambda function.
# Example OpenAPI schema
paths:
/getOrderStatus:
get:
summary: "Get the status of an order"
parameters:
- name: orderId
description: "Order ID"
required: true
Exam Tip: "Retrieve or update data from an external system based on a user's question" → Bedrock Agents + Action Groups (Lambda) is the correct answer pattern.
3.6 Security and Responsible AI
★ What the Exam Tests
You'll be tested on configuring Guardrails, ensuring data privacy, IAM-based access control, and harmful content filtering.
Amazon Bedrock Guardrails
| Feature | Description |
|---|---|
| Content Filters | Detect and block violent, sexual, or discriminatory content |
| Denied Topics | Refuse to respond to specific topics |
| Word Filters | Block specific words or phrases |
| PII Detection | Detect and mask personally identifiable information |
| Contextual Grounding | Hallucination detection (verifying alignment with source) |
Security Best Practices
- Least privilege with IAM: Restrict Bedrock model access to the minimum necessary
- Use VPC endpoints: Access Bedrock without going through the public internet
- Encrypt data: At rest (KMS) and in transit (TLS)
- Audit with CloudTrail: Log all API calls
- Model invocation logging: Record inputs and outputs (S3 / CloudWatch Logs)
Responsible AI Principles (Tested Points)
- Fairness: Detecting and mitigating bias
- Explainability: Being able to present the reasoning behind model decisions
- Privacy: Handling personal data appropriately
- Safety: Preventing harmful outputs
- Transparency: Disclosing when content is AI-generated
Exam Tip: "How do you prevent data leakage when processing data that may contain PII with an LLM?" → Bedrock Guardrails (PII detection and masking) is the correct answer.
3.7 AgentCore (New Service: Announced 2025)
★ What the Exam Tests
Since AgentCore is a relatively new service, the exam focuses on understanding its basic positioning and key components rather than deep technical details. In some questions, AgentCore may be the ideal answer but not appear as an option — a solid grasp of the overview is enough to handle those.
What Is AgentCore?
AgentCore is a managed production infrastructure service that supports the shift from "calling a model" to "operating autonomous agents".
Before AgentCore, the dominant pattern was "call an LLM, get a response (+ RAG)." In 2025, the paradigm fundamentally shifted toward building agents that plan, execute, learn, and act autonomously.
The "6 Challenges" from POC to Production
AgentCore was built to address the following challenges:
| Challenge | Description |
|---|---|
| ① Accuracy | Real users don't behave the way demos assume |
| ② Scalability | Supporting many users across many domains |
| ③ Memory | Safe memory management across users and agents |
| ④ Security | Access control to production systems and real data |
| ⑤ Cost | Controlling inference token and hosting costs |
| ⑥ Observability | Real-time visibility into agent behavior |