~ Proving Your AWS Skills in the Generative AI Era ~


Table of Contents

  1. Introduction: What This Certification Means and How to Approach It
  2. Exam Overview and Domains
  3. Key Knowledge by Domain
    • 3.1 Leveraging and Selecting Foundation Models
    • 3.2 Prompt Engineering
    • 3.3 RAG (Retrieval-Augmented Generation) Architecture
    • 3.4 Fine-Tuning and Model Customization
    • 3.5 Building AI Agents
    • 3.6 Security and Responsible AI
    • 3.7 AgentCore (New Service)
  4. What You'll Be Able to Do in Practice
  5. Study Methods and Resource Guide
  6. Pre-Exam Checklist

1. Introduction: What This Certification Means and How to Approach It

This document was created as a study guide for the AWS Certified Generative AI Developer – Professional exam (commonly referred to as AIP).

What This Certification Proves

This certification demonstrates your skills in developing and optimizing generative AI applications on AWS. As of 2026, demand for engineers with hands-on generative AI experience is growing rapidly. Earning this certification clearly signals to the market that you can work with AWS's generative AI stack at a professional level.

Core Study Philosophy

Advice: This exam tests understanding, not memorization. Always keep asking yourself "Why does this service exist?" and "What problem does it solve?" as you study.


2. Exam Overview and Domains

Item Details
Exam Name AWS Certified Generative AI Developer – Professional
Official Launch April 2026 onward (beta prior to that)
Question Format Multiple choice (single and multiple answer)
Exam Duration 180 minutes
Passing Score 750 / 1000 (scaled score)
Exam Fee ¥44,000 (tax included)

Main Exam Domains

Domain Key Themes
Domain 1 Selecting and leveraging foundation models
Domain 2 Prompt engineering
Domain 3 Designing and building RAG architectures
Domain 4 Fine-tuning and customization
Domain 5 Building AI agents
Domain 6 Security, governance, and responsible AI

3. Key Knowledge by Domain


3.1 Leveraging and Selecting Foundation Models

★ What the Exam Tests

You'll need to understand the characteristics of Foundation Models (FMs) accessible through Amazon Bedrock and be able to select the optimal model for a given use case.

Models You Need to Know

Model Family Provider Key Characteristics / Strengths
Claude Anthropic Long-form comprehension, logical reasoning, safety-focused
Titan Amazon Text generation, embeddings, image generation. AWS-native
Llama Meta Open-source lineage, highly customizable
Mistral Mistral AI Lightweight and fast, cost-efficient
Stable Diffusion Stability AI Specialized for image generation
Command/Embed Cohere Strong at text generation and embeddings

Selection Criteria (Frequently Tested)

Factors to consider when selecting a model:

  1. Task type: Text generation, summarization, code generation, image generation, embeddings, etc.
  2. Accuracy requirements: Is high precision required, or is "good enough" acceptable?
  3. Latency requirements: Is real-time response needed?
  4. Cost: Input/output token pricing, inference costs
  5. Context window: Maximum number of tokens the model can accept as input
  6. Multimodal support: Is combined text + image processing needed?

Exam Tip: Expect many questions about the trade-off between "cost optimization" and "accuracy." For questions asking "What is the most cost-efficient approach?", the correct answer usually involves starting with a smaller model and scaling up only if needed.

Key Amazon Bedrock Features

Feature Description
Model Access API-based access to FMs from multiple providers
Playground GUI-based test environment to try out models
Knowledge Bases Managed service for building RAG pipelines
Agents Autonomous task execution with external tool integration
Guardrails Filtering for harmful content
Model Evaluation Performance comparison across models
Customization Fine-tuning and continued pre-training

3.2 Prompt Engineering

★ What the Exam Tests

You'll be tested on the name, characteristics, and appropriate use of each prompting technique. Questions in the format "Which prompting technique is best for this situation?" appear frequently.

Core Prompting Techniques

Zero-Shot Prompting

Giving instructions only, with no examples. Relies entirely on the model's pre-trained knowledge.

Please summarize the following text in 3 lines:
[text]

When to use: Simple tasks where the model's general capabilities are sufficient

Few-Shot Prompting

Providing a few input–output examples before presenting the actual task.

Review: "This product is amazing!" → Sentiment: Positive
Review: "It broke and is unusable." → Sentiment: Negative
Review: "It's okay." → Sentiment:

When to use: When you need the model to follow a specific format or classification criteria

Chain-of-Thought (CoT) Prompting

A technique that guides the model through a step-by-step reasoning process. Add instructions like "Think through this step by step."

Problem: A store has 12 apples. 8 are sold, then 5 more arrive.
How many apples are there? Think through this step by step.

When to use: Complex tasks requiring mathematical reasoning or logical thinking

System Prompts

A prompt that defines the model's role, constraints, and behavior. This part is not visible to the end user.

You are an AWS technical support engineer.
Only answer questions related to AWS services.
Keep your responses under 200 characters.

When to use: Any application where consistent response quality needs to be maintained

Prompt Optimization Best Practices

Practice Description
Be specific Avoid vague instructions; explicitly state output format, length, and tone
Use delimiters Separate input sections with XML tags or dividers
Iterate Don't aim for perfection on the first try; test and refine repeatedly
Use negative instructions Constraints like "Do not…" are also effective
Tune temperature Low temperature = deterministic; high temperature = creative

Inference Parameters (Frequently Tested)

Parameter Role Effect of Value
Temperature Controls randomness of output Low → precise and consistent; High → diverse and creative
Top P Limits candidate tokens by cumulative probability Low → conservative; High → diverse
Top K Selects from the top K candidate tokens Small → conservative; Large → diverse
Max Tokens Maximum number of output tokens Affects cost and response length
Stop Sequences String(s) that halt generation Useful for controlling output format

Exam Tip: "Tasks where accuracy matters (code generation, fact-based answers)" → Low Temperature "Tasks where creativity matters (brainstorming, story writing)" → High Temperature This judgment call comes up constantly.


3.3 RAG (Retrieval-Augmented Generation) Architecture

★ What the Exam Tests

The exam heavily focuses on RAG's architecture, the role of each component, and vector database selection. Build patterns using Amazon Bedrock Knowledge Bases are one of the most important topics.

What Is RAG?

RAG (Retrieval-Augmented Generation) is an architecture that retrieves relevant information from external data sources and passes it as context to an LLM to generate a response.

It reduces hallucination (plausible-sounding but incorrect answers) — a known weakness of standalone LLMs — and enables accurate responses grounded in up-to-date internal data.

RAG Architecture (Processing Flow)

┌──────────────────────────────────────────────────────────────┐
│                    RAG Processing Flow                        │
│                                                              │
│  User Question                                               │
│      ↓                                                       │
│  ① Vectorize the question using an Embedding model           │
│      ↓                                                       │
│  ② Similarity search in the vector DB (semantic search)      │
│      ↓                                                       │
│  ③ Retrieve relevant documents (chunks)                      │
│      ↓                                                       │
│  ④ Inject retrieved info + original question into prompt     │
│      ↓                                                       │
│  ⑤ LLM generates a response                                  │
│      ↓                                                       │
│  Return answer to user                                       │
└──────────────────────────────────────────────────────────────┘

Data Ingestion Pipeline

┌──────────────────────────────────────────────────────────────┐
│               Data Ingestion Pipeline                         │
│                                                              │
│  Data Sources (S3, Web Crawler, etc.)                        │
│      ↓                                                       │
│  ① Load and parse documents                                  │
│      ↓                                                       │
│  ② Chunking (split documents into smaller units)             │
│      ↓                                                       │
│  ③ Vectorize using an Embedding model                        │
│      ↓                                                       │
│  ④ Store in vector DB                                        │
└──────────────────────────────────────────────────────────────┘

Chunking Strategies (Frequently Tested)

Strategy Description Best For
Fixed-size Mechanically split by a set token count Simple, fast, general-purpose
Semantic Split by meaningful units When semantic coherence is important
Hierarchical Split into parent and child chunks When both broad context and fine detail are needed
Overlapping Overlap chunk boundaries When you want to prevent information loss at boundaries

Exam Tip: Chunks too large → more noise, lower accuracy, higher cost Chunks too small → context is lost, meaningful answers become impossible Expect questions about "choosing the right chunk size."

Vector Database Options

Service Characteristics Exam Role
Amazon OpenSearch Serverless Serverless, hybrid full-text + vector search Most frequently tested. Appears most often in RAG questions
Amazon Aurora PostgreSQL (pgvector) Adds vector search to a relational DB When leveraging an existing RDB
Amazon Neptune Graph DB + vector search Combined with knowledge graphs
Pinecone Third-party, purpose-built for vector search Can connect from Bedrock Knowledge Bases
Redis Enterprise Cloud High-speed in-memory + vector search Low-latency requirements

Amazon Bedrock Knowledge Bases Configuration

Amazon Bedrock Knowledge Bases is a service that lets you build the RAG pipeline described above as a fully managed solution.

Supported Data Sources:

  • Amazon S3 (most common)
  • Web Crawler
  • Confluence
  • SharePoint
  • Salesforce

Key Configuration Options:

  • Choice of Embedding model (Titan Embeddings, etc.)
  • Choice of chunking strategy
  • Choice of vector DB
  • Metadata filtering settings

Exam Tip: "Build a chatbot that returns accurate answers using internal documents" → RAG (Bedrock Knowledge Bases) is the go-to answer. Make sure you understand why RAG is preferred over fine-tuning (data freshness, cost, ease of implementation).

RAG vs. Fine-Tuning (A Very Frequently Tested Comparison)

Aspect RAG Fine-Tuning
Purpose Improve answer accuracy by referencing external knowledge Change model behavior or style
Data freshness Can reference the latest data in real time Depends on data available at training time
Cost Increased tokens at inference (added context) Training cost (GPU time) required
Implementation complexity Relatively simple (especially with Bedrock Knowledge Bases) Requires data prep, training, and evaluation
Best use cases Internal FAQs, knowledge search, referencing current info Specific tone/format, learning domain-specific terminology

3.4 Fine-Tuning and Model Customization

★ What the Exam Tests

You'll be tested on the types of fine-tuning, their use cases, cost trade-offs, and when to use fine-tuning vs. RAG.

Comparison of Customization Approaches

Approach Cost Effectiveness When to Apply
Prompt Engineering Lowest Limited Always try this first
RAG Moderate Highly effective for knowledge expansion When external knowledge retrieval is needed
Fine-Tuning High Highly effective for changing model behavior Specialized tasks in a specific domain
Continued Pre-Training Highest Fundamentally adds domain knowledge Adding a new language or specialized field

Exam Tip: When the question says "most cost-efficient" or "what should you try first," the correct answer pattern is: Prompt Engineering → RAG → Fine-Tuning, in that order.

The Fine-Tuning Process

  1. Prepare training data: Create input–output pairs in JSONL format
  2. Upload data to S3
  3. Create a customization job in Bedrock
  4. Train the model (Provisioned Throughput required)
  5. Evaluate the custom model
  6. Deploy and use

When to Choose Fine-Tuning

  • You want to teach the model a specific response style or tone
  • You need the model to understand industry-specific terminology or abbreviations
  • You need consistent output in a specific format (JSON, XML, etc.)
  • Prompt engineering and RAG don't achieve sufficient accuracy

3.5 Building AI Agents

★ What the Exam Tests

You'll be tested on how Amazon Bedrock Agents work, integrating actions with Knowledge Bases, and connecting to Lambda functions.

What Are Amazon Bedrock Agents?

Bedrock Agents enable an LLM to interact with external APIs and data sources to autonomously execute multi-step tasks.

Agent Components

Component Description
Foundation Model The LLM that serves as the agent's "brain"
Instructions A prompt defining the agent's role and constraints
Action Groups External operations the agent can perform (implemented via Lambda functions)
Knowledge Bases Internal data the agent can reference (RAG)

Agent Execution Flow

User question
    ↓
Agent analyzes the question (orchestration)
    ↓
Selects actions as needed
    ├→ Search Knowledge Base → retrieve relevant info
    ├→ Invoke Lambda function → external API/DB operation
    └→ Additional reasoning needed → query model again
    ↓
Generate final answer and return to user

Defining Action Groups

Action groups are defined using an OpenAPI schema and linked to a backend Lambda function.

# Example OpenAPI schema
paths:
  /getOrderStatus:
    get:
      summary: "Get the status of an order"
      parameters:
        - name: orderId
          description: "Order ID"
          required: true

Exam Tip: "Retrieve or update data from an external system based on a user's question" → Bedrock Agents + Action Groups (Lambda) is the correct answer pattern.


3.6 Security and Responsible AI

★ What the Exam Tests

You'll be tested on configuring Guardrails, ensuring data privacy, IAM-based access control, and harmful content filtering.

Amazon Bedrock Guardrails

Feature Description
Content Filters Detect and block violent, sexual, or discriminatory content
Denied Topics Refuse to respond to specific topics
Word Filters Block specific words or phrases
PII Detection Detect and mask personally identifiable information
Contextual Grounding Hallucination detection (verifying alignment with source)

Security Best Practices

  1. Least privilege with IAM: Restrict Bedrock model access to the minimum necessary
  2. Use VPC endpoints: Access Bedrock without going through the public internet
  3. Encrypt data: At rest (KMS) and in transit (TLS)
  4. Audit with CloudTrail: Log all API calls
  5. Model invocation logging: Record inputs and outputs (S3 / CloudWatch Logs)

Responsible AI Principles (Tested Points)

  • Fairness: Detecting and mitigating bias
  • Explainability: Being able to present the reasoning behind model decisions
  • Privacy: Handling personal data appropriately
  • Safety: Preventing harmful outputs
  • Transparency: Disclosing when content is AI-generated

Exam Tip: "How do you prevent data leakage when processing data that may contain PII with an LLM?" → Bedrock Guardrails (PII detection and masking) is the correct answer.


3.7 AgentCore (New Service: Announced 2025)

★ What the Exam Tests

Since AgentCore is a relatively new service, the exam focuses on understanding its basic positioning and key components rather than deep technical details. In some questions, AgentCore may be the ideal answer but not appear as an option — a solid grasp of the overview is enough to handle those.

What Is AgentCore?

AgentCore is a managed production infrastructure service that supports the shift from "calling a model" to "operating autonomous agents".

Before AgentCore, the dominant pattern was "call an LLM, get a response (+ RAG)." In 2025, the paradigm fundamentally shifted toward building agents that plan, execute, learn, and act autonomously.

The "6 Challenges" from POC to Production

AgentCore was built to address the following challenges:

Challenge Description
① Accuracy Real users don't behave the way demos assume
② Scalability Supporting many users across many domains
③ Memory Safe memory management across users and agents
④ Security Access control to production systems and real data
⑤ Cost Controlling inference token and hosting costs
⑥ Observability Real-time visibility into agent behavior

AgentCore's 7 Key Components