Free AWS Certified Generative AI Developer Exam Practice
Explore real-world practice questions designed to test your knowledge of AWS generative AI applications including model selection, RAG systems, prompt management, security controls, cost optimization, and agentic AI workflows. This lesson helps you prepare effectively for the AWS Certified Generative AI Developer exam by simulating exam-like conditions and focused scenario-based problems.
We'll cover the following...
Question 1:
A financial services company is building a real-time GenAI application that analyzes short market alerts and generates risk summaries for traders during market hours. The application processes approximately 25,000 alerts per day, with peak bursts of 1,000 requests per minute. Traders require responses within 250 ms p95 latency, and the company must minimize inference cost while ensuring high factual accuracy and availability across Regions. The solution must allow future flexibility to switch models without application code changes.
Which solution best meets these requirements?
A. Implement Amazon Bedrock model invocation behind AWS Lambda with dynamic model selection configured through AWS AppConfig, enable Cross-Region Inference, and use a lower-cost model by default with automatic fallback to a higher-capability model for complex alerts.
B. Deploy a fine-tuned large language model on Amazon SageMaker AI real-time endpoints using GPU instances and provision capacity for peak traffic to ensure consistent latency.
C. Use Amazon Bedrock with a single high-capability model configured with provisioned throughput sized for peak load and store all generated summaries in Amazon DynamoDB for reuse.
D. Use Amazon Bedrock Knowledge Bases backed by Amazon OpenSearch Service to retrieve historical alerts and generate summaries dynamically for every request.
Question 2:
A global consulting firm is building a RAG-powered knowledge assistant for 3,000 consultants. The system must retrieve relevant content from over 12 million documents across policy manuals, client reports, and research papers stored in Amazon S3. The application must support semantic search, metadata filtering by region and document type, and return top-5 results within 500 ms. The firm wants to minimize operational overhead while ensuring retrieval accuracy at scale.
Which design choices best meet these requirements? (Select any three options.)
A. Organize the vector store into multiple indexes segmented by document domain to reduce search scope and improve query performance at scale.
B. Implement Amazon Aurora PostgreSQL with pgvector and store all embeddings in a single table with region and document type columns.
C. Use Amazon Bedrock Knowledge Bases backed by Amazon OpenSearch Serverless with hybrid search enabled.
D. Store embeddings in Amazon DynamoDB and calculate cosine similarity in AWS Lambda for each query.
E. Apply a hierarchical metadata framework using document type, region, and update timestamp as filterable attributes.
F. Use fixed-size chunking only, regardless of document structure, to simplify ingestion.
Question 3
Question 4:
A multinational pharmaceutical company is building an enterprise GenAI research assistant to support drug discovery and regulatory analysis. The system must ingest over 8 million documents from internal research reports, clinical trial summaries, and regulatory submissions stored in Amazon S3 across multiple Regions. Scientists query the system to retrieve grounded answers with citations, and responses must not expose patient PII or unapproved research content.
The company uses Amazon Bedrock for generation, Amazon OpenSearch Service for vector search, and AWS Lambda for orchestration. Regulatory requirements mandate full auditability of prompt usage, prompt versioning with approvals, and traceability of data sources used in responses. The system must support frequent, prompt updates from a central AI governance team without disrupting downstream applications.
Which solution best ensures controlled, prompt updates, auditability, and safe reuse across teams?
A. Store prompts in Amazon S3 with versioned objects and use Lambda environment variables to load prompts at runtime.
B. Use Amazon Bedrock Prompt Management with parameterized templates, approval workflows, and CloudTrail logging.
C. Embed prompts directly in application code repositories and manage changes through CI/CD pipelines.
D. Store prompts in Amazon DynamoDB and track changes using DynamoDB Streams.
E. Cache generated responses in DynamoDB to avoid repeated retrieval and generation.
Question 5:
A SaaS analytics company is building a GenAI feature that performs multi-step financial analysis and explains reasoning to customers. The solution must support reusable prompt components, conditional logic based on intermediate outputs, and consistent formatting across multiple applications. All prompt changes must be reviewed, versioned, and tested for regression before production rollout.
Which solution best meets these requirements?
A. Use Amazon Bedrock Prompt Flows to implement chained prompts with conditional branching, integrate Prompt Management for versioning and approvals, and validate outputs using AWS Step Functions.
B. Store all prompts in Amazon S3 and orchestrate multi-step reasoning using AWS Lambda with hardcoded logic.
C. Use a single large prompt with chain-of-thought instructions embedded directly in application code.
D. Implement prompt logic in DynamoDB and use Lambda to assemble prompts dynamically at runtime.
Question 6:
A SaaS customer support platform is integrating a conversational AI assistant to handle customer chats during peak hours. The assistant is built on Amazon Bedrock and must support real-time streaming responses to users with an end-to-end latency target of under 300 ms for the first token. Traffic is highly spiky, with bursts of up to 5,000 concurrent sessions during promotional events, followed by long idle periods. The company wants to minimize costs during idle periods while avoiding manual capacity management.
Which deployment strategy best meets these requirements?
A. Use Amazon Bedrock provisioned throughput with a fixed capacity sized for peak traffic and expose the model through Amazon API Gateway REST APIs.
B. Deploy the model to an Amazon SageMaker AI real-time endpoint with GPU instances and configure auto scaling based on invocation metrics.
C. Invoke Amazon Bedrock models from AWS Lambda using the Bedrock streaming API and rely on on-demand throughput for elastic scaling.
D. Deploy a containerized inference service on Amazon ECS with AWS Fargate and expose streaming responses through an Application Load Balancer.
Question 7:
A logistics company is building an agentic AI system to handle shipment exception resolution. The agent must reason over shipment data, call internal pricing and inventory APIs, and escalate to a human operator if confidence is low. The solution must support multi-step reasoning using the ReAct pattern, validate tool parameters before execution, and maintain short-term conversational state across steps. The company wants to minimize custom orchestration code while ensuring reliability.
Which two approaches best meet these requirements?
A. Store the agent state in Amazon S3 objects between each reasoning step.
B. Use Amazon Bedrock Agents with built-in tool definitions and enable function calling for internal APIs.
C. Deploy a stateless MCP server using AWS Lambda to expose internal APIs as tools with schema validation.
D. Implement AWS Step Functions to orchestrate reasoning steps, tool calls, and human-in-the-loop approval stages. E. Use Amazon ECS with AWS Fargate to host a custom orchestration service that manages reasoning logic and tool execution.
Question 8:
Question 9:
An agentic AI system integrates with multiple internal tools and services using AWS-native components. The architecture must support secure access, observability, and graceful degradation when downstream systems fail.
Match each system requirement with the most appropriate AWS component or pattern.
Enforce validated, least-privilege access to tools.
MCP servers implemented with AWS Lambda and Amazon ECS
Support both short-running and long-running tool executions.
AWS X-Ray distributed tracing
Provide end-to-end request tracing across agent workflows.
AWS Step Functions with retries and timeouts
Handle downstream failures with retries and fallback behavior.
Direct API exposure of internal services
Question 10:
An insurance company is deploying a multi-agent GenAI system to process claims. One agent handles document analysis, another evaluates policy rules, and a third generates customer-facing explanations. The system must continue operating even if one agent or tool becomes unavailable, returning partial but safe responses within 2 seconds. The architecture uses Amazon Bedrock Agents, AWS Step Functions, and Amazon DynamoDB for state.
Which design best meets these resilience requirements?
A. Orchestrate agents using AWS Step Functions with parallel states, timeouts, and fallback paths that skip failed agents.
B. Chain all agents sequentially inside a single Bedrock prompt and rely on the model to reason about failures.
C. Deploy each agent as a separate ECS service and coordinate them through synchronous REST calls.
D. Use a single agent with a very large prompt to handle all responsibilities.
Question 11:
A health care SaaS company is deploying a GenAI-powered customer support chatbot that assists patients with appointment scheduling and general benefit questions. The chatbot uses Amazon Bedrock and must comply with HIPAA requirements by preventing the disclosure of protected health information (PHI) and blocking medical advice beyond approved guidelines. The company wants a managed solution that minimizes custom code while providing consistent enforcement across all prompts and responses. Risk tolerance is low because violations could result in regulatory penalties.
Which solution best meets these content safety requirements?
A. Add explicit safety instructions in the system prompt to instruct the model to avoid PHI and unapproved medical advice.
B. Use Amazon Comprehend Medical to scan responses for PHI after generation and redact detected entities before returning results to users.
C. Configure Amazon Bedrock Guardrails with PII filters, denied medical topics, and output word filters, and attach the guardrail to all InvokeModel calls.
D. Log all prompts and responses to Amazon S3 and perform offline audits for compliance violations.
Question 12:
A European financial services company is building a RAG application that answers analyst questions using internal reports stored in Amazon S3. The data contains personally identifiable information (PII) subject to GDPR and must remain within EU Regions. The application uses Amazon Bedrock, AWS Lambda, and Amazon OpenSearch Service. The company requires encryption at rest, strict access controls, and automated redaction of PII in model inputs and outputs.
Which two controls best meet these privacy and security requirements?
A. Use Amazon Macie to continuously scan S3 buckets and alert on unexpected PII exposure.
B. Configure Amazon Bedrock Guardrails with sensitive information filters to redact PII from outputs.
C. Enable VPC endpoints and AWS PrivateLink for Bedrock and OpenSearch access to avoid public internet exposure.
D. Store embeddings and documents in DynamoDB with on-demand encryption disabled for performance.
E. Apply AWS KMS-managed encryption on S3 buckets and restrict access using bucket policies scoped to IAM roles.
Question 13:
Question 14:
A multinational bank is rolling out a GenAI research assistant for internal analysts using Amazon Bedrock and RAG over proprietary market data. The bank has a centralized AI governance team and very low risk tolerance.
Which approach best supports transparency around model behavior, intended use, and limitations as part of responsible AI governance?
A. Generate SageMaker model cards documenting intended use, limitations, and evaluation results for the FM configuration.
B. Enable AWS CloudTrail and Bedrock invocation logging for all model calls to retain an audit trail of prompts and responses.
C. Attach S3 object URIs and AWS Glue catalog metadata as citations in each response so analysts can trace retrieved source documents.
D. Require prompt templates and retrieval settings to go through change approval workflows before deployment.
Question 15:
A media company operates a public GenAI content assistant that summarizes articles and answers reader questions. Attackers attempt prompt injection to force the model to reveal internal system instructions and moderation rules. The company must mitigate these attacks while maintaining a responsive user experience. The system uses Amazon API Gateway, AWS Lambda, Amazon Bedrock, and CloudWatch.
Which solution best mitigates prompt injection threats while balancing usability?
A. Increase model temperature to make injected instructions less likely to be followed.
B. Completely disable free-form user input and restrict queries to predefined templates only.
C. Implement input sanitization and intent validation in Lambda, enforce Amazon Bedrock Guardrails with prompt attack filters, and monitor violations using CloudWatch metrics.
D. Rely solely on post-generation filtering to remove leaked instructions.
Question 16:
A large enterprise uses a GenAI-powered internal knowledge assistant built on Amazon Bedrock to answer employee questions. The application processes approximately 1.2 million requests per month, with an average prompt size of 3,000 tokens. Many requests include identical prompt prefixes, such as standard system instructions, reusable examples, and common reference context. Monthly inference costs have exceeded the target budget of $18,000, while latency requirements are moderate at under 1 second per response. The company wants to reduce costs by at least 30% without materially degrading response quality.
Which optimization strategy best meets these requirements?
A. Implement Amazon Bedrock prompt caching for repeated prompt prefixes such as system instructions, few-shot examples, and common retrieved context included in requests.
B. Switch all requests to the smallest available FM regardless of query complexity to reduce token cost.
C. Move the application to Amazon SageMaker AI real-time endpoints with provisioned GPU instances.
D. Increase model temperature to reduce token usage by encouraging shorter responses.
Question 17:
An e-commerce platform uses a GenAI assistant to generate personalized product recommendations and explanations. The system must serve up to 3,000 concurrent users during peak sales events, with a strict latency SLA of 400 ms for the first token. The application uses Amazon Bedrock, AWS Lambda, and Amazon DynamoDB for user context. The team wants to improve performance without increasing monthly costs by more than 10%.
Which optimizations best meet these requirements? (Select any three options.)
A. Implement semantic caching using Amazon ElastiCache to store responses for similar recommendation queries.
B. Enable streaming responses from Amazon Bedrock to return the first token as soon as it is generated.
C. Batch multiple user requests together in AWS Lambda before invoking the FM.
D. Increase embedding dimensionality to improve retrieval accuracy at query time.
E. Pre-compute recommendations nightly and store results in Amazon DynamoDB.
F. Use Amazon CloudFront to cache all API responses regardless of personalization.
Question 18:
Question 19:
A retrieval-augmented generation (RAG) application built on Amazon Bedrock and Amazon OpenSearch Service must optimize cost, latency, and response quality under high query volume. The team implements multiple optimization and monitoring techniques to meet strict SLAs.
Match each optimization objective with the most effective technique.
Detect abnormal cost or latency spikes.
Model cascading across foundation models
Improve perceived response latency.
Streaming responses from Amazon Bedrock
Continuously measure hallucination rates.
LLM-as-a-judge evaluations publishing metrics to CloudWatch
Reduce inference cost for simple queries.
CloudWatch anomaly detection on token usage and latency
Question 20:
A global media company runs a GenAI content summarization service with highly variable traffic. The system must maintain p95 latency under 600 ms, keep monthly inference costs below $25,000, and ensure factual accuracy above 97%. Traffic spikes unpredictably during breaking news events. The company wants automated detection and response to performance or cost anomalies.
Which solution best meets these requirements?
A. Use CloudWatch anomaly detection on token usage and invocation latency, with EventBridge and AWS Lambda automation to route traffic among pre-approved model tiers whose evaluation baselines meet the factual-accuracy target.
B. Use AWS Cost Explorer and AWS Budgets to review weekly inference spending trends, and have the operations team manually shift traffic to lower-cost model variants when monthly cost projections exceed the budget target.
C. Use provisioned capacity sized for expected breaking-news peak traffic on the highest-accuracy model tier, and keep that capacity allocated year-round to avoid latency spikes and preserve consistent output quality.
D. Disable streaming responses and standardize all summarization requests through a single response path so monitoring baselines are easier to interpret, then tune concurrency settings to reduce latency variability during traffic surges.
Question 21:
A B2B SaaS company operates a GenAI assistant that summarizes customer support tickets for account managers. In production, managers report that summaries are fluent but occasionally omit critical facts, resulting in an estimated factual accuracy of 92%, below the required 97%. The system uses Amazon Bedrock with prompt templates stored in Amazon S3, and the team wants an automated, repeatable evaluation approach before rolling out prompt updates. Human review capacity is limited, but the explainability of evaluation results is important.
Which evaluation approach best meets these requirements?
A. Use traditional NLP metrics such as ROUGE and BLEU on generated summaries.
B. Enable Amazon CloudWatch metrics to track average response length and latency over time.
C. Collect ad-hoc human feedback from account managers in Amazon DynamoDB and manually review failures.
D. Use Amazon Bedrock Model Evaluations with LLM-as-a-judge to score relevance and factual accuracy against a golden dataset stored in Amazon S3.
Question 22:
A legal research platform uses a RAG architecture to answer compliance questions from internal policy documents. After a recent update, answer accuracy dropped from 96% to 89%, and hallucination frequency increased to 6%. Logs show normal latency, but developers suspect issues in retrieval quality rather than generation. The system uses Amazon Bedrock, Amazon OpenSearch Service, AWS Lambda, and Amazon CloudWatch.
Which two evaluation techniques BEST help isolate the root cause?
A. Analyze embedding drift by comparing vector similarity distributions before and after the update.
B. Measure retrieval relevance scores by comparing retrieved chunks to expected sources using a golden dataset in Amazon S3.
C. Use LLM-as-a-judge to separately score retrieved context quality and final answer accuracy.
D. Track average token usage and completion length in CloudWatch custom metrics.
E. Increase temperature temporarily to test model robustness.
Question 23:
Question 24:
A multi-agent GenAI system is implemented using Amazon Bedrock Agents, AWS Step Functions, and Amazon DynamoDB. The system shows declining task completion rates and inconsistent outputs over time. The development team must diagnose agent-specific issues and prevent future regressions in production.
Match each requirement with the most appropriate AWS solution or technique.
Maintain short-term agent state across multi-step reasoning.
Amazon Bedrock Agent evaluations
Detect quality regressions before a full production rollout.
AWS Step Functions canary deployments with automated regression tests
Measure agent-level task completion and tool-calling accuracy.
Amazon DynamoDB for agent state persistence
Debug granular execution steps and logic errors.
CloudWatch Logs with increased verbosity
Question 25:
A knowledge-base Q&A application built on Amazon Bedrock intermittently returns truncated or irrelevant answers for long documents. Users report that accuracy drops sharply when documents exceed 40 pages, and CloudWatch Logs show no explicit errors. The system uses fixed-size chunking and concatenates all retrieved chunks into a single prompt. The team must resolve the issue without increasing latency beyond the 1-second SLA.
Which solution best addresses the root cause?
A. Increase the model temperature to encourage broader reasoning.
B. Implement dynamic chunking with hierarchical retrieval and limit context size before prompt assembly.
C. Switch to a larger FM with a bigger context window without changing retrieval logic.
D. Retry failed requests automatically using exponential backoff.