Pre-Summer Sale Special - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: mxmas70

Home > Amazon Web Services > AWS Certified Associate > MLA-C01

MLA-C01 AWS Certified Machine Learning Engineer - Associate Question and Answers

Question # 4

An ML engineer is developing a classification model. The ML engineer needs to use custom libraries in processing jobs, training jobs, and pipelines in Amazon SageMaker AI.

Which solution will provide this functionality with the LEAST implementation effort?

A.

Manually install the libraries in the SageMaker AI containers.

B.

Build a custom Docker container that includes the required libraries. Host the container in Amazon Elastic Container Registry (Amazon ECR). Use the ECR image in the SageMaker AI jobs and pipelines.

C.

Use a SageMaker AI notebook instance and install libraries at startup.

D.

Run code externally on Amazon EC2 and import results into SageMaker AI.

Full Access
Question # 5

A company is developing a customer support AI assistant by using an Amazon Bedrock Retrieval Augmented Generation (RAG) pipeline. The AI assistant retrieves articles from a knowledge base stored in Amazon S3. The company uses Amazon OpenSearch Service to index the knowledge base. The AI assistant uses an Amazon Bedrock Titan Embeddings model for vector search.

The company wants to improve the relevance of the retrieved articles to improve the quality of the AI assistant ' s answers.

Which solution will meet these requirements?

A.

Use auto-summarization on the retrieved articles by using Amazon SageMaker JumpStart.

B.

Use a reranker model before passing the articles to the foundation model (FM).

C.

Use Amazon Athena to pre-filter the articles based on metadata before retrieval.

D.

Use Amazon Bedrock Provisioned Throughput to process queries more efficiently.

Full Access
Question # 6

An ML engineer needs to use AWS services to identify and extract meaningful unique keywords from documents.

Which solution will meet these requirements with the LEAST operational overhead?

A.

Use the Natural Language Toolkit (NLTK) library on Amazon EC2 instances for text pre-processing. Use the Latent Dirichlet Allocation (LDA) algorithm to identify and extract relevant keywords.

B.

Use Amazon SageMaker and the BlazingText algorithm. Apply custom pre-processing steps for stemming and removal of stop words. Calculate term frequency-inverse document frequency (TF-IDF) scores to identify and extract relevant keywords.

C.

Store the documents in an Amazon S3 bucket. Create AWS Lambda functions to process the documents and to run Python scripts for stemming and removal of stop words. Use bigram and trigram techniques to identify and extract relevant keywords.

D.

Use Amazon Comprehend custom entity recognition and key phrase extraction to identify and extract relevant keywords.

Full Access
Question # 7

An ML engineer wants to run a training job on Amazon SageMaker AI. The training job will train a neural network by using multiple GPUs. The training dataset is stored in Parquet format.

The ML engineer discovered that the Parquet dataset contains files too large to fit into the memory of the SageMaker AI training instances.

Which solution will fix the memory problem?

A.

Attach an Amazon Elastic Block Store (Amazon EBS) Provisioned IOPS SSD volume to the instance. Store the files in the EBS volume.

B.

Repartition the Parquet files by using Apache Spark on Amazon EMR. Use the repartitioned files for the training job.

C.

Change the instance type to Memory Optimized instances with sufficient memory for the training job.

D.

Use the SageMaker AI distributed data parallelism (SMDDP) library with multiple instances to split the memory usage.

Full Access
Question # 8

An ML engineer is configuring auto scaling for an inference component of a model that runs behind an Amazon SageMaker AI endpoint. The ML engineer configures SageMaker AI auto scaling with a target tracking scaling policy set to 100 invocations per model per minute. The SageMaker AI endpoint scales appropriately during normal business hours. However, the ML engineer notices that at the start of each business day, there are zero instances available to handle requests, which causes delays in processing.

The ML engineer must ensure that the SageMaker AI endpoint can handle incoming requests at the start of each business day.

Which solution will meet this requirement?

A.

Reduce the SageMaker AI auto scaling cooldown period to the minimum supported value. Add an auto scaling lifecycle hook to scale the SageMaker AI instances.

B.

Change the target metric to CPU utilization.

C.

Modify the scaling policy target value to one.

D.

Apply a step scaling policy that scales based on an Amazon CloudWatch alarm. Apply a second CloudWatch alarm and scaling policy to scale the minimum number of instances from zero to one at the start of each business day.

Full Access
Question # 9

An ML engineer is using Amazon SageMaker JumpStart to fine-tune a Llama 3.2 model for text generation. The ML engineer is using an instruction-based fine-tuning method. The model uses 70 billion parameters.

Select the correct fine-tuning term from the following list to match each description. Select each term one time or not at all. (Select THREE.)

• Hyperparameter tuning

• Low-rank adaptation (LoRA)

• Fully Sharded Data Parallel (FSDP)

• Learning rate

• Int8 quantization

Full Access
Question # 10

A company is using an AWS Lambda function to monitor the metrics from an ML model. An ML engineer needs to implement a solution to send an email message when the metrics breach a threshold.

Which solution will meet this requirement?

A.

Log the metrics from the Lambda function to AWS CloudTrail. Configure a CloudTrail trail to send the email message.

B.

Log the metrics from the Lambda function to Amazon CloudFront. Configure an Amazon CloudWatch alarm to send the email message.

C.

Log the metrics from the Lambda function to Amazon CloudWatch. Configure a CloudWatch alarm to send the email message.

D.

Log the metrics from the Lambda function to Amazon CloudWatch. Configure an Amazon CloudFront rule to send the email message.

Full Access
Question # 11

A company is building a deep learning model on Amazon SageMaker. The company uses a large amount of data as the training dataset. The company needs to optimize the model ' s hyperparameters to minimize the loss function on the validation dataset.

Which hyperparameter tuning strategy will accomplish this goal with the LEAST computation time?

A.

Hyperbaric!

B.

Grid search

C.

Bayesian optimization

D.

Random search

Full Access
Question # 12

An ML engineer is building an ML model in Amazon SageMaker AI. The ML engineer needs to load historical data directly from Amazon S3, Amazon Athena, and Snowflake into SageMaker AI.

Which solution will meet this requirement?

A.

Use AWS Glue DataBrew to import the data into SageMaker AI.

B.

Build a pipeline in SageMaker Pipelines to process the data. Use AWS DataSync to load the processed data into SageMaker AI.

C.

Create a feature store in SageMaker Feature Store. Use an Apache Spark connector to Feature Store to access the data.

D.

Use SageMaker Data Wrangler to query and import the data.

Full Access
Question # 13

An ML engineering team has a data processing pipeline that ingests sensor data from IoT devices into an Amazon S3 bucket. The pipeline then processes the data by using AWS Glue extract, transform, and load (ETL) jobs for ML modeling. The team noticed throttling errors in the ETL jobs. The data ingestion process has also been slower than normal.

What is the cause of the problem?

A.

The AWS Glue service quotas have been reached.

B.

The network bandwidth between the IoT devices and the AWS Region is insufficient.

C.

The AWS Glue ETL jobs are not optimized for parallel processing.

D.

The AWS Glue execution role is missing Amazon S3 permissions.

Full Access
Question # 14

An ML engineer is using Amazon SageMaker Canvas to build a custom ML model from an imported dataset. The model must make continuous numeric predictions based on 10 years of data.

Which metric should the ML engineer use to evaluate the model’s performance?

A.

Accuracy

B.

InferenceLatency

C.

Area Under the ROC Curve (AUC)

D.

Root Mean Square Error (RMSE)

Full Access
Question # 15

A company is developing an ML model by using Amazon SageMaker AI. The company must monitor bias in the model and display the results on a dashboard. An ML engineer creates a bias monitoring job.

How should the ML engineer capture bias metrics to display on the dashboard?

A.

Capture AWS CloudTrail metrics from SageMaker Clarify.

B.

Capture Amazon CloudWatch metrics from SageMaker Clarify.

C.

Capture SageMaker Model Monitor metrics from Amazon EventBridge.

D.

Capture SageMaker Model Monitor metrics from Amazon SNS.

Full Access
Question # 16

A company is developing an ML model to predict customer satisfaction. The company needs to use survey feedback and the past satisfaction level of customers to predict the future satisfaction level of customers.

The dataset includes a column named Feedback that contains long text responses. The dataset also includes a column named Satisfaction Level that contains three distinct values for past customer satisfaction: High, Medium, and Low. The company must apply encoding methods to transform the data in each column.

Which solution will meet these requirements?

A.

Apply one-hot encoding to the Feedback column and the Satisfaction Level column.

B.

Apply one-hot encoding to the Feedback column. Apply ordinal encoding to the Satisfaction Level column.

C.

Apply label encoding to the Feedback column. Apply binary encoding to the Satisfaction Level column.

D.

Apply tokenization to the Feedback column. Apply ordinal encoding to the Satisfaction Level column.

Full Access
Question # 17

A company uses Amazon Athena to query a dataset in Amazon S3. The dataset has a target variable that the company wants to predict.

The company needs to use the dataset in a solution to determine if a model can predict the target variable.

Which solution will provide this information with the LEAST development effort?

A.

Create a new model by using Amazon SageMaker Autopilot. Report the model ' s achieved performance.

B.

Implement custom scripts to perform data pre-processing, multiple linear regression, and performance evaluation. Run the scripts on Amazon EC2 instances.

C.

Configure Amazon Macie to analyze the dataset and to create a model. Report the model ' s achieved performance.

D.

Select a model from Amazon Bedrock. Tune the model with the data. Report the model ' s achieved performance.

Full Access
Question # 18

A company is developing an ML model to forecast future values based on time series data. The dataset includes historical measurements collected at regular intervals and categorical features. The model needs to predict future values based on past patterns and trends.

Which algorithm and hyperparameters should the company use to develop the model?

A.

Use the Amazon SageMaker AI XGBoost algorithm. Set the scale_pos_weight hyperparameter to adjust for class imbalance.

B.

Use k-means clustering with k to specify the number of clusters.

C.

Use the Amazon SageMaker AI DeepAR algorithm with matching context length and prediction length hyperparameters.

D.

Use the Amazon SageMaker AI Random Cut Forest (RCF) algorithm with contamination to set the expected proportion of anomalies.

Full Access
Question # 19

A retail company is analyzing customer purchase data to develop personalized product recommendations. The company wants to use Amazon SageMaker Clarify to assess fairness metrics across different customer groups to avoid potential bias in the recommendation system.

The recommendation system needs to identify if certain customer segments are underrepresented in the training data. The company needs to choose a pre-training bias metric in SageMaker Clarify.

Which metric meets these requirements?

A.

Prediction distribution skew

B.

Feature attribution bias

C.

Class imbalance ratio

D.

Model performance gap

Full Access
Question # 20

An ML engineer wants to re-train an XGBoost model at the end of each month. A data team prepares the training data. The training dataset is a few hundred megabytes in size. When the data is ready, the data team stores the data as a new file in an Amazon S3 bucket.

The ML engineer needs a solution to automate this pipeline. The solution must register the new model version in Amazon SageMaker Model Registry within 24 hours.

Which solution will meet these requirements?

A.

Create an AWS Lambda function that runs one time each week to poll the S3 bucket for new files. Invoke the Lambda function asynchronously. Configure the Lambda function to start the pipeline if the function detects new data.

B.

Create an Amazon CloudWatch rule that runs on a schedule to start the pipeline every 30 days.

C.

Create an S3 Lifecycle rule to start the pipeline every time a new object is uploaded to the S3 bucket.

D.

Create an Amazon EventBridge rule to start an AWS Step Functions TrainingStep every time a new object is uploaded to the S3 bucket.

Full Access
Question # 21

A company ' s dataset for prediction analytics contains duplicate records, missing data, and unusually extreme high or low values. The company needs a solution to resolve the data quality issues quickly. The solution must maintain data integrity and have the LEAST operational overhead.

Which solution will meet these requirements?

A.

Use AWS Glue DataBrew to delete duplicate records, fill missing values with medians, and replace extreme values with values in a normal range.

B.

Configure an AWS Glue job to identify records with missing values and extreme measurements and delete them.

C.

Create an Amazon EMR Spark job to replace missing values with zeros and merge duplicate records.

D.

Use Amazon SageMaker Data Wrangler to delete duplicates, apply statistical modeling for missing values, and apply outlier detection algorithms.

Full Access
Question # 22

An ML engineer normalized training data by using min-max normalization in AWS Glue DataBrew. The ML engineer must normalize production inference data in the same way before passing the data to the model.

Which solution will meet this requirement?

A.

Apply statistics from a well-known dataset to normalize the production samples.

B.

Keep the min-max normalization statistics from the training set and use them to normalize the production samples.

C.

Calculate new min-max statistics from a batch of production samples and use them to normalize all production samples.

D.

Calculate new min-max statistics from each production sample and use them to normalize all production samples.

Full Access
Question # 23

A company stores historical data in .csv files in Amazon S3. Only some of the rows and columns in the .csv files are populated. The columns are not labeled. An ML

engineer needs to prepare and store the data so that the company can use the data to train ML models.

Select and order the correct steps from the following list to perform this task. Each step should be selected one time or not at all. (Select and order three.)

• Create an Amazon SageMaker batch transform job for data cleaning and feature engineering.

• Store the resulting data back in Amazon S3.

• Use Amazon Athena to infer the schemas and available columns.

• Use AWS Glue crawlers to infer the schemas and available columns.

• Use AWS Glue DataBrew for data cleaning and feature engineering.

Full Access
Question # 24

A company uses AWS CodePipeline to orchestrate a continuous integration and continuous delivery (CI/CD) pipeline for ML models and applications.

Select and order the steps from the following list to describe a CI/CD process for a successful deployment. Select each step one time. (Select and order FIVE.)

. CodePipeline deploys ML models and applications to production.

· CodePipeline detects code changes and starts to build automatically.

. Human approval is provided after testing is successful.

. The company builds and deploys ML models and applications to staging servers for testing.

. The company commits code changes or new training datasets to a Git repository.

Full Access
Question # 25

A company has a large, unstructured dataset. The dataset includes many duplicate records across several key attributes.

Which solution on AWS will detect duplicates in the dataset with the LEAST code development?

A.

Use Amazon Mechanical Turk jobs to detect duplicates.

B.

Use Amazon QuickSight ML Insights to build a custom deduplication model.

C.

Use Amazon SageMaker Data Wrangler to pre-process and detect duplicates.

D.

Use the AWS Glue FindMatches transform to detect duplicates.

Full Access
Question # 26

An ML engineer is preparing a dataset that contains medical records to train an ML model to predict the likelihood of patients developing diseases.

The dataset contains columns for patient ID, age, medical conditions, test results, and a " Disease " target column.

How should the ML engineer configure the data to train the model?

A.

Remove the patient ID column.

B.

Remove the age column.

C.

Remove the medical conditions and test results columns.

D.

Remove the " Disease " target column.

Full Access
Question # 27

A company uses an Amazon EMR cluster to run a data ingestion process for an ML model. An ML engineer notices that the processing time is increasing.

Which solution will reduce the processing time MOST cost-effectively?

A.

Use Spot Instances to increase the number of primary nodes.

B.

Use Spot Instances to increase the number of core nodes.

C.

Use Spot Instances to increase the number of task nodes.

D.

Use On-Demand Instances to increase the number of core nodes.

Full Access
Question # 28

A company regularly receives new training data from the vendor of an ML model. The vendor delivers cleaned and prepared data to the company ' s Amazon S3 bucket every 3-4 days.

The company has an Amazon SageMaker pipeline to retrain the model. An ML engineer needs to implement a solution to run the pipeline when new data is uploaded to the S3 bucket.

Which solution will meet these requirements with the LEAST operational effort?

A.

Create an S3 Lifecycle rule to transfer the data to the SageMaker training instance and to initiate training.

B.

Create an AWS Lambda function that scans the S3 bucket. Program the Lambda function to initiate the pipeline when new data is uploaded.

C.

Create an Amazon EventBridge rule that has an event pattern that matches the S3 upload. Configure the pipeline as the target of the rule.

D.

Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate the pipeline when new data is uploaded.

Full Access
Question # 29

Case Study

A company is building a web-based AI application by using Amazon SageMaker. The application will provide the following capabilities and features: ML experimentation, training, a

central model registry, model deployment, and model monitoring.

The application must ensure secure and isolated use of training data during the ML lifecycle. The training data is stored in Amazon S3.

The company is experimenting with consecutive training jobs.

How can the company MINIMIZE infrastructure startup times for these jobs?

A.

Use Managed Spot Training.

B.

Use SageMaker managed warm pools.

C.

Use SageMaker Training Compiler.

D.

Use the SageMaker distributed data parallelism (SMDDP) library.

Full Access
Question # 30

An ML engineer is using a training job to fine-tune a deep learning model in Amazon SageMaker Studio. The ML engineer previously used the same pre-trained model with a similar

dataset. The ML engineer expects vanishing gradient, underutilized GPU, and overfitting problems.

The ML engineer needs to implement a solution to detect these issues and to react in predefined ways when the issues occur. The solution also must provide comprehensive real-time metrics during the training.

Which solution will meet these requirements with the LEAST operational overhead?

A.

Use TensorBoard to monitor the training job. Publish the findings to an Amazon Simple Notification Service (Amazon SNS) topic. Create an AWS Lambda function to consume the findings and to initiate the predefined actions.

B.

Use Amazon CloudWatch default metrics to gain insights about the training job. Use the metrics to invoke an AWS Lambda function to initiate the predefined actions.

C.

Expand the metrics in Amazon CloudWatch to include the gradients in each training step. Use the metrics to invoke an AWS Lambda function to initiate the predefined actions.

D.

Use SageMaker Debugger built-in rules to monitor the training job. Configure the rules to initiate the predefined actions.

Full Access
Question # 31

An ML engineer is using an Amazon SageMaker Studio notebook to train a neural network by creating an estimator. The estimator runs a Python training script that uses Distributed Data Parallel (DDP) on a single instance that has more than one GPU.

The ML engineer discovers that the training script is underutilizing GPU resources. The ML engineer must identify the point in the training script where resource utilization can be optimized.

Which solution will meet this requirement?

A.

Use Amazon CloudWatch metrics to create a report that describes GPU utilization over time.

B.

Add SageMaker Profiler annotations to the training script. Run the script and generate a report from the results.

C.

Use AWS CloudTrail to create a report that describes GPU utilization and GPU memory utilization over time.

D.

Create a default monitor in Amazon SageMaker Model Monitor and suggest a baseline. Generate a report based on the constraints and statistics the monitor generates.

Full Access
Question # 32

An ML engineer needs to deploy a trained model based on a genetic algorithm. Predictions can take several minutes, and requests can include up to 100 MB of data.

Which deployment solution will meet these requirements with the LEAST operational overhead?

A.

Deploy on EC2 Auto Scaling behind an ALB.

B.

Deploy to a SageMaker AI real-time endpoint.

C.

Deploy to a SageMaker AI Asynchronous Inference endpoint.

D.

Deploy to Amazon ECS on EC2.

Full Access
Question # 33

A company has developed a new ML model. The company requires online model validation on 10% of the traffic before the company fully releases the model in production. The company uses an Amazon SageMaker endpoint behind an Application Load Balancer (ALB) to serve the model.

Which solution will set up the required online validation with the LEAST operational overhead?

A.

Use production variants to add the new model to the existing SageMaker endpoint. Set the variant weight to 0.1 for the new model. Monitor the number of invocations by using Amazon CloudWatch.

B.

Use production variants to add the new model to the existing SageMaker endpoint. Set the variant weight to 1 for the new model. Monitor the number of invocations by using Amazon CloudWatch.

C.

Create a new SageMaker endpoint. Use production variants to add the new model to the new endpoint. Monitor the number of invocations by using Amazon CloudWatch.

D.

Configure the ALB to route 10% of the traffic to the new model at the existing SageMaker endpoint. Monitor the number of invocations by using AWS CloudTrail.

Full Access
Question # 34

A company has an ML model that needs to run one time each night to predict stock values. The model input is 3 MB of data that is collected during the current day. The model produces the predictions for the next day. The prediction process takes less than 1 minute to finish running.

How should the company deploy the model on Amazon SageMaker to meet these requirements?

A.

Use a multi-model serverless endpoint. Enable caching.

B.

Use an asynchronous inference endpoint. Set the InitialInstanceCount parameter to 0.

C.

Use a real-time endpoint. Configure an auto scaling policy to scale the model to 0 when the model is not in use.

D.

Use a serverless inference endpoint. Set the MaxConcurrency parameter to 1.

Full Access
Question # 35

A company is developing an ML model for a customer. The training data is stored in an Amazon S3 bucket in the customer ' s AWS account (Account A). The company runs Amazon SageMaker AI training jobs in a separate AWS account (Account B).

The company defines an S3 bucket policy and an IAM policy to allow reads to the S3 bucket.

Which additional steps will meet the cross-account access requirement?

A.

Create the S3 bucket policy in Account A. Attach the IAM policy to an IAM role that SageMaker AI uses in Account A.

B.

Create the S3 bucket policy in Account A. Attach the IAM policy to an IAM role that SageMaker AI uses in Account B.

C.

Create the S3 bucket policy in Account B. Attach the IAM policy to an IAM role that SageMaker AI uses in Account A.

D.

Create the S3 bucket policy in Account B. Attach the IAM policy to an IAM role that SageMaker AI uses in Account B.

Full Access
Question # 36

An ML engineer wants to use Amazon SageMaker Data Wrangler to perform preprocessing on a dataset. The ML engineer wants to use the processed dataset to train a classification model. During preprocessing, the ML engineer notices that a text feature has a range of thousands of values that differ only by spelling errors. The ML engineer needs to apply an encoding method so that after preprocessing is complete, the text feature can be used to train the model.

Which solution will meet these requirements?

A.

Perform ordinal encoding to represent categories of the feature.

B.

Perform similarity encoding to represent categories of the feature.

C.

Perform one-hot encoding to represent categories of the feature.

D.

Perform target encoding to represent categories of the feature.

Full Access
Question # 37

An ML engineer is collecting data to train a classification ML model by using Amazon SageMaker AI. The target column can have two possible values: Class A or Class B. The ML engineer wants to ensure that the number of samples for both Class A and Class B are balanced, without losing any existing training data. The ML engineer must test the balance of the training data.

Which solution will meet this requirement?

A.

Use SageMaker Clarify to check for class imbalance (CI). If the value is equal to 0, then use random undersampling in SageMaker Data Wrangler to balance the classes.

B.

Use SageMaker Clarify to check for class imbalance (CI). If the value is greater than 0, then use synthetic minority oversampling technique (SMOTE) in SageMaker Data Wrangler to balance the classes.

C.

Use SageMaker JumpStart to generate a class imbalance (CI) report. If the value is greater than 0, then use random undersampling in SageMaker Studio to balance the classes.

D.

Use SageMaker JumpStart to generate a class imbalance (CI) report. If the value is equal to 0, then use synthetic minority oversampling technique (SMOTE) in SageMaker Studio to balance the classes.

Full Access
Question # 38

A company uses Amazon SageMakerAI to support ML workflows such as model training and deployment.

Select the correct registry from the following list to meet the requirements for each use case with the LEAST operational overhead. Each registry should be selected one or more times. (Select FOUR.)

• Amazon Elastic Container Registry (Amazon ECR)

• SageMaker Model Registry

Full Access
Question # 39

A company stores training data as a .csv file in an Amazon S3 bucket. The company must encrypt the data and must control which applications have access to the encryption key.

Which solution will meet these requirements?

A.

Create a new SSH access key and use the AWS Encryption CLI to encrypt the file.

B.

Create a new API key by using Amazon API Gateway and use it to encrypt the file.

C.

Create a new IAM role with permissions for kms:GenerateDataKey and use the role to encrypt the file.

D.

Create a new AWS Key Management Service (AWS KMS) key and use the AWS Encryption CLI with the KMS key to encrypt the file.

Full Access
Question # 40

An ML engineer at a credit card company built and deployed an ML model by using Amazon SageMaker AI. The model was trained on transaction data that contained very few fraudulent transactions. After deployment, the model is underperforming.

What should the ML engineer do to improve the model’s performance?

A.

Retrain the model with a different SageMaker built-in algorithm.

B.

Use random undersampling to reduce the majority class and retrain the model.

C.

Use Synthetic Minority Oversampling Technique (SMOTE) to generate synthetic minority samples and retrain the model.

D.

Use random oversampling to duplicate minority samples and retrain the model.

Full Access
Question # 41

Case Study

A company is building a web-based AI application by using Amazon SageMaker. The application will provide the following capabilities and features: ML experimentation, training, a

central model registry, model deployment, and model monitoring.

The application must ensure secure and isolated use of training data during the ML lifecycle. The training data is stored in Amazon S3.

The company must implement a manual approval-based workflow to ensure that only approved models can be deployed to production endpoints.

Which solution will meet this requirement?

A.

Use SageMaker Experiments to facilitate the approval process during model registration.

B.

Use SageMaker ML Lineage Tracking on the central model registry. Create tracking entities for the approval process.

C.

Use SageMaker Model Monitor to evaluate the performance of the model and to manage the approval.

D.

Use SageMaker Pipelines. When a model version is registered, use the AWS SDK to change the approval status to " Approved. "

Full Access
Question # 42

A recommendation model uses ML and calls an Amazon SageMaker AI endpoint to get recommendations. An ML engineer must ensure that the model stays available during an expected increase in user traffic.

Which solution will meet these requirements?

A.

Configure auto scaling on the SageMaker AI endpoint.

B.

Create a new SageMaker AI endpoint. Deploy the model to the new endpoint.

C.

Use SageMaker Neo to optimize the model for inference.

D.

Attach an Auto Scaling group to the SageMaker AI endpoint.

Full Access
Question # 43

A company is using Amazon SageMaker and millions of files to train an ML model. Each file is several megabytes in size. The files are stored in an Amazon S3 bucket. The company needs to improve training performance.

Which solution will meet these requirements in the LEAST amount of time?

A.

Transfer the data to a new S3 bucket that provides S3 Express One Zone storage. Adjust the training job to use the new S3 bucket.

B.

Create an Amazon FSx for Lustre file system. Link the file system to the existing S3 bucket. Adjust the training job to read from the file system.

C.

Create an Amazon Elastic File System (Amazon EFS) file system. Transfer the existing data to the file system. Adjust the training job to read from the file system.

D.

Create an Amazon ElastiCache (Redis OSS) cluster. Link the Redis OSS cluster to the existing S3 bucket. Stream the data from the Redis OSS cluster directly to the training job.

Full Access
Question # 44

A company wants to use large language models (LLMs) supported by Amazon Bedrock to develop a chat interface for internal technical documentation.

The documentation consists of dozens of text files totaling several megabytes and is updated frequently.

Which solution will meet these requirements MOST cost-effectively?

A.

Train a new LLM in Amazon Bedrock using the documentation.

B.

Use Amazon Bedrock guardrails to integrate documentation.

C.

Fine-tune an LLM in Amazon Bedrock with the documentation.

D.

Upload the documentation to an Amazon Bedrock knowledge base and use it as context during inference.

Full Access
Question # 45

A company ' s ML engineer is creating a classification model. The ML engineer explores the dataset and notices a column named day_of_week. The column contains the following values: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday.

Which technique should the ML engineer use to convert this column’s data to binary values?

A.

Binary encoding

B.

Label encoding

C.

One-hot encoding

D.

Tokenization

Full Access
Question # 46

A hospital is using an ML model to validate x-ray results. The hospital runs a nightly batch inference job. The hospital needs to produce a daily report about model data quality and model performance.

Which solution will meet these requirements?

A.

Schedule a monitoring job in Amazon SageMaker Model Monitor. Generate the monitoring results for the model and data.

B.

Create an Amazon CloudWatch dashboard that includes the metrics for processing steps in the nightly batch inference job. Compare the baseline resource metrics. Share the dashboard link.

C.

Use AWS Glue DataBrew to create a custom recipe job that uses the Numerical Statistics data quality check for the model file. Generate the results.

D.

Create a SageMaker AI pipeline that includes a QualityCheck step to run monitoring jobs. Generate the monitoring results for the model and the data.

Full Access
Question # 47

A company plans to use Amazon SageMaker AI to build image classification models. The company has 6 TB of training data stored on Amazon FSx for NetApp ONTAP. The file system is in the same VPC as SageMaker AI.

An ML engineer must make the training data accessible to SageMaker AI training jobs.

Which solution will meet these requirements?

A.

Mount the FSx for ONTAP file system as a volume to the SageMaker AI instance.

B.

Create an Amazon S3 bucket and use Mountpoint for Amazon S3 to link the bucket to FSx for ONTAP.

C.

Create a catalog connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

D.

Create a direct connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

Full Access
Question # 48

A company is using Amazon SageMaker AI to build an ML model to predict customer behavior. The company needs to explain the bias in the model to an auditor. The explanation must focus on demographic data of the customers.

Which solution will meet these requirements?

A.

Use SageMaker Clarify to generate a bias report. Send the report to the auditor.

B.

Use AWS Glue DataBrew to create a job to detect drift in the model ' s data quality. Send the job output to the auditor.

C.

Use Amazon QuickSight integration with SageMaker AI to generate a bias report. Send the report to the auditor.

D.

Use Amazon CloudWatch metrics from the SageMaker AI namespace to create a bias dashboard. Share the dashboard with the auditor.

Full Access
Question # 49

An ML engineer needs to run intensive model training jobs each month that can take 48–72 hours. The jobs can be interrupted and resumed. The engineer has a fixed budget and needs the most cost-effective compute option.

Which solution will meet these requirements?

A.

Purchase Reserved Instances with partial upfront payment.

B.

Purchase On-Demand Instances.

C.

Purchase SageMaker AI Savings Plans.

D.

Purchase Spot Instances that use automated checkpoints.

Full Access
Question # 50

An ML engineer needs to use AWS CloudFormation to create an ML model that an Amazon SageMaker endpoint will host.

Which resource should the ML engineer declare in the CloudFormation template to meet this requirement?

A.

AWS::SageMaker::Model

B.

AWS::SageMaker::Endpoint

C.

AWS::SageMaker::NotebookInstance

D.

AWS::SageMaker::Pipeline

Full Access
Question # 51

A company launches a feature that predicts home prices. An ML engineer trained a regression model using the SageMaker AI XGBoost algorithm. The model performs well on training data but underperforms on real-world validation data.

Which solution will improve the validation score with the LEAST implementation effort?

A.

Create a larger training dataset with more real-world data and retrain.

B.

Increase the num_round hyperparameter.

C.

Change the eval_metric from RMSE to Error.

D.

Increase the lambda hyperparameter.

Full Access
Question # 52

A construction company is using Amazon SageMaker AI to train specialized custom object detection models to identify road damage. The company uses images from multiple cameras. The images are stored as JPEG objects in an Amazon S3 bucket.

The images need to be pre-processed by using computationally intensive computer vision techniques before the images can be used in the training job. The company needs to optimize data loading and pre-processing in the training job. The solution cannot affect model performance or increase compute or storage resources.

Which solution will meet these requirements?

A.

Use SageMaker AI file mode to load and process the images in batches.

B.

Reduce the batch size of the model and increase the number of pre-processing threads.

C.

Reduce the quality of the training images in the S3 bucket.

D.

Convert the images into RecordIO format and use the lazy loading pattern.

Full Access
Question # 53

A hospital wants to predict patient outcomes for the coming year An ML engineer must improve several existing ML models that currently perform poorly.

Select the correct regularization method from the following list to improve each model Select each regularization method one time, more than one time, or not at all. (Select THREE.)

• L1 regularization

• L2 regularization

• Early stopping

Full Access
Question # 54

A company is developing a generative AI conversational interface to assist customers with payments. The company wants to use an ML solution to detect customer intent. The company does not have training data to train a model.

Which solution will meet these requirements?

A.

Fine-tune a sequence-to-sequence (seq2seq) algorithm in Amazon SageMaker JumpStart.

B.

Use an LLM from Amazon Bedrock with zero-shot learning.

C.

Use the Amazon Comprehend DetectEntities API.

D.

Run an LLM from Amazon Bedrock on Amazon EC2 instances.

Full Access
Question # 55

A company is using an Amazon Redshift database as its single data source. Some of the data is sensitive.

A data scientist needs to use some of the sensitive data from the database. An ML engineer must give the data scientist access to the data without transforming the source data and without storing anonymized data in the database.

Which solution will meet these requirements with the LEAST implementation effort?

A.

Configure dynamic data masking policies to control how sensitive data is shared with the data scientist at query time.

B.

Create a materialized view with masking logic on top of the database. Grant the necessary read permissions to the data scientist.

C.

Unload the Amazon Redshift data to Amazon S3. Use Amazon Athena to create schema-on-read with masking logic. Share the view with the data scientist.

D.

Unload the Amazon Redshift data to Amazon S3. Create an AWS Glue job to anonymize the data. Share the dataset with the data scientist.

Full Access
Question # 56

A company wants to build an anomaly detection ML model. The model will use large-scale tabular data that is stored in an Amazon S3 bucket. The company does not have expertise in Python, Spark, or other languages for ML.

An ML engineer needs to transform and prepare the data for ML model training.

Which solution will meet these requirements?

A.

Prepare the data by using Amazon EMR Serverless applications that host Amazon SageMaker Studio notebooks.

B.

Prepare the data by using the Amazon SageMaker Data Wrangler visual interface in Amazon SageMaker Canvas.

C.

Run SQL queries from a JupyterLab space in Amazon SageMaker Studio. Process the data further by using pandas DataFrames.

D.

Prepare the data by using a JupyterLab notebook in Amazon SageMaker Studio.

Full Access
Question # 57

A company is using Amazon SageMaker to create ML models. The company ' s data scientists need fine-grained control of the ML workflows that they orchestrate. The data scientists also need the ability to visualize SageMaker jobs and workflows as a directed acyclic graph (DAG). The data scientists must keep a running history of model discovery experiments and must establish model governance for auditing and compliance verifications.

Which solution will meet these requirements?

A.

Use AWS CodePipeline and its integration with SageMaker Studio to manage the entire ML workflows. Use SageMaker ML Lineage Tracking for the running history of experiments and for auditing and compliance verifications.

B.

Use AWS CodePipeline and its integration with SageMaker Experiments to manage the entire ML workflows. Use SageMaker Experiments for the running history of experiments and for auditing and compliance verifications.

C.

Use SageMaker Pipelines and its integration with SageMaker Studio to manage the entire ML workflows. Use SageMaker ML Lineage Tracking for the running history of experiments and for auditing and compliance verifications.

D.

Use SageMaker Pipelines and its integration with SageMaker Experiments to manage the entire ML workflows. Use SageMaker Experiments for the running history of experiments and for auditing and compliance verifications.

Full Access
Question # 58

A company wants to improve the sustainability of its ML operations.

Which actions will reduce the energy usage and computational resources that are associated with the company ' s training jobs? (Choose two.)

A.

Use Amazon SageMaker Debugger to stop training jobs when non-converging conditions are detected.

B.

Use Amazon SageMaker Ground Truth for data labeling.

C.

Deploy models by using AWS Lambda functions.

D.

Use AWS Trainium instances for training.

E.

Use PyTorch or TensorFlow with the distributed training option.

Full Access
Question # 59

A company has an ML model in Amazon SageMaker AI. An ML engineer needs to implement a monitoring solution to automatically detect changes in the input data distribution of model features.

Which solution will meet this requirement with the LEAST operational overhead?

A.

Configure SageMaker Model Monitor. Establish a data quality baseline. Ensure that the emit_metrics option is enabled in the baseline constraints file. Configure an Amazon CloudWatch alarm to notify the company about changes in specific metrics that are related to data quality.

B.

Configure SageMaker Model Monitor. Establish a model quality baseline. Ensure that the comparison_method option is set to Robust in the baseline constraints file. Configure an Amazon CloudWatch alarm to notify the company about changes in model quality metrics.

C.

Use SageMaker Debugger with custom rules to track shifts in feature distributions. Configure Amazon CloudWatch alarms to notify the company when the rules detect significant changes.

D.

Use Amazon CloudWatch to directly observe the SageMaker AI endpoint ' s performance metrics. Manually analyze the CloudWatch logs for indicators of data drift or shifts in feature distribution.

Full Access
Question # 60

A company needs to run a batch data-processing job on Amazon EC2 instances. The job will run during the weekend and will take 90 minutes to finish running. The processing can handle interruptions. The company will run the job every weekend for the next 6 months.

Which EC2 instance purchasing option will meet these requirements MOST cost-effectively?

A.

Spot Instances

B.

Reserved Instances

C.

On-Demand Instances

D.

Dedicated Instances

Full Access
Question # 61

A company has deployed an ML model that detects fraudulent credit card transactions in real time in a banking application. The model uses Amazon SageMaker Asynchronous Inference. Consumers are reporting delays in receiving the inference results.

An ML engineer needs to implement a solution to improve the inference performance. The solution also must provide a notification when a deviation in model quality occurs.

Which solution will meet these requirements?

A.

Use SageMaker real-time inference for inference. Use SageMaker Model Monitor for notifications about model quality.

B.

Use SageMaker batch transform for inference. Use SageMaker Model Monitor for notifications about model quality.

C.

Use SageMaker Serverless Inference for inference. Use SageMaker Inference Recommender for notifications about model quality.

D.

Keep using SageMaker Asynchronous Inference for inference. Use SageMaker Inference Recommender for notifications about model quality.

Full Access
Question # 62

An ML engineer is setting up an Amazon SageMaker AI pipeline for an ML model. The pipeline must automatically initiate a re-training job if any data drift is detected.

How should the ML engineer set up the pipeline to meet this requirement?

A.

Use an AWS Glue crawler and an AWS Glue extract, transform, and load (ETL) job to detect data drift. Use AWS Glue triggers to automate the re-training job.

B.

Use Amazon Managed Service for Apache Flink to detect data drift. Use an AWS Lambda function to automate the re-training job.

C.

Use SageMaker Model Monitor to detect data drift. Use an AWS Lambda function to automate the re-training job.

D.

Use Amazon Quick Suite (previously known as Amazon QuickSight) anomaly detection to detect data drift. Use an AWS Step Functions workflow to automate the re-training job.

Full Access
Question # 63

An airline company deploys ML models to one dozen Amazon SageMaker Al inference endpoints. The inference endpoints must be able to handle different types of

workloads in a cost-effective way.

Select the correct inference option from the following list to handle each type of workload. Select each inference option one time. (Select FOUR.)

    Asynchronous inference

    Batch inference

    Real-time inference

    Serverless inference

Full Access
Question # 64

An ML engineer is training a simple neural network model. The model’s performance improves initially and then degrades after a certain number of epochs.

Which solutions will mitigate this problem? (Select TWO.)

A.

Enable early stopping on the model.

B.

Increase dropout in the layers.

C.

Increase the number of layers.

D.

Increase the number of neurons.

E.

Investigate and reduce the sources of model bias.

Full Access
Question # 65

A company uses the Amazon SageMaker AI Object2Vec algorithm to train an ML model. The model performs well on training data but underperforms after deployment. The company wants to avoid overfitting the model and maintain the model ' s ability to generalize.

Which solution will meet these requirements?

A.

Decrease the early_stopping_patience hyperparameter.

B.

Increase the mini_batch_size hyperparameter.

C.

Decrease the dropout rate.

D.

Increase the number of epochs.

Full Access
Question # 66

A company uses Amazon SageMaker AI to create ML models. The data scientists need fine-grained control of ML workflows, DAG visualization, experiment history, and model governance for auditing and compliance.

Which solution will meet these requirements?

A.

Use AWS CodePipeline with SageMaker Studio and SageMaker ML Lineage Tracking.

B.

Use AWS CodePipeline with SageMaker Experiments.

C.

Use SageMaker Pipelines with SageMaker Studio and SageMaker ML Lineage Tracking.

D.

Use SageMaker Pipelines with SageMaker Experiments.

Full Access
Question # 67

A company is developing an internal cost-estimation tool that uses an ML model in Amazon SageMaker AI. Users upload high-resolution images to the tool.

The model must process each image and predict the cost of the object in the image. The model also must notify the user when processing is complete.

Which solution will meet these requirements?

A.

Store the images in an Amazon S3 bucket. Deploy the model on SageMaker AI. Use batch transform jobs for model inference. Use an Amazon Simple Queue Service (Amazon SQS) queue to notify users.

B.

Store the images in an Amazon S3 bucket. Deploy the model on SageMaker AI. Use an asynchronous inference strategy for model inference. Use an Amazon Simple Notification Service (Amazon SNS) topic to notify users.

C.

Store the images in an Amazon Elastic File System (Amazon EFS) file system. Deploy the model on SageMaker AI. Use batch transform jobs for model inference. Use an Amazon Simple Queue Service (Amazon SQS) queue to notify users.

D.

Store the images in an Amazon Elastic File System (Amazon EFS) file system. Deploy the model on SageMaker AI. Use an asynchronous inference strategy for model inference. Use an Amazon Simple Notification Service (Amazon SNS) topic to notify users.

Full Access
Question # 68

A company has trained an ML model that is packaged in a container. The company will integrate the model with an existing Python web application. The company needs to host the model on AWS by using Kubernetes.

The company does not want to manage the control plane and must provision the resources in a repeatable manner. The infrastructure must be provisioned by using Python.

Which solution will meet these requirements?

A.

Use AWS CloudFormation to provision Amazon EC2 instances in multiple Availability Zones. Set up a Kubernetes cluster. Host the model container on the Kubernetes cluster.

B.

Use the AWS CLI to provision an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. Store the image in an Amazon Elastic Container Registry (Amazon ECR) repository. Host the model container on the EKS cluster.

C.

Use the AWS Cloud Development Kit (AWS CDK) to provision an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. Store the image in an Amazon Elastic Container Registry (Amazon ECR) repository. Host the model container on the EKS cluster.

D.

Use AWS CloudFormation to provision an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. Store the image in an Amazon Elastic Container Registry (Amazon ECR) repository. Host the model container on the EKS cluster.

Full Access
Question # 69

An ML engineer is building a model to predict house and apartment prices. The model uses three features: Square Meters, Price, and Age of Building. The dataset has 10,000 data rows. The data includes data points for one large mansion and one extremely small apartment.

The ML engineer must perform preprocessing on the dataset to ensure that the model produces accurate predictions for the typical house or apartment.

Which solution will meet these requirements?

A.

Remove the outliers and perform a log transformation on the Square Meters variable.

B.

Keep the outliers and perform normalization on the Square Meters variable.

C.

Remove the outliers and perform one-hot encoding on the Square Meters variable.

D.

Keep the outliers and perform one-hot encoding on the Square Meters variable.

Full Access
Question # 70

A company uses a hybrid cloud environment. A model that is deployed on premises uses data in Amazon 53 to provide customers with a live conversational engine.

The model is using sensitive data. An ML engineer needs to implement a solution to identify and remove the sensitive data.

Which solution will meet these requirements with the LEAST operational overhead?

A.

Deploy the model on Amazon SageMaker. Create a set of AWS Lambda functions to identify and remove the sensitive data.

B.

Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster that uses AWS Fargate. Create an AWS Batch job to identify and remove the sensitive data.

C.

Use Amazon Macie to identify the sensitive data. Create a set of AWS Lambda functions to remove the sensitive data.

D.

Use Amazon Comprehend to identify the sensitive data. Launch Amazon EC2 instances to remove the sensitive data.

Full Access
Question # 71

An ML engineer trained an ML model on Amazon SageMaker to detect automobile accidents from dosed-circuit TV footage. The ML engineer used SageMaker Data Wrangler to create a training dataset of images of accidents and non-accidents.

The model performed well during training and validation. However, the model is underperforming in production because of variations in the quality of the images from various cameras.

Which solution will improve the model ' s accuracy in the LEAST amount of time?

A.

Collect more images from all the cameras. Use Data Wrangler to prepare a new training dataset.

B.

Recreate the training dataset by using the Data Wrangler corrupt image transform. Specify the impulse noise option.

C.

Recreate the training dataset by using the Data Wrangler enhance image contrast transform. Specify the Gamma contrast option.

D.

Recreate the training dataset by using the Data Wrangler resize image transform. Crop all images to the same size.

Full Access
Question # 72

A company must install a custom script on any newly created Amazon SageMaker AI notebook instances.

Which solution will meet this requirement with the LEAST operational overhead?

A.

Create a lifecycle configuration script to install the custom script when a new SageMaker AI notebook is created. Attach the lifecycle configuration to every new SageMaker AI notebook as part of the creation steps.

B.

Create a custom Amazon Elastic Container Registry (Amazon ECR) image that contains the custom script. Push the ECR image to a Docker registry. Attach the Docker image to a SageMaker Studio domain. Select the kernel to run as part of the SageMaker AI notebook.

C.

Create a custom package index repository. Use AWS CodeArtifact to manage the installation of the custom script. Set up AWS PrivateLink endpoints to connect CodeArtifact to the SageMaker AI instance. Install the script.

D.

Store the custom script in Amazon S3. Create an AWS Lambda function to install the custom script on new SageMaker AI notebooks. Configure Amazon EventBridge to invoke the Lambda function when a new SageMaker AI notebook is initialized.

Full Access