Spring Sale Special - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: mxmas70

Home > Google > Machine Learning Engineer > Professional-Machine-Learning-Engineer

Professional-Machine-Learning-Engineer Google Professional Machine Learning Engineer Question and Answers

Question # 4

You work with a data engineering team that has developed a pipeline to clean your dataset and save it in a Cloud Storage bucket. You have created an ML model and want to use the data to refresh your model as soon as new data is available. As part of your CI/CD workflow, you want to automatically run a Kubeflow Pipelines training job on Google Kubernetes Engine (GKE). How should you architect this workflow?

A.

Configure your pipeline with Dataflow, which saves the files in Cloud Storage After the file is saved, start the training job on a GKE cluster

B.

Use App Engine to create a lightweight python client that continuously polls Cloud Storage for new files As soon as a file arrives, initiate the training job

C.

Configure a Cloud Storage trigger to send a message to a Pub/Sub topic when a new file is available in a storage bucket. Use a Pub/Sub-triggered Cloud Function to start the training job on a GKE cluster

D.

Use Cloud Scheduler to schedule jobs at a regular interval. For the first step of the job. check the timestamp of objects in your Cloud Storage bucket If there are no new files since the last run, abort the job.

Full Access
Question # 5

You are developing a model to help your company create more targeted online advertising campaigns. You need to create a dataset that you will use to train the model. You want to avoid creating or reinforcing unfair bias in the model. What should you do?

Choose 2 answers

A.

Include a comprehensive set of demographic features.

B.

include only the demographic groups that most frequently interact with advertisements.

C.

Collect a random sample of production traffic to build the training dataset.

D.

Collect a stratified sample of production traffic to build the training dataset.

E.

Conduct fairness tests across sensitive categories and demographics on the trained model.

Full Access
Question # 6

You have developed an AutoML tabular classification model that identifies high-value customers who interact with your organization ' s website.

You plan to deploy the model to a new Vertex Al endpoint that will integrate with your website application. You expect higher traffic to the website during

nights and weekends. You need to configure the model endpoint ' s deployment settings to minimize latency and cost. What should you do?

A.

Configure the model deployment settings to use an n1-standard-32 machine type.

B.

Configure the model deployment settings to use an n1-standard-4 machine type. Set the minReplicaCount value to 1 and the maxReplicaCount value to 8.

C.

Configure the model deployment settings to use an n1-standard-4 machine type and a GPU accelerator. Set the minReplicaCount value to 1 and the maxReplicaCount value to 4.

D.

Configure the model deployment settings to use an n1-standard-8 machine type and a GPU accelerator.

Full Access
Question # 7

You work at a gaming startup that has several terabytes of structured data in Cloud Storage. This data includes gameplay time data, user metadata, and game metadata. You want to build a model that recommends new games to users that requires the least amount of coding. What should you do?

A.

Load the data in BigQuery. Use BigQuery ML to train an Autoencoder model.

B.

Load the data in BigQuery. Use BigQuery ML to train a matrix factorization model.

C.

Read data to a Vertex Al Workbench notebook. Use TensorFlow to train a two-tower model.

D.

Read data to a Vertex Al Workbench notebook. Use TensorFlow to train a matrix factorization model.

Full Access
Question # 8

You developed a Transformer model in TensorFlow to translate text Your training data includes millions of documents in a Cloud Storage bucket. You plan to use distributed training to reduce training time. You need to configure the training job while minimizing the effort required to modify code and to manage the clusters configuration. What should you do?

A.

Create a Vertex Al custom training job with GPU accelerators for the second worker pool Use tf .distribute.MultiWorkerMirroredStrategy for distribution.

B.

Create a Vertex Al custom distributed training job with Reduction Server Use N1 high-memory machine type instances for the first and second pools, and use N1 high-CPU machine type instances for the third worker pool.

C.

Create a training job that uses Cloud TPU VMs Use tf.distribute.TPUStrategy for distribution.

D.

Create a Vertex Al custom training job with a single worker pool of A2 GPU machine type instances Use tf .distribute.MirroredStraregy for distribution.

Full Access
Question # 9

You manage a team of data scientists who use a cloud-based backend system to submit training jobs. This system has become very difficult to administer, and you want to use a managed service instead. The data scientists you work with use many different frameworks, including Keras, PyTorch, theano. Scikit-team, and custom libraries. What should you do?

A.

Use the Al Platform custom containers feature to receive training jobs using any framework

B.

Configure Kubeflow to run on Google Kubernetes Engine and receive training jobs through TFJob

C.

Create a library of VM images on Compute Engine; and publish these images on a centralized repository

D.

Set up Slurm workload manager to receive jobs that can be scheduled to run on your cloud infrastructure.

Full Access
Question # 10

Your company manages an application that aggregates news articles from many different online sources and sends them to users. You need to build a recommendation model that will suggest articles to readers that are similar to the articles they are currently reading. Which approach should you use?

A.

Create a collaborative filtering system that recommends articles to a user based on the user’s past behavior.

B.

Encode all articles into vectors using word2vec, and build a model that returns articles based on vector similarity.

C.

Build a logistic regression model for each user that predicts whether an article should be recommended to a user.

D.

Manually label a few hundred articles, and then train an SVM classifier based on the manually classified articles that categorizes additional articles into their respective categories.

Full Access
Question # 11

You work for a retail company. You have a managed tabular dataset in Vertex Al that contains sales data from three different stores. The dataset includes several features such as store name and sale timestamp. You want to use the data to train a model that makes sales predictions for a new store that will open soon You need to split the data between the training, validation, and test sets What approach should you use to split the data?

A.

Use Vertex Al manual split, using the store name feature to assign one store for each set.

B.

Use Vertex Al default data split.

C.

Use Vertex Al chronological split and specify the sales timestamp feature as the time vanable.

D.

Use Vertex Al random split assigning 70% of the rows to the training set, 10% to the validation set, and 20% to the test set.

Full Access
Question # 12

You are training an ML model using data stored in BigQuery that contains several values that are considered Personally Identifiable Information (Pll). You need to reduce the sensitivity of the dataset before training your model. Every column is critical to your model. How should you proceed?

A.

Using Dataflow, ingest the columns with sensitive data from BigQuery, and then randomize the values in each sensitive column.

B.

Use the Cloud Data Loss Prevention (DLP) API to scan for sensitive data, and use Dataflow with the DLP API to encrypt sensitive values with Format Preserving Encryption

C.

Use the Cloud Data Loss Prevention (DLP) API to scan for sensitive data, and use Dataflow to replace all sensitive data by using the encryption algorithm AES-256 with a salt.

D.

Before training, use BigQuery to select only the columns that do not contain sensitive data Create an authorized view of the data so that sensitive values cannot be accessed by unauthorized individuals.

Full Access
Question # 13

You work for a company that sells corporate electronic products to thousands of businesses worldwide. Your company stores historical customer data in BigQuery. You need to build a model that predicts customer lifetime value over the next three years. You want to use the simplest approach to build the model. What should you do?

A.

Access BigQuery Studio in the Google Cloud console. Run the create model statement in the SQL editor to create an ARIMA model.

B.

Create a Vertex Al Workbench notebook. Use IPython magic to run the create model statement to create an ARIMA model.

C.

Access BigQuery Studio in the Google Cloud console. Run the create model statement in the SQL editor to create an AutoML regression model.

D.

Create a Vertex Al Workbench notebook. Use IPython magic to run the create model statement to create an AutoML regression model.

Full Access
Question # 14

You are developing a custom TensorFlow classification model based on tabular data. Your raw data is stored in BigQuery contains hundreds of millions of rows, and includes both categorical and numerical features. You need to use a MaxMin scaler on some numerical features, and apply a one-hot encoding to some categorical features such as SKU names. Your model will be trained over multiple epochs. You want to minimize the effort and cost of your solution. What should you do?

A.

1 Write a SQL query to create a separate lookup table to scale the numerical features.

2. Deploy a TensorFlow-based model from Hugging Face to BigQuery to encode the text features.

3. Feed the resulting BigQuery view into Vertex Al Training.

B.

1 Use BigQuery to scale the numerical features.

2. Feed the features into Vertex Al Training.

3 Allow TensorFlow to perform the one-hot text encoding.

C.

1 Use TFX components with Dataflow to encode the text features and scale the numerical features.

2 Export results to Cloud Storage as TFRecords.

3 Feed the data into Vertex Al Training.

D.

1 Write a SQL query to create a separate lookup table to scale the numerical features.

2 Perform the one-hot text encoding in BigQuery.

3. Feed the resulting BigQuery view into Vertex Al Training.

Full Access
Question # 15

You work for a delivery company. You need to design a system that stores and manages features such as parcels delivered and truck locations over time. The system must retrieve the features with low latency and feed those features into a model for online prediction. The data science team will retrieve historical data at a specific point in time for model training. You want to store the features with minimal effort. What should you do?

A.

Store features in Bigtable as key/value data.

B.

Store features in Vertex Al Feature Store.

C.

Store features as a Vertex Al dataset and use those features to tram the models hosted in Vertex Al endpoints.

D.

Store features in BigQuery timestamp partitioned tables, and use the BigQuery Storage Read API to serve the features.

Full Access
Question # 16

You work for a bank and are building a random forest model for fraud detection. You have a dataset that

includes transactions, of which 1% are identified as fraudulent. Which data transformation strategy would likely improve the performance of your classifier?

A.

Write your data in TFRecords.

B.

Z-normalize all the numeric features.

C.

Oversample the fraudulent transaction 10 times.

D.

Use one-hot encoding on all categorical features.

Full Access
Question # 17

You lead a data science team at a large international corporation. Most of the models your team trains are large-scale models using high-level TensorFlow APIs on AI Platform with GPUs. Your team usually

takes a few weeks or months to iterate on a new version of a model. You were recently asked to review your team’s spending. How should you reduce your Google Cloud compute costs without impacting the model’s performance?

A.

Use AI Platform to run distributed training jobs with checkpoints.

B.

Use AI Platform to run distributed training jobs without checkpoints.

C.

Migrate to training with Kuberflow on Google Kubernetes Engine, and use preemptible VMs with checkpoints.

D.

Migrate to training with Kuberflow on Google Kubernetes Engine, and use preemptible VMs without checkpoints.

Full Access
Question # 18

You work for a large social network service provider whose users post articles and discuss news. Millions of comments are posted online each day, and more than 200 human moderators constantly review comments and flag those that are inappropriate. Your team is building an ML model to help human moderators check content on the platform. The model scores each comment and flags suspicious comments to be reviewed by a human. Which metric(s) should you use to monitor the model’s performance?

A.

Number of messages flagged by the model per minute

B.

Number of messages flagged by the model per minute confirmed as being inappropriate by humans.

C.

Precision and recall estimates based on a random sample of 0.1% of raw messages each minute sent to a human for review

D.

Precision and recall estimates based on a sample of messages flagged by the model as potentially inappropriate each minute

Full Access
Question # 19

You received a training-serving skew alert from a Vertex Al Model Monitoring job running in production. You retrained the model with more recent training data, and deployed it back to the Vertex Al endpoint but you are still receiving the same alert. What should you do?

A.

Update the model monitoring job to use a lower sampling rate.

B.

Update the model monitoring job to use the more recent training data that was used to retrain the model.

C.

Temporarily disable the alert Enable the alert again after a sufficient amount of new production traffic has passed through the Vertex Al endpoint.

D.

Temporarily disable the alert until the model can be retrained again on newer training data Retrain the model again after a sufficient amount of new production traffic has passed through the Vertex Al endpoint

Full Access
Question # 20

You work for a food product company. Your company ' s historical sales data is stored in BigQuery You need to use Vertex Al’s custom training service to train multiple TensorFlow models that read the data from BigQuery and predict future sales You plan to implement a data preprocessing algorithm that performs min-max scaling and bucketing on a large number of features before you start experimenting with the models. You want to minimize preprocessing time, cost and development effort How should you configure this workflow?

A.

Write the transformations into Spark that uses the spark-bigquery-connector and use Dataproc to preprocess the data.

B.

Write SQL queries to transform the data in-place in BigQuery.

C.

Add the transformations as a preprocessing layer in the TensorFlow models.

D.

Create a Dataflow pipeline that uses the BigQuerylO connector to ingest the data process it and write it back to BigQuery.

Full Access
Question # 21

You are designing an ML recommendation model for shoppers on your company ' s ecommerce website. You will use Recommendations Al to build, test, and deploy your system. How should you develop recommendations that increase revenue while following best practices?

A.

Use the " Other Products You May Like " recommendation type to increase the click-through rate

B.

Use the " Frequently Bought Together ' recommendation type to increase the shopping cart size for each order.

C.

Import your user events and then your product catalog to make sure you have the highest quality event stream

D.

Because it will take time to collect and record product data, use placeholder values for the product catalog to test the viability of the model.

Full Access
Question # 22

You have trained a text classification model in TensorFlow using Al Platform. You want to use the trained model for batch predictions on text data stored in BigQuery while minimizing computational overhead. What should you do?

A.

Export the model to BigQuery ML.

B.

Deploy and version the model on Al Platform.

C.

Use Dataflow with the SavedModel to read the data from BigQuery

D.

Submit a batch prediction job on Al Platform that points to the model location in Cloud Storage.

Full Access
Question # 23

While monitoring your model training’s GPU utilization, you discover that you have a native synchronous implementation. The training data is split into multiple files. You want to reduce the execution time of your input pipeline. What should you do?

A.

Increase the CPU load

B.

Add caching to the pipeline

C.

Increase the network bandwidth

D.

Add parallel interleave to the pipeline

Full Access
Question # 24

You are an ML engineer at a bank that has a mobile application. Management has asked you to build an ML-based biometric authentication for the app that verifies a customer ' s identity based on their fingerprint. Fingerprints are considered highly sensitive personal information and cannot be downloaded and stored into the bank databases. Which learning strategy should you recommend to train and deploy this ML model?

A.

Differential privacy

B.

Federated learning

C.

MD5 to encrypt data

D.

Data Loss Prevention API

Full Access
Question # 25

You are building a linear model with over 100 input features, all with values between -1 and 1. You suspect that many features are non-informative. You want to remove the non-informative features from your model while keeping the informative ones in their original form. Which technique should you use?

A.

Use Principal Component Analysis to eliminate the least informative features.

B.

Use L1 regularization to reduce the coefficients of uninformative features to 0.

C.

After building your model, use Shapley values to determine which features are the most informative.

D.

Use an iterative dropout technique to identify which features do not degrade the model when removed.

Full Access
Question # 26

You work for a company that sells corporate electronic products to thousands of businesses worldwide. Your company stores historical customer data in BigQuery. You need to build a model that predicts customer lifetime value over the next three years. You want to use the simplest approach to build the model and you want to have access to visualization tools. What should you do?

A.

Create a Vertex Al Workbench notebook to perform exploratory data analysis. Use IPython magics to create a new BigQuery table with input features Use the BigQuery console to run the create model statement Validate the results by using the ml. evaluate and ml. predict statements.

B.

Run the create model statement from the BigQuery console to create an AutoML model Validate the results by using the ml. evaluate and ml. predict statements.

C.

Create a Vertex Al Workbench notebook to perform exploratory data analysis and create input features Save the features as a CSV file in Cloud Storage Import the CSV file as a new BigQuery table Use the BigQuery console to run the create model statement Validate the results by using the ml. evaluate and ml. predict statements.

D.

Create a Vertex Al Workbench notebook to perform exploratory data analysis Use IPython magics to create a new BigQuery table with input features, create the model and validate the results by using the create model, ml. evaluates, and ml. predict statements.

Full Access
Question # 27

You are an ML engineer at a travel company. You have been researching customers’ travel behavior for many years, and you have deployed models that predict customers’ vacation patterns. You have observed that customers’ vacation destinations vary based on seasonality and holidays; however, these seasonal variations are similar across years. You want to quickly and easily store and compare the model versions and performance statistics across years. What should you do?

A.

Store the performance statistics in Cloud SQL. Query that database to compare the performance statistics across the model versions.

B.

Create versions of your models for each season per year in Vertex AI. Compare the performance statistics across the models in the Evaluate tab of the Vertex AI UI.

C.

Store the performance statistics of each pipeline run in Kubeflow under an experiment for each season per year. Compare the results across the experiments in the Kubeflow UI.

D.

Store the performance statistics of each version of your models using seasons and years as events in Vertex ML Metadata. Compare the results across the slices.

Full Access
Question # 28

You built and manage a production system that is responsible for predicting sales numbers. Model accuracy is crucial, because the production model is required to keep up with market changes. Since being deployed to production, the model hasn ' t changed; however the accuracy of the model has steadily deteriorated. What issue is most likely causing the steady decline in model accuracy?

A.

Poor data quality

B.

Lack of model retraining

C.

Too few layers in the model for capturing information

D.

Incorrect data split ratio during model training, evaluation, validation, and test

Full Access
Question # 29

You manage a team of data scientists who use a cloud-based backend system to submit training jobs. This system has become very difficult to administer, and you want to use a managed service instead. The data scientists you work with use many different frameworks, including Keras, PyTorch, theano, scikit-learn, and custom libraries. What should you do?

A.

Use the Vertex AI Training to submit training jobs using any framework.

B.

Configure Kubeflow to run on Google Kubernetes Engine and submit training jobs through TFJob.

C.

Create a library of VM images on Compute Engine, and publish these images on a centralized repository.

D.

Set up Slurm workload manager to receive jobs that can be scheduled to run on your cloud infrastructure.

Full Access
Question # 30

You are building a TensorFlow model for a financial institution that predicts the impact of consumer spending on inflation globally. Due to the size and nature of the data, your model is long-running across all types of hardware, and you have built frequent checkpointing into the training process. Your organization has asked you to minimize cost. What hardware should you choose?

A.

A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with 4 NVIDIA P100 GPUs

B.

A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with an NVIDIA P100 GPU

C.

A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with a non-preemptible v3-8 TPU

D.

A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with a preemptible v3-8 TPU

Full Access
Question # 31

You work on the data science team for a multinational beverage company. You need to develop an ML model to predict the company’s profitability for a new line of naturally flavored bottled waters in different locations. You are provided with historical data that includes product types, product sales volumes, expenses, and profits for all regions. What should you use as the input and output for your model?

A.

Use latitude, longitude, and product type as features. Use profit as model output.

B.

Use latitude, longitude, and product type as features. Use revenue and expenses as model outputs.

C.

Use product type and the feature cross of latitude with longitude, followed by binning, as features. Use profit as model output.

D.

Use product type and the feature cross of latitude with longitude, followed by binning, as features. Use revenue and expenses as model outputs.

Full Access
Question # 32

You work for a gaming company that has millions of customers around the world. All games offer a chat feature that allows players to communicate with each other in real time. Messages can be typed in more than 20 languages and are translated in real time using the Cloud Translation API. You have been asked to build an ML system to moderate the chat in real time while assuring that the performance is uniform across the various languages and without changing the serving infrastructure.

You trained your first model using an in-house word2vec model for embedding the chat messages translated by the Cloud Translation API. However, the model has significant differences in performance across the different languages. How should you improve it?

A.

Add a regularization term such as the Min-Diff algorithm to the loss function.

B.

Train a classifier using the chat messages in their original language.

C.

Replace the in-house word2vec with GPT-3 or T5.

D.

Remove moderation for languages for which the false positive rate is too high.

Full Access
Question # 33

You work for a bank You have been asked to develop an ML model that will support loan application decisions. You need to determine which Vertex Al services to include in the workflow You want to track the model ' s training parameters and the metrics per training epoch. You plan to compare the performance of each version of the model to determine the best model based on your chosen metrics. Which Vertex Al services should you use?

A.

Vertex ML Metadata Vertex Al Feature Store, and Vertex Al Vizier

B.

Vertex Al Pipelines. Vertex Al Experiments, and Vertex Al Vizier

C.

Vertex ML Metadata Vertex Al Experiments, and Vertex Al TensorBoard

D.

Vertex Al Pipelines. Vertex Al Feature Store, and Vertex Al TensorBoard

Full Access
Question # 34

You are going to train a DNN regression model with Keras APIs using this code:

How many trainable weights does your model have? (The arithmetic below is correct.)

A.

501*256+257*128+2 = 161154

B.

500*256+256*128+128*2 = 161024

C.

501*256+257*128+128*2=161408

D.

500*256*0 25+256*128*0 25+128*2 = 40448

Full Access
Question # 35

Your task is classify if a company logo is present on an image. You found out that 96% of a data does not include a logo. You are dealing with data imbalance problem. Which metric do you use to evaluate to model?

A.

F1 Score

B.

RMSE

C.

F Score with higher precision weighting than recall

D.

F Score with higher recall weighted than precision

Full Access
Question # 36

You have written unit tests for a Kubeflow Pipeline that require custom libraries. You want to automate the execution of unit tests with each new push to your development branch in Cloud Source Repositories. What should you do?

A.

Write a script that sequentially performs the push to your development branch and executes the unit tests on Cloud Run

B.

Using Cloud Build, set an automated trigger to execute the unit tests when changes are pushed to your development branch.

C.

Set up a Cloud Logging sink to a Pub/Sub topic that captures interactions with Cloud Source Repositories Configure a Pub/Sub trigger for Cloud Run, and execute the unit tests on Cloud Run.

D.

Set up a Cloud Logging sink to a Pub/Sub topic that captures interactions with Cloud Source Repositories. Execute the unit tests using a Cloud Function that is triggered when messages are sent to the Pub/Sub topic

Full Access
Question # 37

You have been tasked with deploying prototype code to production. The feature engineering code is in PySpark and runs on Dataproc Serverless. The model training is executed by using a Vertex Al custom training job. The two steps are not connected, and the model training must currently be run manually after the feature engineering step finishes. You need to create a scalable and maintainable production process that runs end-to-end and tracks the connections between steps. What should you do?

A.

Create a Vertex Al Workbench notebook Use the notebook to submit the Dataproc Serverless feature engineering job Use the same notebook to submit the custom model training job Run the notebook cells sequentially to tie the steps together end-to-end

B.

Create a Vertex Al Workbench notebook Initiate an Apache Spark context in the notebook, and run the PySpark feature engineering code Use the same notebook to run the custom model training job in TensorFlow Run the notebook cells sequentially to tie the steps together end-to-end

C.

Use the Kubeflow pipelines SDK to write code that specifies two components

- The first is a Dataproc Serverless component that launches the feature engineering job

- The second is a custom component wrapped in the

creare_cusrora_rraining_job_from_ccraponent Utility that launches the custom model training

job.

D.

Create a Vertex Al Pipelines job to link and run both components Use the Kubeflow pipelines SDK to write code that specifies two components

- The first component initiates an Apache Spark context that runs the PySpark feature engineering code

- The second component runs the TensorFlow custom model training code Create a Vertex Al Pipelines job to link and run both components

Full Access
Question # 38

You are an ML engineer at a large grocery retailer with stores in multiple regions. You have been asked to create an inventory prediction model. Your models features include region, location, historical demand, and seasonal popularity. You want the algorithm to learn from new inventory data on a daily basis. Which algorithms should you use to build the model?

A.

Classification

B.

Reinforcement Learning

C.

Recurrent Neural Networks (RNN)

D.

Convolutional Neural Networks (CNN)

Full Access
Question # 39

You are an ML engineer on an agricultural research team working on a crop disease detection tool to detect leaf rust spots in images of crops to determine the presence of a disease. These spots, which can vary in shape and size, are correlated to the severity of the disease. You want to develop a solution that predicts the presence and severity of the disease with high accuracy. What should you do?

A.

Create an object detection model that can localize the rust spots.

B.

Develop an image segmentation ML model to locate the boundaries of the rust spots.

C.

Develop a template matching algorithm using traditional computer vision libraries.

D.

Develop an image classification ML model to predict the presence of the disease.

Full Access
Question # 40

You are responsible for building a unified analytics environment across a variety of on-premises data marts. Your company is experiencing data quality and security challenges when integrating data across the servers, caused by the use of a wide range of disconnected tools and temporary solutions. You need a fully managed, cloud-native data integration service that will lower the total cost of work and reduce repetitive work. Some members on your team prefer a codeless interface for building Extract, Transform, Load (ETL) process. Which service should you use?

A.

Dataflow

B.

Dataprep

C.

Apache Flink

D.

Cloud Data Fusion

Full Access
Question # 41

You work with a team of researchers to develop state-of-the-art algorithms for financial analysis. Your team develops and debugs complex models in TensorFlow. You want to maintain the ease of debugging while also reducing the model training time. How should you set up your training environment?

A.

Configure a v3-8 TPU VM SSH into the VM to tram and debug the model.

B.

Configure a v3-8 TPU node Use Cloud Shell to SSH into the Host VM to train and debug the model.

C.

Configure a M-standard-4 VM with 4 NVIDIA P100 GPUs SSH into the VM and use

Parameter Server Strategy to train the model.

D.

Configure a M-standard-4 VM with 4 NVIDIA P100 GPUs SSH into the VM and use

MultiWorkerMirroredStrategy to train the model.

Full Access
Question # 42

You are developing models to classify customer support emails. You created models with TensorFlow Estimators using small datasets on your on-premises system, but you now need to train the models using large datasets to ensure high performance. You will port your models to Google Cloud and want to minimize code refactoring and infrastructure overhead for easier migration from on-prem to cloud. What should you do?

A.

Use Vertex Al Platform for distributed training

B.

Create a cluster on Dataproc for training

C.

Create a Managed Instance Group with autoscaling

D.

Use Kubeflow Pipelines to train on a Google Kubernetes Engine cluster.

Full Access
Question # 43

Your team is building a convolutional neural network (CNN)-based architecture from scratch. The preliminary experiments running on your on-premises CPU-only infrastructure were encouraging, but have slow convergence. You have been asked to speed up model training to reduce time-to-market. You want to experiment with virtual machines (VMs) on Google Cloud to leverage more powerful hardware. Your code does not include any manual device placement and has not been wrapped in Estimator model-level abstraction. Which environment should you train your model on?

A.

AVM on Compute Engine and 1 TPU with all dependencies installed manually.

B.

AVM on Compute Engine and 8 GPUs with all dependencies installed manually.

C.

A Deep Learning VM with an n1-standard-2 machine and 1 GPU with all libraries pre-installed.

D.

A Deep Learning VM with more powerful CPU e2-highcpu-16 machines with all libraries pre-installed.

Full Access
Question # 44

You are developing an ML model to identify your company s products in images. You have access to over one million images in a Cloud Storage bucket. You plan to experiment with different TensorFlow models by using Vertex Al Training You need to read images at scale during training while minimizing data I/O bottlenecks What should you do?

A.

Load the images directly into the Vertex Al compute nodes by using Cloud Storage FUSE Read the images by using the tf .data.Dataset.from_tensor_slices function.

B.

Create a Vertex Al managed dataset from your image data Access the aip_training_data_uri

environment variable to read the images by using the tf. data. Dataset. Iist_flies function.

C.

Convert the images to TFRecords and store them in a Cloud Storage bucket Read the TFRecords by using the tf. ciata.TFRecordDataset function.

D.

Store the URLs of the images in a CSV file Read the file by using the tf.data.experomental.CsvDataset function.

Full Access
Question # 45

You have been asked to productionize a proof-of-concept ML model built using Keras. The model was trained in a Jupyter notebook on a data scientist’s local machine. The notebook contains a cell that performs data validation and a cell that performs model analysis. You need to orchestrate the steps contained in the notebook and automate the execution of these steps for weekly retraining. You expect much more training data in the future. You want your solution to take advantage of managed services while minimizing cost. What should you do?

A.

Move the Jupyter notebook to a Notebooks instance on the largest N2 machine type, and schedule the execution of the steps in the Notebooks instance using Cloud Scheduler.

B.

Write the code as a TensorFlow Extended (TFX) pipeline orchestrated with Vertex AI Pipelines. Use standard TFX components for data validation and model analysis, and use Vertex AI Pipelines for model retraining.

C.

Rewrite the steps in the Jupyter notebook as an Apache Spark job, and schedule the execution of the job on ephemeral Dataproc clusters using Cloud Scheduler.

D.

Extract the steps contained in the Jupyter notebook as Python scripts, wrap each script in an Apache Airflow BashOperator, and run the resulting directed acyclic graph (DAG) in Cloud Composer.

Full Access
Question # 46

You work for a retail company. You have been tasked with building a model to determine the probability of churn for each customer. You need the predictions to be interpretable so the results can be used to develop marketing campaigns that target at-risk customers. What should you do?

A.

Build a random forest regression model in a Vertex Al Workbench notebook instance Configure the model to generate feature importance’s after the model is trained.

B.

Build an AutoML tabular regression model Configure the model to generate explanations when it makes predictions.

C.

Build a custom TensorFlow neural network by using Vertex Al custom training Configure the model to generate explanations when it makes predictions.

D.

Build a random forest classification model in a Vertex Al Workbench notebook instance Configure the model to generate feature importance’s after the model is trained.

Full Access
Question # 47

You work for a magazine distributor and need to build a model that predicts which customers will renew their subscriptions for the upcoming year. Using your company’s his torical data as your training set, you created a TensorFlow model and deployed it to AI Platform. You need to determine which customer attribute has the most predictive power for each prediction served by the model. What should you do?

A.

Use AI Platform notebooks to perform a Lasso regression analysis on your model, which will eliminate features that do not provide a strong signal.

B.

Stream prediction results to BigQuery. Use BigQuery’s CORR(X1, X2) function to calculate the Pearson correlation coefficient between each feature and the target variable.

C.

Use the AI Explanations feature on AI Platform. Submit each prediction request with the ‘explain’ keyword to retrieve feature attributions using the sampled Shapley method.

D.

Use the What-If tool in Google Cloud to determine how your model will perform when individual features are excluded. Rank the feature importance in order of those that caused the most significant performance drop when removed from the model.

Full Access
Question # 48

You are working on a prototype of a text classification model in a managed Vertex AI Workbench notebook. You want to quickly experiment with tokenizing text by using a Natural Language Toolkit (NLTK) library. How should you add the library to your Jupyter kernel?

A.

Install the NLTK library from a terminal by using the pip install nltk command.

B.

Write a custom Dataflow job that uses NLTK to tokenize your text and saves the output to Cloud Storage.

C.

Create a new Vertex Al Workbench notebook with a custom image that includes the NLTK library.

D.

Install the NLTK library from a Jupyter cell by using the! pip install nltk —user command.

Full Access
Question # 49

You are an ML engineer at a bank. You have developed a binary classification model using AutoML Tables to predict whether a customer will make loan payments on time. The output is used to approve or reject loan requests. One customer’s loan request has been rejected by your model, and the bank’s risks department is asking you to provide the reasons that contributed to the model’s decision. What should you do?

A.

Use local feature importance from the predictions.

B.

Use the correlation with target values in the data summary page.

C.

Use the feature importance percentages in the model evaluation page.

D.

Vary features independently to identify the threshold per feature that changes the classification.

Full Access
Question # 50

You want to migrate a scikrt-learn classifier model to TensorFlow. You plan to train the TensorFlow classifier model using the same training set that was used to train the scikit-learn model and then compare the performances using a common test set. You want to use the Vertex Al Python SDK to manually log the evaluation metrics of each model and compare them based on their F1 scores and confusion matrices. How should you log the metrics?

A.

B.

C.

D.

Full Access
Question # 51

You are training a TensorFlow model on a structured data set with 100 billion records stored in several CSV files. You need to improve the input/output execution performance. What should you do?

A.

Load the data into BigQuery and read the data from BigQuery.

B.

Load the data into Cloud Bigtable, and read the data from Bigtable

C.

Convert the CSV files into shards of TFRecords, and store the data in Cloud Storage

D.

Convert the CSV files into shards of TFRecords, and store the data in the Hadoop Distributed File System (HDFS)

Full Access
Question # 52

You work on a growing team of more than 50 data scientists who all use Al Platform. You are designing a strategy to organize your jobs, models, and versions in a clean and scalable way. Which strategy should you choose?

A.

Set up restrictive I AM permissions on the Al Platform notebooks so that only a single user or group can access a given instance.

B.

Separate each data scientist ' s work into a different project to ensure that the jobs, models, and versions created by each data scientist are accessible only to that user.

C.

Use labels to organize resources into descriptive categories. Apply a label to each created resource so that users can filter the results by label when viewing or monitoring the resources

D.

Set up a BigQuery sink for Cloud Logging logs that is appropriately filtered to capture information about Al Platform resource usage In BigQuery create a SQL view that maps users to the resources they are using.

Full Access
Question # 53

You were asked to investigate failures of a production line component based on sensor readings. After receiving the dataset, you discover that less than 1% of the readings are positive examples representing failure incidents. You have tried to train several classification models, but none of them converge. How should you resolve the class imbalance problem?

A.

Use the class distribution to generate 10% positive examples

B.

Use a convolutional neural network with max pooling and softmax activation

C.

Downsample the data with upweighting to create a sample with 10% positive examples

D.

Remove negative examples until the numbers of positive and negative examples are equal

Full Access
Question # 54

You work for a company that is developing an application to help users with meal planning You want to use machine learning to scan a corpus of recipes and extract each ingredient (e g carrot, rice pasta) and each kitchen cookware (e.g. bowl, pot spoon) mentioned Each recipe is saved in an unstructured text file What should you do?

A.

Create a text dataset on Vertex Al for entity extraction Create two entities called ingredient " and cookware " and label at least 200 examples of each entity Train an AutoML entity extraction model to extract occurrences of these entity types Evaluate performance on a holdout dataset.

B.

Create a multi-label text classification dataset on Vertex Al Create a test dataset and label each recipe that corresponds to its ingredients and cookware Train a multi-class classification model Evaluate the model’s performance on a holdout dataset.

C.

Use the Entity Analysis method of the Natural Language API to extract the ingredients and cookware from each recipe Evaluate the model ' s performance on a prelabeled dataset.

D.

Create a text dataset on Vertex Al for entity extraction Create as many entities as there are different ingredients and cookware Train an AutoML entity extraction model to extract those entities Evaluate the models performance on a holdout dataset.

Full Access
Question # 55

You have a functioning end-to-end ML pipeline that involves tuning the hyperparameters of your ML model using Al Platform, and then using the best-tuned parameters for training. Hypertuning is taking longer than expected and is delaying the downstream processes. You want to speed up the tuning job without significantly compromising its effectiveness. Which actions should you take?

Choose 2 answers

A.

Decrease the number of parallel trials

B.

Decrease the range of floating-point values

C.

Set the early stopping parameter to TRUE

D.

Change the search algorithm from Bayesian search to random search.

E.

Decrease the maximum number of trials during subsequent training phases.

Full Access
Question # 56

You work for an online publisher that delivers news articles to over 50 million readers. You have built an AI model that recommends content for the company’s weekly newsletter. A recommendation is considered successful if the article is opened within two days of the newsletter’s published date and the user remains on the page for at least one minute.

All the information needed to compute the success metric is available in BigQuery and is updated hourly. The model is trained on eight weeks of data, on average its performance degrades below the acceptable baseline after five weeks, and training time is 12 hours. You want to ensure that the model’s performance is above the acceptable baseline while minimizing cost. How should you monitor the model to determine when retraining is necessary?

A.

Use Vertex AI Model Monitoring to detect skew of the input features with a sample rate of 100% and a monitoring frequency of two days.

B.

Schedule a cron job in Cloud Tasks to retrain the model every week before the newsletter is created.

C.

Schedule a weekly query in BigQuery to compute the success metric.

D.

Schedule a daily Dataflow job in Cloud Composer to compute the success metric.

Full Access
Question # 57

You work for a hospital that wants to optimize how it schedules operations. You need to create a model that uses the relationship between the number of surgeries scheduled and beds used You want to predict how many beds will be needed for patients each day in advance based on the scheduled surgeries You have one year of data for the hospital organized in 365 rows

The data includes the following variables for each day

• Number of scheduled surgeries

• Number of beds occupied

• Date

You want to maximize the speed of model development and testing What should you do?

A.

Create a BigQuery table Use BigQuery ML to build a regression model, with number of beds as the target variable and number of scheduled surgeries and date features (such as day of week) as the predictors

B.

Create a BigQuery table Use BigQuery ML to build an ARIMA model, with number of beds as the target variable and date as the time variable.

C.

Create a Vertex Al tabular dataset Tram an AutoML regression model, with number of beds as the target variable and number of scheduled minor surgeries and date features (such as day of the week) as the predictors

D.

Create a Vertex Al tabular dataset Train a Vertex Al AutoML Forecasting model with number of beds as the target variable, number of scheduled surgeries as a covariate, and date as the time variable.

Full Access
Question # 58

You have built a model that is trained on data stored in Parquet files. You access the data through a Hive table hosted on Google Cloud. You preprocessed these data with PySpark and exported it as a CSV file into Cloud Storage. After preprocessing, you execute additional steps to train and evaluate your model. You want to parametrize this model training in Kubeflow Pipelines. What should you do?

A.

Remove the data transformation step from your pipeline.

B.

Containerize the PySpark transformation step, and add it to your pipeline.

C.

Add a ContainerOp to your pipeline that spins a Dataproc cluster, runs a transformation, and then saves the transformed data in Cloud Storage.

D.

Deploy Apache Spark at a separate node pool in a Google Kubernetes Engine cluster. Add a ContainerOp to your pipeline that invokes a corresponding transformation job for this Spark instance.

Full Access
Question # 59

You work on a growing team of more than 50 data scientists who all use AI Platform. You are designing a strategy to organize your jobs, models, and versions in a clean and scalable way. Which strategy should you choose?

A.

Set up restrictive IAM permissions on the AI Platform notebooks so that only a single user or group can access a given instance.

B.

Separate each data scientist’s work into a different project to ensure that the jobs, models, and versions created by each data scientist are accessible only to that user.

C.

Use labels to organize resources into descriptive categories. Apply a label to each created resource so that users can filter the results by label when viewing or monitoring the resources.

D.

Set up a BigQuery sink for Cloud Logging logs that is appropriately filtered to capture information about AI Platform resource usage. In BigQuery, create a SQL view that maps users to the resources they are using

Full Access
Question # 60

You need to train a natural language model to perform text classification on product descriptions that contain millions of examples and 100,000 unique words. You want to preprocess the words individually so that they can be fed into a recurrent neural network. What should you do?

A.

Create a hot-encoding of words, and feed the encodings into your model.

B.

Identify word embeddings from a pre-trained model, and use the embeddings in your model.

C.

Sort the words by frequency of occurrence, and use the frequencies as the encodings in your model.

D.

Assign a numerical value to each word from 1 to 100,000 and feed the values as inputs in your model.

Full Access
Question # 61

You have recently developed a custom model for image classification by using a neural network. You need to automatically identify the values for learning rate, number of layers, and kernel size. To do this, you plan to run multiple jobs in parallel to identify the parameters that optimize performance. You want to minimize custom code development and infrastructure management. What should you do?

A.

Create a Vertex Al pipeline that runs different model training jobs in parallel.

B.

Train an AutoML image classification model.

C.

Create a custom training job that uses the Vertex Al Vizier SDK for parameter optimization.

D.

Create a Vertex Al hyperparameter tuning job.

Full Access
Question # 62

Your work for a textile manufacturing company. Your company has hundreds of machines and each machine has many sensors. Your team used the sensory data to build hundreds of ML models that detect machine anomalies Models are retrained daily and you need to deploy these models in a cost-effective way. The models must operate 24/7 without downtime and make sub millisecond predictions. What should you do?

A.

Deploy a Dataflow batch pipeline and a Vertex Al Prediction endpoint.

B.

Deploy a Dataflow batch pipeline with the Runlnference API. and use model refresh.

C.

Deploy a Dataflow streaming pipeline and a Vertex Al Prediction endpoint with autoscaling.

D.

Deploy a Dataflow streaming pipeline with the Runlnference API and use automatic model refresh.

Full Access
Question # 63

You are developing an ML model that uses sliced frames from video feed and creates bounding boxes around specific objects. You want to automate the following steps in your training pipeline: ingestion and preprocessing of data in Cloud Storage, followed by training and hyperparameter tuning of the object model using Vertex AI jobs, and finally deploying the model to an endpoint. You want to orchestrate the entire pipeline with minimal cluster management. What approach should you use?

A.

Use Kubeflow Pipelines on Google Kubernetes Engine.

B.

Use Vertex AI Pipelines with TensorFlow Extended (TFX) SDK.

C.

Use Vertex AI Pipelines with Kubeflow Pipelines SDK.

D.

Use Cloud Composer for the orchestration.

Full Access
Question # 64

You are developing ML models with Al Platform for image segmentation on CT scans. You frequently update your model architectures based on the newest available research papers, and have to rerun training on the same dataset to benchmark their performance. You want to minimize computation costs and manual intervention while having version control for your code. What should you do?

A.

Use Cloud Functions to identify changes to your code in Cloud Storage and trigger a retraining job

B.

Use the gcloud command-line tool to submit training jobs on Al Platform when you update your code

C.

Use Cloud Build linked with Cloud Source Repositories to trigger retraining when new code is pushed to the repository

D.

Create an automated workflow in Cloud Composer that runs daily and looks for changes in code in Cloud Storage using a sensor.

Full Access
Question # 65

You have successfully deployed to production a large and complex TensorFlow model trained on tabular data. You want to predict the lifetime value (LTV) field for each subscription stored in the BigQuery table named subscription. subscriptionPurchase in the project named my-fortune500-company-project.

You have organized all your training code, from preprocessing data from the BigQuery table up to deploying the validated model to the Vertex AI endpoint, into a TensorFlow Extended (TFX) pipeline. You want to prevent prediction drift, i.e., a situation when a feature data distribution in production changes significantly over time. What should you do?

A.

Implement continuous retraining of the model daily using Vertex AI Pipelines.

B.

Add a model monitoring job where 10% of incoming predictions are sampled 24 hours.

C.

Add a model monitoring job where 90% of incoming predictions are sampled 24 hours.

D.

Add a model monitoring job where 10% of incoming predictions are sampled every hour.

Full Access
Question # 66

You developed an ML model with Al Platform, and you want to move it to production. You serve a few thousand queries per second and are experiencing latency issues. Incoming requests are served by a load balancer that distributes them across multiple Kubeflow CPU-only pods running on Google Kubernetes Engine (GKE). Your goal is to improve the serving latency without changing the underlying infrastructure. What should you do?

A.

Significantly increase the max_batch_size TensorFlow Serving parameter

B.

Switch to the tensorflow-model-server-universal version of TensorFlow Serving

C.

Significantly increase the max_enqueued_batches TensorFlow Serving parameter

D.

Recompile TensorFlow Serving using the source to support CPU-specific optimizations Instruct GKE to choose an appropriate baseline minimum CPU platform for serving nodes

Full Access
Question # 67

You work for a global footwear retailer and need to predict when an item will be out of stock based on historical inventory data. Customer behavior is highly dynamic since footwear demand is influenced by many different factors. You want to serve models that are trained on all available data, but track your performance on specific subsets of data before pushing to production. What is the most streamlined and reliable way to perform this validation?

A.

Use the TFX ModelValidator tools to specify performance metrics for production readiness

B.

Use k-fold cross-validation as a validation strategy to ensure that your model is ready for production.

C.

Use the last relevant week of data as a validation set to ensure that your model is performing accurately on current data

D.

Use the entire dataset and treat the area under the receiver operating characteristics curve (AUC ROC) as the main metric.

Full Access
Question # 68

Your data science team has requested a system that supports scheduled model retraining, Docker containers, and a service that supports autoscaling and monitoring for online prediction requests. Which platform components should you choose for this system?

A.

Vertex AI Pipelines and App Engine

B.

Vertex AI Pipelines, Vertex AI Prediction, and Vertex AI Model Monitoring

C.

Cloud Composer, BigQuery ML, and Vertex AI Prediction

D.

Cloud Composer, Vertex AI Training with custom containers, and App Engine

Full Access
Question # 69

You have a large corpus of written support cases that can be classified into 3 separate categories: Technical Support, Billing Support, or Other Issues. You need to quickly build, test, and deploy a service that will automatically classify future written requests into one of the categories. How should you configure the pipeline?

A.

Use the Cloud Natural Language API to obtain metadata to classify the incoming cases.

B.

Use AutoML Natural Language to build and test a classifier. Deploy the model as a REST API.

C.

Use BigQuery ML to build and test a logistic regression model to classify incoming requests. Use BigQuery ML to perform inference.

D.

Create a TensorFlow model using Google’s BERT pre-trained model. Build and test a classifier, and deploy the model using Vertex AI.

Full Access
Question # 70

Your company manages a video sharing website where users can watch and upload videos. You need to

create an ML model to predict which newly uploaded videos will be the most popular so that those videos can be prioritized on your company’s website. Which result should you use to determine whether the model is successful?

A.

The model predicts videos as popular if the user who uploads them has over 10,000 likes.

B.

The model predicts 97.5% of the most popular clickbait videos measured by number of clicks.

C.

The model predicts 95% of the most popular videos measured by watch time within 30 days of being

uploaded.

D.

The Pearson correlation coefficient between the log-transformed number of views after 7 days and 30 days after publication is equal to 0.

Full Access
Question # 71

You work for a semiconductor manufacturing company. You need to create a real-time application that automates the quality control process High-definition images of each semiconductor are taken at the end of the assembly line in real time. The photos are uploaded to a Cloud Storage bucket along with tabular data that includes each semiconductor ' s batch number serial number dimensions, and weight You need to configure model training and serving while maximizing model accuracy. What should you do?

A.

Use Vertex Al Data Labeling Service to label the images and train an AutoML image classification model.

Deploy the model and configure Pub/Sub to publish a message when an image is categorized into the failing class.

B.

Use Vertex Al Data Labeling Service to label the images and train an AutoML image classification model. Schedule a daily batch prediction job that publishes a Pub/Sub message when the job completes.

C.

Convert the images into an embedding representation Import this data into BigQuery, and train a BigQuery. ML K-means clustenng model with two clusters Deploy the model and configure Pub/Sub to publish a message when a semiconductor ' s data is categorized into the failing cluster.

D.

Import the tabular data into BigQuery use Vertex Al Data Labeling Service to label the data and train an AutoML tabular classification model Deploy the model and configure Pub/Sub to publish a message when a semiconductor ' s data is categorized into the failing class.

Full Access
Question # 72

You have trained a model by using data that was preprocessed in a batch Dataflow pipeline Your use case requires real-time inference. You want to ensure that the data preprocessing logic is applied consistently between training and serving. What should you do?

A.

Perform data validation to ensure that the input data to the pipeline is the same format as the input data to the endpoint.

B.

Refactor the transformation code in the batch data pipeline so that it can be used outside of the pipeline Use the same code in the endpoint.

C.

Refactor the transformation code in the batch data pipeline so that it can be used outside of the pipeline Share this code with the end users of the endpoint.

D.

Batch the real-time requests by using a time window and then use the Dataflow pipeline to preprocess the batched requests. Send the preprocessed requests to the endpoint.

Full Access
Question # 73

You developed a Python module by using Keras to train a regression model. You developed two model architectures, linear regression and deep neural network (DNN). within the same module. You are using the – raining_method argument to select one of the two methods, and you are using the Learning_rate-and num_hidden_layers arguments in the DNN. You plan to use Vertex Al ' s hypertuning service with a Budget to perform 100 trials. You want to identify the model architecture and hyperparameter values that minimize training loss and maximize model performance What should you do?

A.

Run one hypertuning job for 100 trials. Set num hidden_layers as a conditional hypetparameter based on its parent hyperparameter training_mothod. and set learning rate as a non-conditional hyperparameter

B.

Run two separate hypertuning jobs. a linear regression job for 50 trials, and a DNN job for 50 trials Compare their final performance on a

common validation set. and select the set of hyperparameters with the least training loss

C.

Run one hypertuning job for 100 trials Set num_hidden_layers and learning_rate as conditional hyperparameters based on their parent hyperparameter training method.

D.

Run one hypertuning job with training_method as the hyperparameter for 50 trials Select the architecture with the lowest training loss. and further hypertune It and its corresponding hyperparameters for 50 trials

Full Access
Question # 74

You are developing a training pipeline for a new XGBoost classification model based on tabular data The data is stored in a BigQuery table You need to complete the following steps

1. Randomly split the data into training and evaluation datasets in a 65/35 ratio

2. Conduct feature engineering

3 Obtain metrics for the evaluation dataset.

4 Compare models trained in different pipeline executions

How should you execute these steps ' ?

A.

1 Using Vertex Al Pipelines, add a component to divide the data into training and evaluation sets, and add another component for feature engineering

2. Enable auto logging of metrics in the training component.

3 Compare pipeline runs in Vertex Al Experiments

B.

1 Using Vertex Al Pipelines, add a component to divide the data into training and evaluation sets, and add another component for feature engineering

2 Enable autologging of metrics in the training component

3 Compare models using the artifacts lineage in Vertex ML Metadata

C.

1 In BigQuery ML. use the create model statement with bocstzd_tree_classifier as the model

type and use BigQuery to handle the data splits.

2 Use a SQL view to apply feature engineering and train the model using the data in that view

3. Compare the evaluation metrics of the models by using a SQL query with the ml. training_infc statement.

D.

1 In BigQuery ML use the create model statement with boosted_tree_classifier as the model

type, and use BigQuery to handle the data splits.

2 Use ml transform to specify the feature engineering transformations, and train the model using the

data in the table

' 3. Compare the evaluation metrics of the models by using a SQL query with the ml. training_info statement.

Full Access
Question # 75

You are creating a deep neural network classification model using a dataset with categorical input values. Certain columns have a cardinality greater than 10,000 unique values. How should you encode these categorical values as input into the model?

A.

Convert each categorical value into an integer value.

B.

Convert the categorical string data to one-hot hash buckets.

C.

Map the categorical variables into a vector of boolean values.

D.

Convert each categorical value into a run-length encoded string.

Full Access
Question # 76

You have developed a BigQuery ML model that predicts customer churn and deployed the model to Vertex Al Endpoints. You want to automate the retraining of your model by using minimal additional code when model feature values change. You also want to minimize the number of times that your model is retrained to reduce training costs. What should you do?

A.

1. Enable request-response logging on Vertex Al Endpoints.

2 Schedule a TensorFlow Data Validation job to monitor prediction drift

3. Execute model retraining if there is significant distance between the distributions.

B.

1. Enable request-response logging on Vertex Al Endpoints

2. Schedule a TensorFlow Data Validation job to monitor training/serving skew

3. Execute model retraining if there is significant distance between the distributions

C.

1 Create a Vertex Al Model Monitoring job configured to monitor prediction drift.

2. Configure alert monitoring to publish a message to a Pub/Sub queue when a monitonng alert is detected.

3. Use a Cloud Function to monitor the Pub/Sub queue, and trigger retraining in BigQuery

D.

1. Create a Vertex Al Model Monitoring job configured to monitor training/serving skew

2. Configure alert monitoring to publish a message to a Pub/Sub queue when a monitoring alert is detected

3. Use a Cloud Function to monitor the Pub/Sub queue, and trigger retraining in BigQuery.

Full Access
Question # 77

You work for a manufacturing company. You need to train a custom image classification model to detect product defects at the end of an assembly line Although your model is performing well some images in your holdout set are consistently mislabeled with high confidence You want to use Vertex Al to understand your model ' s results What should you do?

A.

B.

C.

D.

Full Access
Question # 78

You work for a company that sells corporate electronic products to thousands of businesses worldwide. Your company stores historical customer data in BigQuery. You need to build a model that predicts customer lifetime value over the next three years. You want to use the simplest approach to build the model. What should you do?

A.

Access BigQuery Studio in the Google Cloud console. Run the CREATE MODEL statement in the SQL editor to create a deep neural network (DNN) regressor model.

B.

Create a Vertex AI Workbench notebook. Use IPython magic to run the CREATE MODEL statement to create a deep neural network (DNN) regressor model.

C.

Access BigQuery Studio in the Google Cloud console. Run the CREATE MODEL statement in the SQL editor to create an AutoML regression model.

D.

Create a Vertex AI Workbench notebook. Use IPython magic to run the CREATE MODEL statement to create an AutoML regression model.

Full Access
Question # 79

You need to quickly build and train a model to predict the sentiment of customer reviews with custom categories without writing code. You do not have enough data to train a model from scratch. The resulting model should have high predictive performance. Which service should you use?

A.

AutoML Natural Language

B.

Cloud Natural Language API

C.

AI Hub pre-made Jupyter Notebooks

D.

AI Platform Training built-in algorithms

Full Access
Question # 80

You built a custom Vertex AI pipeline job that preprocesses images and trains an object detection model. The pipeline currently uses 1 n1-standard-8 machine with 1 NVIDIA Tesla V100 GPU. You want to reduce the model training time without compromising model accuracy. What should you do?

A.

Reduce the number of layers in your object detection model.

B.

Train the same model on a stratified subset of your dataset.

C.

Update the WorkerPoolSpec to use a machine with 24 vCPUs and 1 NVIDIA Tesla V100 GPU.

D.

Update the WorkerPoolSpec to use a machine with 24 vCPUs and 3 NVIDIA Tesla V100 GPUs.

Full Access
Question # 81

You have trained a DNN regressor with TensorFlow to predict housing prices using a set of predictive features. Your default precision is tf.float64, and you use a standard TensorFlow estimator;

estimator = tf.estimator.DNNRegressor(

feature_columns=[YOUR_LIST_OF_FEATURES],

hidden_units-[1024, 512, 256],

dropout=None)

Your model performs well, but Just before deploying it to production, you discover that your current serving latency is 10ms @ 90 percentile and you currently serve on CPUs. Your production requirements expect a model latency of 8ms @ 90 percentile. You are willing to accept a small decrease in performance in order to reach the latency requirement Therefore your plan is to improve latency while evaluating how much the model ' s prediction decreases. What should you first try to quickly lower the serving latency?

A.

Increase the dropout rate to 0.8 in_PREDICT mode by adjusting the TensorFlow Serving parameters

B.

Increase the dropout rate to 0.8 and retrain your model.

C.

Switch from CPU to GPU serving

D.

Apply quantization to your SavedModel by reducing the floating point precision to tf.float16.

Full Access
Question # 82

You work for a manufacturing company. You need to train a custom image classification model to detect product detects at the end of an assembly line. Although your model is performing well, some images in your holdout set are consistently mislabeled with high confidence. You want to use Vertex Al to understand your models results. What should you do?

A.

Configure feature-based explanations by using sampled Shapley. Set number of feature permutations to the maximum value of 50.

B.

Create an index by using Vertex Al Matching Engine. Query the index with your mislabeled images

C.

Configure example-based explanations by using integrated gradients. Set visualization type to pixels, and set clip_percent_upperbound to 95.

D.

Configure example-based explanations. Specify the embedding output layer to be used for the latent space representation.

Full Access
Question # 83

Your organization wants to make its internal shuttle service route more efficient. The shuttles currently stop at all pick-up points across the city every 30 minutes between 7 am and 10 am. The development team has already built an application on Google Kubernetes Engine that requires users to confirm their presence and shuttle station one day in advance. What approach should you take?

A.

1. Build a tree-based regression model that predicts how many passengers will be picked up at each shuttle station.

2. Dispatch an appropriately sized shuttle and provide the map with the required stops based on the prediction.

B.

1. Build a tree-based classification model that predicts whether the shuttle should pick up passengers at each shuttle station.

2. Dispatch an available shuttle and provide the map with the required stops based on the prediction

C.

1. Define the optimal route as the shortest route that passes by all shuttle stations with confirmed attendance at the given time under capacity constraints.

2 Dispatch an appropriately sized shuttle and indicate the required stops on the map

D.

1. Build a reinforcement learning model with tree-based classification models that predict the presence of passengers at shuttle stops as agents and a reward function around a distance-based metric

2. Dispatch an appropriately sized shuttle and provide the map with the required stops based on the simulated outcome.

Full Access
Question # 84

You trained a model on data that is stored in a Cloud Storage bucket. The model needs to be retrained frequently in Vertex AI Training by using the latest data in the bucket. Data preprocessing is required prior to the retraining. You want to build a simple and efficient near real-time ML pipeline in Vertex AI that will perform the data preprocessing when new data arrives in the bucket. What should you do?

A.

Use the Vertex AI SDK to preprocess the new data in the bucket prior to each model retraining. Store the processed features in BigQuery.

B.

Create a Cloud Run function that is triggered when new data arrives in the bucket. The function initiates a Vertex AI Pipeline to preprocess the new data and store the processed features in Vertex AI Feature Store.

C.

Create a pipeline by using the Vertex AI SDK. Schedule the pipeline with Cloud Scheduler to preprocess the new data in the bucket. Store the processed features in Vertex AI Feature Store.

D.

Build a Dataflow pipeline to preprocess the new data in the bucket and store the processed features in BigQuery. Configure a cron job to trigger the pipeline execution.

Full Access
Question # 85

You want to rebuild your ML pipeline for structured data on Google Cloud. You are using PySpark to conduct data transformations at scale, but your pipelines are taking over 12 hours to run. To speed up development and pipeline run time, you want to use a serverless tool and SQL syntax. You have already moved your raw data into Cloud Storage. How should you build the pipeline on Google Cloud while meeting the speed and processing requirements?

A.

Use Data Fusion ' s GUI to build the transformation pipelines, and then write the data into BigQuery

B.

Convert your PySpark into SparkSQL queries to transform the data and then run your pipeline on Dataproc to write the data into BigQuery.

C.

Ingest your data into Cloud SQL convert your PySpark commands into SQL queries to transform the data, and then use federated queries from BigQuery for machine learning

D.

Ingest your data into BigQuery using BigQuery Load, convert your PySpark commands into BigQuery SQL queries to transform the data, and then write the transformations to a new table

Full Access
Question # 86

You are working with a dataset that contains customer transactions. You need to build an ML model to predict customer purchase behavior You plan to develop the model in BigQuery ML, and export it to Cloud Storage for online prediction You notice that the input data contains a few categorical features, including product category and payment method You want to deploy the model as quickly as possible. What should you do?

A.

Use the transform clause with the ML. ONE_HOT_ENCODER function on the categorical features at model creation and select the categorical and non-categorical features.

B.

Use the ML. ONE_HOT_ENCODER function on the categorical features, and select the encoded categorical features and non-categorical features as inputs to create your model.

C.

Use the create model statement and select the categorical and non-categorical features.

D.

Use the ML. ONE_HOT_ENCODER function on the categorical features, and select the encoded categorical features and non-categorical features as inputs to create your model.

Full Access
Question # 87

You are training an object detection machine learning model on a dataset that consists of three million X-ray images, each roughly 2 GB in size. You are using Vertex AI Training to run a custom training application on a Compute Engine instance with 32-cores, 128 GB of RAM, and 1 NVIDIA P100 GPU. You notice that model training is taking a very long time. You want to decrease training time without sacrificing model performance. What should you do?

A.

Increase the instance memory to 512 GB and increase the batch size.

B.

Replace the NVIDIA P100 GPU with a v3-32 TPU in the training job.

C.

Enable early stopping in your Vertex AI Training job.

D.

Use the tf.distribute.Strategy API and run a distributed training job.

Full Access
Question # 88

You are pre-training a large language model on Google Cloud. This model includes custom TensorFlow operations in the training loop Model training will use a large batch size, and you expect training to take several weeks You need to configure a training architecture that minimizes both training time and compute costs What should you do?

A.

B.

C.

D.

Full Access