Databricks-Certified-Data-Engineer-Associate Databricks Certified Data Engineer Associate Exam sample Question + Exam 2024 Practice Exam Dumps

Question # 4

Which of the following is hosted completely in the control plane of the classic Databricks architecture?

Worker node

JDBC data source

Databricks web application

Databricks Filesystem

Driver node

Full Access

Answer:

Explanation:

The Databricks web application is the user interface that allows you to create and manage workspaces, clusters, notebooks, jobs, and other resources. It is hosted completely in the control plane of the classic Databricks architecture, which includes the backend services that Databricks manages in your Databricks account. The other options are part of the compute plane, which is where your data is processed by compute resources such as clusters. The compute plane is in your own cloud account and network.Â References:Â Databricks architecture overview,Â Security and Trust Center QUESTION NO: 4

Which of the following benefits of using the Databricks Lakehouse Platform is provided by Delta Lake?

A. The ability to manipulate the same data using a variety of languages

B. The ability to collaborate in real time on a single notebook

C. The ability to set up alerts for query failures

D. The ability to support batch and streaming workloads

E. The ability to distribute complex data operations

Answer: D

Delta Lake is the optimized storage layer that provides the foundation for storing data and tables in the Databricks lakehouse.Â Delta Lake is fully compatible with Apache Spark APIs, and was developed for tight integration with Structured Streaming, allowing you to easily use a single copy of data for both batch and streaming operations and providing incremental processing at scale1.Â Delta Lake supports upserts using the merge operation, which enables you to efficiently update existing data or insert new data into your Delta tables2.Â Delta Lake also provides time travel capabilities, which allow you to query previous versions of your data or roll back to a specific point in time3.Â References:Â 1:Â What is Delta Lake? | Databricks on AWSÂ 2:Â Upsert into a table using merge | Databricks on AWSÂ 3: [Query an older snapshot of a table (time travel) | Databricks on AWS]

Learn more

1blob:https://www.bing.com/a746b4b4-48d0-4f44-9736-44d1ce0c4228

learn.microsoft.com2blob:https://www.bing.com/525fbb0f-e02f-4a70-8085-22c065fe0ca0

medium.com3blob:https://www.bing.com/5cb5bd07-1008-4cf7-9fa3-42a5a689c7d5

slideshare.net4blob:https://www.bing.com/9a7e8352-30c1-4356-a73f-a7253b607ef7

docs.databricks.com5blob:https://www.bing.com/3f65cc27-d573-4810-b272-01238a431c03

github.com6blob:https://www.bing.com/334f6880-dfeb-4e61-bd9a-76efae0a2d01

key2consulting.com

Question # 5

In which of the following scenarios should a data engineer select a Task in the Depends On field of a new Databricks Job Task?

When another task needs to be replaced by the new task

When another task needs to fail before the new task begins

When another task has the same dependency libraries as the new task

When another task needs to use as little compute resources as possible

When another task needs to successfully complete before the new task begins

Full Access

Answer:

Explanation:

A data engineer can create a multi-task job in Databricks that consists of multiple tasks that run in a specific order. Each task can have one or more dependencies, which are other tasks that must run before the current task. The Depends On field of a new Databricks Job Task allows the data engineer to specify the dependencies of the task. The data engineer should select a task in the Depends On field when they want the new task to run only after the selected task has successfully completed. This can help the data engineer to create a logical sequence of tasks that depend on each otherâ€™s outputs or results. For example, a data engineer can create a multi-task job that consists of the following tasks:

Task A: Ingest data from a source using Auto Loader
Task B: Transform the data using Spark SQL
Task C: Write the data to a Delta Lake table
Task D: Analyze the data using Spark ML
Task E: Visualize the data using Databricks SQL

In this case, the data engineer can set the dependencies of each task as follows:

Task A: No dependencies
Task B: Depends on Task A
Task C: Depends on Task B
Task D: Depends on Task C
Task E: Depends on Task D

This way, the data engineer can ensure that each task runs only after the previous task has successfully completed, and the data flows smoothly from ingestion to visualization.

The other options are incorrect because they do not describe valid scenarios for selecting a task in the Depends On field. The Depends On field does not affect the following aspects of a task:

Whether the task needs to be replaced by another task
Whether the task needs to fail before another task begins
Whether the task has the same dependency libraries as another task
Whether the task needs to use as little compute resources as possibleÂ References:Â Create a multi-task job,Â Run tasks conditionally in a Databricks job,Â Databricks Jobs.

Question # 6

A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are defined against Delta Lake table sources using LIVE TABLE.

The table is configured to run in Production mode using the Continuous Pipeline Mode.

Assuming previously unprocessed data exists and all definitions are valid, what is the expected outcome after clicking Start to update the pipeline?

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.

All datasets will be updated once and the pipeline will persist without any processing. The compute resources will persist but go unused.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be deployed for the update and terminated when the pipeline is stopped.

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

Full Access

Question # 7

A data engineer wants to schedule their Databricks SQL dashboard to refresh once per day, but they only want the associated SQL endpoint to be running when it is necessary.

Which of the following approaches can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?

They can ensure the dashboardâ€™s SQL endpoint matches each of the queriesâ€™ SQL endpoints.

They can set up the dashboardâ€™s SQL endpoint to be serverless.

They can turn on the Auto Stop feature for the SQL endpoint.

They can reduce the cluster size of the SQL endpoint.

They can ensure the dashboardâ€™s SQL endpoint is not one of the included queryâ€™s SQL endpoint.

Full Access

Question # 8

A data analyst has developed a query that runs against Delta table. They want help from the data engineering team to implement a series of tests to ensure the data returned by the query is clean. However, the data engineering team uses Python for its tests rather than SQL.

Which of the following operations could the data engineering team use to run the query and operate with the results in PySpark?

SELECT * FROM sales

spark.delta.table

spark.sql

There is no way to share data between PySpark and SQL.

spark.table

Full Access

Question # 9

A data engineer needs to determine whether to use the built-in Databricks Notebooks versioning or version their project using Databricks Repos.

Which of the following is an advantage of using Databricks Repos over the Databricks Notebooks versioning?

Databricks Repos automatically saves development progress

Databricks Repos supports the use of multiple branches

Databricks Repos allows users to revert to previous versions of a notebook

Databricks Repos provides the ability to comment on specific changes

Databricks Repos is wholly housed within the Databricks Lakehouse Platform

Full Access

Question # 10

Which of the following data workloads will utilize a Gold table as its source?

A job that enriches data by parsing its timestamps into a human-readable format

A job that aggregates uncleaned data to create standard summary statistics

A job that cleans data by removing malformatted records

A job that queries aggregated data designed to feed into a dashboard

A job that ingests raw data from a streaming source into the Lakehouse

Full Access

Question # 11

Which of the following describes the type of workloads that are always compatible with Auto Loader?

Dashboard workloads

Streaming workloads

Machine learning workloads

Serverless workloads

Batch workloads

Full Access

Question # 12

Which of the following commands will return the location of database customer360?

DESCRIBE LOCATION customer360;

DROP DATABASE customer360;

DESCRIBE DATABASE customer360;

ALTER DATABASE customer360 SET DBPROPERTIES ('location' = '/user'};

USE DATABASE customer360;

Full Access

Question # 13

A dataset has been defined using Delta Live Tables and includes an expectations clause:

CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION DROP ROW

What is the expected behavior when a batch of data containing data that violates these constraints is processed?

Records that violate the expectation are dropped from the target dataset and loaded into a quarantine table.

Records that violate the expectation are added to the target dataset and flagged as invalid in a field added to the target dataset.

Records that violate the expectation are dropped from the target dataset and recorded as invalid in the event log.

Records that violate the expectation are added to the target dataset and recorded as invalid in the event log.

Records that violate the expectation cause the job to fail.

Full Access

Question # 14

Which of the following benefits is provided by the array functions from Spark SQL?

An ability to work with data in a variety of types at once

An ability to work with data within certain partitions and windows

An ability to work with time-related data in specified intervals

An ability to work with complex, nested data ingested from JSON files

An ability to work with an array of tables for procedural automation

Full Access

Question # 15

A data engineer is working with two tables. Each of these tables is displayed below in its entirety.

The data engineer runs the following query to join these tables together:

Which of the following will be returned by the above query?

Option A

Option B

Option C

Option D

Option E

Full Access

Question # 16

A new data engineering team team has been assigned to an ELT project. The new data engineering team will need full privileges on the table sales to fully manage the project.

Which of the following commands can be used to grant full permissions on the database to the new data engineering team?

GRANT ALL PRIVILEGES ON TABLE sales TO team;

GRANT SELECT CREATE MODIFY ON TABLE sales TO team;

GRANT SELECT ON TABLE sales TO team;

GRANT USAGE ON TABLE sales TO team;

GRANT ALL PRIVILEGES ON TABLE team TO sales;

Full Access

Question # 17

A data engineer needs to apply custom logic to string column city in table stores for a specific use case. In order to apply this custom logic at scale, the data engineer wants to create a SQL user-defined function (UDF).

Which of the following code blocks creates this SQL UDF?

Full Access

Question # 18

A new data engineering team team has been assigned to an ELT project. The new data engineering team will need full privileges on the table sales to fully manage the project.

Which command can be used to grant full permissions on the database to the new data engineering team?

grant all privileges on table sales TO team;

GRANT SELECT ON TABLE sales TO team;

GRANT SELECT CREATE MODIFY ON TABLE sales TO team;

GRANT ALL PRIVILEGES ON TABLE team TO sales;

Full Access

Question # 19

A data engineer has a Python notebook in Databricks, but they need to use SQL to accomplish a specific task within a cell. They still want all of the other cells to use Python without making any changes to those cells.

Which of the following describes how the data engineer can use SQL within a cell of their Python notebook?

It is not possible to use SQL in a Python notebook

They can attach the cell to a SQL endpoint rather than a Databricks cluster

They can simply write SQL syntax in the cell

They can add %sql to the first line of the cell

They can change the default language of the notebook to SQL

Full Access

Question # 20

In which of the following file formats is data from Delta Lake tables primarily stored?

Delta

CSV

Parquet

JSON

A proprietary, optimized format specific to Databricks

Full Access

Question # 21

Which tool is used by Auto Loader to process data incrementally?

Spark Structured Streaming

Unity Catalog

Checkpointing

Databricks SQL

Full Access

Question # 22

A data engineer is maintaining a data pipeline. Upon data ingestion, the data engineer notices that the source data is starting to have a lower level of quality. The data engineer would like to automate the process of monitoring the quality level.

Which of the following tools can the data engineer use to solve this problem?

Unity Catalog

Data Explorer

Delta Lake

Delta Live Tables

Auto Loader

Full Access

Question # 23

Which of the following describes when to use the CREATE STREAMING LIVE TABLE (formerly CREATE INCREMENTAL LIVE TABLE) syntax over the CREATE LIVE TABLE syntax when creating Delta Live Tables (DLT) tables using SQL?

CREATE STREAMING LIVE TABLE should be used when the subsequent step in the DLT pipeline is static.

CREATE STREAMING LIVE TABLE should be used when data needs to be processed incrementally.

CREATE STREAMING LIVE TABLE is redundant for DLT and it does not need to be used.

CREATE STREAMING LIVE TABLE should be used when data needs to be processed through complicated aggregations.

CREATE STREAMING LIVE TABLE should be used when the previous step in the DLT pipeline is static.

Full Access

Question # 24

A data engineer only wants to execute the final block of a Python program if the Python variable day_of_week is equal to 1 and the Python variable review_period is True.

Which of the following control flow statements should the data engineer use to begin this conditionally executed code block?

if day_of_week = 1 and review_period:

if day_of_week = 1 and review_period = "True":

if day_of_week == 1 and review_period == "True":

if day_of_week == 1 and review_period:

if day_of_week = 1 & review_period: = "True":

Full Access

Question # 25

A data engineer has created a new database using the following command:

CREATE DATABASE IF NOT EXISTS customer360;

In which of the following locations will the customer360 database be located?

dbfs:/user/hive/database/customer360

dbfs:/user/hive/warehouse

dbfs:/user/hive/customer360

More information is needed to determine the correct response

Full Access

Question # 26

A data engineer wants to schedule their Databricks SQL dashboard to refresh every hour, but they only want the associated SQL endpoint to be running when it is necessary. The dashboard has multiple queries on multiple datasets associated with it. The data that feeds the dashboard is automatically processed using a Databricks Job.

Which of the following approaches can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?

They can turn on the Auto Stop feature for the SQL endpoint.

They can ensure the dashboard's SQL endpoint is not one of the included query's SQL endpoint.

They can reduce the cluster size of the SQL endpoint.

They can ensure the dashboard's SQL endpoint matches each of the queries' SQL endpoints.

They can set up the dashboard's SQL endpoint to be serverless.

Full Access

Question # 27

Which of the following describes the relationship between Bronze tables and raw data?

Bronze tables contain less data than raw data files.

Bronze tables contain more truthful data than raw data.

Bronze tables contain aggregates while raw data is unaggregated.

Bronze tables contain a less refined view of data than raw data.

Bronze tables contain raw data with a schema applied.

Full Access

Question # 28

A data engineering team has noticed that their Databricks SQL queries are running too slowly when they are submitted to a non-running SQL endpoint. The data engineering team wants this issue to be resolved.

Which of the following approaches can the team use to reduce the time it takes to return results in this scenario?

They can turn on the Serverless feature for the SQL endpoint and change the Spot Instance Policy to "Reliability Optimized."

They can turn on the Auto Stop feature for the SQL endpoint.

They can increase the cluster size of the SQL endpoint.

They can turn on the Serverless feature for the SQL endpoint.

They can increase the maximum bound of the SQL endpoint's scaling range

Full Access

Question # 29

A data engineer needs to create a table in Databricks using data from their organization's existing SQLite database. They run the following command:

CREATE TABLE jdbc_customer360

USING

OPTIONS (

url "jdbc:sqlite:/customers.db", dbtable "customer360"

)

Which line of code fills in the above blank to successfully complete the task?

autoloader

org.apache.spark.sql.jdbc

sqlite

org.apache.spark.sql.sqlite

Full Access

Weekend Sale - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: mxmas70

Databricks-Certified-Data-Engineer-Associate Databricks Certified Data Engineer Associate Exam Question and Answers

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Quick Links

Why Us

Unlimited Packages

Site Secure

We Accept