Weekend Sale - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: mxmas70

Home > CompTIA > CompTIA Data+ > DY0-001

DY0-001 CompTIA DataX Exam Question and Answers

Question # 4

Which of the following layer sets includes the minimum three layers required to constitute an artificial neural network?

A.

An input layer, a pooling layer, and an output layer

B.

An input layer, a convolutional layer, and a hidden layer

C.

An input layer, a hidden layer, and an output layer

D.

An input layer, a dropout layer, and a hidden layer

Full Access
Question # 5

A data scientist observes findings that indicate that as electrical grids in a country become more and more connected over time, the frequency of brownouts and blackouts in total decrease, and the frequency of major brownouts and blackouts increase. Which of the following distribution metrics could best be identified?

A.

Scale axis magnitudes

B.

Kurtosis

C.

Skewness

D.

Normality

Full Access
Question # 6

A data scientist needs to analyze a company's chemical businesses and is using the master database of the conglomerate company. Nothing in the data differentiates the data observations for the different businesses. Which of the following is the most efficient way to identify the chemical businesses' observations?

A.

Ingest the data from all of the hard drives and perform exploratory data analysis to identify which business is responsible for chemical operations.

B.

Perform analysis on all of the data and create a summary report on the results relevant to chemical operations.

C.

Consult with the business team to identify which sites are responsible for chemical operations and ingest only the relevant data for analysis.

D.

Ingest data from the hard drive containing the most data and present sample results on the chemical operations.

Full Access
Question # 7

A data analyst wants to use compression on an analyzed data set and send it to a new destination for further processing. Which of the following issues will most likely occur?

A.

Library dependency will be missing.

B.

Server CPU usage will be too high.

C.

Operating system support will be missing.

D.

Server memory usage will be too high.

Full Access
Question # 8

An analyst is examining data from an array of temperature sensors and sees that one sensor consistently returns values that are much higher than the values from the other sensors. Which of the following terms best describes this type of error?

A.

Synthetic

B.

Systematic

C.

Heteroskedastic

D.

Idiosyncratic

Full Access
Question # 9

Which of the following environmental changes is most likely to resolve a memory constraint error when running a complex model using distributed computing?

A.

Converting an on-premises deployment to a containerized deployment

B.

Migrating to a cloud deployment

C.

Moving model processing to an edge deployment

D.

Adding nodes to a cluster deployment

Full Access
Question # 10

A data analyst wants to find the latitude and longitude of a mailing address. Which of the following is the best method to use?

A.

One-hot encoding

B.

Binning

C.

Geocoding

D.

Imputing

Full Access
Question # 11

A data scientist is building a proof of concept for a commercialized machine-learning model. Which of the following is the best starting point?

A.

Literature review

B.

Model performance evaluation

C.

Hyperparameter tuning

D.

Model selection

Full Access
Question # 12

Which of the following is the layer that is responsible for the depth in deep learning?

A.

Convolution

B.

Dropout

C.

Pooling

D.

Hidden

Full Access
Question # 13

A data scientist is attempting to identify sentences that are conceptually similar to each other within a set of text files. Which of the following is the best way to prepare the data set to accomplish this task after data ingestion?

A.

Embeddings

B.

Extrapolation

C.

Sampling

D.

One-hot encoding

Full Access
Question # 14

A data scientist has built an image recognition model that distinguishes cars from trucks. The data scientist now wants to measure the rate at which the model correctly identifies a car as a car versus when it misidentifies a truck as a car. Which of the following would best convey this information?

A.

Confusion matrix

B.

AUC/ROC curve

C.

Box plot

D.

Correlation plot

Full Access
Question # 15

A data scientist is presenting the recommendations from a monthslong modeling and experiment process to the company’s Chief Executive Officer. Which of the following is the best set of artifacts to include in the presentation?

A.

Methods, data overview, results, recommendations, and charts

B.

Results, recommendations, justifications, and clear charts

C.

Recommendation, charts, justifications, code reviews, and results

D.

Methodology, code snippets, findings, data tables, and p-values

Full Access
Question # 16

Under perfect conditions, E. coli bacteria would cover the entire earth in a matter of days. Which of the following types of models is the best for explaining this type of growth?

A.

Linear

B.

Logarithmic

C.

Polynomial

D.

Exponential

Full Access
Question # 17

The most likely concern with a one-feature, machine-learning model is high error due to:

A.

bias

B.

dimensionality

C.

variance

D.

probability

Full Access
Question # 18

A company created a very popular collectible card set. Collectors attempt to collect the entire set, but the availability of each card varies, because some cards have higher production volumes than others. The set contains a total of 12 cards. The attributes of the cards are shown.

The data scientist is tasked with designing an initial model iteration to predict whether the animal on the card lives in the sea or on land, given the card's features: Wrapper color, Wrapper shape, and Animal.

Which of the following is the best way to accomplish this task?

A.

ARIMA

B.

Linear regression

C.

Association rules

D.

Decision trees

Full Access
Question # 19

A data scientist would like to model a complex phenomenon using a large data set composed of categorical, discrete, and continuous variables. After completing exploratory data analysis, the data scientist is reasonably certain that no linear relationship exists between the predictors and the target. Although the phenomenon is complex, the data scientist still wants to maintain the highest possible degree of interpretability in the final model. Which of the following algorithms best meets this objective?

A.

Artificial neural network

B.

Decision tree

C.

Multiple linear regression

D.

Random forest

Full Access
Question # 20

Which of the following describes the appropriate use case for PCA?

A.

Dimensionality reduction

B.

Classification

C.

Regression

D.

Recommendation

Full Access
Question # 21

A data scientist is designing a real-time machine-learning model that classifies a user based on initial behavior. The run times of these models are provided in the following table:

Which of the following models should the data scientist recommend for deployment?

A.

XGBoost

B.

Random forest

C.

Decision trees

D.

Artificial neural network

Full Access
Question # 22

A data scientist receives an update on a business case about a machine that has thousands of error codes. The data scientist creates the following summary statistics profile while reviewing the logs for each machine:

| Number of machines observed | 3,000,000

| Number of unique error codes observed | 19,000

| Median number of unique codes per machine | 7

| Median number of error transactions | 45

Which of the following is the most likely concern with respect to data design for model ingestion?

A.

Sparse matrix

B.

Granularity misalignment

C.

Insufficient features

D.

Multivariate outliers

Full Access
Question # 23

A data scientist is developing a model to predict the outcome of a vote for a national mascot. The choice is between tigers and lions. The full data set represents feedback from individuals representing 17 professions and 12 different locations. The following rank aggregation represents 80% of the data set:

(Screenshot shows survey rankings for just two professions and a few locations, all voting for "Tigers")

Which of the following is the most likely concern about the model's ability to predict the outcome of the vote?

A.

Interpolated data

B.

Extrapolated data

C.

In-sample data

D.

Out-of-sample data

Full Access
Question # 24

A model's results show increasing explanatory value as additional independent variables are added to the model. Which of the following is the most appropriate statistic?

A.

Adjusted R²

B.

p value

C.

χ²

D.

R²

Full Access
Question # 25

A data scientist is working with a data set that covers a two-year period for a large number of machines. The data set contains:

    Machine system ID numbers

    Sensor measurement values

    Daily timestamps for each machine

The data scientist needs to plot the total measurements from all the machines over the entire time period. Which of the following is the best way to present this data?

A.

Scatter plot

B.

Line plot

C.

Histogram

D.

Box-and-whisker plot

Full Access