DSA-C02 SnowPro Advanced: Data Scientist Certification Exam sample Question + Exam 2025 Practice Exam Dumps

Question # 4

What Can Snowflake Data Scientist do in the Snowflake Marketplace as Provider?

Publish listings for free-to-use datasets to generate interest and new opportunities among the Snowflake customer base.

Publish listings for datasets that can be customized for the consumer.

Share live datasets securely and in real-time without creating copies of the data or im-posing data integration tasks on the consumer.

Eliminate the costs of building and maintaining APIs and data pipelines to deliver data to customers.

Full Access

Question # 5

Which command manually triggers a single run of a scheduled task (either a standalone task or the root task in a DAG) independent of the schedule defined for the task?

RUN TASK

CALL TASK

EXECUTE TASK

RUN ROOT TASK

Full Access

Question # 6

What is the formula for measuring skewness in a dataset?

MEAN - MEDIAN

MODE - MEDIAN

(3(MEAN - MEDIAN))/ STANDARD DEVIATION

(MEAN - MODE)/ STANDARD DEVIATION

Full Access

Question # 7

Which ones are the key actions in the data collection phase of Machine learning included?

Label

Ingest and Aggregate

Probability

Measure

Full Access

Answer:

A, B

Explanation:

Explanation

The key actions in the data collection phase include:

Label: Labeled data is the raw data that was processed by adding one or more meaningful tags so that a model can learn from it. It will take some work to label it if such information is missing (manually or automatically).

Ingest and Aggregate: Incorporating and combining data from many data sources is part of data collection in AI.

Data collection

Collecting data for training the ML model is the basic step in the machine learning pipeline. The predictions made by ML systems can only be as good as the data on which they have been trained. Following are some of the problems that can arise in data collection:

Inaccurate data. The collected data could be unrelated to the problem statement.

Missing data. Sub-data could be missing. That could take the form of empty values in columns or missing images for some class of prediction.

Data imbalance. Some classes or categories in the data may have a disproportionately high or low number of corresponding samples. As a result, they risk being under-represented in the model.

Data bias. Depending on how the data, subjects and labels themselves are chosen, the model could propagate inherent biases on gender, politics, age or region, for example. Data bias is difficult to detect and remove.

Several techniques can be applied to address those problems:

Pre-cleaned, freely available datasets. If the problem statement (for example, image classification, object recognition) aligns with a clean, pre-existing, properly formulated dataset, then take ad-vantage of existing, open-source expertise.

Web crawling and scraping. Automated tools, bots and headless browsers can crawl and scrape websites for data.

Private data. ML engineers can create their own data. This is helpful when the amount of data required to train the model is small and the problem statement is too specific to generalize over an open-source dataset.

Custom data. Agencies can create or crowdsource the data for a fee.

Question # 8

Which one is incorrect understanding about Providers of Direct share?

A data provider is any Snowflake account that creates shares and makes them available to other Snowflake accounts to consume.

As a data provider, you share a database with one or more Snowflake accounts.

You can create as many shares as you want, and add as many accounts to a share as you want.

If you want to provide a share to many accounts, you can do the same via Direct Share.

Full Access

Question # 9

Consider a data frame df with 10 rows and index [ 'r1', 'r2', 'r3', 'row4', 'row5', 'row6', 'r7', 'r8', 'r9', 'row10']. What does the expression g = df.groupby(df.index.str.len()) do?

Groups df based on index values

Groups df based on length of each index value

Groups df based on index strings

Data frames cannot be grouped by index values. Hence it results in Error.

Full Access

Question # 10

Which Python method can be used to Remove duplicates by Data scientist?

remove_duplicates()

duplicates()

drop_duplicates()

clean_duplicates()

Full Access

Question # 11

Select the Correct Statements regarding Normalization?

Normalization technique uses minimum and max values for scaling of model.

Normalization technique uses mean and standard deviation for scaling of model.

Scikit-Learn provides a transformer RecommendedScaler for Normalization.

Normalization got affected by outliers.

Full Access

Question # 12

Which metric is not used for evaluating classification models?

Recall

Accuracy

Mean absolute error

Precision

Full Access

Question # 13

Consider a data frame df with 10 rows and index [ 'r1', 'r2', 'r3', 'row4', 'row5', 'row6', 'r7', 'r8', 'r9', 'row10']. What does the aggregate method shown in below code do?

g = df.groupby(df.index.str.len())

g.aggregate({'A':len, 'B':np.sum})

Computes Sum of column A values

Computes length of column A

Computes length of column A and Sum of Column B values of each group

Computes length of column A and Sum of Column B values

Full Access

Question # 14

Which tools helps data scientist to manage ML lifecycle & Model versioning?

MLFlow

Pachyderm

Albert

CRUX

Full Access

Answer:

A, B

Explanation:

Explanation

Model versioning in a way involves tracking the changes made toan ML model that has been previously built. Put differently, it is the process of making changes to the configurations of an ML Model. From another perspective, we can see model versioning as a feature that helps Machine Learning Engineers, Data Scientists, and related personnel create and keep multiple versions of the same model.

Think of it as a way of taking notes of the changes you make to the model through tweaking hyperparameters, retraining the model with more data, and so on.

In model versioning, a number of things need to be versioned, to help us keep track of important changes. Iâ€™ll list and explain them below:

Implementation code: From the early days of model building to optimization stages, code or in this case source code of the model plays an important role. This code experiences significant changes during optimization stages which can easily be lost if not tracked properly. Because of this, code is one of the things that are taken into consideration during the model versioning process.

Data: In some cases, training data does improve significantly from its initial state during model op-timization phases. This can be as a result of engineering new features from existing ones to train our model on. Also there is metadata (data about your training data and model) to consider versioning. Metadata can change different times over without the training data actually changing. We need to be able to track these changes through versioning

Model: The model is a product of the two previous entities and as stated in their explanations, an ML model changes at different points of the optimization phases through hyperparameter setting, model artifacts and learning coefficients. Versioning helps take record of the different versions of a Machine Learning model.

MLFlow & Pachyderm are the tools used to manage ML lifecycle & Model versioning.

Question # 15

Select the correct mappings:

I. W Weights or Coefficients of independent variables in the Linear regression model --> Model Pa-rameter

II. K in the K-Nearest Neighbour algorithm --> Model Hyperparameter

III. Learning rate for training a neural network --> Model Hyperparameter

IV. Batch Size --> Model Parameter

I,II

I,II,III

III,IV

II,III,IV

Full Access

Answer:

Explanation:

Explanation

Hyperparameters in Machine learning are those parameters that are explicitly defined by the user to control the learning process. These hyperparameters are used to improve the learning of the model, and their values are set before starting the learning process of the model.

What are hyperparameters?

In Machine Learning/Deep Learning, a model is represented by its parameters. In contrast, a training process involves selecting the best/optimal hyperparameters that are used by learning algorithms to provide the best result. So, what are these hyperparameters? The answer is, "Hyperparameters are defined as the parameters that are explicitly defined by the user to control the learning process."

Here the prefix "hyper" suggests that the parameters are top-level parameters that are used in con-trolling the learning process. The value of the Hyperparameter is selected and set by the machine learning engineer before the learning algorithm begins training the model. Hence, these are external to the model, and their values cannot be changed during the training process.

Some examples of Hyperparameters in Machine Learning

Â· The k in kNN or K-Nearest Neighbour algorithm

Â· Learning rate for training a neural network

Â· Train-test split ratio

Â· Batch Size

Â· Number of Epochs

Â· Branches in Decision Tree

Â· Number of clusters in Clustering Algorithm

Model Parameters:

Model parameters are configuration variables that are internal to the model, and a model learns them on its own. For example, W Weights or Coefficients of independentvariables in the Linear regression model. or Weights or Coefficients of independent variables in SVM, weight, and biases of a neural network, cluster centroid in clustering. Some key points for model parameters are as follows:

They are used by the model for making predictions.

Â· They are learned by the model from the data itself

Â· These are usually not set manually.

Â· These are the part of the model and key to a machine learning Algorithm.

Model Hyperparameters:

Hyperparameters are those parameters that are explicitly defined by the user to control the learning process. Some key points for model parameters are as follows:

These are usually defined manually by the machine learning engineer.

One cannot know the exact best value for hyperparameters for the given problem. The best value can be determined either by the rule of thumb or by trial and error.

Some examples of Hyperparameters are the learning rate for training a neural network, K in the KNN algorithm.

Question # 16

Which object records data manipulation language (DML) changes made to tables, including inserts, updates, and deletes, as well as metadata about each change, so that actions can be taken using the changed data of Data Science Pipelines?

Task

Dynamic tables

Stream

Summer Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: myex65

MyExamCollection

DSA-C02 SnowPro Advanced: Data Scientist Certification Exam Question and Answers

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Quick Links

Why Us

Unlimited Packages

Site Secure

We Accept