You previously trained a model using a training dataset. You want to detect any data drift in the new data collected since the model was trained.
What should you do?
A. Create a new dataset using the new data and a timestamp column and create a data drift monitor that uses the training dataset as a baseline and the new dataset as a target.
B. Create a new version of the dataset using only the new data and retrain the model.
C. Add the new data to the existing dataset and enable Application Insights for the service where the model is deployed.
D. Retrained your training dataset after correcting data outliers and no need to introduce new data.
Data providers add Snowflake objects (databases, schemas, tables, secure views, etc.) to a share us-ing. Which of the following options? Choose 2.
A. Grant privileges on objects to a share via Account role.
B. Grant privileges on objects directly to a share.
C. Grant privileges on objects to a share via a database role.
D. Grant privileges on objects to a share via a third-party role.
Which one of the following is not the key component while designing External functions within Snowflake?
A. Remote Service
B. API Integration
C. UDF Service
D. Proxy Service
Which command manually triggers a single run of a scheduled task (either a standalone task or the root task in a DAG) independent of the schedule defined for the task?
A. RUN TASK
B. CALL TASK
C. EXECUTE TASK
D. RUN ROOT TASK
Performance metrics are a part of every machine learning pipeline, Which ones are not the performance metrics used in the Machine learning?
A. R - (R-Squared)
B. Root Mean Squared Error (RMSE)
C. AU-ROC
D. AUM
Which is the visual depiction of data through the use of graphs, plots, and informational graphics?
A. Data Interpretation
B. Data Virtualization
C. Data visualization
D. Data Mining
Mark the Incorrect statements regarding MIN / MAX Functions?
A. NULL values are skipped unless all the records are NULL
B. NULL values are ignored unless all the records are NULL, in which case a NULL value is returned
C. The data type of the returned value is the same as the data type of the input values
D. For compatibility with other systems, the DISTINCT keyword can be specified as an argument for MIN or MAX, but it does not have any effect
Which command is used to install Jupyter Notebook?
A. pip install jupyter
B. pip install notebook
C. pip install jupyter-notebook
D. pip install nbconvert
Select the Correct Statements regarding Normalization? Choose 2.
A. Normalization technique uses minimum and max values for scaling of model.
B. Normalization technique uses mean and standard deviation for scaling of model.
C. Scikit-Learn provides a transformer RecommendedScaler for Normalization.
D. Normalization got affected by outliers.
Which are the following additional Metadata columns Stream contains that could be used for creating Efficient Data science Pipelines and helps in transforming only the New/Modified data only? Choose 3.
A. METADATA$ACTION
B. METADATA$FILE_ID
C. METADATA$ISUPDATE
D. METADATA$DELETE
E. METADATA$ROW_ID
All Snowpark ML modeling and preprocessing classes are in the ________ namespace?
A. snowpark.ml.modeling
B. snowflake.sklearn.modeling
C. snowflake.scikit.modeling
D. snowflake.ml.modeling
Consider a data frame df with 10 rows and index [ 'r1', 'r2', 'r3', 'row4', 'row5', 'row6', 'r7', 'r8', 'r9', 'row10']. What does the aggregate method shown in below code do?
g = df.groupby(df.index.str.len())
A. aggregate({'A':len, 'B':np.sum})
B. Computes Sum of column A values
C. Computes length of column A
D. Computes length of column A and Sum of Column B values of each group
E. Computes length of column A and Sum of Column B values
Which of the following cross validation versions may not be suitable for very large datasets with hundreds of thousands of samples?
A. k-fold cross-validation
B. Leave-one-out cross-validation
C. Holdout method
D. All of the above
Consider a data frame df with 10 rows and index [ 'r1', 'r2', 'r3', 'row4', 'row5', 'row6', 'r7', 'r8', 'r9', 'row10']. What does the expression g = df.groupby(df.index.str.len()) do?
A. Groups df based on index values
B. Groups df based on length of each index value
C. Groups df based on index strings
D. Data frames cannot be grouped by index values. Hence it results in Error.
How do you handle missing or corrupted data in a dataset?
A. Drop missing rows or columns
B. Replace missing values with mean/median/mode
C. Assign a unique category to missing values
D. All of the above