What is a core deliverable at the end of the analytic project?
A. An implemented database design
B. A whitepaper describing the project and the implementation
C. A presentation for project sponsors
D. The training materials
Refer to the Exhibit.
In the Exhibit. For effective visualization, what is the chart's primary flaw?
A. The use of 3 dimensions.
B. The slanting of axis labels.
C. The location of the legend.
D. The order of the columns.
A disk drive manufacturer has a defect rate of less than 1.0% with 98% confidence. A quality assurance team samples 1000 disk drives and finds 14 defective units. Which action should the team recommend?
A. The manufacturing process should be inspected for problems.
B. A larger sample size should be taken to determine if the plant is functioning properly
C. A smaller sample size should be taken to determine if the plant is functioning properly
D. The manufacturing process is functioning properly and no further action is required.
Refer to the exhibit.
In the exhibit, the x-axis represents the derived probability of a borrower defaulting on a loan. Also in the exhibit, the pink represents borrowers that are known to have not defaulted on their loan, and the blue represents borrowers that are known to have defaulted on their loan.
Which analytical method could produce the probabilities needed to build this exhibit?
A. Logistic Regression
B. Linear Regression
C. Discriminant Analysis
D. Association Rules
Before building an ARMA model, how can you determine if the time series is weakly stationary?
A. Constant variance around a constant mean is apparent
B. Mean of the series is close to 0
C. Series is normally distributed
D. No trend component is apparent
What describes a true property of Logistic Regression method?
A. It is robust with redundant variables and correlated variables.
B. It handles missing values well.
C. It works well with discrete variables that have many distinct values.
D. It works well with variables that affect the outcome in a discontinuous way.
On analyzing your time series data you suspect that the data represented as y1, y2, y3, ... , yn-1, yn may have a trend component that is quadratic in nature. Which pattern of data will indicate that the trend in
the time series data is quadratic in nature?
A. (y3-y2) ?(y2-y1) = .........= (yn-yn-1)-(yn-1-yn-2)
B. (y2-y1) = (y3-y2) = ....... = (yn-yn-1)
C. ((y2-y1) /y1 ) * 100% = .......((yn-yn-1)/yn-1) * 100%
D. (y4-y2) ?(y3-y1) = .........= (yn-yn-2)-(yn-1-yn-3)
In addition to less data movement and the ability to use larger datasets in calculations, what is a benefit of analytical calculations in a database?
A. quicker time to insight
B. more efficient handling of categorical values
C. improved connections between disparate data sources
D. full use of data aggregation functionality
You are using MADlib for Linear Regression analysis. Which value does the statement return? SELECT (linregr(depvar, indepvar)).r2 FROM zeta1;
A. Goodness of fit
B. Coefficients
C. Standard error
D. P-value
Based on the exhibit, the table shows the values for the input Boolean attributes A, B, and
C. In addition, the exhibit shows the values for the output attribute "class".
Which decision tree is valid for the data?
A. Tree A
B. Tree B
C. Tree C
D. Tree D
Which word or phrase completes the statement?
Theater actor is to "Artistic and Expressive" as Data Scientist is to ________________
A. "Communicative and Collaborative"
B. "Introverted and Technical"
C. "Logical and Steadfast"
D. "Independent and Intelligent"
Which word or phrase completes the statement?
Business Intelligence is to ad-hoc reporting and dashboards as Data Science is to ______________ .
A. Optimization and Predictive Modeling
B. Alerts and Queries
C. Structured Data and Data Sources
D. Sales and profit reporting
In MADlib what does MAD stand for?
A. Magnetic, Agile, Deep
B. Machine Learning, Algorithms for Databases
C. Mathematical Algorithms for Databases
D. Modular, Accurate, Dependable
The web analytics team uses Hadoop to process access logs. They now want to correlate this data with structured user data residing in a production single-instance JDBC database. They collaborate with the production team to import the data into Hadoop. Which tool should they use?
A. Sqoop
B. Pig
C. Chukwa
D. Scribe
Refer to the exhibit.
You are asked to write a report on how specific variables impact your client's sales using a data set provided to you by the client. The data includes 15 variables that the client views as directly related to sales, and you are restricted to these variables only.
After a preliminary analysis of the data, the following findings were made:
1.
Multicollinearity is not an issue among the variables
2.
Only three variables--A, B, and C--have significant correlation with sales
You build a linear regression model on the dependent variable of sales with the independent variables of
A, B, and C. The results of the regression are seen in the exhibit.
Which interpretation is supported by the analysis?
A. Variables A, B, and C are significantly impacting sales, but are not effectively estimating sales
B. Variables A, B, and C are significantly impacting sales and are effectively estimating sales
C. Due to the R2 of 0.10, the model is not valid ?the linear regression should be re-run with all 15 variables forced into the model to increase the R2
D. Due to the R2 of 0.10, the model is not valid ?a different analytical model should be attempted