What is a characteristic of stop words?
A. Used in term frequency analysis
B. Include words such as "a", "an", and "the"
C. Meaningful words requiring a parser to stop and examine them
D. Don't occur often in text
What is a property of a good color model for ordinal data?
A. Uses a rainbow-like color map for distinction of categories
B. Uses a rainbow-like color map for ease of display and printing
C. Uses perceptually ordinal colors with just-noticeable increments
D. Uses perceptually ordinal colors with linear, perceptual increments
Why would a company decide to use HBase to replace an existing relational database?
A. It is required for performing ad-hoc queries.
B. Varying formats of input data requires columns to be added in real time.
C. The company's employees are already fluent in SQL.
D. Existing SQL code will run unchanged on HBase.
Which graph structure would best model the relationship between job seekers and employers?
A. Bipartite
B. Weighted
C. Directed acyclic
D. Ranked
What is a key beneficial characteristic of the Random Forest algorithm?
A. Provides and explanatory model
B. Distinguishes categorical from continuous variables
C. Support for unstructured data
D. Resiliency to complex, non-linear variable interactions
In which step in the visualization lifecycle would you determine how the raw data is stored?
A. Visualization Planning
B. Data Preparation
C. Visualization Building
D. Discovery
What runs more efficiently because of Apache Tez?
A. Pig and Hive
B. Hive and HBase
C. Yarn and Spark
D. All MapReduce jobs
What is an intended application of the MapReduce framework?
A. Processing can be broken into smaller pieces
B. Processing a large number of small files
C. Processing in real time is required
D. Processing a small subset of data
Which problem type is best suited for simulation?
A. One with a few. non-random input variables
B. One that has a closed-form solution
C. One with numerous, non-random Input-variables
D. One that compares "what-if scenarios
What is a characteristic of the trigram language model?
A. Based on the second-order Markov process
B. Equivalent to trigram hidden Markov models
C. Uses smoothing to reduce the high dimensionality in text
D. Can be used for part-of -speech tagging
Assuming the node index starts at 1, what is the out-degree of node 3 in the adjacency matrix shown? Refer to the exhibit.
A. 0
B. 1
C. 2
D. 3
What best describes tokenization?
A. Adding lexical relations to the raw text
B. Converting text into the list of terms
C. Converting text into a list of unique terms
D. Reducing variant forms of tokens to their base forms
What is an effective use of color in visualization?
A. Use self-explanatory colors so a legend is unnecessary
B. Maximize use of color to make a more lasting impression
C. Use high contrast colors such as red and blue
D. Minimize use of color except for emphasis
If two of the communities are re-designated to be one community, how does that change the network characteristics?
Refer to the exhibit.
A. Neighborhood overlap would increase
B. Network diameter would decrease
C. Modularity would increase
D. Modularity would decrease
After a client submits a job request to the YARN ResourceManager, what happens next?
A. The scheduler allocates a container to run an ApplicationMaster
B. The ResourceManager allocates containers to run map and reduce tasks
C. The Resource Manager requests load data from the NodeManagers
D. The ApplicationManager starts an ApplicationMaster