Find it source: Data Analysis

Data Analysis

Module Description

This course reviews and expands upon core topics in probability and statistics through the study and practice of data anlysis

It is a free software programming language.	R-programming
Which of the following is NOT a goal in data mining?	collecting data
It transforms data into actionable intelligence for business purposes.	Business Intelligence
It has the goal of discovering useful information to support decision making.	data analysis
Which of the following type of text is processed in text analytics?	unstructured
The following are artifacts used in data analysis EXCEPT:	ANOVA
___________ uses artifacts to present data visually.	data visualization
It extracts meaningful numerical indices from information and make it available to statistical and machine learning.	Text analytics
It includes identifying groups of data records.	cluster analysis
Which of the following data mining techniques is predictive?	classification
The goal is to transform raw data into understandable business information.	Data mining
What is the process of deriving useful information from text?	Text Analytics
_____________ includes identifying groups of data record.	Cluster analysis
It is used in organization’s strategic and tactical business decision making.	business intelligence
It is a powerful tool that shows the network of data.	Knime
What programming language doe Orange use?	python
The following processes are used in data analysis EXCEPT:	collecting
Which of the following is NOT a method used in data analysis?	Statistics Analytics
It is a method for discovering patterns in large data sets.	Data Mining
It makes complex data more understandable and usable.	data visualization
The two sets If A={ 2,3} B={4,5} are said to be	disjoint
Which of the matrices is singular?	A
What is the focus of data science?	manipulate data efficiently and effectively
What is an organized collection of information and set of information used to manage that operation?	ADT
What is the correct meaning of ADT?	Abstract Data Type
ML means:	Machine Learning
Addition and subtraction of matrices only is possible if two are more matrices.	Have same sizes
3A + B =	d.
Which is NOT a characteristic feature of data structure?	It contains a fixed structure.
It refers to a data structure that grows and shrinks at execution time.	dynamic
If A={ 2,3} B={4,5},which of the following is a Cartesian product of the two sets?	{ (3,4) (3,5) (2,4 ) {2,5) }
What is the earlier name for data science?	datalogy
Which of the following is the transpose of B?	a
Which of the following is TRUE?	A + B = B+ A
What is a data structure that has a fixed size?	static
Matrix B is	invertible
The intersection of the two sets A={ 2,3} B={4,5} is a What is the size of the product of a 5x 6 and a 6x 8 matrices?	null set 5x 8
_______________ is a data structure that every component has a unique processor and succesor.	linear
An array is a good example of _________data structure.	Static
Addition and subtraction of matrices only is possible if two are more matrices.	Have same sizes
Which of the following is TRUE?	A + B = B+ A
The goal is to transform raw data into understandable business information.	Data mining
If A= { x/x is a distinct letter in the word "MATHEMATICS"} AND B={x/x is a distinct letter in the word "STATISTICS"} then their intersection is	{A,C,I,S,T}
It is a process of finding the computational complexity of algorithms.	analysis of algorithms
The function describing the performance of an algorithm is usually an upper bound determined from ______inputs.	worst case
Which of the following data mining techniques is predictive?	classification
In α =babaa β =a^6b^5bb, what is the length of the concatenation of the two strings?	18
He coined the term “analysis of algorithms”.	Donald Knuth
Another term for an empty set.	null
What is the size of the product of a 5x 6 and a 6x 8 matrices?	5x 8
It is a free software programming language.	R-programming
It is popular among financial data analysts.	Knime
It is used to discover patterns in large data sets	Data mining
Earlier name for data science.	datalogy
Which of the following is a predictive data mining technique?	regression
Another term for text analytics.	text mining
Algorithm analysis is an important part of a broader_____________.	computational complexity theory
it is a perfect software for machine learning.	orange
It is a process of finding the computational complexity of algorithms.	analysis of algorithms
The constant multiplicative factor in which algorithms are related are_______ constants.	hidden
The range in R={ (3,3), (3,6), (5,5),(5,10),(6.12)} is a binary relation in R is	{3,5,6,10,12}
It is used in organization’s strategic and tactical business decision making.	business intelligence
Refers to using tools of statistics to present data visually.	data visualization
Null strings are indicated by	λ
Which of the following is the transpose of B?	a.
It offers a way to examine trends from collected data and derive insights from it.	Business Intelligence
The following are softwares used in data mining EXCEPT	SPSS
If R= { (3,3), (3,6), (5,5),(5,10),(6.12)} is a binary relation in R which the domain is	{3,5,6}
It includes identifying groups of data records	cluster analysis
It is a powerful tool that shows the network of data.	Knime
It relates the length of an algorithm to the number of storage location it uses.	space complexity
There are how many data mining techniques?	7
Which of the matrices is singular?	A
A special type of function where the domain is a set of consecutive integers.	sequence
It is used for prototyping in Rapid miner.	studio
An example of an abstract computer.	Turing machine
The process of inspecting,cleansing,transforming and modelling data with the goal of discovering useful information.	data analysis
It makes complex data more understandable and usable.	data visualization
The symbol used to indicate strings with no elements.	λ
What type of text are processed in Text analytics?	unstructured
It is a theoretical classification that estimates and anticipates the increase increase in running time for algorithms.	run time analysis
The following are large inputs EXCEPT	Big beta notation
It relates the length of an algorithm’s input to the number of steps it takes.	time complexity
Matrix B is	invertible
The sets A= { x/x is a distinct letter in the word "MATHEMATICS"} and B={x/x is a distinct letter in the word "STATISTICS"} , the two sets are	Equal (not sure)
What programming language is used in Rapid miner?	Java
The product of a 2x5 and 5x3 matrices is a ______matrix	2x3
A matrix that has the same number of rows and columns is called	Square
“ All models are wrong but some are useful “	George E. P. Box
What is a great example of data product?	google maps
He pointed out that until 2003 ,all of mankind had generated just 5 exabytes of data	Eric Schmidt
A new phenomenon for the explosion of _________data	interaction
He said that “ In mathematics the art of proposing a question must be held of higher value than solving it”.	Georg Cantor
The creation of data from varied sources and its quantification into information.	Datafication
How many bytes of data are generated every two days in today's world?	5 exabytes
It refers to well based theories and sound business judgement.	Data Science
PAW means____________.	Predictive Analytics World
Exabyte means ________bytes	billion billion
The developer of farmville, a famous game in the internet.	Zynga Incorporated
The person who said that “ The future is not google-able”.	William Gibson
It shows a high correlation between the incidence of flu and searches about flu on google.	Google Flu trends
It expands available data enormously since there is so much more text being generated than numbers.	Text mining
The creation of data from varied sources and its qualification into information.	datafication
These are the data skills that a good data scientist need to cultivate EXCEPT	speaking
Which is Not an interaction data?	data base
The following are the 3V's of big data EXCEPT	veracity
IOT means	Internet of things
The explosion of _______data is the main reason why every 2 days 5 exabytes of data are generated.	interaction
Which of the following is TRUE when a distribution is normal?	Mean=Median=Mode
What is the mean for a standard normal distribution?	0
Empirical rule for a normal distribution that is 3 standard deviations above and below the mean covers ______% of the data.	99.7
Empirical rule for a normal distribution that is 2 standard deviations above and below the mean is ________% of data.	95
A bell shaped curve that is symmetric about a vertical line.	normal distribution
A distribution where large distribution are displayed.	Grouped frequency distribution
A survey of 100 consumers said that the price charged for a kilo of rice could be approximated by a normal distribution with a mean of 35 and a standard deviation of 4.How many of them lie between 27 and 43?	95
What percent of data will lie within 2 standard deviation of the mean?	95
What range of values 3 SD below and above the mean in a normal distribution if the mean is 10 and standard deviation is 2?	4-16
Lists the percent of data in each distribution.	relative frequency distribution
The normal distribution with a mean of 0 and standard deviation of 1.	Standard
The area of the standard normal curve to the right of z=0.82 is _______.	0.206
What is the value of the mean if a score of 110 is 3 standard deviation above the mean?	95
A score of 50 lies 2 standard deviations above a mean of 30.What is the value of the standard deviation?	10
A graph used to indicate intervals in a frequency distribution is refereed to as a______________.	Histogram
What range of values lie between 3 standard deviations above and below the mean if the mean is 80 and the standard deviation is 3?	71-89
A bell-shaped distribution that is symmetric about a vertical line?	normal
A survey of 100 consumers said that the price charged for a kilo of rice could be approximated by a normal distribution with a mean of 35 and a standard deviation of 4.How many are less than 39?	84
Empirical rule for a normal distribution lie ______% of data with 1 standard deviation below and above the mean.	68
What is the value of the standard deviation in a standard normal distribution?	1
What is the value of the mean if a score of 110 is 3 standard deviation above the mean?	95
Which is NOT a value of r ?	1.02
A vegetable distributor knows that during the month of August ,the weights of tomatoes are normally distributed with a mean of 0.61 lb and a standard deviation of 0.15 lb. What percent of the tomatoes weigh less than 0.71 lb?	84
The equation of the _______line predicts the value of Y given X.	Regression
What range of values lie between 3 standard deviations above and below the mean if the mean is 80 and the standard deviation is 3?	71-89
Data is NOT information unless we add_________.	analytics
The creation of a data product contains 3 components EXCEPT	time
What increases data volume?	velocity
A negative correlation exists when___________.	x increases y decreases
The following are elements in an analytic plan EXCEPT	graphs
As of 2014,there are _______million of tweets a day.	500
The value of X in the regression equation Y= 1.24 X + 6.9 if Y=13.1 is	5
According to Hilary Mason which is NOT a skill that a good data scientist must cultivate.	critical thinking
If there are 103 scores the median is equal to the _____ranked score.	52nd
A vegetable distributor knows that during the month of August ,the weights of tomatoes are normally distributed with a mean of 0.61 lb and a standard deviation of 0.15 lb. How many can be expected to weigh between 0.31 to 0.91 in a shipment of 4500 tomatoes.	4275
It partitions a ranked data into four equal groups.	quartile
A perfect positive correlation coefficient is equal to	1
In the equation of the regression line represented by Y= 1.24 X + 6.9 if X=2 then Y =?	9.38
It refers to the degree of relationship between two variables?	Correlation
Who said that "The future is not google-able " ?	William Gillason
He is someone who asks interesting questions on formal and informal theory.	data scientist
The number that occurs most frequently is called________.	Mode
If the standard deviation of a distribution is 3.5, the variance is	12.25
The difference between the highest and lowest value.	range
It list the percent of data in a distribution.	relative frequency distribution
The normal distribution with a mean of 0 and standard deviation of 1.	Standard
Which is NOT a correct correlation Coefficient?	1.2
A bell-shaped distribution that is symmetric about a vertical line?	normal
The middle-most value in a ranked list of numbers.	median
Which of the following is TRUE when a distribution is normal?	Mode
It expands available data enormously.	text mining
A bell-shaped distribution that is symmetric about a vertical line.	normal
What percent of data will lie within 2 standard deviation of the mean?	95
Data involving two variables.	bivariate
The quantification of data into information.	Datafication
He coined the term "data scientist"	DJ Patil
The major outcome of correlation.	prediction
A vegetable distributor knows that during the month of August ,the weights of tomatoes are normally distributed with a mean of 0.61 lb and a standard deviation of 0.15 lb. How many can be expected to weigh more than 0.31 lb in a shipment of 6000 tomatoes.	150
The method of correlation used for ranked score is ________.	Spearman rho
A data having the same number of occurrence in scores is said to be	no mode
Example of a data product.	google map
A positive z-score means that the score is	Higher than the mean
A graph that is used to indicate frequency distribution.	histogram
The area of the standard normal curve to the right of z=0.82 is _______.	0.206
Which of the following is used as a method for Correlation?	Pearson r
A score of 50 lies 2 standard deviations above a mean of 30.What is the value of the standard deviation?	10
The score NOT easily affected by extreme values.	Median
On an examination given to 1000 students, Jef’s score of 80 was higher than the score of 480 students who took the exam. What is the percentile for Jef’s score?	48th
The classification table that XL Stat can display.	confusion matrix
Which pair belongs to the same family of models called GLM ?	Logistic and linear regression
Classification table is also called ________	confusion matrix
It corresponds to the case where the dependent variable has more than 2 categories.	multinomial logit model
It enables the performance of a model and enables a comparison to be made with other models.	ROC
He proposed the use of a penalized likehood function.	Firth
Which belong to the GLM family?	logistic and linear
The proportion of a well-defined classified positive events.	sensitivity
What does GLM means?	Generalized Linear model
The method that does NOT require t he assumption that the parameters are normally distributed.	profile likehood
To estimate the parameters of the model ,the ________function is maximized.	likehood
The most common functions used to link probability to the explanatory variables are the LOGIT model and ________model.	PROBIT
The method used to iteratively find a solution to a multinomial legit model.	Newton-Raphson algorithm
SBC means_________	Schwar’s Bayesian Criterion
Displays the performance of a model and enables a comparison to be made with other models.	ROC curve
What does ROC mean?	Receiver Operating Characteristics
A frequently used method as it enables binary variables, sum polytomous variable to be modelled.	logistic regression
It does NOT require the assumption that the parameters are normally distributed.	profile likehood
The proportion of a well-classified negative event.	specificity
ROC comes from ______theory.	signal detection
Which is NOT a KR technology?	roles
It is used to enable an entity to determine consequences by thinking rather than acting.	Knowledge Representation
The following are distinct roles that KR plays EXCEPT	Medium for pragmatically diligent interpretation
Any way to get new expressions from old ones.	inference
A network purpoting to describe family memberships.	network topology
KR means __________________________.	Knowledge Representation
It is a variety of formal calculation typically deduction	Intelligent Reasoning
The following are abstract notions EXCEPT	casualty
It views the world in terms of attributes object value triples.	rule based
The following provided inspirations of what constitutes intelligent reasoning EXCEPT	Sociology
It involves a commitment in viewing the world in terms of individual entities and relations.	logic
KR is a set of __________commitments.	ontological
It sees a set of prototypes in particular prototypical diseases to be matched against the case at hand.	INTERNIST
Which is NOT a basic representation technology?	graph
All representations are ________.	imperfect
Which is NOT a component of KR?	it adheres to the function
KR as a _________is a substitute for the thing itself	surrogate
It views the world in thinking of prototypical objects.	frame
It is a language that we say things about the world.	Medium of human expression
It is a process that goes on internally while most things it wishes about exists only externally.	reasoning
Which function provides the value of a function at any particular value of x but does NOT directly give the probability of the random variable?	Probability density
It involves a commitment in viewing the world in terms of individual entities and relations between them.	logic
ROC means	Receiver Operating Characteristics
Which of the following is a discrete distribution?	Hypergeometric
Any way to get new expressions from old ones	inference
The _______value is the weighted average of the value the random variable may assume	Expected
Which is an example of a discrete random variable?	number of book
The following are discrete distributions EXCEPT	chi-square
The classification table that XLSTAT can display	confusion matrix
The most common functions used to link probability to the explanatory variables are the LOGIT model and ________model.	PROBIT
Which of the following is a continuous distribution?	Chi-square
It is a numerical function of the outcome of a statistical experiment.	random variable
The most commonly used continuous probability distribution	normal
It is a language that we say things about the world	Medium of human expression
Which is NOT a basic representation technologies?	graph
Two of the most widely used discrete probability distribution	poisson and binomial
It does NOT require the assumption that the parameters are normally distributed	INTERNIST
The integral of all the values of a random variable in a probability density function is equal to______.	One
The following provided inspirations of what constitute intelligent reasoning EXCEPT	philosophy
Which is NOT a KR technology?	roles
The following are distinct roles that KR plays EXCEPT	Medium for pragmatically diligent interpretation
What is KR?	Knowledge Representation
The most common function used to link probability to explanatory variables	logit model
It does NOT require the assumption that the parameters are normally distributed	profile likehood
A model that corresponds to the case where the dependent variable has more than two categories.	multinomial logit model
The following are distinct roles that KR plays EXCEPT	Medium for pragmatically diligent interpretation
Which function provides the value of a function at any particular value of x but does NOT directly give the probability of the random variable?	Probability density
The following are continuous distributions EXCEPT	geometric
To estimate the parameters of the model ,the ________function is maximized	Likehood
It provides the height or the value of the function at any particular value of x	probability density function
It sees the medical world as made of empirical associations connecting symptoms to diseases.	MYCIN
Any way to get new expressions from old ones	inference
It refers to a frequently used method as it enables binary or polytomous variables to be modelled.	logistic regression
KR as a _________is a substitute for the thing itself	surrogate
It is used to enable an entity to determine consequences by thinking rather than acting.	KR
A network purpoting to describe family memberships	network topology
The most widely used continuous probability distribution	Normal
What is the value of the mean and standard deviation in a normal probability density function	mean-50 s=5
SBC means_________	Schwar’s Bayesian Criterion
The most widely used continuous probability distribution	Normal
It views the world in terms of attribute -object value triples	rule-based
Which of the following is a discrete distribution?	Hypergeometric
It is a variety of formal calculation typically deduction	Intelligent Reasoning
Which of the following is a continuous distribution?	Chi-square
It corresponds to the case where the dependent variable has more than 2 categories.	multinomial logit model
Which is NOT a component of KR?	it adheres to the function
It is often used as a model of the number of arrivals at a facility in a given period of time.	poison probability distribution

Pages

Wednesday, February 24, 2021

Data Analysis

No comments:

Post a Comment