It is a free software programming language. |
R-programming |
Which of the following is NOT a goal in data
mining? |
collecting data |
It transforms data into actionable intelligence
for business purposes. |
Business Intelligence |
It
has the goal of discovering useful information to support decision making. |
data analysis |
Which
of the following type of text is processed in text analytics? |
unstructured |
The
following are artifacts used in data analysis EXCEPT: |
ANOVA |
___________
uses artifacts to present data visually. |
data visualization |
It
extracts meaningful numerical indices from information and make it available
to statistical and machine learning. |
Text analytics |
It
includes identifying groups of data records. |
cluster analysis |
Which
of the following data mining techniques is predictive? |
classification |
The
goal is to transform raw data into understandable business information. |
Data mining |
What
is the process of deriving useful information from text? |
Text Analytics |
_____________
includes identifying groups of data record. |
Cluster analysis |
It is
used in organization’s strategic and tactical business decision making. |
business intelligence |
It is
a powerful tool that shows the network of data. |
Knime |
What
programming language doe Orange use? |
python |
The
following processes are used in data analysis EXCEPT: |
collecting |
Which
of the following is NOT a method used in data analysis? |
Statistics Analytics |
It is
a method for discovering patterns in large data sets. |
Data Mining |
It
makes complex data more understandable and usable. |
data visualization |
The
two sets If A={ 2,3} B={4,5} are said to be |
disjoint |
Which of the matrices is singular? |
A |
What
is the focus of data science? |
manipulate data efficiently and effectively |
What
is an organized collection of information and set of information used to
manage that operation? |
ADT |
What
is the correct meaning of ADT? |
Abstract Data Type |
ML
means: |
Machine Learning |
Addition
and subtraction of matrices only is possible if two are more matrices. |
Have same sizes |
3A +
B = |
d. |
Which
is NOT a characteristic feature of data structure? |
It contains a fixed structure. |
It
refers to a data structure that grows and shrinks at execution time. |
dynamic |
If A={
2,3} B={4,5},which of the following is a Cartesian product of the two sets? |
{ (3,4) (3,5) (2,4 ) {2,5) } |
What
is the earlier name for data science? |
datalogy |
Which
of the following is the transpose of B? |
a |
Which
of the following is TRUE? |
A + B = B+ A |
What
is a data structure that has a fixed size? |
static |
Matrix
B is |
invertible |
The
intersection of the two sets A={ 2,3} B={4,5} is a What
is the size of the product of a 5x 6 and a 6x 8 matrices? |
null set 5x 8 |
_______________
is a data structure that every component has a unique processor and succesor. |
linear |
An
array is a good example of _________data structure. |
Static |
Addition
and subtraction of matrices only is possible if two are more matrices. |
Have same sizes |
Which
of the following is TRUE? |
A + B = B+ A |
The
goal is to transform raw data into understandable business information. |
Data mining |
If A=
{ x/x is a distinct letter in the word "MATHEMATICS"} AND
B={x/x is a distinct letter in the word "STATISTICS"} then
their intersection is |
{A,C,I,S,T} |
It is
a process of finding the computational complexity of algorithms. |
analysis of algorithms |
The
function describing the performance of an algorithm is usually an upper bound
determined from ______inputs. |
worst case |
Which
of the following data mining techniques is predictive? |
classification |
In α
=babaa β =a^6b^5bb, what is the length of the concatenation of
the two strings? |
18 |
He
coined the term “analysis of algorithms”. |
Donald Knuth |
Another
term for an empty set. |
null |
What
is the size of the product of a 5x 6 and a 6x 8 matrices? |
5x 8 |
It is
a free software programming language. |
R-programming |
It
is popular among financial data analysts. |
Knime |
It is
used to discover patterns in large data sets |
Data mining |
Earlier
name for data science. |
datalogy |
Which
of the following is a predictive data mining technique? |
regression |
Another
term for text analytics. |
text mining |
Algorithm analysis is an important part of a
broader_____________. |
computational complexity theory |
it
is a perfect software for machine learning. |
orange |
It is
a process of finding the computational complexity of algorithms. |
analysis of algorithms |
The
constant multiplicative factor in which algorithms are related are_______
constants. |
hidden |
The
range in R={ (3,3), (3,6), (5,5),(5,10),(6.12)} is a binary relation in
R is |
{3,5,6,10,12} |
It is
used in organization’s strategic and tactical business decision making. |
business intelligence |
Refers
to using tools of statistics to present data visually. |
data visualization |
Null
strings are indicated by |
λ |
Which
of the following is the transpose of B? |
a. |
It
offers a way to examine trends from collected data and derive insights
from it. |
Business Intelligence |
The
following are softwares used in data mining EXCEPT |
SPSS |
If R=
{ (3,3), (3,6), (5,5),(5,10),(6.12)} is a binary relation in R which the
domain is |
{3,5,6} |
It
includes identifying groups of data records |
cluster analysis |
It is
a powerful tool that shows the network of data. |
Knime |
It
relates the length of an algorithm to the number of storage location it uses. |
space complexity |
There
are how many data mining techniques? |
7 |
Which
of the matrices is singular? |
A |
A
special type of function where the domain is a set of consecutive
integers. |
sequence |
It is
used for prototyping in Rapid miner. |
studio |
An
example of an abstract computer. |
Turing machine |
The
process of inspecting,cleansing,transforming and modelling data with the goal
of discovering useful information. |
data analysis |
It
makes complex data more understandable and usable. |
data visualization |
The
symbol used to indicate strings with no elements. |
λ
|
What
type of text are processed in Text analytics? |
unstructured |
It is
a theoretical classification that estimates and anticipates the increase
increase in running time for algorithms.
|
run time analysis
|
The
following are large inputs EXCEPT |
Big beta notation |
It
relates the length of an algorithm’s input to the number of steps it takes. |
time complexity |
Matrix
B is |
invertible |
The
sets A= { x/x is a distinct letter in the word
"MATHEMATICS"} and B={x/x is a distinct letter in the
word "STATISTICS"} , the two sets are |
Equal (not sure) |
What
programming language is used in Rapid miner? |
Java |
The
product of a 2x5 and 5x3 matrices is a ______matrix |
2x3 |
A
matrix that has the same number of rows and columns is called |
Square |
“ All
models are wrong but some are useful “ |
George E. P. Box |
What
is a great example of data product? |
google maps |
He
pointed out that until 2003 ,all of mankind had generated just 5 exabytes of
data |
Eric Schmidt |
A new
phenomenon for the explosion of _________data |
interaction |
He
said that “ In mathematics the art of proposing a question must be held
of higher value than solving it”. |
Georg Cantor |
The
creation of data from varied sources and its quantification into information. |
Datafication |
How
many bytes of data are generated every two days in today's world? |
5 exabytes |
It
refers to well based theories and sound business judgement. |
Data Science |
PAW
means____________. |
Predictive Analytics World |
Exabyte
means ________bytes |
billion billion |
The
developer of farmville, a famous game in the internet. |
Zynga Incorporated |
The
person who said that “ The future is not google-able”. |
William Gibson |
It
shows a high correlation between the incidence of flu and searches about flu
on google. |
Google Flu trends |
It
expands available data enormously since there is so much more text being
generated than numbers. |
Text mining |
The
creation of data from varied sources and its qualification into information. |
datafication |
These
are the data skills that a good data scientist need to cultivate EXCEPT |
speaking |
Which
is Not an interaction data? |
data base |
The
following are the 3V's of big data EXCEPT |
veracity |
IOT
means |
Internet of things |
The
explosion of _______data is the main reason why every 2 days 5 exabytes of
data are generated. |
interaction |
Which
of the following is TRUE when a distribution is normal? |
Mean=Median=Mode |
What
is the mean for a standard normal distribution? |
0 |
Empirical
rule for a normal distribution that is 3 standard deviations above and below
the mean covers ______% of the data. |
99.7 |
Empirical
rule for a normal distribution that is 2 standard deviations above and below
the mean is ________% of data. |
95 |
A
bell shaped curve that is symmetric about a vertical line. |
normal distribution |
A
distribution where large distribution are displayed. |
Grouped frequency distribution |
A
survey of 100 consumers said that the price charged for a kilo of rice could
be approximated by a normal distribution with a mean of 35 and a standard
deviation of 4.How many of them lie between 27 and 43? |
95 |
What
percent of data will lie within 2 standard deviation of the mean? |
95 |
What
range of values 3 SD below and above the mean in a normal distribution if the
mean is 10 and standard deviation is 2? |
4-16 |
Lists
the percent of data in each distribution. |
relative frequency distribution |
The
normal distribution with a mean of 0 and standard deviation of 1. |
Standard |
The
area of the standard normal curve to the right of z=0.82 is _______. |
0.206 |
What
is the value of the mean if a score of 110 is 3 standard deviation
above the mean? |
95 |
A
score of 50 lies 2 standard deviations above a mean of 30.What is the value
of the standard deviation? |
10 |
A
graph used to indicate intervals in a frequency distribution is refereed to
as a______________. |
Histogram |
What
range of values lie between 3 standard deviations above and below the mean if
the mean is 80 and the standard deviation is 3? |
71-89 |
A
bell-shaped distribution that is symmetric about a vertical line? |
normal |
A
survey of 100 consumers said that the price charged for a kilo of rice could
be approximated by a normal distribution with a mean of 35 and a standard
deviation of 4.How many are less than 39? |
84 |
Empirical
rule for a normal distribution lie ______% of data with 1 standard
deviation below and above the mean. |
68 |
What
is the value of the standard deviation in a standard normal distribution? |
1 |
What
is the value of the mean if a score of 110 is 3 standard deviation above the
mean? |
95 |
Which
is NOT a value of r ? |
1.02 |
A
vegetable distributor knows that during the month of August ,the
weights of tomatoes are normally distributed with a mean of 0.61 lb and
a standard deviation of 0.15 lb. What percent of the tomatoes weigh less than
0.71 lb? |
84 |
The
equation of the _______line predicts the value of Y given X. |
Regression |
What
range of values lie between 3 standard deviations above and below the mean if
the mean is 80 and the standard deviation is 3? |
71-89 |
Data
is NOT information unless we add_________. |
analytics |
The
creation of a data product contains 3 components EXCEPT |
time |
What
increases data volume? |
velocity |
A
negative correlation exists when___________. |
x increases y decreases |
The
following are elements in an analytic plan EXCEPT |
graphs |
As of
2014,there are _______million of tweets a day. |
500 |
The
value of X in the regression equation Y= 1.24 X + 6.9 if Y=13.1 is |
5 |
According
to Hilary Mason which is NOT a skill that a good data scientist must
cultivate. |
critical thinking |
If
there are 103 scores the median is equal to the _____ranked score. |
52nd |
A
vegetable distributor knows that during the month of August ,the
weights of tomatoes are normally distributed with a mean of 0.61 lb and
a standard deviation of 0.15 lb. How many can be expected to weigh between
0.31 to 0.91 in a shipment of 4500 tomatoes. |
4275 |
It
partitions a ranked data into four equal groups. |
quartile |
A
perfect positive correlation coefficient is equal to |
1 |
In
the equation of the regression line represented by Y= 1.24 X + 6.9 if X=2
then Y =? |
9.38 |
It
refers to the degree of relationship between two variables? |
Correlation |
Who
said that "The future is not google-able " ? |
William Gillason |
He is
someone who asks interesting questions on formal and informal theory. |
data scientist |
The
number that occurs most frequently is called________. |
Mode |
If
the standard deviation of a distribution is 3.5, the variance is |
12.25 |
The
difference between the highest and lowest value. |
range |
It
list the percent of data in a distribution. |
relative frequency distribution |
The
normal distribution with a mean of 0 and standard deviation of 1. |
Standard |
Which
is NOT a correct correlation Coefficient? |
1.2 |
A
bell-shaped distribution that is symmetric about a vertical line? |
normal |
The
middle-most value in a ranked list of numbers. |
median |
Which
of the following is TRUE when a distribution is normal? |
Mode |
It
expands available data enormously. |
text mining |
A
bell-shaped distribution that is symmetric about a vertical line. |
normal |
What
percent of data will lie within 2 standard deviation of the mean? |
95 |
Data
involving two variables. |
bivariate |
The
quantification of data into information. |
Datafication |
He
coined the term "data scientist" |
DJ Patil |
The
major outcome of correlation. |
prediction |
A
vegetable distributor knows that during the month of August ,the
weights of tomatoes are normally distributed with a mean of 0.61 lb and
a standard deviation of 0.15 lb. How many can be expected to weigh more than
0.31 lb in a shipment of 6000 tomatoes. |
150 |
The
method of correlation used for ranked score is ________. |
Spearman rho |
A
data having the same number of occurrence in scores is said to be |
no mode |
Example
of a data product. |
google map |
A
positive z-score means that the score is |
Higher than the mean |
A
graph that is used to indicate frequency distribution. |
histogram |
The
area of the standard normal curve to the right of z=0.82 is _______. |
0.206 |
Which
of the following is used as a method for Correlation? |
Pearson r |
A
score of 50 lies 2 standard deviations above a mean of 30.What is the value
of the standard deviation? |
10 |
The
score NOT easily affected by extreme values. |
Median |
On an
examination given to 1000 students, Jef’s score of 80 was higher than the
score of 480 students who took the exam. What is the percentile for Jef’s
score? |
48th |
The
classification table that XL Stat can display. |
confusion matrix |
Which
pair belongs to the same family of models called GLM ? |
Logistic and linear regression |
Classification
table is also called ________ |
confusion matrix |
It
corresponds to the case where the dependent variable has more than 2
categories. |
multinomial logit model |
It
enables the performance of a model and enables a comparison to be made with
other models. |
ROC |
He
proposed the use of a penalized likehood function. |
Firth |
Which
belong to the GLM family? |
logistic and linear |
The
proportion of a well-defined classified positive events. |
sensitivity |
What
does GLM means? |
Generalized Linear model |
The
method that does NOT require t he assumption that the parameters are normally
distributed. |
profile
likehood |
To
estimate the parameters of the model ,the ________function is maximized. |
likehood |
The
most common functions used to link probability to the explanatory variables
are the LOGIT model and ________model. |
PROBIT |
The
method used to iteratively find a solution to a multinomial legit model. |
Newton-Raphson algorithm |
SBC
means_________ |
Schwar’s Bayesian Criterion |
Displays
the performance of a model and enables a comparison to be made with other
models. |
ROC curve |
What
does ROC mean? |
Receiver Operating Characteristics |
A
frequently used method as it enables binary variables, sum polytomous
variable to be modelled. |
logistic regression |
It
does NOT require the assumption that the parameters are normally distributed. |
profile likehood |
The
proportion of a well-classified negative event. |
specificity |
ROC
comes from ______theory. |
signal detection |
Which
is NOT a KR technology? |
roles |
It is
used to enable an entity to determine consequences by thinking rather than
acting. |
Knowledge Representation |
The
following are distinct roles that KR plays EXCEPT |
Medium for pragmatically diligent interpretation |
Any
way to get new expressions from old ones. |
inference |
A
network purpoting to describe family memberships. |
network topology |
KR
means __________________________. |
Knowledge Representation |
It is
a variety of formal calculation typically deduction |
Intelligent Reasoning |
The
following are abstract notions EXCEPT |
casualty |
It
views the world in terms of attributes object value triples. |
rule based |
The
following provided inspirations of what constitutes intelligent reasoning
EXCEPT |
Sociology |
It
involves a commitment in viewing the world in terms of individual entities
and relations. |
logic |
KR is
a set of __________commitments. |
ontological |
It
sees a set of prototypes in particular prototypical diseases to be matched
against the case at hand. |
INTERNIST |
Which
is NOT a basic representation technology? |
graph |
All representations are
________. |
imperfect |
Which
is NOT a component of KR? |
it adheres to the function |
KR as
a _________is a substitute for the thing itself |
surrogate |
It
views the world in thinking of prototypical objects. |
frame |
It is
a language that we say things about the world. |
Medium of human expression |
It is
a process that goes on internally while most things it wishes about exists
only externally. |
reasoning |
Which
function provides the value of a function at any particular value of x but
does NOT directly give the probability of the random variable? |
Probability density |
It
involves a commitment in viewing the world in terms of individual entities
and relations between them. |
logic |
ROC
means |
Receiver Operating Characteristics |
Which
of the following is a discrete distribution? |
Hypergeometric |
Any
way to get new expressions from old ones |
inference |
The
_______value is the weighted average of the value the random variable may
assume |
Expected |
Which
is an example of a discrete random variable? |
number of book |
The
following are discrete distributions EXCEPT |
chi-square |
The
classification table that XLSTAT can display |
confusion matrix |
The most common
functions used to link probability to the explanatory variables are the LOGIT
model and ________model. |
PROBIT |
Which
of the following is a continuous distribution? |
Chi-square |
It is
a numerical function of the outcome of a statistical experiment. |
random variable |
The
most commonly used continuous probability distribution |
normal |
It is
a language that we say things about the world |
Medium of human expression |
Which
is NOT a basic representation technologies? |
graph |
Two
of the most widely used discrete probability distribution |
poisson and binomial |
It
does NOT require the assumption that the parameters are normally distributed |
INTERNIST |
The
integral of all the values of a random variable in a probability density
function is equal to______. |
One |
The
following provided inspirations of what constitute intelligent reasoning
EXCEPT |
philosophy |
Which
is NOT a KR technology? |
roles |
The
following are distinct roles that KR plays EXCEPT |
Medium for pragmatically diligent interpretation |
What
is KR? |
Knowledge Representation |
The
most common function used to link probability to explanatory variables |
logit model |
It
does NOT require the assumption that the parameters are normally distributed |
profile likehood |
A
model that corresponds to the case where the dependent variable has more
than two categories. |
multinomial logit model |
The
following are distinct roles that KR plays EXCEPT |
Medium for pragmatically diligent interpretation |
Which function provides
the value of a function at any particular value of x but does NOT directly give
the probability of the random variable? |
Probability density |
The
following are continuous distributions EXCEPT |
geometric |
To
estimate the parameters of the model ,the ________function is maximized |
Likehood |
It provides the height
or the value of the function at any particular value of x |
probability density function |
It
sees the medical world as made of empirical associations connecting symptoms
to diseases. |
MYCIN |
Any
way to get new expressions from old ones |
inference |
It
refers to a frequently used method as it enables binary or polytomous
variables to be modelled. |
logistic
regression |
KR as
a _________is a substitute for the thing itself |
surrogate |
It is
used to enable an entity to determine consequences by thinking rather than
acting. |
KR |
A
network purpoting to describe family memberships |
network topology |
The
most widely used continuous probability distribution |
Normal |
What
is the value of the mean and standard deviation in a normal probability
density function |
mean-50 s=5 |
SBC
means_________ |
Schwar’s Bayesian Criterion |
The
most widely used continuous probability distribution |
Normal |
It
views the world in terms of attribute -object value triples |
rule-based |
Which
of the following is a discrete distribution? |
Hypergeometric |
It is
a variety of formal calculation typically deduction |
Intelligent Reasoning |
Which
of the following is a continuous distribution? |
Chi-square |
It
corresponds to the case where the dependent variable has more than 2
categories. |
multinomial logit model |
Which
is NOT a component of KR? |
it adheres to the function |
It
is often used as a model of the number of arrivals at a
facility in a given period of time. |
poison probability distribution |
Wednesday, February 24, 2021
Data Analysis
Data
Analysis
This course reviews and expands upon core topics in
probability and statistics through the study and practice of data anlysis
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment