Wednesday, February 24, 2021

Data Analysis

Data Analysis

Module Description
This course reviews and expands upon core topics in probability and statistics through the study and practice of data anlysis

It is a free software programming language.

R-programming

Which of the following is NOT a goal in data mining?

collecting data

It transforms data into actionable intelligence for business purposes.

Business Intelligence

It has the goal of discovering useful information to support decision making.

data analysis

Which of the following type of text  is processed in text analytics?

unstructured

The following are artifacts used in data analysis EXCEPT:

ANOVA

___________ uses artifacts to present data visually.

data visualization

It extracts meaningful numerical indices from information and make it available to statistical and machine learning.

Text analytics

It includes identifying groups of data records.

cluster analysis

Which of the following data mining techniques is predictive?

classification

The goal is to transform raw data into understandable business information.

Data mining

What is the process of deriving useful information from text?

Text Analytics

_____________ includes identifying groups of data record.

Cluster analysis

It is used in organization’s strategic and tactical business decision making.

business intelligence

It is a powerful tool that shows the network of data.

Knime

What programming language doe Orange use?

python

The following processes are used in data analysis EXCEPT:

collecting

Which of the following is NOT a method used in data analysis?

Statistics Analytics

It is a method for discovering patterns in large data sets.

Data Mining

It makes complex data more understandable and usable.

data visualization

The two sets If A={ 2,3} B={4,5} are said to be

disjoint

Which of the matrices is singular?

A

What is the focus of data science?

manipulate data efficiently and effectively

What is an organized collection of information and set of information used to manage that operation?

ADT

What is the correct meaning of ADT?

Abstract Data Type

ML means:

Machine Learning

Addition and subtraction of matrices only is possible if  two are more matrices.

Have same sizes

3A + B =

d.

Which is NOT a characteristic feature of data structure?

It contains a fixed structure.

It refers to a data structure that grows and shrinks at execution time.

dynamic

If A={ 2,3} B={4,5},which of the following is a Cartesian product of the two sets?

{ (3,4) (3,5) (2,4 ) {2,5) }

What is the earlier name for data science?

datalogy

Which of the following is the transpose of B?

a

Which of the following is TRUE?

A + B = B+ A

What is a data structure that has a fixed size?

static

Matrix B is

invertible

The intersection of the two sets A={ 2,3} B={4,5} is a

What is the size of the product of a 5x 6    and a 6x 8 matrices?

null set

5x 8

_______________ is a data structure that every component has a unique processor and succesor.

linear

An array is a good example of _________data structure.

Static

Addition and subtraction of matrices only is possible if  two are more matrices.

Have same sizes

Which of the following is TRUE?

A + B = B+ A

The goal is to transform raw data into understandable business information.

Data mining

If A= { x/x is a distinct letter in the word "MATHEMATICS"}  AND B={x/x is a distinct letter in the word "STATISTICS"} then their intersection is

{A,C,I,S,T}

It is a process  of finding the computational complexity of algorithms.

analysis of algorithms

The function describing the performance of an algorithm is usually an upper bound determined from ______inputs.

worst case

Which of the following data mining techniques is predictive?

classification

In Î± =babaa  β  =a^6b^5bb, what is the length of the concatenation of the two strings?

18

He coined the term “analysis of algorithms”.

Donald Knuth

Another term for an empty set.

null

What is the size of the product of a 5x 6 and a 6x 8 matrices?

5x 8

It is a free software programming language.

R-programming

It is  popular among financial data analysts.

Knime

It is used to discover patterns in large data sets

Data mining

Earlier name for data science.

datalogy

Which of the following is a predictive data mining technique?

regression

Another term for text analytics.

text mining

Algorithm analysis is an important part of a broader_____________.

computational complexity theory

it is  a perfect software  for machine learning.

orange

It is a process  of finding the computational complexity of algorithms.

analysis of algorithms

The constant multiplicative factor in which algorithms are related are_______ constants.

hidden

The range in  R={ (3,3), (3,6), (5,5),(5,10),(6.12)} is a binary relation in R is

{3,5,6,10,12}

It is used in organization’s strategic and tactical business decision making.

business intelligence

Refers to using tools of statistics to present data visually.

data visualization

Null strings are indicated by

λ

Which of the following is the transpose of B?

a.

It offers a  way to examine trends from collected data and derive insights from it.

Business Intelligence

The following are softwares used in  data mining  EXCEPT

SPSS

If R= { (3,3), (3,6), (5,5),(5,10),(6.12)} is a binary relation in R which the domain is

{3,5,6}

It includes identifying groups of data records

cluster analysis

It is a powerful tool that shows the network of data.

Knime

It relates the length of an algorithm to the number of storage location it uses.

space complexity

There are how many data mining techniques?

7

Which of the matrices is singular?

A

A special type of function where the domain is a  set of consecutive integers.

sequence

It is used for prototyping in Rapid miner.

studio

An example of an abstract computer.

Turing machine

The process of inspecting,cleansing,transforming and modelling data with the goal of discovering useful information.

data analysis

It makes complex data more understandable and usable.

data visualization

The symbol used to indicate strings with no elements.

λ

 

What type of text are processed in Text analytics?

unstructured

It is a theoretical classification that estimates and anticipates the increase increase in running time for algorithms.

 

run time analysis

 

The following are large inputs EXCEPT

Big beta notation

It relates the length of an algorithm’s input to the number of steps it takes.

time complexity

Matrix B is

invertible

The sets  A= { x/x is a distinct letter in the word "MATHEMATICS"} and B={x/x is a distinct letter in the word "STATISTICS"} , the two sets are

Equal (not sure)

What programming language is used in Rapid miner?

Java

The product of a 2x5 and 5x3 matrices is a ______matrix

2x3

A matrix that has the same number of rows and columns is called

Square

“ All models are wrong but  some  are useful “

George  E. P. Box

What is a great example of data product?

google maps

He pointed out that until 2003 ,all of mankind had generated just 5 exabytes of data

Eric Schmidt

A new phenomenon for the explosion of _________data

interaction

He said that “ In mathematics the art of proposing a question  must be held of higher value than solving it”.

Georg Cantor

The creation of data from varied sources and its quantification into information.

Datafication

How many bytes of data are generated every two days in today's world?

5 exabytes

It refers to well based theories  and sound business judgement.

Data Science

PAW means____________.

Predictive Analytics World

Exabyte means ________bytes

billion billion

The developer of farmville, a famous game in the internet.

Zynga Incorporated

The person who said that “ The future is not google-able”.

William Gibson

It shows a high correlation between the incidence of flu and searches about flu on google.

Google Flu trends

It expands available data enormously since there is so much more text being generated than numbers.

Text mining

The creation of data from varied sources and its qualification into information.

datafication

These are the data skills that a good data scientist need to cultivate EXCEPT

speaking

Which is Not an interaction data?

data base

The following are the 3V's of big data EXCEPT

veracity

IOT means

Internet of things

The explosion of _______data is the main reason why every 2 days 5 exabytes of data are generated.

interaction

Which of the following is TRUE when a distribution is normal?

Mean=Median=Mode

What is the mean for a standard normal distribution?

0

Empirical rule for a normal distribution that is 3 standard deviations above and below the mean covers ______% of the data.

99.7

Empirical rule for a normal distribution that is 2 standard deviations above and below the mean is ________% of data.

95

A  bell shaped curve that is symmetric about a vertical  line.

normal distribution

A distribution where large distribution are displayed.

Grouped frequency distribution

A survey of 100 consumers said that the price charged for a kilo of rice could be approximated by a normal distribution with a mean of 35 and a standard deviation of 4.How many of them lie between 27 and 43?

95

What percent of data will lie within 2 standard deviation of the mean?

95

What range of values 3 SD below and above the mean in a normal distribution if the mean is 10 and standard deviation is 2?

4-16

Lists the percent of data  in each distribution.

relative frequency distribution

The normal distribution with  a mean of 0 and standard deviation of 1.

Standard

The area of the standard normal curve to the right of z=0.82 is _______.

0.206

What is the value of the mean if a score of 110 is 3 standard deviation  above the mean?

95

A score of 50 lies 2 standard deviations above a mean of 30.What is the value of the standard deviation?

10

A graph used to indicate intervals in a frequency distribution is refereed to as a______________.

Histogram

What range of values lie between 3 standard deviations above and below the mean if the mean is 80 and the standard deviation is 3?

71-89

A bell-shaped distribution that is symmetric about a vertical line?

normal

A survey of 100 consumers said that the price charged for a kilo of rice could be approximated by a normal distribution with a mean of 35 and a standard deviation of 4.How many are less than 39?

84

Empirical rule for a normal distribution  lie ______% of data with 1 standard deviation below and above the mean.

68

What is the value of the standard deviation in a standard normal distribution?

1

What is the value of the mean if a score of 110 is 3 standard deviation above the mean?

95

Which is NOT a value of r ?

1.02

A  vegetable distributor  knows that during the month of August ,the weights of tomatoes are normally distributed with a mean of 0.61 lb  and a standard deviation of 0.15 lb. What percent of the tomatoes weigh less than 0.71 lb?

84

The equation of the _______line predicts the value of Y given X.

Regression

What range of values lie between 3 standard deviations above and below the mean if the mean is 80 and the standard deviation is 3?

71-89

Data is NOT information unless we add_________.

analytics

The creation of a data product contains 3 components  EXCEPT

time

What increases data volume?

velocity

A negative correlation exists when___________.

x increases y decreases

The following are elements in an analytic plan EXCEPT

graphs

As of 2014,there are _______million of tweets a day.

500

The value of X in the regression equation Y= 1.24 X + 6.9 if Y=13.1 is

5

According to Hilary Mason which is NOT a skill that a good data scientist must cultivate.

critical thinking

If there are 103 scores the median is equal to the _____ranked score.

52nd

A  vegetable distributor  knows that during the month of August ,the weights of tomatoes are normally distributed with a mean of 0.61 lb  and a standard deviation of 0.15 lb. How many can be expected to weigh between 0.31 to 0.91  in a shipment of 4500 tomatoes.

4275

It partitions a ranked data into four equal groups.

quartile

A perfect positive correlation coefficient is equal to

1

In the equation of the regression line represented by Y= 1.24 X + 6.9 if X=2 then Y =?

9.38

It refers to the degree of relationship between two variables?

Correlation

Who said that "The future is not  google-able " ?

William Gillason

He is someone who asks interesting questions on formal and informal theory.

data scientist

The number that occurs most frequently is called________.

Mode

If the standard deviation of a distribution is 3.5, the variance is

12.25

The difference between the highest and lowest value.

range

It list the percent of data in a distribution.

relative frequency distribution

The normal distribution with a mean of 0 and standard deviation of 1.

Standard

Which is NOT a correct correlation Coefficient?

1.2

A bell-shaped distribution that is symmetric about a vertical line?

normal

The middle-most value in a ranked list of numbers.

median

Which of the following is TRUE when a distribution is normal?

Mode

It expands available data enormously.

text mining

A bell-shaped distribution that is symmetric about a vertical line.

normal

What percent of data will lie within 2 standard deviation of the mean?

95

Data involving two variables.

bivariate

The quantification of data into information.

Datafication

He coined the term  "data scientist"

DJ Patil

The major outcome of correlation.

prediction

A  vegetable distributor  knows that during the month of August ,the weights of tomatoes are normally distributed with a mean of 0.61 lb  and a standard deviation of 0.15 lb. How many can be expected to weigh more than 0.31 lb in a shipment of 6000 tomatoes.

150

The method of correlation used for ranked score is ________.

Spearman rho

A data having the same number of occurrence in scores is said to be

no mode

Example of a data product.

google map

A positive z-score means that the score is

Higher than the mean

A graph that is used to indicate frequency distribution.

histogram

The area of the standard normal curve to the right of z=0.82 is _______.

0.206

Which of the following is used as a method for Correlation?

Pearson r

A score of 50 lies 2 standard deviations above a mean of 30.What is the value of the standard deviation?

10

The score NOT easily affected by extreme values.

Median

On an examination given to 1000 students, Jef’s score of 80 was higher than the score of 480 students who took the exam. What is the percentile for Jef’s score?

48th

The classification table that XL Stat can display.

confusion matrix

Which pair belongs to the same family of models called GLM ?

Logistic and linear regression

Classification table is also called ________

confusion matrix

It corresponds to the case where the dependent variable has more than 2 categories.

multinomial logit model

It enables the performance of a model and enables a comparison to be made with other models.

ROC

He proposed the use of a penalized likehood function.

Firth

Which belong to the GLM  family?

logistic and linear

The proportion of a well-defined classified positive events.

sensitivity

What does GLM means?

Generalized Linear model

The method that does NOT require t he assumption that the parameters are normally distributed.

profile likehood

To estimate the parameters of the model ,the ________function is maximized.

likehood

The most common functions used to link probability to the explanatory variables are the LOGIT model and ________model.

PROBIT

The method used to iteratively find a solution to a multinomial legit model.

Newton-Raphson algorithm

SBC means_________

Schwar’s Bayesian Criterion

Displays the performance of a model and enables a comparison to be made with other models.

ROC curve

What does ROC mean?

Receiver Operating Characteristics

A frequently used method as it enables binary variables, sum polytomous variable to be modelled.

logistic regression

It does NOT require the assumption that the parameters are normally distributed.

profile likehood

The proportion of a well-classified negative event.

specificity

ROC comes from ______theory.

signal detection

Which is NOT a KR technology?

roles

It is used to enable an entity to determine consequences by thinking rather than acting.

Knowledge Representation

The following are distinct roles that KR plays EXCEPT

Medium for pragmatically diligent interpretation

Any way to get new expressions from old ones.

inference

A network purpoting to describe family memberships.

network topology

KR means __________________________.

Knowledge Representation

It is a variety of  formal calculation typically deduction

Intelligent Reasoning

The following are abstract notions EXCEPT

casualty

It views the world in terms of attributes object value triples.

rule based

The following provided inspirations of what constitutes intelligent reasoning EXCEPT

Sociology

It involves a commitment in viewing the world in terms of individual entities and relations.

logic

KR is a set of __________commitments.

ontological

It sees a set of prototypes in particular prototypical diseases to be matched against the case at hand.

INTERNIST

Which is NOT a basic representation technology?

graph

All representations are ________.

imperfect

Which is NOT a component of KR?

it adheres to the function

KR as a _________is a substitute for the thing itself

surrogate

It views the world in thinking of prototypical objects.

frame

It is a language that we say things about the world.

Medium of human expression

It is a process that goes on internally while most things it wishes about exists only externally.

reasoning

Which function provides the value of a function at any particular value of x but does NOT directly give the probability of the random variable?

Probability density

It involves a commitment in viewing the world in terms of individual entities and relations between them.

logic

ROC means

Receiver Operating Characteristics

Which of the following is a discrete distribution?

Hypergeometric

Any way to get new expressions from old ones

inference

The _______value is the weighted average of the value the random variable may assume

Expected

Which is an example of a discrete random variable?

number of book

The following are discrete distributions EXCEPT

chi-square

The classification table that XLSTAT can display

confusion matrix

The most common functions used to link probability to the explanatory variables are the LOGIT model and ________model.

PROBIT

Which of the following is a continuous distribution?

Chi-square

It is a numerical function of the outcome of a statistical experiment.

random variable

The most commonly used continuous probability distribution

normal

It is a language that we say things about the world

Medium of human expression

Which is NOT a basic representation technologies?

graph

Two of the most widely used discrete probability distribution

poisson and binomial

It does NOT require the assumption that the parameters are normally distributed

INTERNIST

The integral of all the values of a random variable in a probability density function is equal to______.

One

The following provided inspirations of what constitute intelligent reasoning EXCEPT

philosophy

Which is NOT a KR technology?

roles

The following are distinct roles that KR plays EXCEPT

Medium for pragmatically diligent interpretation

What is KR?

Knowledge Representation

The most common function used to link probability to explanatory variables

logit model

It does NOT require the assumption that the parameters are normally distributed

profile likehood

A model that corresponds to the case where the dependent variable has more than two categories.

multinomial logit model

The following are distinct roles that KR plays EXCEPT

Medium for pragmatically diligent interpretation

Which function provides the value of a function at any particular value of x but does NOT directly give the probability of the random variable?

Probability density

The following are continuous distributions EXCEPT

geometric

To estimate the parameters of the model ,the ________function is maximized

Likehood

It provides the height or the value of the function at any particular value of x

probability density function

It sees the medical world as made of empirical associations connecting symptoms to diseases.

MYCIN

Any way to get new expressions from old ones

inference

It refers to a frequently used method as it enables binary or polytomous variables to be modelled.

logistic regression

KR as a _________is a substitute for the thing itself

surrogate

It is used to enable an entity to determine consequences by thinking rather than acting.

KR

A network purpoting to describe  family memberships

network topology

The most widely used continuous probability distribution

Normal

What is the value of the mean and standard deviation in a normal probability density function

mean-50 s=5

SBC means_________

Schwar’s Bayesian Criterion

The most widely used continuous probability distribution

Normal

It views the world in terms of attribute -object value triples

rule-based

Which of the following is a discrete distribution?

Hypergeometric

It is a variety of formal calculation typically deduction

Intelligent Reasoning

Which of the following is a continuous distribution?

Chi-square

It corresponds to the case where the dependent variable has more than 2 categories.

multinomial logit model

Which is NOT a component of KR?

it adheres to the function

It is  often used as a  model  of the number of arrivals at a facility in a  given period of time.

poison probability distribution

No comments:

Post a Comment