Search site

Contact

GOPAL KRISHAN

MY BLOG

This section is empty.

SUBJECT LIST: > Data Warehousing and data mining

Data Warehousing and data mining

1- Marks Questions

Qs 1. data ------------- is concerned with finding hidden relationship present in business data to allow business to make predictions for future use.

warehousing
mining
extraction
hiding

Qs 2. whole logig of data mining is based on modeling.

true
false

Qs 3. Data in data processing is in different formats -----------

Operational / Transactional data
Non-Operational data
Information and Knowledge
all of the above

Qs 4. Data warehousing is defined as a process of centralized data management and retrieval.

True
false

Qs 5. KDD stands for -----------

Knowledge discovery in databases
known discovered databases
both of the above
none of the above

Qs 6. ------------------ is a technology that is used to create decision support software. OLAP and data mining are used to solve different kinds of analytic problems.

OLAP (Online Analytical Processing )
OLTP(Online Transaction Processing )
KDD
Data mining

Qs 7. Banking, Insurance, Credit Marketing, Telecommunications, Pharmaceuticals and Bioinformatics are the different applications in the industry in which data mining is used

Correct
Incorrect

Qs 8. ----------- provides software called Darwin, which is data mining tool. It incorporates Cluster analysis, classification, and prediction and Association rules

Intelligent miner (IBM Corp)
Weak 3-A
Oracle 10 g
Enterprise Miner (SAS Institute inc.)

Qs 9. The construction of data warehouse, which involves -------------can be viewed as an important preprocessing step for data mining

Data cleaning
Data integration,
Data hiding
Both 1 and 2

Qs 10. Data warehousing provides an interesting alternative to the traditional approach of ------------- databases integration.

Homogeneous
Heterogeneous
all of the above
none of the above

Qs 11. -------------------- approach requires complex information filtering and integration processes, and competes for resources with processing at local sources.

Wrappers
Integrators
Update driven
Query driven

Qs 12. Data warehouse and OLAP tools are not based on a multidimensional data model.

true
false

Qs 13 ---------- which contains language primitives for defining data warehouses and data marts. Language primitives for specifying other data mining tasks such as mining of concept, class descriptions, associations, classifications and so on.

SQL
DMQL
database language
all of the above

Qs 14. top down view, data source view , data warehouse view and business query view are the views considered during the -------- phase of a data warehouse

Analysis
Testing
Design
construction

Qs 15. A --------- Contains a subset of corporate wide data that is of value to a specific group of users

Data Mart
Data Warehouse
Data mining
all of the above

Qs 16. A virtual warehouse is a set of views over operational databases

true
false

Qs 17. Aggregated data can be stored in fact tables referred to as -----------

fact table
Dimension table
Summary fact table
Summary table

Qs 18. Multidimensional Analysis software also known as --------

OLAP
OLTP
KDD
MOLAP

Qs 19. ---------- is information about a company’s past performance that is used to help predict the company’s future performance.

Artificial intelligence (AI)
Business intelligence (BI)
Logical intelligence
none of the above

Qs 20. Today’s real world databases are highly susceptible to --------

Due to their typically huge size, often several gigabytes or more

noisy
inconsistent data
missing
all of the above

Qs 21. Buckets or bins are not interchangeable terms

correct
Incorrect

Qs 22. Binning, clustering and regression techniques works to remove the noise from the data during transformation of data in -----------

Normalization
Smoothing
Aggregation
Generalization

Qs 23. ---------- techniques can be applied to obtain a reduced representation of the data set that is much smaller in volume yet closely contains the integrity of the original data.

Normalization
Data Reduction

Smoothing
Aggregation

Qs 24. Sampling can be used as data ---------- technique.

Normalization
Creation
Reduction
Mining

Qs 25. A majority of Data Mining systems do not use any DBMS and have their own memory and storage management.

true
false

Qs 26. Association, classification, regression, clustering and Neural Networks all are Data ---------- techniques

Normalization
Creation
Reduction
Mining

Qs 27. Mcfs STANDS FOR

a. Maximum FREQUENT CANDIDATE SET

b. Minimal FREQUENT CANDIDATE SET

c. None OF ABOVE

d. All OF THE ABOVE

qs 28. mdl.

maximum description length
minimum description length
mean described length
minimum described length

qs 29. post pruning approach removes branches from a ‘fully grown’ tree.

1. true

2. false

qs 30. q8 classification and prediction are two forms of

1. data analysis

2. decision tree

3. a and b

4. none of these

qs 31. decision tree is based on

1. bottom-down technique

b. top-down techique

c. divide-and-conquer manner

d. top-down recursive divide-and-conquer manner

qs 32. pam stands for

a. prototype above medoids

b prototype around means

c. paritioning around medoids

d. paritioning above means

qs 33. a user session is a--------- record spanning the entire web

qs 34 web data is ----------

1. structured data

2. unstructured data

3. text data binary data

4. binary data

qs 35. user navigation accessing technique is -------

1. web structured mining

2. web usage mining

3. web content mining

4. web data definition mining

qs 36. e-banking, search engine, online auction and web advertisment are the few applications of -----------------

1.web structured mining

2. web usage mining

3.web content mining

4.web data definition mining

Qs 37. BO (Bookmark organizer ) combines hierarchical clustering techniques and user interaction to organize a collection of web documents based on conceptual information.

1. true

2. false

Qs 38. E-commerce site will be defined as any web site offering ------

1. pre-sale support

2. products for sale

3. after sales service and backup

4. all of the above

Qs 39. --------- is a link analysis algorithum that assigns a numerical weighing to each element of a hyperlinked set of documents such as the world wide web.

1. web agent

2. log file

3. page rank

4. user profile

Qs 40. ---------- is simple text file that are automatically generated every time someone accesses one website.

1. web agent

2. log file

3. page rank

4. user profile

2- Marks Questions

Qs 1. ----------- may be detected by clustering, where similar values are organized into groups or “clusters”. Intuitively, values that fall outside of the set of clusters may be considered -----------.

clusters, Bins
Groups, Buckets\
Outliers, Outliers

4. all of the above

Qs 2. starProbe, web-based multi-user -------- available for academic institutions. ---------- provides a set of partitioned clustering algorithum that treat the clustering problem as an optimization process.

1. Client, SOM

2. Server , CLUTO

3. Client, CLUTO

4. Server, ESOM

Qs 3. ESOM stands for --------- and MML stands for ----------

1. Emergent self-organizing Maps, Minimum Message Length

2. Emerging self operating measure, maximum message Last

3. Emitted self organizing measure, Maximum Minimum length

4. none of the above

Qs 4. K-means, Hierarchical , agglomerative and Divisive are four methods of -------. And --------is one of the simplest unsupervised learning algorithms that solve the well known clustering problem.

1. classification , K-means

2. Prediction , K-means

3. clustering, K-means

4. all are correct

Qs 5. Clustering may also be considered as ------------ and clustering is also called --------

1. segmentation, partitions with similar objects

2. classification, segmentation

3. prediction , compression

4. segmentation, all of the above

Qs 6. (i)Association rules that involve two or more dimension or predicates can be referred to as multidimensional association rule.

(ii) Multidimensional association rules with no repeated predicates are called inter dimension association rule.

1. (i)true, (ii)false

2. (i)true, (ii)True

3. (i)false, (ii)false

4. (i)false, (ii)true

Qs 7. Classification and Prediction are two forms of

(I) 1. Data analysis

2.Decision Tree

3. A and B

4. None of these

(ii) Classification predicts

a. Categorical labels

b. Prediction models continued valued function

c. A and B

d. None of these

Qs 8. Decision tree is based on

(I) 1. Bottom-down technique

2 . Top-down techique

3. Divide-and-conquer manner

4. Top-down recursive divide-and-conquer manner

(II). Recursive Partitioning stops in Decision Tree when

1. All samples for a given node belong to same class.

2. There are no remaining attributes on which samples may be

further partitioned.

3. There are no samples for the branch test.

4. All the above.

Qs 9. --------- works to remove the noise from the data that includes techniques like binning , clustering and regression. the ------- techniques uses encoding mechanisms to reduced the data set size.

1. clustering , data reduction

2. smoothing, data compression

3. classification, data processing

4. binning, data reduction

Qs 10. OLTP and OLAP expands as

1. On-line transaction processing , on-line analysis processing

2. On-line temporary processing , on-line analysis processing

3. On-line transaction processing , on-line accurate processing

4. On-line time processing , on-line analysis processing

Qs 11. The data warehouse view includes fact table and --------table . The business query view is the perspective of data in the data warehouse from the viewpoint of the -------

1. Fact , programmer

2. Dimension , developer

3. Fact, end-user

4. all are correct

Qs 12. The ----- performs a structured and systematic analysis at each step before proceeding to the next whish is like a waterfall, falling from one step to next. The --------- involves the rapid generation of increasingly functional systems, with short intervals between successive releases

1 Waterfall method, spiral method

2. Spiral method, waterfall method

3. prototype model, spiral method

4. Linear method, spiral method.

Qs13 The bottom tier is a _________ database server that is almost always a relational database system. Data warehouse and _______ tools are based on OLTP data model.

1. Warehouse, OLAP

2. OLAP, ROLAP

3. ROLAP,OLTP

4. MOLAP, None of the above

QS14 NOISE is random error or variance in measured variable. SRSWR stands for.

1. True ,Simple random sample with replacement

2. False, Simple random sample without replacement

Qs17 The data compression technique uses encoding mechanisms to ______ the date set size. To deal with larger data sets, a sampling method, called _____________

1. Reduce, Clara

2. rease, Dara

3. Equal, Pam

4. None, None of the above

Qs 18. (i) A majority of Data mining systems do not use any DBMS and have their own memory and storage mgmt.

(ii)Data mining supports automatic data exploration.

1. (i)True (ii) False

2. (i)True (ii) true

3. (i)false (ii) False

4. (i)False (ii) true

Qs 19. Neural networks, classification, regression , clustering and association are data -------- techniques, -------- make use of existing variables in the database in order to predict unknown or future values of interest

1. Mining, Prediction

2. Warehousing , prediction

3. Mining, description

4. Warehousing, deduction

Qs 20. (i) Data constraints specify the set of task relevant data

(ii)Rule constraints specify the form of rules to be mined.

(i)True (ii) False
(i)True (ii) true
(i)false (ii) False
(i)False (ii) true

4-Marks Questions

Qs 1. The Entity relationship data model is commonly used in the design of -----------

Where a database -------- consists of a set of entities and the relationships between them . ER data model is appropriate for ------- processing. A -------- requires a concise subject-oriented schema that facilitates on-line data analysis.

relational databases, schema, on-line transaction , datawarehouse
Hierarchical databases, schema, on-line transaction , data mining
Hierarchical databases, schema, real-time transaction , data mining
Relational databases, schema, on-line transaction , data classification

Qs 2.(i) Data warehouse and OLAP tools are not based on multidimensional data

(ii) the data source view exposes the information being captured , stroed and managed by operational systems

(iii)Relational OLAP are the intermediate servers that stand in between a relational back-end server and client front-end tools

(iv)A virtual machine is a set of views over operational databases

(i)True (ii)True(iii)True(iv)True
(i)false (ii)True(iii)True(iv)True
(i)True (ii)True(iii)True(iv)false
(i)True (ii)True(iii)False(iv)false

Qs 3. ANN, FP tree, OLTP and OLAP

Articraft neural network, Frequent pattern tree, On-line temporary processing , on-line analysis processing
Artificial neural network, Frequent pattern tree, On-line transaction processing , on-line analysis processing
Artistic neural network, Frequent pattern tree, On-line temporary processing , on-line analysis processing
Articraft neural network, Frequent pattern tree, On-line temporary processing , on-line analysis processing

Qs 4. ----------- specify the type of knowledge to be mined. Data constraints specify the set of ---------. Dimensional constraints specify the dimension of the ---------- and rule constraints specify the form of ------ to be mined

knowledge type constricts, time-related data, information, rule
knowledge type constricts, time-related data, infornation, rule
knowledge type constricts, time-related data, data, interestingness
knowledge type constricts, task-related data, data, rule

Qs 5. k-mean, agglomerative and hierachical are methods of -------single link clustering also called -------- complete link clustering also called as --------- method. ---------- is used for data mining.

classification, connectedness, diameter, data warehouse
clustering, connectedness, area, clustering
clustering, connectedness, diameter, clustering

clustering, isolated, diameter, data mart

Qs 6.(i) clustering may also be considered as segmentation .

(ii)Segmentation, compression , and partitions with similar object all are not clustering methods

(iii)clustering is not used only in data mining

(iv)supervised learning is represented in the form of clustering.

(i)True (ii)True(iii)True(iv)True
(i)false (ii)True(iii)True(iv)True
(i)True (ii)True(iii)True(iv)false
(i)True (ii)false(iii)False(iv)true

Qs 7. web content mining, web structure mining and web usage mining all comes under ------. And --------- is simple text files that are automatically generated every time someone accesses one website .--------- is a link analysis that assigns a numerical weighing to each element of a hyperlinked set of documents such as the world wide web. --------- a software agent is a computer program which runs on an agent interaction machine

web mining, log file, page rank , web agent
web warehousing, data file, page rank , web agent
web mining, log file, user profile , web agent
web mining, log file, page rank , web mining

Qs 8. ----------data quality solution provides an enterprise solution for profiling cleansing, augmenting and integrating data to create consistent , relaiable-------, with Sas data quality solution you can automatically incorporated data quality into data integration and -----------projects to dramatically improve returns on your organization ‘s -----------initatives.

SAS ,information, business intelliegence, strategies
GNU, information, business intelliegence, strategies
SAS, decisions, business intelliegence, rules
GNU, data, business intelliegence, policies

Qs 9. Weka is a collection of machine learning algorithum for -------tasks, the algorithums can either be applied directly to a dataset or called from your own java code. Weka contains tools for data preprocessing , classification, regression, clustering , association rules and-----------. It is well suited for developing new machine learning------.

data warehousing, imagination, rules
data mining, visualization, schemes
data mining, calculations, strategies
data mart, visualization, schemes

Qs 10. web log analysis has been the foundation of ---------- on the web

In --------- uniquely identifying users. A lots of works have been done in the information retrieval databases intelligent agents and topology which provides a sound foundation for the ------------. Web mining is the application of -----------.

data visualization, data mining, data mart creation, data mining
data mining, visualization, schemes, data warehousing
data warehousing, web mining, web content mining, data mining
E-commerce, web mining, content search, data warehouse

Qs 11.(i)A user session is a click stream record spanning the entire web .

(ii) web structure describes how a page is used the date and time it was accessed the IP addresses of the browser ad page references.

(iii)web log files are frequently used in sequential mining.

(iv)Structural mining is used to examine the structure of a particular websites and collate and analyze related data.

(i)True (ii)True(iii)True(iv)True
(i)false (ii)True(iii)True(iv)True
(i)True (ii)True(iii)True(iv)false
(i)True (ii)True(iii)False(iv)false

Qs 12. EOS , KDD, GDP, and PRIM expands as -------

Early observation system , Knowledge database, Grand domestic product , patient rule induction method
Earth observation system , Knowledge database, Gross domestic product , peiodic rule induction method
Earth observation system , Knowledge database, Gross domestic product , patient rule induction method
Easy observation system , Knowledge database, Grand domestic product , patient rule induction method

Qs 13. Insurance and direct mail are two industries that rely on -------- to make profitable business decisions. To aid decision making analysis construct ----- models using warehouse data to predict the outcomes of variety of decision alternatives. A --------profile is a model that predicts future purchasing behaviour of an individual customer,given historical transaction data for both the individual and for the larger population of all of a particular company’s customers. It is often beneficial to -------

Data into a smaller number of points , easing computational requirements and reducing the amount of noise.

data analysis, predictive, predictive, aggregate
alternative analysis predictive, predictive, aggregate
data analysis, classification, predictive, noisy
cluster analysis, predictive, predictive, aggregate

Qs 14. WIS , DRG, MBA, HOLAP means

weight item sets ,Diagnosis related group, Mean basket Analysis, Hybrid OLAP
weighted item sets ,Dialogue related group, Mark basket Analysis, Hybrid OLAP
weighted item sets ,Diagnosis related group, Market basket Analysis, Hierarchical OLAP
weighted item sets ,Diagnosis related group, Market basket Analysis, Hybrid OLAP

Qs 15. data stored in most text databases are ---------- text data bases are also called as ---

---------- is the first step in text retrieval system, precision , recall and F-score all are the measures of the text ---------- documents.

sequence structured, document databases, Tokenization, retrieval
semi structured, relational databases, Tokenization, processing
semi structured, document databases, Tokenization, retrieval
structured, document databases, Tokenization, formatting