Data Warehousing and data mining

 

  

1- Marks Questions         

 

Qs 1. data ------------- is concerned with finding hidden relationship present in business data to allow business to make predictions for future use.

 

  1. warehousing
  2. mining
  3. extraction
  4. hiding

 

Qs 2. whole logig of data mining is based on modeling.

 

  1. true
  2. false

 

Qs 3. Data in data processing is in different formats -----------

  1. Operational / Transactional data
  2. Non-Operational data
  3. Information and Knowledge
  4. all of the above

 

Qs 4. Data warehousing is defined as a process of centralized data management and retrieval.

 

  1. True
  2. false

 

Qs 5. KDD stands for -----------

 

  1. Knowledge discovery in databases
  2. known discovered databases
  3. both of the above
  4. none of the above

 

Qs 6. ------------------ is a technology that is used to create decision support software. OLAP and data mining are used to solve different kinds of analytic problems.

 

  1. OLAP (Online Analytical Processing )
  2. OLTP(Online Transaction Processing )
  3. KDD
  4. Data mining

 

Qs 7. Banking, Insurance, Credit Marketing, Telecommunications, Pharmaceuticals and Bioinformatics are the different applications in the industry in which data mining is used

 

  1. Correct
  2. Incorrect

 

Qs 8. ----------- provides software called Darwin, which is data mining tool. It incorporates Cluster analysis, classification, and prediction and Association rules

 

  1. Intelligent miner (IBM Corp)
  2. Weak 3-A
  3. Oracle 10 g
  4. Enterprise Miner (SAS Institute inc.)

 

Qs 9. The construction of data warehouse, which involves -------------can be viewed as an important preprocessing step for data mining

 

  1. Data cleaning
  2. Data integration,
  3. Data hiding
  4. Both 1 and 2

 

Qs 10. Data warehousing provides an interesting alternative to the traditional approach of ------------- databases integration.

  1. Homogeneous
  2. Heterogeneous
  3. all of the above
  4. none of the above

 

Qs 11. -------------------- approach requires complex information filtering and integration processes, and competes for resources with processing at local sources.

 

  1. Wrappers
  2. Integrators
  3. Update driven
  4. Query driven

 

Qs 12. Data warehouse and OLAP tools are not based on a multidimensional data model.

 

  1. true
  2. false

 

Qs 13 ---------- which contains language primitives for defining data warehouses and data marts. Language primitives for specifying other data mining tasks such as mining of concept, class descriptions, associations, classifications and so on.

 

  1. SQL
  2. DMQL
  3. database language
  4. all of the above

 

Qs 14. top down view, data source view , data warehouse view and business query view are the views considered during the -------- phase of a data warehouse

  1. Analysis
  2. Testing
  3. Design
  4. construction

Qs 15. A --------- Contains a subset of corporate wide data that is of value to a specific group of users

 

  1. Data Mart
  2. Data Warehouse
  3. Data mining
  4. all of the above

 

Qs 16. A virtual warehouse is a set of views over operational databases

 

  1. true
  2. false

 

Qs 17. Aggregated data can be stored in fact tables referred to as -----------

 

  1. fact table
  2. Dimension table
  3. Summary fact table
  4. Summary table

 

Qs 18. Multidimensional Analysis software also known as --------

  1. OLAP
  2. OLTP
  3. KDD
  4. MOLAP

 

Qs 19. ---------- is information about a company’s past performance that is used to help predict the company’s future performance.

 

  1. Artificial intelligence (AI)
  2. Business intelligence (BI)
  3. Logical intelligence
  4. none of the above

Qs 20. Today’s real world databases are highly susceptible to --------

Due to their typically huge size, often several gigabytes or more

 

  1. noisy
  2. inconsistent data
  3. missing
  4. all of the above

 

Qs 21. Buckets or bins are not interchangeable terms

 

  1. correct
  2. Incorrect

 

 

Qs 22. Binning, clustering and regression techniques works to remove the noise from the data during transformation of data in -----------

 

  1. Normalization
  2. Smoothing
  3. Aggregation
  4. Generalization

 

Qs 23. ---------- techniques can be applied to obtain a reduced representation of the data set that is much smaller in volume yet closely contains the integrity of the original data.

 

  1. Normalization
  2. Data Reduction
  1. Smoothing
  2. Aggregation

 

Qs 24. Sampling can be used as data ---------- technique.

 

  1. Normalization
  2. Creation
  3. Reduction
  4. Mining

 

Qs 25. A majority of Data Mining systems do not use any DBMS and have their own memory and storage management.

 

  1. true
  2. false

 

Qs 26. Association, classification, regression, clustering and Neural Networks all are Data ---------- techniques

 

  1. Normalization
  2. Creation
  3. Reduction
  4. Mining

 

Qs 27. Mcfs STANDS FOR

 

a.      Maximum FREQUENT CANDIDATE SET

b.      Minimal FREQUENT CANDIDATE SET

c.      None OF ABOVE

d.      All OF THE ABOVE

 

qs 28. mdl.

 

  1. maximum description length
  2. minimum description length
  3. mean described length
  4. minimum described length

 

qs 29. post pruning approach removes branches from a ‘fully grown’ tree.

1.      true

2.      false

qs 30. q8 classification and prediction are two forms of

1.                  data analysis

2.                  decision tree

3.                  a and b

4.                  none of these

qs 31. decision tree is based on

     1. bottom-down technique

    b. top-down techique

    c. divide-and-conquer manner

    d. top-down recursive divide-and-conquer manner

qs 32. pam stands for

a. prototype above medoids

b prototype around means

c. paritioning around medoids

d. paritioning above means

qs 33. a user session is a--------- record spanning the entire web

qs 34 web data is ----------

1.      structured data

2.      unstructured data

3.      text data binary data

4.      binary data

qs 35. user navigation accessing technique is -------

1.      web structured mining

2.      web usage mining

3.      web content mining

4.      web data definition mining

qs 36. e-banking, search engine, online auction and web advertisment are the few applications of -----------------

 1.web structured mining

 2. web usage mining

 3.web content mining

4.web data definition mining

Qs 37. BO (Bookmark organizer ) combines hierarchical clustering techniques and user interaction to organize a collection of web documents based on conceptual information.

1. true

2. false

Qs 38. E-commerce site will be defined as any web site offering ------

1.      pre-sale support

2.      products for sale

3.      after sales service and backup

4.      all of the above

Qs 39. --------- is a link analysis algorithum that assigns a numerical weighing to each element of a hyperlinked set of documents such as the world wide web.

1.      web agent

2.      log file

3.      page rank

4.      user profile

Qs 40. ---------- is simple text file that are automatically generated every time someone accesses one website.

1.      web agent

2.      log file

3.      page rank

4.      user profile

 

 

 

 

 

2-     Marks Questions  

 

 

Qs 1. ----------- may be detected by clustering, where similar values are organized into groups or “clusters”. Intuitively, values that fall outside of the set of clusters may be considered -----------.

  1. clusters, Bins
  2. Groups, Buckets\
  3. Outliers, Outliers

   4. all of the above

Qs 2. starProbe, web-based multi-user -------- available for academic institutions. ---------- provides a set of partitioned clustering algorithum that treat the clustering problem as an optimization process.

1. Client, SOM

2. Server , CLUTO

3. Client, CLUTO

4. Server, ESOM

 

Qs 3. ESOM stands for --------- and MML stands for ----------

1. Emergent self-organizing Maps, Minimum Message Length

2. Emerging self operating measure, maximum message Last

3. Emitted self organizing measure, Maximum Minimum length

4. none of the above

 

Qs 4. K-means, Hierarchical , agglomerative and Divisive are four methods of -------. And --------is one of the simplest unsupervised learning algorithms that solve the well known clustering problem.

1. classification , K-means

2. Prediction , K-means

3. clustering, K-means

4. all are correct

Qs 5. Clustering may also be considered as ------------ and clustering is also called --------

1.       segmentation, partitions with similar objects

2.       classification, segmentation

3.       prediction , compression

4.       segmentation, all of the above

Qs 6. (i)Association rules that involve two or more dimension or predicates can be referred to as multidimensional association rule.

(ii) Multidimensional association rules with no repeated predicates are called inter dimension association rule.

1. (i)true, (ii)false

2. (i)true, (ii)True

3. (i)false, (ii)false

4. (i)false, (ii)true

Qs 7. Classification and Prediction are two forms of

(I)        1.        Data analysis

             2.Decision Tree

3.      A and B

4.      None of these

(ii) Classification predicts

a.                  Categorical labels

b.                  Prediction models continued valued function

c.                  A and B

d.                  None of these

Qs 8. Decision tree is based on

 (I) 1. Bottom-down technique

    2 . Top-down techique

    3. Divide-and-conquer manner

    4. Top-down recursive divide-and-conquer manner

 

(II). Recursive Partitioning stops in Decision Tree when

    1. All samples for a given node belong to same class.

    2. There are no remaining attributes on which samples may be    

          further partitioned.

    3. There are no samples for the branch test.

    4. All the above.

Qs 9. --------- works to remove the noise from the data that includes techniques like binning , clustering and regression. the ------- techniques uses encoding mechanisms to reduced the data set size.

1.      clustering , data reduction

2.      smoothing, data compression

3.      classification, data processing

4.      binning, data reduction

 

 

Qs 10. OLTP and OLAP expands as

1.               On-line transaction processing , on-line analysis processing

2.               On-line temporary processing , on-line analysis processing

 

3.               On-line transaction processing , on-line accurate processing

 

4.               On-line time processing , on-line analysis processing

Qs 11. The data warehouse view includes fact table and --------table . The business query view is the perspective of data in the data warehouse from the viewpoint of the -------

1.       Fact , programmer

2.       Dimension , developer

3.       Fact, end-user

4.       all are correct

 

Qs 12. The ----- performs a structured and systematic analysis at each step before proceeding to the next whish is like a waterfall, falling from one step to next. The --------- involves the rapid generation of increasingly functional systems, with short intervals between successive releases

 

1 Waterfall method, spiral method

2. Spiral method, waterfall method

3. prototype model, spiral method

4. Linear method, spiral method.

 

Qs13 The bottom tier is a _________ database server that is almost always a relational database system. Data warehouse and _______ tools are based on OLTP data model.

 

1.       Warehouse, OLAP

2.       OLAP, ROLAP

3.       ROLAP,OLTP

4.       MOLAP, None of the above

 

QS14 NOISE is random error or variance in measured variable. SRSWR stands for.

 

 

1.       True ,Simple random sample with replacement

2. False, Simple random sample without replacement

 

Qs17 The data compression technique uses encoding mechanisms to ______ the date set size. To deal with larger data sets, a sampling method, called _____________

 

 

1. Reduce, Clara

2.       rease, Dara

3.       Equal, Pam

4.       None, None of the above

Qs 18. (i) A majority of Data mining systems do not use any DBMS and have their own memory and storage mgmt.

(ii)Data mining supports automatic data exploration.

1.       (i)True (ii) False

2.       (i)True (ii) true

3.       (i)false (ii) False

4.       (i)False (ii) true

Qs 19. Neural networks, classification, regression , clustering and association are data -------- techniques, -------- make use of existing variables in the database in order to predict unknown or future values of interest

1.          Mining, Prediction

2.          Warehousing , prediction

3.          Mining, description

4.          Warehousing, deduction

Qs 20. (i) Data constraints specify the set of task relevant data

(ii)Rule constraints specify the form of rules to be mined.

  1. (i)True (ii) False
  2. (i)True (ii) true
  3. (i)false (ii) False
  4. (i)False (ii) true

 

 

 

 

4-Marks Questions

 

Qs 1. The Entity relationship data model is commonly used in the design of -----------

Where a database -------- consists of a set of entities and the relationships between them . ER data model is appropriate for ------- processing. A -------- requires a concise subject-oriented schema that facilitates on-line data analysis.

  1. relational databases, schema, on-line transaction , datawarehouse
  2. Hierarchical databases, schema, on-line transaction , data mining
  3. Hierarchical databases, schema, real-time transaction , data mining
  4. Relational databases, schema, on-line transaction , data classification

Qs 2.(i) Data warehouse and OLAP tools are not based on multidimensional data

(ii) the data source view exposes the information being captured , stroed and managed by operational systems

(iii)Relational OLAP are the intermediate servers that stand in between a relational back-end server and client front-end tools

(iv)A virtual machine is a set of views over operational databases

 

  1. (i)True (ii)True(iii)True(iv)True
  2. (i)false (ii)True(iii)True(iv)True
  3. (i)True (ii)True(iii)True(iv)false
  4. (i)True (ii)True(iii)False(iv)false

Qs 3. ANN, FP tree, OLTP and OLAP

 

  1. Articraft neural network, Frequent pattern tree, On-line temporary processing , on-line analysis processing
  2. Artificial neural network, Frequent pattern tree, On-line transaction processing , on-line analysis processing
  3. Artistic neural network, Frequent pattern tree, On-line temporary processing , on-line analysis processing
  4. Articraft neural network, Frequent pattern tree, On-line temporary processing , on-line analysis processing

Qs 4. ----------- specify the type of knowledge to be mined. Data constraints specify the set of ---------. Dimensional constraints specify the dimension of the ---------- and rule constraints specify the form of ------ to be mined

  1. knowledge type constricts, time-related data, information, rule
  2. knowledge type constricts, time-related data, infornation, rule
  3. knowledge type constricts, time-related data, data, interestingness
  4. knowledge type constricts, task-related data, data, rule

Qs 5. k-mean, agglomerative and hierachical are methods of -------single link clustering also called -------- complete link clustering also called as --------- method. ---------- is used for data mining.

  1. classification, connectedness, diameter, data warehouse
  2. clustering, connectedness, area, clustering
  3. clustering, connectedness, diameter, clustering

 

  1. clustering, isolated, diameter, data mart

 

Qs 6.(i) clustering may also be considered as segmentation .

(ii)Segmentation, compression , and partitions with similar object all are not clustering methods

(iii)clustering is not used only in data mining

(iv)supervised learning is represented in the form of clustering.

  1. (i)True (ii)True(iii)True(iv)True
  2. (i)false (ii)True(iii)True(iv)True
  3. (i)True (ii)True(iii)True(iv)false
  4. (i)True (ii)false(iii)False(iv)true

Qs 7. web content mining, web structure mining and web usage mining all comes under ------. And --------- is simple text files that are automatically generated every time someone accesses one website .--------- is a link analysis that assigns a numerical weighing to each element of a hyperlinked set of documents such as the world wide web. --------- a software agent is a computer program which runs on an agent interaction machine

  1. web mining, log file, page rank , web agent
  2. web warehousing, data file, page rank , web agent
  3. web mining, log file, user profile , web agent
  4. web mining, log file, page rank , web mining

Qs 8. ----------data quality solution provides an enterprise solution for profiling cleansing, augmenting and integrating data to create consistent , relaiable-------, with Sas data quality solution you can automatically incorporated data quality into data integration and -----------projects to dramatically improve returns on your organization ‘s -----------initatives.

  1. SAS ,information, business intelliegence, strategies
  2. GNU, information, business intelliegence, strategies
  3. SAS, decisions, business intelliegence, rules
  4. GNU, data, business intelliegence, policies

Qs 9. Weka is a collection of machine learning algorithum for -------tasks, the algorithums can either be applied directly to a dataset or called from your own java code. Weka contains tools for data preprocessing , classification, regression, clustering , association rules and-----------. It is well suited for developing new machine learning------.

  1. data warehousing, imagination, rules
  2. data mining, visualization, schemes
  3. data mining, calculations, strategies
  4. data mart, visualization, schemes

Qs 10. web log analysis has been the foundation of ---------- on the web

In  --------- uniquely identifying users. A lots of works have been done in the information retrieval databases intelligent agents and topology which provides a sound foundation for the ------------. Web mining is the application of -----------.

  1. data visualization, data mining, data mart creation, data mining
  2. data mining, visualization, schemes, data warehousing
  3. data warehousing, web mining, web content mining, data mining
  4. E-commerce, web mining, content search, data warehouse

 

Qs 11.(i)A user session is a click stream record spanning the entire web .

(ii) web structure describes how a page is used the date and time it was accessed the IP addresses of the browser ad page references.

(iii)web log files are frequently used in sequential mining.

(iv)Structural mining is used to examine the structure of a particular websites and collate and analyze related data.

 

  1. (i)True (ii)True(iii)True(iv)True
  2. (i)false (ii)True(iii)True(iv)True
  3. (i)True (ii)True(iii)True(iv)false
  4. (i)True (ii)True(iii)False(iv)false

Qs 12. EOS , KDD, GDP, and PRIM expands as -------

  1. Early observation system , Knowledge database, Grand domestic product , patient rule induction method
  2. Earth observation system , Knowledge database, Gross domestic product , peiodic rule induction method
  3. Earth observation system , Knowledge database, Gross domestic product , patient rule induction method
  4. Easy observation system , Knowledge database, Grand domestic product , patient rule induction method

Qs 13. Insurance and direct mail are two industries that rely on -------- to make profitable business decisions. To aid decision making analysis construct ----- models using warehouse data to predict the outcomes of variety of decision alternatives. A --------profile is a model that predicts future purchasing behaviour of an individual customer,given historical transaction data for both the individual and for the larger population of all of a particular company’s customers. It is often beneficial to -------

Data into a smaller number of points , easing computational requirements and reducing the amount of noise.

 

  1. data analysis, predictive, predictive, aggregate
  2. alternative analysis predictive, predictive, aggregate
  3. data analysis, classification, predictive, noisy
  4. cluster analysis, predictive, predictive, aggregate

Qs 14. WIS , DRG, MBA, HOLAP means

  1. weight item sets ,Diagnosis related group, Mean basket Analysis, Hybrid OLAP
  2. weighted item sets ,Dialogue related group, Mark basket Analysis, Hybrid OLAP
  3. weighted item sets ,Diagnosis related group, Market basket Analysis, Hierarchical OLAP
  4. weighted item sets ,Diagnosis related group, Market basket Analysis, Hybrid OLAP

Qs 15. data stored in most text databases are ---------- text data bases are also called as ---

---------- is the first step in text retrieval system, precision , recall and F-score all are the measures of the text ---------- documents.

 

  1. sequence structured, document databases, Tokenization, retrieval
  2. semi structured, relational databases, Tokenization, processing
  3. semi structured, document databases, Tokenization, retrieval
  4.  structured, document databases, Tokenization, formatting