Following are the major steps of data mining process.
- Extract, transform and load data into a data warehouse.
- Store and manage data in a multidimensional databases.
- Provide data access to business analysts using application software.
- Present analyzed data in easily understandable forms, such as graphs.
Referring to a journal written by Sumit Garg and
Arvind K.Sharma, data mining is a relatively young and interdisciplinary field
of computer science which discovers new patterns in large data sets. Data
mining is a wide research area with important applications in Engineering,
Science, Medicine, Business and Education.
Data Mining Techniques
According to the journal, as data mining process
is to find new pattern from large data sets, it has several techniques. They
can be categorized into two, as supervised learning and unsupervised learning.
Used to develop a model which can classify the population of records at large level.
Decision Tree :
It’s a structure like a flowchart where each
node denotes test on an attribute value, each branch represents the results of
the test, while tree leaves show classes.
Bayesian Classification :
It’s a statistical classifier. Can be used to
predict probabilities of class membership. This is based on Bayes Theorem which
is,
Neural Networks :
Belongs to the border line between the
artificial intelligence and approximation algorithm. It’s a collection of
neurons like processing units with weighted connection between the units.
Association Rule Mining :
It’s an important data mining model studied
extensively by the database and data mining community. It is, finding frequent
patterns, associations, correlations, objects in transactional databases,
relational databases.
Clustering :
Process
of partitioning a set of data in a set of meaningful sub - classes, called
clusters. It helps to understand the natural grouping or structure in a data
set.
Prediction :
Used to identify the relationship between
independent variables and relationship between dependent and independent
variables.
Time Series Analysis :
It’s a sequence of data points, measured
typically at successive times spaced at uniform time intervals.
Sequential Patterns :
Seeks
to discover similar patterns in data transaction over a business period.
The authors of this journal have proposed a
methodology which used to generate a database for their study.
A data set which represents their data-set collected from an Engineering College, as following :
Their experiments and observations was done by
WEKA data mining tool. WEKA (Waikato Environment for Knowledge Analysis) is
developed at the University of Waikato, New Zealand. It’s implemented in Java
programming language and has GUI for loading data, running analysis and
producing visualization of result. WEKA supports classification, clustering,
feature section, data preprocessing and regression, visualization as well.
WEKA GUI chooser provides interfaces to work on
Explorer, Experimenter, Knowledge Flow and Simple CLI.
Comparisons
In their journal, they have well described the
importance of data mining techniques and comparisons of data mining techniques.
Reference :
Sumit Garg, M.Tech Scholar, Dept. of Computer Science, Shekhawati Engineering College, Dundlod, Rajasthan, India, Arvind K. Sharma, Guest Faculty, Dept. of Computer Science, University of Kota, Kota, Rajasthan, India, International Journal of Computer Applications (0975 – 8887), Volume 74– No.5, July 2013
No comments:
Post a Comment