Comparative Analysis of Data Mining Techniques on Educational Data set

Data mining can be described as a process of analyzing hidden patterns of data according to different perspectives for categorization into useful information, which is collected and assembled in common areas, such as data warehouses, for efficient analysis, data mining algorithms, facilitating business decision making and other information requirements to ultimately cut costs and increase revenue.

Following are the major steps of data mining process.

Extract, transform and load data into a data warehouse.
Store and manage data in a multidimensional databases.
Provide data access to business analysts using application software.
Present analyzed data in easily understandable forms, such as graphs.

Referring to a journal written by Sumit Garg and Arvind K.Sharma, data mining is a relatively young and interdisciplinary field of computer science which discovers new patterns in large data sets. Data mining is a wide research area with important applications in Engineering, Science, Medicine, Business and Education.

Data Mining Techniques

According to the journal, as data mining process is to find new pattern from large data sets, it has several techniques. They can be categorized into two, as supervised learning and unsupervised learning.

Classification :
Used to develop a model which can classify the population of records at large level.

Decision Tree :

It’s a structure like a flowchart where each node denotes test on an attribute value, each branch represents the results of the test, while tree leaves show classes.

Bayesian Classification :

It’s a statistical classifier. Can be used to predict probabilities of class membership. This is based on Bayes Theorem which is,

Neural Networks :

Belongs to the border line between the artificial intelligence and approximation algorithm. It’s a collection of neurons like processing units with weighted connection between the units.

Association Rule Mining :

It’s an important data mining model studied extensively by the database and data mining community. It is, finding frequent patterns, associations, correlations, objects in transactional databases, relational databases.

Clustering :

Process of partitioning a set of data in a set of meaningful sub - classes, called clusters. It helps to understand the natural grouping or structure in a data set.

Prediction :

Used to identify the relationship between independent variables and relationship between dependent and independent variables.

Time Series Analysis :

It’s a sequence of data points, measured typically at successive times spaced at uniform time intervals.

Sequential Patterns :

Seeks to discover similar patterns in data transaction over a business period.

The authors of this journal have proposed a methodology which used to generate a database for their study.

A data set which represents their data-set collected from an Engineering College, as following :

Their experiments and observations was done by WEKA data mining tool. WEKA (Waikato Environment for Knowledge Analysis) is developed at the University of Waikato, New Zealand. It’s implemented in Java programming language and has GUI for loading data, running analysis and producing visualization of result. WEKA supports classification, clustering, feature section, data preprocessing and regression, visualization as well.

WEKA GUI chooser provides interfaces to work on Explorer, Experimenter, Knowledge Flow and Simple CLI.

Comparisons

In their journal, they have well described the importance of data mining techniques and comparisons of data mining techniques.

Reference :

Sumit Garg, M.Tech Scholar, Dept. of Computer Science, Shekhawati Engineering College, Dundlod, Rajasthan, India, Arvind K. Sharma, Guest Faculty, Dept. of Computer Science, University of Kota, Kota, Rajasthan, India, International Journal of Computer Applications (0975 – 8887), Volume 74– No.5, July 2013

HashSet - Literature Survey

Sunday, July 30, 2017