|
RESEARCH
Being an engineering school, the school is focusing on research
work only in those areas where potential application of the
research is in-sight and the benefits are tangible. Focus is
more on the areas where the research work can result in patents
or in products for the industry. This necessarily translates
into research areas where experimental work becomes very
relevant and supports simulation work.
Currently the
school is focusing on the following areas of research:
Data Mining
Data mining (or data discovery) is the process of
autonomously extracting useful information or knowledge
(“actionable assets”) from large data stores or sets. Data
mining can be performed on a variety of data stores, including
the World Wide Web, relational databases, transactional
databases, internal legacy systems, pdf documents, and data
warehouses. Many organizations have compiled a diverse
collection of massively large and dynamic datasets over the
years. Data mining is a tool that has been actively used to
discover interesting and surprising patterns in these datasets.
The technology has been successfully utilized by organizations
that collect web click streams, financial transactions,
observational science data, etc. Our research work would cover
major algorithmic advances in data mining with a thrust towards
both theoretical underpinnings of problems as well as successful
practical deployments. Topics that would be covered in our
research would include clustering, association rules, machine
learning, web link analysis, data streams, and
privacy-preserving algorithms.
Through Data mining techniques, a knowledge model is obtained
representing behavior patterns in relevant problem variables or
relations between them. Several algorithms are frequently tested
generating different models.
The most usual algorithms or techniques are:
-
IDT (Induction
of Decision Trees)
-
Neural Nets
-
Genetic
Algorithms
-
Fuzzy
techniques (fuzzy logic, fuzzy sets, etc.)
-
Rule induction
-
VSM (Vector
Support Machines)
-
Bayesian
Networks, etc.
Data mining attempts to identify valid novel, potentially
useful, and ultimately understandable patterns from huge volume
of data. The mined patterns must be ultimately understandable
because the purpose of data mining is to aid decision-making. A
data mining algorithm is usually inherently associated with some
representations for the patterns it mines. Therefore, an
important aspect of a data mining algorithm is the
comprehensibility of the representations it forms. That is,
whether or not the algorithm encodes the patterns it mines in
such a way that they can be inspected and understood by human
beings.
It is evident that data mining algorithms with good
comprehensibility are very desirable. Unfortunately, most data
mining algorithms are not very comprehensible and therefore
their comprehensibility has to be enhanced by extra mechanisms.
Since there are many different data mining tasks and
corresponding data mining algorithms, it is difficult for such a
short article to cover all of them. So, the following
discussions are restricted to the comprehensibility of
classification algorithms, but some essence is also applicable
to other kinds of data mining algorithms.
Data Mining
With the unprecedented rate at which data is being collected
today in almost all fields of human endeavor, there is an
emerging economic and scientific need to extract useful
information from it. Data mining is the process of automatic
discovery of patterns, changes, associations and anomalies in
massive databases, and is a highly inter-disciplinary field
representing the confluence of several disciplines, including
database systems, data warehousing, machine learning,
statistics, algorithms, data visualization, and high-performance
computing
Data mining refers to the automated or semi-automated search for
relationships and global patterning within data. Data mining
techniques include data visualization, neural network analysis,
and genetic algorithms. Data mining uses complex algorithms to
search large amounts of data and find patterns, correlation's,
and trends in that data. A data-mining application can create a
model that can identify buying habits, shopping trends, credit
card purchases as well as perform many non-commercial functions.
Data mining, also known as knowledge-discovery in databases
(KDD), is the practice of automatically searching large stores
of data for patterns. To do this, data mining uses computational
techniques from statistics and pattern recognition.
As data-mining has become recognized as a powerful tool, several
different communities have laid claim to the subject:
-
Statistics.
-
AI, where it is called machine learning.
-
Researchers in clustering algorithms.
-
Visualization researchers.
-
Databases. We'll be taking this approach, of course,
concentrating on the challenges that appear when the data is
large and computations are complex. In a sense, data mining
can be thought of as algorithms for executing very complex
queries on non-main-memory data.
In recent years, database and data mining communities have
focused on a new model of data processing, where data arrives in
the form of continuous streams. Because it is not feasible to
store all data, it is quite challenging to perform the
traditional data mining operations in a streaming environment.
Our current and proposed research focuses on many challenges
associated with mining streaming data. Our main thrust would be
on designing algorithm which would be effective and efficient in
frequent item set mining encompassing deterministic bounds on
accuracy. The recent trend in algorithm development for this
purpose is towards algorithms which are memory efficient and
allow mining of datasets with large number of distinct items
and/or very low support levels. |