stty consulting › our future
Data Mining: Another Tool To Increase Productivity In Manufacturing?
What Is Data Mining?
The definition of Data Mining is one that is often confused. Many feel it is merely a part of the Knowledge Management concept. In fact, Data Mining is referred to in a broader context as Knowledge Discovery in Databases or KDD. It may appear as if Knowledge Management and KDD themselves are recycled concepts. In fact, many definitions of Knowledge Management are very similar to many of its predecessors; Information Systems, Decision Support Systems, Expert Systems and their earlier forms. Not only, do they all exhibit very similar goals, the methods in which they extrapolate information from data are not too dissimilar either.
The Knowledge Management concept "emanates from its earlier definition of capturing, storing and analytically processing that resides in the various companies databases for decision making." Kanter [1]. This does not appear to be any different from that of a Management Information System. Kanter does note that knowledge includes tacit or implicit knowledge of the user which does not exist within any database which does set KM apart from the pack.
Fayyad, et al [6], however, assert that Knowledge Discovery in Databases and Data Mining are different. "The term knowledge discovery in databases, or KDD for short, was coined in 1989 to refer to the broad process of finding knowledge in data, and to emphasize the 'highlevel' application of particular data mining methods. The term Data Mining has been commonly used by statisticians, data analysts and the MIS (Management Information Systems) community, while Knowledge Discovery in Databases has been mostly used by artificial intelligence and machine learning researchers "Knowledge Discovery in Databases refers to the overall process of discovering useful knowledge from data while data mining refers to the application of algorithms for extracting patterns from data without the additional steps of the Knowledge Discovery in Databases process."
The data mining differs from common statistical methods in the quantity of data processed. Berry and Linoff [2] define Data Mining as "the process of exploration and analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns and rules." The choice of words 'automatic or semi-automatic' is interesting but not wholly necessary as many if the mining techniques may be employed manually.
Hand et al [4] state that "Data Mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner."
Data Mining on the other hand works most proficiently when the data set is arranged in the tabular format, a single non-normalised table. Unlike Data Warehouse, Data Mining works with a fixed viewpoint and the analyst, assisted by the computer, discover relationships and structures within the data set. To build a data 'cube' you first need to know the relationships to design the model prior to populating with data.
With the similarities of all the various methods of converting data into information and information to knowledge why, would we need to create a new term? Kanter notes that buzzwords are a positive contribution as they draw the attention of the subject at hand. Spiegler [3] goes further, "buzzwords in a fast moving society is a double-edged sword. ...buzzwords tend to create a shallow image of ideas and a notion that their introduction is more for marketing and sales consumption to denote innovation." Data Mining is no exception to this, even though it is indeed a separate branch of information discovery it has not had time to mature. Data Mining will suffer as the increased speed to market of new concepts within the computing industry does not allow a great deal of time develop it own language. Data Mining is an interesting derivation of Knowledge Management that requires time before the true potential is realised.
So how does Data Mining differ from other information systems and statistical tools? 'Query and reporting' methods are common among relational database systems and applications. By using the standard 'query and reporting' methods of extracting information from data sets, the analyst will only be able to answer simple questions, i.e. 'Who bought what?'
Data Warehouse and 'Online analytical processing' (OLAP) go beyond queries allowing the user to 'drill-down' into the data. OLAP is good for drilling into summary and consolidated data to answer more complex historical questions, i.e. "What is the average annual income of households of pet owners by year by region?"
Data Mining techniques are able to predict the future by analysing the past unlike other disciplines. Data Mining is able to sift thorough massive amounts of data and find hidden information and relationships. The other methods are unable to predict the future as the user is only given results to questions posed. If you used only queries and OLAP tools you would therefore need to know what you were looking for prior to initiating the search, employ good analytical techniques and have plenty of time before you would eventually find your answer. Data Mining does not suffer from these limitations. As Data Mining uses a host of different algorithms to sort through each record in the data set it is able to unearth patterns and relationship that were unknown. Data Mining goes further than OLAP by allowing the analyst to ask the system to give predictions, i.e. "Who is likely to purchase a cell phone and why?"