Popular Posts

Tuesday, October 13, 2009

Professional Approach to Data Mining

I worked as a Data Stream Analyst with one of the nation’s top five market research firm. My duties involved data mining tasks which though; I had been doing but was unaware of its actual aspects. Then I took a introduction to data mining course during my pursue of Masters of Software Engineering degree at George Mason University which helps me to gain knowledge and maturity in this field. This course helps me to classify my daily tasks at job very easily. I will now try to relate some of my professional experiences with data mining tasks in this article which might be a good overview for beginners in this field. The data mining steps are:

Steps involve in Data Mining

Originally Data Mining consists of seven steps, as you can see above, but we combine  a couple of steps and comes with this diagram below:

Professional Steps involve for Knowledge Discover

So lets begin with Problem Definition first.

Step 1 Problem Definition:

In this step the problem definition is established. By problem definition it infers to the actual task for which the data mining is needed. As an example one of our clients wants to know his business progress geographically. Like

Visa Credit Card is being used mostly where?

There are four type of credit card being used widely American Express, MasterCard, Discover and Visa. Visa is one of our client and they want know in which part of the country there cards are mostly used and in which parts they cards neglected widely and what are the reasons behind it. 

Step 2 Data Gathering & Preparation:

We have two kinds of data streams, ecommerce; which involves data coming from online shopping sites like buy.com, amazon.com, eBay.com etc. and financial data stream which involves the data coming from online banking sites like bankofamerica.com, chevychase.com etc. Now in order to scrap the data which is required for analysis we code software agents and these agents will gather data for us. These agents not only help us in data gathering but they also do data cleaning for us and for this purpose we make patterns in software agents which will clean the data. Once the data is being gathered and cleaned we are ready for the data sampling. We will sample data in four types according to the requirement of our clients for example how we are reporting our client means monthly, quarterly, semi and or on yearly basis.

Step 3 Model Building & Evaluation:

We will build and test the model on the past data that we have it and then run it on the new data that we just gathered for the recent quarter.  The models are separately built for each client and for each client’s requirement. For example Visa credit cards geographical customer report modeler is different than it’s quarterly expense report modeler. So for each site and for each specific purpose we create different model from its past behavior data trends.

Step 4 Knowledge Deployment:


As it is described before that our Knowledge Deployment depends on creating custom reports depending on the customer requirement. This can vary from anything to know its own profit report or to get an idea of the competitors business in that region for launching a new product. Our custom reports provide all the information which can be helpful in increased revenue. Our reports are supported by the facts of the past data which is helpful is making a strong impact on the clients.

Example Reports:

  • E-Commerce Spending on Black Friday Jumps 42 Percent Versus Last Year.
  • Consumer Online Retail Spending Tops $610 Million Per Day for Five Consecutive Days as Online Holiday Season Spending Grows 25 Percent vs. 2005.
  • Fox Interactive Media Ranks #1 in Page Views; Yahoo! Sites Attract the Most Unique Visitors.
  • 23 million People Watched More Than 2 Billion videos online in France in January 2008


I hope this will help some of us to understand how data mining is been done at a professional level. This will also clear student’s perspective of data mining being used now a days as some of my class fellows were not able to get it while we were taking Introduction to Data Mining course.

Post a Comment