Data mining, or knowledge discovery, is the computer-assisted process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools predict behaviors and future trends, allowing businesses to make proactive, knowledge- driven decisionsAlthough data mining is still in its infancy, companies in a wide range of industries – including retail, finance, health care, manufacturing transportation, and aerospace - are already using data mining tools and techniques to take advantage of historical data.
By using pattern recognition technologies and statistical and mathematical techniques to sift through warehoused information, data mining helps analysts recognize significant facts, relationships, trends, patterns, exceptions and anomalies that might otherwise go unnoticed. A data mining algorithm is a set of heuristics and calculations that creates a data mining model from data. To create a model, the algorithm first analyzes the data you provide, looking for specific types of patterns or trends. The algorithm uses the results of this analysis to define optimal parameters for creating the mining model.These parameters are then applied across the entire data set to extract actionable patterns and detailed statistics This paper will Determine the benefits of data mining to the businesses when employing: Predictive analytics to understand the behavior of customers, Associations discovery in products sold to customers, Web mining to discover business intelligence from Web customers, and Clustering to find related customer information. It will also assess the reliability of the data mining algorithms and decide if they can be trusted and predict the errors they are likely to produce, analyze privacy concerns raised by the collection of personal data for mining purposes.
This will be done by choosing and describing three (3) concerns raised by consumers, deciding if each of these concerns is valid and explain your decision for each, and describing how each concern is being allayed. This paper will also provide at least three (3) examples where businesses have used predictive analysis to gain a competitive advantage and evaluate the effectiveness of each business’s strategy.1. Determine the benefits of data mining to the businesses when employing:1. Predictive analytics to understand the behavior of customers 2. Associations discovery in products sold to customers 3.
Web mining to discover business intelligence from Web customers 4. Clustering to find related customer information Data mining is defined as “a process that uses statistical, mathematical, artificial intelligence, and machine-learning techniques to extract and identify useful information and subsequent knowledge from large databases, including data warehouses”.The information identified using data mining includes patterns indicating trends, correlations, rules, similarities, and used as predictive analytics. “Predictive analysis is the decision science that removes guesswork out of the decision-making process and applies proven scientific guidelines to find right solution in the shortest time possible.” There are seven steps to Predictive Analytics: spot the business problem, explore various data sources, extract patterns from data, build a sample model using data and problem, Clarify data – find valuable factors – generate new variables, construct a predictive model using sampling and validate and deploy the model.
By using this method, businesses can make fast decisions using vast amounts of data. There are three main benefits of predictive analytics: minimizing risk, indentifying fraud, and pursuing new sources of revenue. Being able to predict the risks involved with loan and credit origination, fraudulent insurance claims, and making predictions with regard to promotional offers and coupons are all examples of these benefits. It basically reduces the cost of making mistakes.
This type of algorithm allows businesses to test all sorts of situations and scenarios it could take years to test in the real world. Studying customer behavior gives businesses a competitive advantage and allows them to stay ahead of the competition in their market place. Association analysis is useful for discovering interesting relationships hidden in large amounts of data. There are two things to remember when using association analysis with regard to market data: discovering patterns from a large transaction data set can be computationally expensive and some of the discovered patterns are potentially spurious because they may happen simply by chance. Association discovery finds rules about items that appear together in an event such as a purchase transaction.
Market-basket analysis is a well-known example of association discovery.This algorithm is used for recommendation engines. These engines are used to recommend products to customers based on items they have already bought or shown interest in. This provides a benefit to the business by allowing them to effectively stage their products, as well as, knowing which customers to target for specific promotions or new products. Web data mining is the process of extracting structured information from unstructured or semi-structured web data sources. Companies use web data mining as a tool to gather data from different websites and collate it together to do analysis, build websites which provide information from different websites.
It helps the visitors to get a lot of information in one location instead of reading information from different websites. For business intelligence, competitiveness in the markets of ecommerce and the vast number of options customers have today have forced business’s to employ marketing strategies that are built largely on data mined from web mining.Web usage mining is critical for effective Web site management, creating adaptive Web sites, business and support services, personalization, network traffic flow analysis and more. Business intelligence keeps a business informed of market trends, alerts about new avenues of generating revenue, and helps determine the status of the competition.
Clustering analysis subdivides a market into distinct subsets of customers where any subset may potentially be selected as a market target to be reached with a distinct marketing mix. This type of analysis finds clusters of data objects that are similar in some sense to one another and segments that data. Businesses today collect information about what pages site users visit, and about the order in which the pages are visited. Because the business provides online ordering, customers must log in to the site.This provides the company with click for each customer profile.
By using a clustering algorithm on this data, the business can find groups, or clusters, of customers who have similar patterns or sequences of clicks. The business can then use these clusters to analyze how users move through the Web site, to identify which pages are most closely related to the sale of a particular product, and to predict which pages are most likely to be visited next.2. Assess the reliability of the data mining algorithms.
Decide if they can be trusted andpredict the errors they are likely to produce. Reliability of the data mining algorithms has opportunity for error and misuse. The algorithm is only going to be reliable if they have gone through sufficient validation testing. The results must be validated. Not all patterns discovered with data mining algorithms are going to be valid.
It is possible for a pattern to be discovered in the test data but not in the general population of the data. There are three ways of measuring data mining: accuracy, reliability and usefulness. Accuracy measures how the model correlates an outcome with the attributes in the data that has been provided. Reliability focuses on how that mining model performs using different sets of data. And Usefulness examines various metrics that tell you whether the model provides useful information.
It is possible for users of the algorithm to ask the wrong question, fail to test the reasonableness of the results, ignoring discrepancies in the data, ignoring simple explanations and building overly complex models, over generalizing from the results, using insufficient or inadequate data or using a single data analysis tool. 3. Analyze privacy concerns raised by the collection of personal data for mining purposes. 1. Choose and describe three (3) concerns raised by consumers.
2. Decide if each of these concerns is valid and explain your decision for each.3. Describe how each concern is being allayedIn order to perform data mining, information must be gathered to enter into the system. This information can contain private or confidential information that an individual did not release to a third party. The data can also contain identifying information about the individuals that once the data mine is performed is no longer anonymous.
This can be a problem with regard to privacy. Privacy is the right of individual’s to control information about them. With data mining there are some valid concerns and they revolve around secondary use of the personal information, handling misinformation, and granulated access to personal information. The data collected could pose potential risks to the privacy of persons or organizations.These risks are not limited to theft by fraud, actual identifications or incorrect identification that could threaten a person’s life, livelihood, or reputation.
There are documented cases where individuals have obtained a person’s address and then physically did them harm. Pedophiles thrive on this type of data that as technology advances leaves individuals vulnerable to all sorts of attacks. Hackers breaking into large company databases have left many individuals the victim of identity fraud. There are both mandatory and voluntary controls that cushion some of these concerns. There are legal restrictions on the use of information and action that can be taken in the event of such activity but it is a cumbersome and arduous process that can take a long period of time to recover from.Sadly, the laws are lagging behind technology and insufficient to protect individuals alone.
The voluntary controls consist of technical, methodological and policy approaches to limit opportunities for inappropriate access to insure the sound data with a desired outcome. In some cases the consumer has no choice; we all have to give up certain information to buy homes, vehicles and other necessities of life. But one should exercise caution from giving up essential personal information unnecessarily.4. Provide at least three (3) examples where businesses have used predictive analysis to gaina competitive advantage and evaluate the effectiveness of each business’s strategy. Blue Cross and Blue Shield System (BCBS) is one organization that is already deriving considerable benefits from predictive analytics.
As an organization that provides healthcare insurance to nearly one in three Americans, BCBS has amassed a huge amount of claims-related data over the years. By applying predictive analytics technologies to its vast trove of claims data, BCBS has been getting better at not only identifying the risk factors that lead to several chronic diseases, but also identifying individuals who are at heightened risk of getting such diseasesMemphis Police Department (MPD) has enhanced its crime fighting techniques with IBM predictive analytics software and reduced serious crime by more than 30 percent, including a 15 percent reduction in violent crimes since 2006. MPD is now able to evaluate incident patterns throughout the city and forecast criminal "hot spots" to proactively allocate resources and deploy personnel, resulting in improved force effectiveness and increased public safety.Target used predictive analytics to determine based on past purchases if a woman could be pregnant. Target assigns every customer a Guest ID number, tied to their credit card, name, or email address that becomes a bucket that stores a history of everything they’ve bought and any demographic information Target has collected from them or bought from other sources. Using that, Target looked at historical buying data for all the ladies who had signed up for Target baby registries in the past.
They successfully used this information to send out target coupons to improve the sales of their maternity and baby products.