Sunday, 30 June 2013

Online Data Entry and Data Mining Services

Data entry job involves transcribing a particular type of data into some other form. It can be either online or offline. The input data may include printed documents like Application forms, survey forms, registration forms, handwritten documents etc.

Data entry process is an inevitable part of the job to any organization. One way or other each organization demands data entry. Data entry skills vary depends upon the nature of the job requirement, in some cases data to be entered from a hard copy formats and in some other cases data to be entered directly into a web portal. Online data entry job generally requires the data to be entered in to any online data base.

For a super market, data associate might be required to enter the goods which have sold in a particular day and the new goods received in a particular day to maintain the stock well in order. Also, by doing this the concerned authorities will get an idea about the sale particulars of each commodity as they requires. In another example, an office the account executive might be required to input the day to day expenses in to the online accounting database in order to keep the account well in order.

The aim of the data mining process is to collect the information from reliable online sources as per the requirement of the customer and convert it to a structured format for the further use. The major source of data mining is any of the internet search engine like Google, Yahoo, Bing, AOL, MSN etc. Many search engines such as Google and Bing provide customized results based on the user's activity history. Based on our keyword search, the search engine lists the details of the websites from where we can gather the details as per our requirement.

Collect the data from the online sources such as Company Name, Contact Person, Profile of the Company, Contact Phone Number of Email ID Etc. are doing for the marketing activities. Once the data is gathered from the online sources into a structured format, the marketing authorities will start their marketing promotions by calling or emailing the concerned persons, which may result to create a new customer. So basically data mining is playing a vital role in today's business expansions. By outsourcing the data entry and its related works, you can save the cost that would be incurred in setting up the necessary infrastructure and employee cost.


Source: http://ezinearticles.com/?Online-Data-Entry-and-Data-Mining-Services&id=7713395

Friday, 28 June 2013

Know What the Truth Behind Data Mining Outsourcing Service

We came to that, what we call the information age where industries are like useful data needed for decision-making, the creation of products - among other essential uses for business. Information mining and converting them to useful information is a part of this trend that allows companies to reach their optimum potential. However, many companies that do not meet even one deal with data mining question because they are simply overwhelmed with other important tasks. This is where data mining outsourcing comes in.

There have been many definitions to introduced, but it can be simply explained as a process that involves sorting through large amounts of raw data to extract valuable information needed by industries and enterprises in various fields. In most cases this is done by professionals, professional organizations and financial analysts. He has seen considerable growth in the number of sectors or groups that enter my self.
There are a number of reasons why there is a rapid growth in data mining outsourcing service subscriptions. Some of them are presented below:

A wide range of services

Many companies are turning to information mining outsourcing, because they cover a wide range of services. These services include, but are not limited to data from web applications congregation database, collect contact information from different sites, extract data from websites using the software, the sort of stories from sources news, information and accumulate commercial competitors.

Many companies fall

Many industries benefit because it is fast and realistic. The information extracted by data mining service providers of outsourcing used in crucial decisions in the field of direct marketing, e-commerce, customer relationship management, health, scientific tests and other experimental work, telecommunications, financial services, and a whole lot more.

A lot of advantages

Subscribe data mining outsourcing services it's offers many benefits, as providers assures customers to render services to world standards. They strive to work with improved technologies, scalability, sophisticated infrastructure, resources, timeliness, cost, the system safer for the security of information and increased market coverage.

Outsourcing allows companies to focus their core business and can improve overall productivity. Not surprisingly, information mining outsourcing has been a first choice of many companies - to propel the business to higher profits.


Source: http://ezinearticles.com/?Know-What-the-Truth-Behind-Data-Mining-Outsourcing-Service&id=5303589

Tuesday, 25 June 2013

Data Harvesting

E-Publishing

Data Harvesting Digitization

E Publishing is the process of publishing information to be viewed in electronic format or online. Such information is delivered via Electronic Books or "eBooks", CD-ROM or over the Internet.

E-publishing services addresses the needs of the e-Publishing industry and help Publishers to get their products into the market place quickly, in print Media, electronic media or in web media. E publishing is also known as electronic publishing or Internet publishing.

Publishing Services and Projects that can be outsourced:

-PDF creation or conversion

-ebooks development or formatting

-Book to e-book conversion

-eBooks writing, editing, proofing

-Download and tracking ebooks software development

-ePublishing website development

-Web content management software development

-Turnkey web publishing products

-Multimedia book development

-Book conversion from media to another media

-Graphic design for online publishing

-Custom development of ePublishing solutions

-e-Book security solutions

-Content syndication solutions

-XML conversion

-Ezine development and management

-Subscription management

-Distribution of digital content

-Print on demand solutions

Advantages of E-Publishing

-Reaches a wider audience

-Integrates multiple sources

-Universally recognized format

-Easy archival and retrieval

-Accurate information extraction

Data Harvesting

Data harvesting is gathering or harvesting data from the Web for directories and databases. It is a research made in the web for obtaining required data in a required format.

Who uses Data Harvesting?

-Marketing Managers, who need to get a grasp of e marketing.

-Business leaders, who have to meet new e-targets.

-Financial Directors, who need to cut costs and improve efficiency.

-IT Departments, who want to outsource specialist e-marketing support projects.

Data Harvesting Services that can be outsourced:

-Data Capturing Service From The Web

-Catalog / database management

-Internet research, email mining and customized list making

-Portal management support

-e-Newsletters / e-Clippings

-Secondary Research / Market Intelligence

Digitization

Digitization is the conversion of images, characters, or sounds to digital codes so that the information may be processed or stored by a computer system. Digitizing can be effectively outsourced allowing you to utilize your resources in the more critical design phases of your projects. The digital technology revolution has given libraries, archives and cultural institutions the ability to reproduce their assets - including rare, fragile and uniquely visual items - for virtually universal access, copying, and distribution.



Source: http://ezinearticles.com/?Data-Harvesting&id=1625193

Monday, 24 June 2013

Why Web Scraping Software Won't Help

How to get continuous stream of data from these websites without getting stopped? Scraping logic depends upon the HTML sent out by the web server on page requests, if anything changes in the output, its most likely going to break your scraper setup.

If you are running a website which depends upon getting continuous updated data from some websites, it can be dangerous to reply on just a software.

Some of the challenges you should think:

1. Web masters keep changing their websites to be more user friendly and look better, in turn it breaks the delicate scraper data extraction logic.

2. IP address block: If you continuously keep scraping from a website from your office, your IP is going to get blocked by the "security guards" one day.

3. Websites are increasingly using better ways to send data, Ajax, client side web service calls etc. Making it increasingly harder to scrap data off from these websites. Unless you are an expert in programing, you will not be able to get the data out.

4. Think of a situation, where your newly setup website has started flourishing and suddenly the dream data feed that you used to get stops. In today's society of abundant resources, your users will switch to a service which is still serving them fresh data.

Getting over these challenges

Let experts help you, people who have been in this business for a long time and have been serving clients day in and out. They run their own servers which are there just to do one job, extract data. IP blocking is no issue for them as they can switch servers in minutes and get the scraping exercise back on track. Try this service and you will see what I mean here.


Source: http://ezinearticles.com/?Why-Web-Scraping-Software-Wont-Help&id=4550594

Friday, 21 June 2013

New Method of Market Segmentation - Combining Segmentation With Data Mining

Marketers have the ability to get high-fidelity information on their target markets through market segmentation. Market segmentation is the process of categorizing potential customers based on certain variables, such as age, gender, and income. A market segment is a group of customers that will react in the same way to a particular marketing campaign. By gathering this information, marketers can tailor their campaigns to groups of prospects to build stronger relationships with them.

Marketers gather this demographic information through surveys, usually when the customer submits a product rebate or willingly participates in a customer satisfaction survey. Over the majority of the past few decades, market segmentation consisted of differentiating prospects based on very simple variables: income, race, location, etc. While this is definitely important information to have on your target market, modern market segmentation takes into account more integrated information.

Modern segmentation breaks the market into target clusters that take into account not only standard demographics, but also other factors such as population density, psychographics, and buying and spending habits of customers. By focusing on these variables in addition to standard demographics, you can gain deeper insight into customer behavior.

Using standard demographics, you can tailor your marketing pieces to specific groups of people. But, by including these more sophisticated variables in your segmentation process, you can determine achieve a higher degree of "lift" or return on your segmentation efforts.

Segmenting your market on these factors helps you realize your total opportunity and revenue potential. It can enable you to better compete with similar product or service providers and lets you know where you stand within the game. It can help you target untapped market opportunities and allow you to better reach and retain customers.

Market segmentation depends on the gathering of high-quality, usable data. Many companies exist to gather and sell massive databases of targeted customer information, as well as providing consultation services to help you make sense of data bought or already owned. The key to the process is determining the best way to split up data.

There are essentially two methods for categorizing customers. Segments can either be determined in advance and then customers are assigned to each segment, or the actual customer data can be analyzed to identify naturally occurring behavioral clusters. Each cluster forms a particular market segment.

The benefit of cluster-based segmentation is that as a market's behavior changes, you can adapt your campaigns to better suit the cluster. The latest techniques blend cluster-based segmentation with deeper customer information acquired via data mining. Data mining uses algorithms to interrogate data within a database, and can produce information such as buying frequency and product types.

This new method of market segmentation, combining segmentation with data mining, provides marketers with high quality information on how their customers shop for and purchase their products or services. By combining standard market segmentation with data mining techniques you can better predict and model the behavior of your segments.


Source: http://ezinearticles.com/?New-Method-of-Market-Segmentation---Combining-Segmentation-With-Data-Mining&id=6890243

Wednesday, 19 June 2013

Usefulness of Web Scraping Services

For any business or organization, surveys and market research play important roles in the strategic decision-making process. Data extraction and web scraping techniques are important tools that find relevant data and information for your personal or business use. Many companies employ people to copy-paste data manually from the web pages. This process is very reliable but very costly as it results to time wastage and effort. This is so because the data collected is less compared to the resources spent and time taken to gather such data.

Nowadays, various data mining companies have developed effective web scraping techniques that can crawl over thousands of websites and their pages to harvest particular information. The information extracted is then stored into a CSV file, database, XML file, or any other source with the required format. After the data has been collected and stored, data mining process can be used to extract the hidden patterns and trends contained in the data. By understanding the correlations and patterns in the data; policies can be formulated and thereby aiding the decision-making process. The information can also be stored for future reference.

The following are some of the common examples of data extraction process:

• Scrap through a government portal in order to extract the names of the citizens who are reliable for a given survey.
• Scraping competitor websites for feature data and product pricing
• Using web scraping to download videos and images for stock photography site or for website design

Automated Data Collection
It is important to note that web scraping process allows a company to monitor the website data changes over a given time frame. It also collects the data on a routine basis regularly. Automated data collection techniques are quite important as they help companies to discover customer trends and market trends. By determining market trends, it is possible to understand the customer behavior and predict the likelihood of how the data will change.

The following are some of the examples of the automated data collection:

• Monitoring price information for the particular stocks on hourly basis
• Collecting mortgage rates from the various financial institutions on the daily basis
• Checking on weather reports on regular basis as required

By using web scraping services it is possible to extract any data that is related to your business. The data can then be downloaded into a spreadsheet or a database for it to be analyzed and compared. Storing the data in a database or in a required format makes it easier for interpretation and understanding of the correlations and for identification of the hidden patterns.

Through web scraping it is possible to get quicker and accurate results and thus saving many resources in terms of money and time. With data extraction services, it is possible to fetch information about pricing, mailing, database, profile data, and competitors data on a consistent basis. With the emergence of professional data mining companies outsourcing your services will greatly reduce your costs and at the same time you are assured of high quality services.



Source: http://ezinearticles.com/?Usefulness-of-Web-Scraping-Services&id=7181014

Monday, 17 June 2013

Web Data Extraction Services

Web Data Extraction from Dynamic Pages includes some of the services that may be acquired through outsourcing. It is possible to siphon information from proven websites through the use of Data Scrapping software. The information is applicable in many areas in business. It is possible to get such solutions as data collection, screen scrapping, email extractor and Web Data Mining services among others from companies providing websites such as Scrappingexpert.com.

Data mining is common as far as outsourcing business is concerned. Many companies are outsource data mining services and companies dealing with these services can earn a lot of money, especially in the growing business regarding outsourcing and general internet business. With web data extraction, you will pull data in a structured organized format. The source of the information will even be from an unstructured or semi-structured source.

In addition, it is possible to pull data which has originally been presented in a variety of formats including PDF, HTML, and test among others. The web data extraction service therefore, provides a diversity regarding the source of information. Large scale organizations have used data extraction services where they get large amounts of data on a daily basis. It is possible for you to get high accuracy of information in an efficient manner and it is also affordable.

Web data extraction services are important when it comes to collection of data and web-based information on the internet. Data collection services are very important as far as consumer research is concerned. Research is turning out to be a very vital thing among companies today. There is need for companies to adopt various strategies that will lead to fast means of data extraction, efficient extraction of data, as well as use of organized formats and flexibility.

In addition, people will prefer software that provides flexibility as far as application is concerned. In addition, there is software that can be customized according to the needs of customers, and these will play an important role in fulfilling diverse customer needs. Companies selling the particular software therefore, need to provide such features that provide excellent customer experience.

It is possible for companies to extract emails and other communications from certain sources as far as they are valid email messages. This will be done without incurring any duplicates. You will extract emails and messages from a variety of formats for the web pages, including HTML files, text files and other formats. It is possible to carry these services in a fast reliable and in an optimal output and hence, the software providing such capability is in high demand. It can help businesses and companies quickly search contacts for the people to be sent email messages.

It is also possible to use software to sort large amount of data and extract information, in an activity termed as data mining. This way, the company will realize reduced costs and saving of time and increasing return on investment. In this practice, the company will carry out Meta data extraction, scanning data, and others as well.


Source: http://ezinearticles.com/?Web-Data-Extraction-Services&id=4733722

Friday, 14 June 2013

Backtesting & Data Mining

Introduction

In this article we'll take a look at two related practices that are widely used by traders called Backtesting and Data Mining. These are techniques that are powerful and valuable if we use them correctly, however traders often misuse them. Therefore, we'll also explore two common pitfalls of these techniques, known as the multiple hypothesis problem and overfitting and how to overcome these pitfalls.

Backtesting

Backtesting is just the process of using historical data to test the performance of some trading strategy. Backtesting generally starts with a strategy that we would like to test, for instance buying GBP/USD when it crosses above the 20-day moving average and selling when it crosses below that average. Now we could test that strategy by watching what the market does going forward, but that would take a long time. This is why we use historical data that is already available.

"But wait, wait!" I hear you say. "Couldn't you cheat or at least be biased because you already know what happened in the past?" That's definitely a concern, so a valid backtest will be one in which we aren't familiar with the historical data. We can accomplish this by choosing random time periods or by choosing many different time periods in which to conduct the test.

Now I can hear another group of you saying, "But all that historical data just sitting there waiting to be analyzed is tempting isn't it? Maybe there are profound secrets in that data just waiting for geeks like us to discover it. Would it be so wrong for us to examine that historical data first, to analyze it and see if we can find patterns hidden within it?" This argument is also valid, but it leads us into an area fraught with danger...the world of Data Mining

Data Mining

Data Mining involves searching through data in order to locate patterns and find possible correlations between variables. In the example above involving the 20-day moving average strategy, we just came up with that particular indicator out of the blue, but suppose we had no idea what type of strategy we wanted to test? That's when data mining comes in handy. We could search through our historical data on GBP/USD to see how the price behaved after it crossed many different moving averages. We could check price movements against many other types of indicators as well and see which ones correspond to large price movements.

The subject of data mining can be controversial because as I discussed above it seems a bit like cheating or "looking ahead" in the data. Is data mining a valid scientific technique? On the one hand the scientific method says that we're supposed to make a hypothesis first and then test it against our data, but on the other hand it seems appropriate to do some "exploration" of the data first in order to suggest a hypothesis. So which is right? We can look at the steps in the Scientific Method for a clue to the source of the confusion. The process in general looks like this:

Observation (data) >>> Hypothesis >>> Prediction >>> Experiment (data)

Notice that we can deal with data during both the Observation and Experiment stages. So both views are right. We must use data in order to create a sensible hypothesis, but we also test that hypothesis using data. The trick is simply to make sure that the two sets of data are not the same! We must never test our hypothesis using the same set of data that we used to suggest our hypothesis. In other words, if you use data mining in order to come up with strategy ideas, make sure you use a different set of data to backtest those ideas.

Now we'll turn our attention to the main pitfalls of using data mining and backtesting incorrectly. The general problem is known as "over-optimization" and I prefer to break that problem down into two distinct types. These are the multiple hypothesis problem and overfitting. In a sense they are opposite ways of making the same error. The multiple hypothesis problem involves choosing many simple hypotheses while overfitting involves the creation of one very complex hypothesis.

The Multiple Hypothesis Problem

To see how this problem arises, let's go back to our example where we backtested the 20-day moving average strategy. Let's suppose that we backtest the strategy against ten years of historical market data and lo and behold guess what? The results are not very encouraging. However, being rough and tumble traders as we are, we decide not to give up so easily. What about a ten day moving average? That might work out a little better, so let's backtest it! We run another backtest and we find that the results still aren't stellar, but they're a bit better than the 20-day results. We decide to explore a little and run similar tests with 5-day and 30-day moving averages. Finally it occurs to us that we could actually just test every single moving average up to some point and see how they all perform. So we test the 2-day, 3-day, 4-day, and so on, all the way up to the 50-day moving average.

Now certainly some of these averages will perform poorly and others will perform fairly well, but there will have to be one of them which is the absolute best. For instance we may find that the 32-day moving average turned out to be the best performer during this particular ten year period. Does this mean that there is something special about the 32-day average and that we should be confident that it will perform well in the future? Unfortunately many traders assume this to be the case, and they just stop their analysis at this point, thinking that they've discovered something profound. They have fallen into the "Multiple Hypothesis Problem" pitfall.

The problem is that there is nothing at all unusual or significant about the fact that some average turned out to be the best. After all, we tested almost fifty of them against the same data, so we'd expect to find a few good performers, just by chance. It doesn't mean there's anything special about the particular moving average that "won" in this case. The problem arises because we tested multiple hypotheses until we found one that worked, instead of choosing a single hypothesis and testing it.

Here's a good classic analogy. We could come up with a single hypothesis such as "Scott is great at flipping heads on a coin." From that, we could create a prediction that says, "If the hypothesis is true, Scott will be able to flip 10 heads in a row." Then we can perform a simple experiment to test that hypothesis. If I can flip 10 heads in a row it actually doesn't prove the hypothesis. However if I can't accomplish this feat it definitely disproves the hypothesis. As we do repeated experiments which fail to disprove the hypothesis, then our confidence in its truth grows.

That's the right way to do it. However, what if we had come up with 1,000 hypotheses instead of just the one about me being a good coin flipper? We could make the same hypothesis about 1,000 different people...me, Ed, Cindy, Bill, Sam, etc. Ok, now let's test our multiple hypotheses. We ask all 1000 people to flip a coin. There will probably be about 500 who flip heads. Everyone else can go home. Now we ask those 500 people to flip again, and this time about 250 will flip heads. On the third flip about 125 people flip heads, on the fourth about 63 people are left, and on the fifth flip there are about 32. These 32 people are all pretty amazing aren't they? They've all flipped five heads in a row! If we flip five more times and eliminate half the people each time on average, we will end up with 16, then 8, then 4, then 2 and finally one person left who has flipped ten heads in a row. It's Bill! Bill is a "fantabulous" flipper of coins! Or is he?

Well we really don't know, and that's the point. Bill may have won our contest out of pure chance, or he may very well be the best flipper of heads this side of the Andromeda galaxy. By the same token, we don't know if the 32-day moving average from our example above just performed well in our test by pure chance, or if there is really something special about it. But all we've done so far is to find a hypothesis, namely that the 32-day moving average strategy is profitable (or that Bill is a great coin flipper). We haven't actually tested that hypothesis yet.

So now that we understand that we haven't really discovered anything significant yet about the 32-day moving average or about Bill's ability to flip coins, the natural question to ask is what should we do next? As I mentioned above, many traders never realize that there is a next step required at all. Well, in the case of Bill you'd probably ask, "Aha, but can he flip ten heads in a row again?" In the case of the 32-day moving average, we'd want to test it again, but certainly not against the same data sample that we used to choose that hypothesis. We would choose another ten-year period and see if the strategy worked just as well. We could continue to do this experiment as many times as we wanted until our supply of new ten-year periods ran out. We refer to this as "out of sample testing", and it's the way to avoid this pitfall. There are various methods of such testing, one of which is "cross validation", but we won't get into that much detail here.

Overfitting

Overfitting is really a kind of reversal of the above problem. In the multiple hypothesis example above, we looked at many simple hypotheses and picked the one that performed best in the past. In overfitting we first look at the past and then construct a single complex hypothesis that fits well with what happened. For example if I look at the USD/JPY rate over the past 10 days, I might see that the daily closes did this:

up, up, down, up, up, up, down, down, down, up.

Got it? See the pattern? Yeah, neither do I actually. But if I wanted to use this data to suggest a hypothesis, I might come up with...

My amazing hypothesis:

If the closing price goes up twice in a row then down for one day, or if it goes down for three days in a row we should buy,

but if the closing price goes up three days in a row we should sell,

but if it goes up three days in a row and then down three days in a row we should buy.

Huh? Sounds like a whacky hypothesis right? But if we had used this strategy over the past 10 days, we would have been right on every single trade we made! The "overfitter" uses backtesting and data mining differently than the "multiple hypothesis makers" do. The "overfitter" doesn't come up with 400 different strategies to backtest. No way! The "overfitter" uses data mining tools to figure out just one strategy, no matter how complex, that would have had the best performance over the backtesting period. Will it work in the future?

Not likely, but we could always keep tweaking the model and testing the strategy in different samples (out of sample testing again) to see if our performance improves. When we stop getting performance improvements and the only thing that's rising is the complexity of our model, then we know we've crossed the line into overfitting.



Source: http://ezinearticles.com/?Backtesting-and-Data-Mining&id=341468

Data Mining Questions? Some Back-Of-The-Envelope Answers

Data mining, the discovery and modeling of hidden patterns in large volumes of data, is becoming a mainstream technology. And yet, for many, the prospect of initiating a data mining (DM) project remains daunting. Chief among the concerns of those considering DM is, "How do I know if data mining is right for my organization?"

A meaningful response to this concern hinges on three underlying questions:

    Economics - Do you have a pressing business/economic need, a "pain" that needs to be addressed immediately?
    Data - Do you have, or can you acquire, sufficient data that are relevant to the business need?
    Performance - Do you need a DM solution to produce a moderate gain in business performance compared to current practice?

By the time you finish reading this article, you will be able to answer these questions for yourself on the back of an envelope. If all answers are yes, data mining is a good fit for your business need. Any no answers indicate areas to focus on before proceeding with DM.

In the following sections, we'll consider each of the above questions in the context of a sales and marketing case study. Since DM applies to a wide spectrum of industries, we will also generalize each of the solution principles.

To begin, suppose that Donna is the VP of Marketing for a trade organization. She is responsible for several trade shows and a large annual meeting. Attendance was good for many years, and she and her staff focused their efforts on creating an excellent meeting experience (program plus venue). Recently, however, there has been declining response to promotions, and a simultaneous decline in attendance. Is data mining right for Donna and her organization?

Economics - Begin with economics - Is there a pressing business need? Donna knows that meeting attendance was down 15% this year. If that trend continues for two more years, turnout will be only about 60% of its previous level (85% x 85% x 85%), and she knows that the annual meeting is not sustainable at that level. It is critical, then, to improve the attendance, but to do so profitably. Yes, Donna has an economic need.

Generally speaking, data mining can address a wide variety of business "pains". If your company is experiencing rapid growth, DM can identify promising new retail locations or find more prospects for your online service. Conversely, if your organization is facing declining sales, DM can improve retention or identify your best existing customers for cross-selling and upselling. It is not advisable, however, to start a data mining effort without explicitly identifying a critical business need. Vast sums have been spent wastefully on mining data for "nuggets" of knowledge that have little or no value to the enterprise.

Data - Next, consider your data assets - Are sufficient, relevant data available? Donna has a spreadsheet that captures several years of meeting registrations (who attended). She also maintains a promotion history (who was sent a meeting invitation) in a simple database. So, information is available about the stimulus (sending invitations) and the response (did/did not attend). This data is clearly relevant to understanding and improving future attendance.

Donna's multi-year registration spreadsheet contains about 10,000 names. The promotion history database is even larger because many invitations are sent for each meeting, both to prior attendees and to prospects who have never attended. Sounds like plenty of data, but to be sure, it is useful to think about the factors that might be predictive of future attendance. Donna consults her intuitive knowledge of the meeting participants and lists four key factors:

    attended previously
    age
    size of company
    industry

To get a reasonable estimate for the amount of data required, we can use the following rule of thumb, developed from many years of experience:

Number of records needed ≥ 60 x 2^N (where N is the number of factors)

Since Donna listed 4 key factors, the above formula estimates that she needs 960 records (60 x 2^4 = 60 x 16). Since she has more than 10,000, we conclude Yes, Donna has relevant and sufficient data for DM.

More generally, in considering your own situation, it is important to have data that represents:

    stimulus and response (what was done and what happened)
    positive and negative outcomes

Simply put, you need data on both what works and what doesn't.

Performance - Finally, performance - Is a moderate improvement required relative to current benchmarks? Donna would like to increase attendance back to its previous level without increasing her promotion costs. She determines that the response rate to promotions needs to increase from 2% to 2.5% to meet her goals. In data mining terms, a moderate improvement is generally in the range of 10% to 100%. Donna's need is in this interval, at 25%. For her, Yes, a moderate performance increase is needed.

The performance question is typically the hardest one to address prior to starting a project. Performance is an outcome of the data mining effort, not a precursor to it. There are no guarantees, but we can use past experience as a guide. As noted for Donna above, incremental-to-moderate improvements are reasonable to expect with data mining. But don't expect DM to produce a miracle.

Conclusion

Summarizing, to determine if data mining fits your organization, you must consider:

    your business need
    your available data assets
    the performance improvement required

In the case study, Donna answered yes to each of the questions posed. She is well-positioned to proceed with a data mining project. You, too, can apply the same thought process before you spend a single dollar on DM. If you decide there is a fit, this preparation will serve you well in talking with your staff, vendors, and consultants who can help you move a data mining project forward.



Source: http://ezinearticles.com/?Data-Mining-Questions?-Some-Back-Of-The-Envelope-Answers&id=6047713

Thursday, 13 June 2013

Is Web Scraping Relevant in Today's Business World?

Different techniques and processes have been created and developed over time to collect and analyze data. Web scraping is one of the processes that have hit the business market recently. It is a great process that offers businesses with vast amounts of data from different sources such as websites and databases.

It is good to clear the air and let people know that data scraping is legal process. The main reason is in this case is because the information or data is already available in the internet. It is important to know that it is not a process of stealing information but rather a process of collecting reliable information. Most people have regarded the technique as unsavory behavior. Their main basis of argument is that with time the process will be over flooded and therefore lead to parity in plagiarism.

We can therefore simply define web scraping as a process of collecting data from a wide variety of different websites and databases. The process can be achieved either manually or by the use of software. The rise of data mining companies has led to more use of the web extraction and web crawling process. Other main functions such companies are to process and analyze the data harvested. One of the important aspects about these companies is that they employ experts. The experts are aware of the viable keywords and also the kind of information which can create usable statistic and also the pages that are worth the effort. Therefore the role of data mining companies is not limited to mining of data but also help their clients be able to identify the various relationships and also build the models.

Some of the common methods of web scraping used include web crawling, text gripping, DOM parsing, and expression matching. The latter process can only be achieved through parsers, HTML pages or even semantic annotation. Therefore there are many different ways of scraping the data but most importantly they work towards the same goal. The main objective of using web scraping service is to retrieve and also compile data contained in databases and websites. This is a must process for a business to remain relevant in the business world.

The main questions asked about web scraping touch on relevance. Is the process relevant in the business world? The answer to this question is yes. The fact that it is employed by large companies in the world and has derived many rewards says it all. It is important to note that many people regarded this technology as a plagiarism tool and others consider it as a useful tool that harvests the data required for the business success.

Using of web scraping process to extract data from the internet for competition analysis is highly recommended. If this is the case, then you must be sure to spot any pattern or trend that can work in a given market.



Source: http://ezinearticles.com/?Is-Web-Scraping-Relevant-in-Todays-Business-World?&id=7091414

Tuesday, 11 June 2013

All About PC Data Recovery

PC Data Recovery is the procedure of retrieving data from database or storage systems. You can recover data by using floppy discs, DVDs, hard drives, CDs, Memory cards etc. It helps you to recover all the corrupt or lost data in a professional, secure and fast manner. For all the businesses and IT companies, data recovery is important for saving data in an appropriate manner. The experts at the computer repair Sydney discuss some tips for it.

May be you are under a lot of mental stress and worried about how to retrieve the lost data as quickly as possible. The time for preventing your data from getting corrupt or lost has gone - the problem at hand is that of PC Data Recovery.

Firstly you could get hold of your tech savvy relatives or friends; if you are lucky enough, they will help you out and if in case you are really fortunate then they might even have data recovery software. However, if you are not lucky then you have to get your wallet out because data recovery is going to be an expensive affair. Also, just prepare yourself for a mundane and time-consuming act. Try to identify the problem with your hard disc. Either your computer fails to boot up or if it boots up, it does not show other drives. Listen carefully to your hard drive, if in case it makes some noises like that of ticking, scratching or scraping then you have to take it to the PC Data Recovery center where the experts solve your problem. As these services are time-consuming and expensive, you have to decide the worth of data that is stored in the hard disc:

If it is only a set of downloaded music or a few games then you should delete it and accept the data loss.

On the other hand, if it is some important information like a product or book that you have been working for years, then you have to take your system to a data recovery center for an evaluation- it generally costs nothing.

Therefore, if the hard drive is safe then you have a decent chance of retrieving data yourself. Firstly, you have to download some important software that will help in recovering data. Unfortunately, the reputed software are expensive, however, the good news is that many companies allow you to use them on a trial basis. Although there are some freeware versions but they are not easy to use. The execution of the further procedure depends on the set of hard drive:

    If your system has a single hard drive that is not partitioned then you have to attach hard drive to another system that has ample space to store all the lost data. This is technical so if in case you do not have any technical knowledge then get a computer savvy relative or friend to help you in PC Data Recovery.
    If in case, your computer system has a multiple drive set up and it boots up fine then all you have to do is to download the software to read the files.




Source: http://ezinearticles.com/?All-About-PC-Data-Recovery&id=3240328

Friday, 7 June 2013

Groupon Is a Straight-Up Ponzi Scheme

I would love to be wrong about this. Especially given the fallout in the tech economy if Groupon blows up. But isn’t it really pretty obvious that Groupon is a massive Ponzi scheme?

Let me first say that Groupon filing to go public is not proof of a tech bubble. There is no tech bubble, just a micro bubble here or there. Nor is the Groupon story even particularly interesting or important compared to what’s happening in Europe right now. But since it has filed for IPO and since all of us in the tech economy now must spend the next years hearing the breathless gossip, IPO hysteria, and requisite recriminations over the inevitable implosion — let us briefly examine the tulip mania that is Groupon.

Why is Groupon not merely a tech-bubble datum but a Ponzi scheme? Simple: Groupon has found that you can get local merchants to try anything once if it brings them new customers. A few local merchants in Chicago get them started, and Groupon shows good revenues. In fact, Groupon immediately remits half of those “revenues” back to the local merchant — they were never Groupon revenues in any meaningful sense of the word. But, optically, Groupon revenues look high — which they use to raise a financing round at a high valuation. Then they use the proceeds to hire vast armies of salespeople to dig deeper into Chicago’s local merchant community and repeat the trick in other cities.

Meanwhile, many early-adopting merchants find that the burst in customers immediately disappears, and since they can’t perpetually discount 75%, those merchants stop using Groupon. But Groupon’s sales force adds many more new merchants than it loses (for now). And Groupon goes out and raises another round at an even higher valuation; they hire even more salespeople and expand into even more virgin territory. Lather, rinse, repeat.

The model is only sustainable if it pays off for local merchants — and to justify Groupon’s current size, it now must pay off for local merchants ubiquitously and flamboyantly. If not, Groupon is mostly a Ponzi scheme.

Groupon argues that it helps merchants attract new customers who become loyal patrons, and that pays for the expense of winning them via Groupon. This is the fundamental argument Groupon’s sales force uses to close local merchants. Let’s get past the sales speak to what this really means. The typical Groupon “deal” is 50 percent off retail, with half of the proceeds going to Groupon. So the merchant gets 25 percent of the revenue s/he would have received if the same number of customers had arrived via walk-in traffic. Except that all that Groupon revenue is unprofitable — so more and more Groupon revenue is actually bad.

The vast majority of local merchants can’t discount more than 10 percent. Some can go maybe 25 percent in special situations. But 75 percent is a wholly unsustainable number. If all local merchants begin using Groupon then it can’t send loyal customers to anyone; Groupon can only send discount chasers to merchants. Which means that as Groupon grows, both local merchants and their competitors will find that Groupon’s main argument no longer works (if it ever did) — Groupon simply can’t send them loyal new business. So they all stop using Groupon in its current form.

Perhaps Groupon management thinks it is creating a sustainable Prisoner’s Dilemma, one that ultimately destroys value for the local merchant ecosystem but benefits Groupon. In other words, Groupon could grow so big that local merchants have to use it, even though it ultimately hurts them. In game theory terms, Groupon creates an equilibrium point at “All Local Merchants Defect,” and then, having forced merchants into this value-destroying equilibrium, takes a cut for having rigged the game. Obviously, Groupon couldn’t share this thinking publicly. They would just continue to use the attract-loyal-new-customers argument even though it no longer makes any sense for a ginormous Groupon.

This may sound cynical. But if this is Groupon’s game plan, it isn’t cynical. It’s naïve. Most local merchants simply don’t have enough value in their collective ecosystem to share anything remotely like this much value with Groupon. This isn’t a stable equilibrium, it’s a suicidal one. The local merchants will have to stop using Groupon en masse not long after they first start experimenting with it.

Due to its size, Wal-Mart can squeeze its suppliers on price and its suppliers will comply. Lower prices create value for Wal-Mart’s customers. But it’s sustainable only because it also creates value for Wal-Mart’s suppliers who are large enough that they can find efficiencies in their manufacturing processes (generally by outsourcing manufacturing to low-wage economies like China). That’s bad for American workers. But it’s value-creating for Wal-Mart suppliers because they get to sell stuff through Wal-Mart (which means they can sell more of it) at margins that are acceptable due to reduced manufacturing costs.

But most Groupon local merchants are nothing whatsoever like Wal-Mart’s suppliers. They generally have no margin to spare or wiggle room in their operating costs. Therefore, they cannot continue using Groupon.

Let’s consider the exceptions because there are some. A local merchant with huge gross margins — 70 to 90 percent — can use Groupon sustainably (though it still isn’t clear that they should). Or, a large local merchant who does a lot of expensive customer acquisition (i.e., local television) can use Groupon sustainably but only if Groupon is better than its traditional customer acquisition methods (doing both and doubling customer acquisition costs will not double the local market size).

This is why Groupon must ultimately implode — there just aren’t that many business that fit either of these descriptions.

Groupon’s management publically avers that “local merchants come back” — well, sure, some of them do. For a while. But what do the audited numbers look like? Just what percentage of local merchants come back? How many times? Do local merchants show a strong tendency to decline in participation over time?

Groupon management won’t release these numbers, and certainly won’t release thoroughly audited and vetted versions of these numbers. Instead, what Groupon management is doing is withdrawing an astonishing amount of cash out of the company. It’s also creating a new class of B shares so that it can keep control of the company in the hands of management — all the better to keep the Ponzi scheme going for as long as possible.

Again, there isn’t a bubble in tech. High valuations for many of the big tech companies like LinkedIn, Facebook, and Twitter make sense due to those companies’ incredible network effects and the fact that, fundamentally, these companies are creating value and will get better over time at monetizing that value. Net-net, Groupon is unsustainably destroying value and will implode sometime in the next five years. When that happens, it will almost certainly, and totally unfairly, wreak havoc throughout the tech ecosystem.


Source: http://www.knewton.com/blog/knewton/2011/06/03/groupon-is-a-straight-up-ponzi-scheme/

Wednesday, 5 June 2013

Using Charts For Effective Data Mining

The modern world is one where data is gathered voraciously. Modern computers with all their advanced hardware and software are bringing all of this data to our fingertips. In fact one survey says that the amount of data gathered is doubled every year. That is quite some data to understand and analyze. And this means a lot of time, effort and money. That is where advancements in the field of Data Mining have proven to be so useful.

Data mining is basically a process of identifying underlying patters and relationships among sets of data that are not apparent at first glance. It is a method by which large and unorganized amounts of data are analyzed to find underlying connections which might give the analyzer useful insight into the data being analyzed.

It's uses are varied. In marketing it can be used to reach a product to a particular customer. For example, suppose a supermarket while mining through their records notices customers preferring to buy a particular brand of a particular product. The supermarket can then promote that product even further by giving discounts, promotional offers etc. related to that product. A medical researcher analyzing D.N.A strands can and will have to use data mining to find relationships existing among the strands. Apart from bio-informatics, data mining has found applications in several other fields like genetics, pure medicine, engineering, even education.

The Internet is also a domain where mining is used extensively. The world wide web is a minefield of information. This information needs to be sorted, grouped and analyzed. Data Mining is used extensively here. For example one of the most important aspects of the net is search. Everyday several million people search for information over the world wide web. If each search query is to be stored then extensively large amounts of data will be generated. Mining can then be used to analyze all of this data and help return better and more direct search results which lead to better usability of the Internet.

Data mining requires advanced techniques to implement. Statistical models, mathematical algorithms or the more modern machine learning methods may be used to sift through tons and tons of data in order to make sense of it all.

Foremost among these is the method of charting. Here data is plotted in the form of charts and graphs. Data visualization, as it is often referred to is a tried and tested technique of data mining. If visually depicted, data easily reveals relationships that would otherwise be hidden. Bar charts, pie charts, line charts, scatter plots, bubble charts etc. provide simple, easy techniques for data mining.

Thus a clear simple truth emerges. In today's world of heavy load data, mining it is necessary. And charts and graphs are one of the surest methods of doing this. And if current trends are anything to go by the importance of data mining cannot be undermined in any way in the near future.



Source: http://ezinearticles.com/?Using-Charts-For-Effective-Data-Mining&id=2644996

Monday, 3 June 2013

Data Extraction Services - A Helpful Hand For Large Organization

The data extraction is the way to extract and to structure data from not structured and semi-structured electronic documents, as found on the web and in various data warehouses. Data extraction is extremely useful for the huge organizations which deal with considerable amounts of data, daily, which must be transformed into significant information and be stored for the use this later on.

Your company with tons of data but it is difficult to control and convert the data into useful information. Without right information at the right time and based on half of accurate information, decision makers with a company waste time by making wrong strategic decisions. In high competing world of businesses, the essential statistics such as information customer, the operational figures of the competitor and the sales figures inter-members play a big role in the manufacture of the strategic decisions. It can help you to take strategic business decisions that can shape your business' goals..

Outsourcing companies provide custom made services to the client's requirements. A few of the areas where it can be used to generate better sales leads, extract and harvest product pricing data, capture financial data, acquire real estate data, conduct market research , survey and analysis, conduct product research and analysis and duplicate an online database..

The different types of Data Extraction Services:

    Database Extraction:
    Reorganized data from multiple databases such as statistics about competitor's products, pricing and latest offers and customer opinion and reviews can be extracted and stored as per the requirement of company.
    Web Data Extraction:
    Web Data Extraction is also known as data Extraction which is usually referred to the practice of extract or reading text data from a targeted website.

Businesses have now realized about the huge benefits they can get by outsourcing their services. Then outsourcing is profitable option for business. Since all projects are custom based to suit the exact needs of the customer, huge savings in terms of time, money and infrastructure are among the many advantages that outsourcing brings.

Advantages of Outsourcing Data Extraction Services:

    Improved technology scalability
    Skilled and qualified technical staff who are proficient in English
    Advanced infrastructure resources
    Quick turnaround time
    Cost-effective prices
    Secure Network systems to ensure data safety
    Increased market coverage

By outsourcing, you can definitely increase your competitive advantages. Outsourcing of services helps businesses to manage their data effectively, which in turn would enable them to experience an increase in profits.


Source: http://ezinearticles.com/?Data-Extraction-Services---A-Helpful-Hand-For-Large-Organization&id=2477589