Thursday, 14 November 2013

What is data scraping and how can I stop it?

Data scraping (also called web scraping) is the process of extracting information from websites. Data scraping focuses on transforming unstructured website content (usually HTML) into structured data which can be stored in a database or spreadsheet.

The way data is scraped from a website is similar to that used by search bots – human web browsing is simulated by using programs (bots) which extract (scrape) the data from a website.

Unfortunately, there is no efficient way to fully protect your website from data scraping. This is so because data scraping programs (also called data scrapers or web scrapers) obtain the same information as your regular web visitors.

Even if you block the IP address of a data scraper, this will not prevent it from accessing your website. Most data scraping bots use large IP address pools and automatically switch the IP address in case one IP gets blocked. And if you block too many IPs, you will most probably block many of your legitimate visitors.

One of the best ways to protect globally accessible data on a website is through copyright protection. This way you can legally protect the intellectual ownership of your website content.

Another way to protect your site content is to password protect it. This way your website data will be available only to people who can authenticate with the correct username and password.


Source: http://kb.siteground.com/what_is_data_scraping_and_how_can_i_stop_it/

No comments:

Post a Comment