Tuesday 11 June 2013

Web Screen Scraping, Screen Scraping, Website Scraping, Web Page Scraping, Web Scraper

Web scraping is a technique used for extracting the information from different websites. They make use of software programs which simulate a human who surfs the internet to gather information. A human browser would enter the url, request the web page, copy the information and paste it. Similarly the programs or scripts are written in such a way that the software establishes a connection with the server and requests the web page, the server then sends an acknowledgement and the pages requested. The scripts then capture the data and store them as structured data.

Web scraping is implemented using HTTP protocol or by embedding web browsers. The aim of web scraping is to capture unstructured data from the target websites and convert them into structured data which can be stored and maintained in database for any future use. With the growing usage of internet for daily activities like weather monitoring, information gathering, price comparison etc, web scraping has become a great necessity.

Most of the data present in websites are of HTML format which are machine readable. The process of extracting data from HTML web pages is called as web screen scraping. Screen scraping uses software programs or scripts written to read the data from terminal port or the screen rather than the database. This enables the extraction of data in human readable format.

Website scraping enables extracting information from various websites where they are stored reaching out even to the hidden ones. Web page scraping involves collecting information from target websites and saving the data in a new database to enable easy filtering and sorting the data. The web scrapers are designed in such a way that they gather the required information; convert the unstructured data into a structured format, save them data by assembling them in a proper way for future usage. The output data can be saved in any database, spreadsheet, text file or any other required format.

The major advantages of using web scraping tools are accuracy and efficiency. The manual work of searching for the information, gathering the data, copying and pasting would take a lot of time making the job boring and tiresome. The web scrapers complete the task in very less time making the whole process easier. The manual work may not provide that accurate data while the web page scraping tools provide great accuracy. These tools also enable retrieving any type of data and images i.e. text, word, pdf, jpeg or gif from websites having different technologies like php, html, jsp, asp, java script, ajax etc. The scraped data can also be converted into desired format like XML, CSV, EXCEL or databases like MS Access, MS-SQL, MySQL etc.

With the availability of web scraping tools, gathering information is no more time consuming. One need not spend hours to complete such a simple task. The scrapers do the work for you.


Source: http://www.articlesnatch.com/Article/Web-Screen-Scraping--Screen-Scraping--Website-Scraping--Web-Page-Scraping--Web-Scraper/923102#.Ubgpa9gY_Dc

No comments:

Post a Comment