Overview:
All screen scraping first requires manual review of the page you want to extract
resources from. When dealing with AJAX you usually just need to analyze a bit more
than just simply the HTML.
When dealing with AJAX this just means that the value you want is not in the initial
HTML document that you requested, but that javascript will be exectued which asks the
server for the extra information you want.
You can therefore usually simply analyze the javascript and see which request the
javascript makes and just call this URL instead from the start.
Example:
Take this as an example, assume the page you want to scrape from has the following
script:
<script type="text/javascript">
function ajaxFunction()
{
var xmlHttp;
try
{
// Firefox, Opera 8.0+, Safari
xmlHttp=new XMLHttpRequest();
}
catch (e)
{
// Internet Explorer
try
{
xmlHttp=new ActiveXObject("Msxml2.XMLHTTP");
}
catch (e)
{
try
{
xmlHttp=new ActiveXObject("Microsoft.XMLHTTP");
}
catch (e)
{
alert("Your browser does not support AJAX!");
return false;
}
}
}
xmlHttp.onreadystatechange=function()
{
if(xmlHttp.readyState==4)
{
document.myForm.time.value=xmlHttp.responseText;
}
}
xmlHttp.open("GET","time.asp",true);
xmlHttp.send(null);
}
</script>
Then all you need to do is instead do an HTTP request to time.asp of the same server
instead. Example from w3schools.
Sporce: http://stackoverflow.com/questions/260540/how-do-you-scrape-ajax-pages
No comments:
Post a Comment